Square Root Decomposition

Jul 4, 2025

Updated 18 hours ago

5 min read

Square Root Decomposition Explained: Range Queries in O(√n)

Topic: Understanding square root decomposition — how it works, why it's useful, and a complete C++ implementation for range sum queries with point updates.

There's a gap between a brute-force O(n) range query and the O(log n) power of a segment tree. Square root decomposition sits right in the middle at O(√n) — and for many problems, that's exactly the right tradeoff. It's simpler to implement than a segment tree, handles a broader class of problems without customization, and is fast enough for most competitive programming constraints up to around 10⁵ elements.

This guide explains how it works from scratch, then walks through a complete C++ implementation.

The Core Idea

Take your array and divide it into blocks of size √n. For an array of 16 elements, that's blocks of size 4. For 100 elements, blocks of size 10.

For each block, precompute and store a summary value — in this case the sum of all elements in that block. Now you have two layers of data: the original array and a smaller array of block summaries.

Square root decomposition example showing array divided into blocks with precomputed sums for efficient range queries

The diagram above shows exactly this layout. The original array sits at the bottom. Above it, each block of √n elements is collapsed into a single precomputed sum. When a query arrives, you use the block sums for fully covered blocks and walk the raw array only for the partial blocks at the edges.

How Range Queries Work

When you query range [l, r], three cases appear:

Case 1 — Same block. If l and r fall in the same block, there's no shortcut. Loop through every element from l to r individually. At most √n iterations.

Case 2 — Spanning multiple blocks. This is where the structure pays off:

Left partial block — from l to the end of its block, walk element by element.
Complete middle blocks — add the precomputed block sum directly. O(1) per block.
Right partial block — from the start of the last block to r, walk element by element.

Total cost: at most √n elements on the left edge + at most √n elements on the right edge + at most √n block sums in the middle = O(√n).

How Point Updates Work

When you update index i to a new value:

Find which block i belongs to: blockIndex = i / blockSize
Adjust the block sum: blocks[blockIndex] += newValue - oldValue
Update the raw array: nums[i] = newValue

That's it — O(1) per update. The block sum stays correct because you're adding exactly the difference.

Complete C++ Implementation

cpp

class SquareRootDecomposition {
    vector<int> nums, blocks;
    int blockSize;

public:
    SquareRootDecomposition(vector<int>& nums) {
        this->nums = nums;
        blockSize = sqrt(nums.size()) + 1;
        blocks.resize(blockSize, 0);

        for (int i = 0; i < nums.size(); i++) {
            blocks[i / blockSize] += nums[i];
        }
    }

    void update(int index, int value) {
        int blockIndex = index / blockSize;
        blocks[blockIndex] += value - nums[index]; // adjust block sum by the difference
        nums[index] = value;
    }

    int query(int left, int right) {
        int sum = 0;
        int startBlock = left / blockSize;
        int endBlock   = right / blockSize;

        // Case 1: both endpoints in the same block
        if (startBlock == endBlock) {
            for (int i = left; i <= right; i++)
                sum += nums[i];
            return sum;
        }

        // Case 2: left partial block
        for (int i = left; i < (startBlock + 1) * blockSize; i++)
            sum += nums[i];

        // Case 2: complete middle blocks (use precomputed sums)
        for (int i = startBlock + 1; i < endBlock; i++)
            sum += blocks[i];

        // Case 2: right partial block
        for (int i = endBlock * blockSize; i <= right; i++)
            sum += nums[i];

        return sum;
    }
};

Walking Through the Code

Constructor — sets blockSize to √n + 1 (the +1 ensures the last block is never missed for non-perfect-square sizes). Then iterates through the array once, accumulating each element into the correct block. O(n) build time.

update(index, value) — instead of recomputing the whole block sum, it adjusts by the delta: newValue - oldValue. If the old value was 3 and the new value is 10, the block sum increases by exactly 7. Clean and O(1).

query(left, right) — handles the same-block edge case first, then does the three-part traversal for cross-block ranges. The middle loop uses blocks[i] directly — no inner loop, just one addition per complete block.

Complexity Summary

Operation	Time Complexity
Build	O(n)
Point update	O(1)
Range query	O(√n)
Space	O(n) for array + O(√n) for blocks

Square Root Decomposition vs Segment Tree

	Square Root Decomposition	Segment Tree
Build time	O(n)	O(n)
Query time	O(√n)	O(log n)
Update time	O(1) point, O(√n) range	O(log n)
Implementation	Simple — ~30 lines	Moderate — ~60–80 lines
Flexibility	Very high — works on almost anything	High — requires custom node design
Best for	Mo's algorithm, offline queries, unusual aggregates	Online queries, strict time limits

The key advantage of square root decomposition is flexibility. Any operation that's hard to merge across a segment tree (like "median of range", "sort a subarray") is often easy to handle with block decomposition because you're always working with small chunks of raw data directly.

When to Choose Square Root Decomposition Over a Segment Tree

The problem involves offline queries — pair it with Mo's Algorithm for O((n + q)√n) total.
The aggregate function is hard to merge — medians, modes, and sorted order don't compose cleanly in a segment tree but work fine with block decomposition.
You need a quick implementation in a contest — ~30 lines versus ~70 for a lazy segment tree.
Constraints are around n, q ≤ 10⁵ — O(√n) ≈ 316 operations per query, which comfortably fits in time limits.

For tighter constraints (n, q ≤ 10⁶) or when lazy propagation is needed, a Segment Tree is the better choice.

Common Mistakes to Avoid

Forgetting the +1 in blockSize — sqrt(n) for a non-perfect square like n=10 gives 3.16, which truncates to 3. With blockSize=3, the last element at index 9 belongs to block 3 but blocks only has 3 entries (indices 0,1,2). The +1 prevents this out-of-bounds bug.
Recomputing the whole block sum on update — the delta trick (value - nums[index]) is O(1). Recomputing from scratch is O(√n) for no gain.
Off-by-one in the right partial block — the right partial block starts at endBlock * blockSize, not right. Starting at right skips elements.

For more on block-based algorithms, CP-Algorithms has a detailed guide on sqrt decomposition covering the range update variant and Mo's algorithm integration.

Also Explore These Topics

Written by Abhijeet Singh Rajput · Published on Notehub