ReLU → Max Pool Solved Numericals

Section 01

The Full Pipeline — Every CNN Block in Order

📖 The Assembly Line

Raw Material → Inspection → Rejection → Packaging

Imagine a factory assembly line. Raw steel sheets come in (the input image). A stamping press shapes them with a mould (the convolution kernel) — every region gets pressed, producing a new shaped sheet (the feature map). A quality inspector then discards every warped or bent piece below zero (the ReLU). Finally, a packing machine groups every 2×2 batch of pieces and keeps only the best one (the max pool). What arrives at the warehouse is a compact, high-quality summary of the original sheet.

That is exactly what happens in one CNN block — in that order, every time.

Input

5×5 image

→

Conv 2D

kernel 3×3

→

Feature Map

3×3

→

ReLU

max(0, x)

→

Max Pool

2×2, S=1

→

Output

2×2

📋

What You Will Compute By Hand

Two complete numericals. Each one goes through Step 1 — Convolution (every dot product, position by position), Step 2 — ReLU (zero out every negative), and Step 3 — Max Pooling (slide a 2×2 window and keep the maximum). Nothing skipped, nothing assumed.

Section 02

Numerical 1 — Full Pipeline: Conv → ReLU → Max Pool

Given: A 5×5 input image and a 3×3 kernel. No padding. Stride 1 for both conv and pool (2×2 pool, stride 1).

🖼 Input Image (5×5)

★

⚙ Kernel (3×3)

-1

🔎

What This Kernel Does

This kernel has +1 in the left column, 0 in the middle, −1 in the right column. It subtracts the right side from the left side of every 3×3 patch — a classic vertical edge detector. Bright on the left, dark on the right → large positive output.

📐 Output Size after Convolution

Formula

O = ⌊(N − F + 2P) / S⌋ + 1 = ⌊(5 − 3 + 0) / 1⌋ + 1 = 3 → Feature map is 3×3

After Pool

O = ⌊(3 − 2) / 1⌋ + 1 = 2 → Final output is 2×2

① Step 1 — Convolution (9 dot products)

The 3×3 kernel slides across the 5×5 input with stride 1. Every position produces one value. Here are all 9:

Position [0,0] — top-left patch

⊙

-1

(1×1)+(2×0)+(3×−1) + (4×1)+(5×0)+(6×−1) + (7×1)+(8×0)+(9×−1)
= (1+0−3) + (4+0−6) + (7+0−9)
= −2 + (−2) + (−2) = −6 → FM[0,0] = −6

Position [0,1] — shift right by 1

⊙

-1

(2×1)+(3×0)+(0×−1) + (5×1)+(6×0)+(1×−1) + (8×1)+(9×0)+(0×−1)
= (2+0+0) + (5+0−1) + (8+0+0)
= 2 + 4 + 8 = 14 → FM[0,1] = 14

Position [0,2] — shift right again

⊙

-1

(3×1)+(0×0)+(1×−1) + (6×1)+(1×0)+(2×−1) + (9×1)+(0×0)+(3×−1)
= (3+0−1) + (6+0−2) + (9+0−3)
= 2 + 4 + 6 = 12 → FM[0,2] = 12

Position [1,0] — move down to row 1

⊙

-1

(4×1)+(5×0)+(6×−1) + (7×1)+(8×0)+(9×−1) + (2×1)+(1×0)+(0×−1)
= (4+0−6) + (7+0−9) + (2+0+0)
= −2 + (−2) + 2 = −2 → FM[1,0] = −2

Position [1,1] — centre of feature map

⊙

-1

(5×1)+(6×0)+(1×−1) + (8×1)+(9×0)+(0×−1) + (1×1)+(0×0)+(4×−1)
= (5+0−1) + (8+0+0) + (1+0−4)
= 4 + 8 + (−3) = 9 → FM[1,1] = 9

Position [1,2]

⊙

-1

(6×1)+(1×0)+(2×−1) + (9×1)+(0×0)+(3×−1) + (0×1)+(4×0)+(5×−1)
= (6+0−2) + (9+0−3) + (0+0−5)
= 4 + 6 + (−5) = 5 → FM[1,2] = 5

Position [2,0] — bottom row, left

⊙

-1

(7×1)+(8×0)+(9×−1) + (2×1)+(1×0)+(0×−1) + (6×1)+(3×0)+(2×−1)
= (7+0−9) + (2+0+0) + (6+0−2)
= −2 + 2 + 4 = 4 → FM[2,0] = 4

Position [2,1]

⊙

-1

(8×1)+(9×0)+(0×−1) + (1×1)+(0×0)+(4×−1) + (3×1)+(2×0)+(1×−1)
= (8+0+0) + (1+0−4) + (3+0−1)
= 8 + (−3) + 2 = 7 → FM[2,1] = 7

Position [2,2] — bottom-right patch

⊙

-1

(9×1)+(0×0)+(3×−1) + (0×1)+(4×0)+(5×−1) + (2×1)+(1×0)+(0×−1)
= (9+0−3) + (0+0−5) + (2+0+0)
= 6 + (−5) + 2 = 3 → FM[2,2] = 3

📈 Feature Map after Convolution (3×3)

−6

−2

② Step 2 — ReLU Activation: max(0, x)

Apply ReLU element-wise. Every negative value becomes 0. Every positive value stays unchanged.

Position	Conv Output	ReLU Rule	Result
[0,0]	−6	→ max(0, −6)	0
[0,1]	14	→ max(0, 14)	14
[0,2]	12	→ max(0, 12)	12
[1,0]	−2	→ max(0, −2)	0
[1,1]	9	→ max(0, 9)	9
[1,2]	5	→ max(0, 5)	5
[2,0]	4	→ max(0, 4)	4
[2,1]	7	→ max(0, 7)	7
[2,2]	3	→ max(0, 3)	3

⚡ After ReLU (3×3)

③ Step 3 — Max Pooling: 2×2 window, Stride 1

Output size: O = ⌊(3 − 2)/1⌋ + 1 = 2 → 2×2 output. Slide the 2×2 window over the ReLU map:

Window [0:2, 0:2] → Out[0,0]

max(0, 14, 0, 9) = 14

Window [0:2, 1:3] → Out[0,1]

max(14, 12, 9, 5) = 14

Window [1:3, 0:2] → Out[1,0]

max(0, 9, 4, 7) = 9

Window [1:3, 1:3] → Out[1,1]

max(9, 5, 7, 3) = 9

🏆 Final Output after Max Pool (2×2)

🎯

Numerical 1 — Full Summary

Input 5×5 → Conv (3×3 kernel, S=1, P=0) → Feature Map 3×3 [−6,14,12 / −2,9,5 / 4,7,3] → ReLU [0,14,12 / 0,9,5 / 4,7,3] → MaxPool (2×2, S=1) → Final [[14,14],[9,9]]. The two negatives (−6 and −2) were killed by ReLU. Max pooling then pulled the strongest signal (14 — the edge response) into both top cells.

Section 03

Numerical 2 — Different Kernel, Stride 2 Pool

Given: A 4×4 input image and a 3×3 sharpening kernel. No padding. Stride 1 conv, then 2×2 MaxPool with stride 2 (non-overlapping).

Input

4×4

→

Conv 3×3

S=1, P=0

→

Feature Map

2×2

→

ReLU

max(0,x)

→

MaxPool 2×2

Stride 2

→

Output

1×1

🖼 Input Image (4×4)

★

⚙ Kernel (3×3) — Sharpening

-1

📐 Output Sizes

After Conv

O = ⌊(4 − 3 + 0) / 1⌋ + 1 = 2 → Feature map is 2×2

After Pool

O = ⌊(2 − 2) / 2⌋ + 1 = 1 → Final output is 1×1 (a single number!)

① Step 1 — Convolution (4 dot products on the 4×4 input)

Position [0,0]

⊙

-1

(2×0)+(4×−1)+(1×0) + (5×−1)+(8×5)+(2×−1) + (1×0)+(3×−1)+(7×0)
= (0−4+0) + (−5+40−2) + (0−3+0)
= −4 + 33 + (−3) = 26 → FM[0,0] = 26

Position [0,1]

⊙

-1

(4×0)+(1×−1)+(3×0) + (8×−1)+(2×5)+(6×−1) + (3×0)+(7×−1)+(4×0)
= (0−1+0) + (−8+10−6) + (0−7+0)
= −1 + (−4) + (−7) = −12 → FM[0,1] = −12

Position [1,0]

⊙

-1

(5×0)+(8×−1)+(2×0) + (1×−1)+(3×5)+(7×−1) + (0×0)+(2×−1)+(5×0)
= (0−8+0) + (−1+15−7) + (0−2+0)
= −8 + 7 + (−2) = −3 → FM[1,0] = −3

Position [1,1]

⊙

-1

(8×0)+(2×−1)+(6×0) + (3×−1)+(7×5)+(4×−1) + (2×0)+(5×−1)+(9×0)
= (0−2+0) + (−3+35−4) + (0−5+0)
= −2 + 28 + (−5) = 21 → FM[1,1] = 21

📈 Feature Map after Convolution (2×2)

−12

−3

② Step 2 — ReLU Activation

Position	Conv Output	ReLU Rule	Result
[0,0]	26	→ max(0, 26)	26
[0,1]	−12	→ max(0, −12)	0
[1,0]	−3	→ max(0, −3)	0
[1,1]	21	→ max(0, 21)	21

⚡ After ReLU (2×2)

③ Step 3 — Max Pooling: 2×2 window, Stride 2

Output size: O = ⌊(2 − 2)/2⌋ + 1 = 1 → single scalar output. Only one window — it covers the entire 2×2 ReLU map:

Window [0:2, 0:2] — the entire ReLU map → Out[0,0]

max(26, 0, 0, 21) = 26

🏆 Final Output after Max Pool (1×1)

🎯

Numerical 2 — Full Summary

Input 4×4 → Conv (sharpening 3×3, S=1, P=0) → Feature Map 2×2 [26, −12, −3, 21] → ReLU → [26, 0, 0, 21] → MaxPool (2×2, S=2) → single value 26. The sharpening kernel amplified the two "high-contrast" patches (strong neighbours) and suppressed the rest. ReLU removed the two negative responses. Max pool selected the strongest — 26.

Section 04

Side-by-Side Pipeline Summary

Stage	Numerical 1 (5×5 input)	Numerical 2 (4×4 input)
Input	5×5 = 25 values	4×4 = 16 values
Kernel	3×3 vertical edge detector `[1,0,−1 / 1,0,−1 / 1,0,−1]`	3×3 sharpening `[0,−1,0 / −1,5,−1 / 0,−1,0]`
After Conv	3×3 feature map `[−6,14,12 / −2,9,5 / 4,7,3]`	2×2 feature map `[26,−12 / −3,21]`
Negatives	2 values (−6, −2)	2 values (−12, −3)
After ReLU	`[0,14,12 / 0,9,5 / 4,7,3]`	`[26,0 / 0,21]`
Pool Config	2×2, Stride 1 → overlapping	2×2, Stride 2 → non-overlapping
Final Output	`[[14,14],[9,9]]` — 2×2	`[[26]]` — single scalar

Section 05

Python — Verify Both Pipelines

import numpy as np

# ── Convolution (cross-correlation, no flip) ──────────────────
def conv2d(x, k):
    """No padding, stride 1."""
    KH, KW = k.shape
    OH, OW = x.shape[0]-KH+1, x.shape[1]-KW+1
    out = np.zeros((OH, OW))
    for i in range(OH):
        for j in range(OW):
            out[i, j] = np.sum(x[i:i+KH, j:j+KW] * k)
    return out

# ── ReLU ──────────────────────────────────────────────────────
def relu(x):
    return np.maximum(0, x)

# ── Max Pool ──────────────────────────────────────────────────
def max_pool(x, pool=2, stride=2):
    OH = (x.shape[0] - pool) // stride + 1
    OW = (x.shape[1] - pool) // stride + 1
    out = np.zeros((OH, OW))
    for i in range(OH):
        for j in range(OW):
            out[i, j] = x[i*stride:i*stride+pool, j*stride:j*stride+pool].max()
    return out

# ═══════════════════════════════════════════════
# NUMERICAL 1 — 5×5 input, vertical edge kernel
# ═══════════════════════════════════════════════
inp1 = np.array([
    [1,2,3,0,1],
    [4,5,6,1,2],
    [7,8,9,0,3],
    [2,1,0,4,5],
    [6,3,2,1,0]
])
k1 = np.array([[1,0,-1],[1,0,-1],[1,0,-1]])

fm1 = conv2d(inp1, k1)
r1  = relu(fm1)
p1  = max_pool(r1, pool=2, stride=1)

print("N1 Feature Map:\n", fm1)
print("N1 After ReLU:\n",  r1)
print("N1 Max Pool output:\n", p1)

# ═══════════════════════════════════════════════
# NUMERICAL 2 — 4×4 input, sharpening kernel
# ═══════════════════════════════════════════════
inp2 = np.array([
    [2,4,1,3],
    [5,8,2,6],
    [1,3,7,4],
    [0,2,5,9]
])
k2 = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]])

fm2 = conv2d(inp2, k2)
r2  = relu(fm2)
p2  = max_pool(r2, pool=2, stride=2)

print("N2 Feature Map:\n", fm2)
print("N2 After ReLU:\n",  r2)
print("N2 Max Pool output:\n", p2)

OUTPUT

N1 Feature Map: [[ -6. 14. 12.] [ -2. 9. 5.] [ 4. 7. 3.]] N1 After ReLU: [[ 0. 14. 12.] [ 0. 9. 5.] [ 4. 7. 3.]] N1 Max Pool output: [[14. 14.] [ 9. 9.]] N2 Feature Map: [[ 26. -12.] [ -3. 21.]] N2 After ReLU: [[26. 0.] [ 0. 21.]] N2 Max Pool output: [[26.]]

Section 06

Golden Rules — The Three-Step Sequence

⚡ Conv → ReLU → MaxPool — What Every Student Must Internalise

Always compute output size before you start. O = ⌊(N − F + 2P)/S⌋ + 1. Know your dimensions at every stage — an error here means all subsequent numbers are wrong.

Convolution is a dot product, not a multiplication. Multiply element-wise then sum all 9 (or 4, or 25) products to get one number. Do not multiply entire rows or columns.

ReLU is trivial but critical. Every negative → 0, every positive stays. It is the non-linearity that lets the network learn non-linear decision boundaries. Without it, stacking convolutions is just one big linear transform.

Max pooling uses the ReLU output, not the raw feature map. The order is always: Conv → ReLU → Pool. Reversing ReLU and Pool is incorrect — pooling before ReLU allows negatives to propagate.

Pool stride controls spatial compression. Stride 1 = barely any size reduction. Stride 2 = halved spatial size. Stride = pool size = no overlap. These are distinct behaviours with very different effects on the network.