The Full Pipeline — Every CNN Block in Order
That is exactly what happens in one CNN block — in that order, every time.
Two complete numericals. Each one goes through Step 1 — Convolution (every dot product, position by position), Step 2 — ReLU (zero out every negative), and Step 3 — Max Pooling (slide a 2×2 window and keep the maximum). Nothing skipped, nothing assumed.
Numerical 1 — Full Pipeline: Conv → ReLU → Max Pool
Given: A 5×5 input image and a 3×3 kernel. No padding. Stride 1 for both conv and pool (2×2 pool, stride 1).
This kernel has +1 in the left column, 0 in the middle, −1 in the right column. It subtracts the right side from the left side of every 3×3 patch — a classic vertical edge detector. Bright on the left, dark on the right → large positive output.
① Step 1 — Convolution (9 dot products)
The 3×3 kernel slides across the 5×5 input with stride 1. Every position produces one value. Here are all 9:
= (1+0−3) + (4+0−6) + (7+0−9)
= −2 + (−2) + (−2) = −6 → FM[0,0] = −6
= (2+0+0) + (5+0−1) + (8+0+0)
= 2 + 4 + 8 = 14 → FM[0,1] = 14
= (3+0−1) + (6+0−2) + (9+0−3)
= 2 + 4 + 6 = 12 → FM[0,2] = 12
= (4+0−6) + (7+0−9) + (2+0+0)
= −2 + (−2) + 2 = −2 → FM[1,0] = −2
= (5+0−1) + (8+0+0) + (1+0−4)
= 4 + 8 + (−3) = 9 → FM[1,1] = 9
= (6+0−2) + (9+0−3) + (0+0−5)
= 4 + 6 + (−5) = 5 → FM[1,2] = 5
= (7+0−9) + (2+0+0) + (6+0−2)
= −2 + 2 + 4 = 4 → FM[2,0] = 4
= (8+0+0) + (1+0−4) + (3+0−1)
= 8 + (−3) + 2 = 7 → FM[2,1] = 7
= (9+0−3) + (0+0−5) + (2+0+0)
= 6 + (−5) + 2 = 3 → FM[2,2] = 3
② Step 2 — ReLU Activation: max(0, x)
Apply ReLU element-wise. Every negative value becomes 0. Every positive value stays unchanged.
| Position | Conv Output | ReLU Rule | Result |
|---|---|---|---|
| [0,0] | −6 | → max(0, −6) | 0 |
| [0,1] | 14 | → max(0, 14) | 14 |
| [0,2] | 12 | → max(0, 12) | 12 |
| [1,0] | −2 | → max(0, −2) | 0 |
| [1,1] | 9 | → max(0, 9) | 9 |
| [1,2] | 5 | → max(0, 5) | 5 |
| [2,0] | 4 | → max(0, 4) | 4 |
| [2,1] | 7 | → max(0, 7) | 7 |
| [2,2] | 3 | → max(0, 3) | 3 |
③ Step 3 — Max Pooling: 2×2 window, Stride 1
Output size: O = ⌊(3 − 2)/1⌋ + 1 = 2 → 2×2 output. Slide the 2×2 window over the ReLU map:
Input 5×5 → Conv (3×3 kernel, S=1, P=0) → Feature Map 3×3
[−6,14,12 / −2,9,5 / 4,7,3] → ReLU
[0,14,12 / 0,9,5 / 4,7,3] → MaxPool (2×2, S=1)
→ Final [[14,14],[9,9]].
The two negatives (−6 and −2) were killed by ReLU. Max pooling then pulled the strongest
signal (14 — the edge response) into both top cells.
Numerical 2 — Different Kernel, Stride 2 Pool
Given: A 4×4 input image and a 3×3 sharpening kernel. No padding. Stride 1 conv, then 2×2 MaxPool with stride 2 (non-overlapping).
① Step 1 — Convolution (4 dot products on the 4×4 input)
= (0−4+0) + (−5+40−2) + (0−3+0)
= −4 + 33 + (−3) = 26 → FM[0,0] = 26
= (0−1+0) + (−8+10−6) + (0−7+0)
= −1 + (−4) + (−7) = −12 → FM[0,1] = −12
= (0−8+0) + (−1+15−7) + (0−2+0)
= −8 + 7 + (−2) = −3 → FM[1,0] = −3
= (0−2+0) + (−3+35−4) + (0−5+0)
= −2 + 28 + (−5) = 21 → FM[1,1] = 21
② Step 2 — ReLU Activation
| Position | Conv Output | ReLU Rule | Result |
|---|---|---|---|
| [0,0] | 26 | → max(0, 26) | 26 |
| [0,1] | −12 | → max(0, −12) | 0 |
| [1,0] | −3 | → max(0, −3) | 0 |
| [1,1] | 21 | → max(0, 21) | 21 |
③ Step 3 — Max Pooling: 2×2 window, Stride 2
Output size: O = ⌊(2 − 2)/2⌋ + 1 = 1 → single scalar output. Only one window — it covers the entire 2×2 ReLU map:
Input 4×4 → Conv (sharpening 3×3, S=1, P=0)
→ Feature Map 2×2 [26, −12, −3, 21]
→ ReLU → [26, 0, 0, 21]
→ MaxPool (2×2, S=2) → single value 26.
The sharpening kernel amplified the two "high-contrast" patches
(strong neighbours) and suppressed the rest. ReLU removed the two negative
responses. Max pool selected the strongest — 26.
Side-by-Side Pipeline Summary
| Stage | Numerical 1 (5×5 input) | Numerical 2 (4×4 input) |
|---|---|---|
| Input | 5×5 = 25 values | 4×4 = 16 values |
| Kernel | 3×3 vertical edge detector [1,0,−1 / 1,0,−1 / 1,0,−1] |
3×3 sharpening [0,−1,0 / −1,5,−1 / 0,−1,0] |
| After Conv | 3×3 feature map [−6,14,12 / −2,9,5 / 4,7,3] |
2×2 feature map [26,−12 / −3,21] |
| Negatives | 2 values (−6, −2) | 2 values (−12, −3) |
| After ReLU | [0,14,12 / 0,9,5 / 4,7,3] |
[26,0 / 0,21] |
| Pool Config | 2×2, Stride 1 → overlapping | 2×2, Stride 2 → non-overlapping |
| Final Output | [[14,14],[9,9]] — 2×2 |
[[26]] — single scalar |
Python — Verify Both Pipelines
import numpy as np
# ── Convolution (cross-correlation, no flip) ──────────────────
def conv2d(x, k):
"""No padding, stride 1."""
KH, KW = k.shape
OH, OW = x.shape[0]-KH+1, x.shape[1]-KW+1
out = np.zeros((OH, OW))
for i in range(OH):
for j in range(OW):
out[i, j] = np.sum(x[i:i+KH, j:j+KW] * k)
return out
# ── ReLU ──────────────────────────────────────────────────────
def relu(x):
return np.maximum(0, x)
# ── Max Pool ──────────────────────────────────────────────────
def max_pool(x, pool=2, stride=2):
OH = (x.shape[0] - pool) // stride + 1
OW = (x.shape[1] - pool) // stride + 1
out = np.zeros((OH, OW))
for i in range(OH):
for j in range(OW):
out[i, j] = x[i*stride:i*stride+pool, j*stride:j*stride+pool].max()
return out
# ═══════════════════════════════════════════════
# NUMERICAL 1 — 5×5 input, vertical edge kernel
# ═══════════════════════════════════════════════
inp1 = np.array([
[1,2,3,0,1],
[4,5,6,1,2],
[7,8,9,0,3],
[2,1,0,4,5],
[6,3,2,1,0]
])
k1 = np.array([[1,0,-1],[1,0,-1],[1,0,-1]])
fm1 = conv2d(inp1, k1)
r1 = relu(fm1)
p1 = max_pool(r1, pool=2, stride=1)
print("N1 Feature Map:\n", fm1)
print("N1 After ReLU:\n", r1)
print("N1 Max Pool output:\n", p1)
# ═══════════════════════════════════════════════
# NUMERICAL 2 — 4×4 input, sharpening kernel
# ═══════════════════════════════════════════════
inp2 = np.array([
[2,4,1,3],
[5,8,2,6],
[1,3,7,4],
[0,2,5,9]
])
k2 = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]])
fm2 = conv2d(inp2, k2)
r2 = relu(fm2)
p2 = max_pool(r2, pool=2, stride=2)
print("N2 Feature Map:\n", fm2)
print("N2 After ReLU:\n", r2)
print("N2 Max Pool output:\n", p2)