Backpropagation Step by Step

Section 01

The Network — Read the Diagram

The image above shows a neural network with 2 inputs → 2 hidden neurons → 1 output. Below is an exact reproduction with all weights labelled. We will solve this network completely — forward pass, loss, then full backpropagation — and the animated player lets you step through each computation exactly as you would on paper.

📌

Network Parameters (from the diagram)

Inputs: x₁ = 0.35, x₂ = 0.7
Layer 1 weights: w₁,₁ = 0.2 (x₁→h₁), w₂,₁ = 0.2 (x₂→h₁), w₁,₂ = 0.3 (x₁→h₂), w₂,₂ = 0.3 (x₂→h₂)
Layer 2 weights: w₁,₃ = 0.3 (h₁→o₃), w₂,₃ = 0.9 (h₂→o₃)
Activation: Sigmoid | Target y = 1.0 | Loss: MSE

Section 02

Interactive Animated Step-Through

Press ▶ Auto Play to watch the computation animate, or use ← → to step through manually at your own pace. Each step shows the exact formula and result as you would write it on paper.

⬛ Ready

Step 0 / 14

▸ PRESS PLAY OR STEP THROUGH TO BEGIN

Network ready — all weights loaded

Use the controls above to step through every computation. Each step shows the exact formula and numerical result — exactly as you would write it on paper.

Section 03

Forward Pass — Complete Paper-Style Solution

Here is every calculation written out exactly as you would show it in an exam or on paper. No shortcuts. No skipping. Every intermediate value stated explicitly.

① Hidden Layer — Neuron h₁

🔵 h₁: Weighted Sum + Sigmoid

NET

z_h1 = w₁,₁ · x₁ + w₂,₁ · x₂
z_h1 = (0.2)(0.35) + (0.2)(0.70)
z_h1 = 0.0700 + 0.1400 = 0.2100

ACT

a_h1 = σ(z_h1) = 1 / (1 + e⁻⁰·²¹)
a_h1 = 1 / (1 + 0.8106) = 1 / 1.8106 = 0.5523

② Hidden Layer — Neuron h₂

🔵 h₂: Weighted Sum + Sigmoid

NET

z_h2 = w₁,₂ · x₁ + w₂,₂ · x₂
z_h2 = (0.3)(0.35) + (0.3)(0.70)
z_h2 = 0.1050 + 0.2100 = 0.3150

ACT

a_h2 = σ(z_h2) = 1 / (1 + e⁻⁰·³¹⁵)
a_h2 = 1 / (1 + 0.7298) = 1 / 1.7298 = 0.5781

③ Output Layer — Neuron o₃

🟢 o₃: Weighted Sum + Sigmoid + Loss

NET

z_o3 = w₁,₃ · a_h1 + w₂,₃ · a_h2
z_o3 = (0.3)(0.5523) + (0.9)(0.5781)
z_o3 = 0.1657 + 0.5203 = 0.6860

ACT

ŷ = a_o3 = σ(z_o3) = 1 / (1 + e⁻⁰·⁶⁸⁶)
ŷ = 1 / (1 + 0.5037) = 1 / 1.5037 = 0.6651

LOSS

L = ½(ŷ − y)² = ½(0.6651 − 1.0)²
L = ½(−0.3349)² = ½ × 0.1122 = 0.0561

Neuron	Input Sum (z)	Activation σ(z)	Note
h₁	0.2100	0.5523	First hidden neuron
h₂	0.3150	0.5781	Second hidden neuron
o₃	0.6860	0.6651	Prediction ŷ
Loss L	0.0561		½(ŷ − 1.0)²

Section 04

Backward Pass — Full Chain Rule Derivation

📐

Sigmoid Derivative — Key Formula

σ'(z) = σ(z) × (1 − σ(z))
This means you never need to recompute e⁻ᶻ — just reuse the stored activation value. For any neuron with activation a: σ'(z) = a × (1 − a)

① Output Error Signal δ_o3

🔴 Step B1: δ at the output neuron

dL/dŷ

Derivative of MSE loss:
dL/dŷ = ŷ − y = 0.6651 − 1.0 = −0.3349

σ'(z_o3)

Sigmoid derivative at output:
σ'(z_o3) = a_o3 × (1 − a_o3) = 0.6651 × (1 − 0.6651)
= 0.6651 × 0.3349 = 0.2228

δ_o3

Output error signal (chain rule):
δ_o3 = dL/dŷ × σ'(z_o3) = (−0.3349) × 0.2228 = −0.074617

② Output-Layer Weight Gradients

dL / dw₁,₃

δ_o3 × a_h1

= −0.074617 × 0.5523

= −0.041212

Gradient for the weight connecting h₁ to o₃

dL / dw₂,₃

δ_o3 × a_h2

= −0.074617 × 0.5781

= −0.043140

Gradient for the weight connecting h₂ to o₃

③ Propagate Error to Hidden Layer

🔴 Step B2: Error at h₁ and h₂

→h₁

dL/da_h1 = δ_o3 × w₁,₃ = (−0.074617) × 0.3 = −0.022385

σ'(z_h1)

σ'(z_h1) = a_h1 × (1 − a_h1) = 0.5523 × 0.4477 = 0.2473

δ_h1

δ_h1 = dL/da_h1 × σ'(z_h1) = (−0.022385) × 0.2473 = −0.005536

→h₂

dL/da_h2 = δ_o3 × w₂,₃ = (−0.074617) × 0.9 = −0.067155

σ'(z_h2)

σ'(z_h2) = a_h2 × (1 − a_h2) = 0.5781 × 0.4219 = 0.2439

δ_h2

δ_h2 = dL/da_h2 × σ'(z_h2) = (−0.067155) × 0.2439 = −0.016380

④ Input-Layer Weight Gradients (All 4 weights)

dL / dw₁,₁

δ_h1 × x₁

= −0.005536 × 0.35

= −0.001938

x₁ → h₁ weight gradient

dL / dw₂,₁

δ_h1 × x₂

= −0.005536 × 0.70

= −0.003875

x₂ → h₁ weight gradient

dL / dw₁,₂

δ_h2 × x₁

= −0.016380 × 0.35

= −0.005733

x₁ → h₂ weight gradient

dL / dw₂,₂

δ_h2 × x₂

= −0.016380 × 0.70

= −0.011466

x₂ → h₂ weight gradient

Section 05

Weight Update — Before & After (η = 0.5)

Rule: W_new = W_old − η × (dL/dW) Applied to all 6 weights simultaneously.

Weight	Connection	Old Value	Gradient	η × Gradient	New Value	Change
w₁,₁	x₁ → h₁	0.2000	−0.001938	−0.000969	0.2010	↑ +0.0010
w₂,₁	x₂ → h₁	0.2000	−0.003875	−0.001938	0.2019	↑ +0.0019
w₁,₂	x₁ → h₂	0.3000	−0.005733	−0.002867	0.3029	↑ +0.0029
w₂,₂	x₂ → h₂	0.3000	−0.011466	−0.005733	0.3057	↑ +0.0057
w₁,₃	h₁ → o₃	0.3000	−0.041212	−0.020606	0.3206	↑ +0.0206
w₂,₃	h₂ → o₃	0.9000	−0.043140	−0.021570	0.9216	↑ +0.0216

💡

All gradients are negative → all weights increase

Since ŷ = 0.665 was below the target y = 1.0, the network needed to predict higher. All gradients are negative, so subtracting them (W − η × negative) makes all weights increase. A larger network output on the next forward pass — exactly what we needed. Gradient descent is working correctly.

Section 06

Python Verification — All Numbers Confirmed

import numpy as np

# ── Network from the diagram ──────────────────────────────
x1, x2   = 0.35, 0.70
w11, w21 = 0.2, 0.2   # to h1
w12, w22 = 0.3, 0.3   # to h2
w13, w23 = 0.3, 0.9   # to o3
y        = 1.0
lr       = 0.5

def sig(z):  return 1 / (1 + np.exp(-z))
def sigD(z): s = sig(z); return s * (1 - s)

# ── FORWARD PASS ──────────────────────────────────────────
z_h1 = w11*x1 + w21*x2          # 0.21
a_h1 = sig(z_h1)

z_h2 = w12*x1 + w22*x2          # 0.315
a_h2 = sig(z_h2)

z_o3 = w13*a_h1 + w23*a_h2
a_o3 = sig(z_o3)                 # ŷ

loss = 0.5 * (a_o3 - y)**2

print("=== FORWARD PASS ===")
print(f"z_h1 = {z_h1:.4f}  a_h1 = {a_h1:.4f}")
print(f"z_h2 = {z_h2:.4f}  a_h2 = {a_h2:.4f}")
print(f"z_o3 = {z_o3:.4f}  y_hat = {a_o3:.4f}")
print(f"Loss = {loss:.4f}")

# ── BACKWARD PASS ─────────────────────────────────────────
dL_do3 = a_o3 - y                # dL/dŷ
d_o3   = dL_do3 * sigD(z_o3)    # δ_o3
dW13   = d_o3 * a_h1
dW23   = d_o3 * a_h2

dL_ah1 = d_o3 * w13
dL_ah2 = d_o3 * w23
d_h1   = dL_ah1 * sigD(z_h1)   # δ_h1
d_h2   = dL_ah2 * sigD(z_h2)   # δ_h2
dW11   = d_h1 * x1
dW21   = d_h1 * x2
dW12   = d_h2 * x1
dW22   = d_h2 * x2

print("\n=== BACKWARD PASS ===")
print(f"δ_o3  = {d_o3:.6f}")
print(f"dW13  = {dW13:.6f}   dW23 = {dW23:.6f}")
print(f"δ_h1  = {d_h1:.6f}   δ_h2 = {d_h2:.6f}")
print(f"dW11  = {dW11:.6f}   dW21 = {dW21:.6f}")
print(f"dW12  = {dW12:.6f}   dW22 = {dW22:.6f}")

# ── WEIGHT UPDATES (η = 0.5) ───────────────────────────────
print("\n=== UPDATED WEIGHTS ===")
print(f"w11: {w11:.4f} → {w11 - lr*dW11:.4f}")
print(f"w21: {w21:.4f} → {w21 - lr*dW21:.4f}")
print(f"w12: {w12:.4f} → {w12 - lr*dW12:.4f}")
print(f"w22: {w22:.4f} → {w22 - lr*dW22:.4f}")
print(f"w13: {w13:.4f} → {w13 - lr*dW13:.4f}")
print(f"w23: {w23:.4f} → {w23 - lr*dW23:.4f}")

OUTPUT

=== FORWARD PASS === z_h1 = 0.2100 a_h1 = 0.5523 z_h2 = 0.3150 a_h2 = 0.5781 z_o3 = 0.6861 y_hat = 0.6651 Loss = 0.0561 === BACKWARD PASS === δ_o3 = -0.074617 dW13 = -0.041212 dW23 = -0.043140 δ_h1 = -0.005536 δ_h2 = -0.016380 dW11 = -0.001938 dW21 = -0.003875 dW12 = -0.005733 dW22 = -0.011466 === UPDATED WEIGHTS === w11: 0.2000 → 0.2010 w21: 0.2000 → 0.2019 w12: 0.3000 → 0.3029 w22: 0.3000 → 0.3057 w13: 0.3000 → 0.3206 w23: 0.9000 → 0.9216

Section 07

Paper-Exam Cheat-Sheet — The 8-Step Recipe

📋 Solve Any Small Network in 8 Steps

Compute z for every hidden neuron: z = Σ(wᵢ · xᵢ). Sum of (weight × input) for each incoming connection. No activation yet.

Apply activation to get a: a = σ(z) = 1/(1+e⁻ᶻ). Store both z and a — you need both in the backward pass.

Repeat steps 1–2 for every layer until you reach the output. The output neuron's activation is your prediction ŷ.

Compute the loss: L = ½(ŷ − y)² for MSE, or −y·log(ŷ) for cross-entropy.

Start backprop at the output: δ_output = (ŷ − y) × σ'(z_output) = (ŷ − y) × ŷ × (1 − ŷ).

Compute weight gradients at the output layer: dL/dW = δ_output × a_hidden. One gradient per weight.

Propagate δ backward: δ_hidden = (δ_output × W_to_output) × σ'(z_hidden). Then compute dL/dW = δ_hidden × input for each input-layer weight.

Update all weights simultaneously: W_new = W_old − η × (dL/dW). Use the same η for all weights in one step.

🧠

Memory Trick — "ZASA-ΔWWU"