The Network — Read the Diagram
The image above shows a neural network with 2 inputs → 2 hidden neurons → 1 output. Below is an exact reproduction with all weights labelled. We will solve this network completely — forward pass, loss, then full backpropagation — and the animated player lets you step through each computation exactly as you would on paper.
Inputs: x₁ = 0.35, x₂ = 0.7
Layer 1 weights: w₁,₁ = 0.2 (x₁→h₁), w₂,₁ = 0.2 (x₂→h₁),
w₁,₂ = 0.3 (x₁→h₂), w₂,₂ = 0.3 (x₂→h₂)
Layer 2 weights: w₁,₃ = 0.3 (h₁→o₃), w₂,₃ = 0.9 (h₂→o₃)
Activation: Sigmoid | Target y = 1.0 | Loss: MSE
Interactive Animated Step-Through
Press ▶ Auto Play to watch the computation animate, or use ← → to step through manually at your own pace. Each step shows the exact formula and result as you would write it on paper.
Forward Pass — Complete Paper-Style Solution
Here is every calculation written out exactly as you would show it in an exam or on paper. No shortcuts. No skipping. Every intermediate value stated explicitly.
① Hidden Layer — Neuron h₁
z_h1 = (0.2)(0.35) + (0.2)(0.70)
z_h1 = 0.0700 + 0.1400 = 0.2100
a_h1 = 1 / (1 + 0.8106) = 1 / 1.8106 = 0.5523
② Hidden Layer — Neuron h₂
z_h2 = (0.3)(0.35) + (0.3)(0.70)
z_h2 = 0.1050 + 0.2100 = 0.3150
a_h2 = 1 / (1 + 0.7298) = 1 / 1.7298 = 0.5781
③ Output Layer — Neuron o₃
z_o3 = (0.3)(0.5523) + (0.9)(0.5781)
z_o3 = 0.1657 + 0.5203 = 0.6860
ŷ = 1 / (1 + 0.5037) = 1 / 1.5037 = 0.6651
L = ½(−0.3349)² = ½ × 0.1122 = 0.0561
| Neuron | Input Sum (z) | Activation σ(z) | Note |
|---|---|---|---|
| h₁ | 0.2100 | 0.5523 | First hidden neuron |
| h₂ | 0.3150 | 0.5781 | Second hidden neuron |
| o₃ | 0.6860 | 0.6651 | Prediction ŷ |
| Loss L | 0.0561 | ½(ŷ − 1.0)² | |
Backward Pass — Full Chain Rule Derivation
σ'(z) = σ(z) × (1 − σ(z))
This means you never need to recompute e⁻ᶻ — just reuse the stored activation value.
For any neuron with activation a: σ'(z) = a × (1 − a)
① Output Error Signal δ_o3
dL/dŷ = ŷ − y = 0.6651 − 1.0 = −0.3349
σ'(z_o3) = a_o3 × (1 − a_o3) = 0.6651 × (1 − 0.6651)
= 0.6651 × 0.3349 = 0.2228
δ_o3 = dL/dŷ × σ'(z_o3) = (−0.3349) × 0.2228 = −0.074617
② Output-Layer Weight Gradients
③ Propagate Error to Hidden Layer
④ Input-Layer Weight Gradients (All 4 weights)
Weight Update — Before & After (η = 0.5)
Rule: W_new = W_old − η × (dL/dW) Applied to all 6 weights simultaneously.
| Weight | Connection | Old Value | Gradient | η × Gradient | New Value | Change |
|---|---|---|---|---|---|---|
| w₁,₁ | x₁ → h₁ | 0.2000 | −0.001938 | −0.000969 | 0.2010 | ↑ +0.0010 |
| w₂,₁ | x₂ → h₁ | 0.2000 | −0.003875 | −0.001938 | 0.2019 | ↑ +0.0019 |
| w₁,₂ | x₁ → h₂ | 0.3000 | −0.005733 | −0.002867 | 0.3029 | ↑ +0.0029 |
| w₂,₂ | x₂ → h₂ | 0.3000 | −0.011466 | −0.005733 | 0.3057 | ↑ +0.0057 |
| w₁,₃ | h₁ → o₃ | 0.3000 | −0.041212 | −0.020606 | 0.3206 | ↑ +0.0206 |
| w₂,₃ | h₂ → o₃ | 0.9000 | −0.043140 | −0.021570 | 0.9216 | ↑ +0.0216 |
Since ŷ = 0.665 was below the target y = 1.0, the network needed to predict higher. All gradients are negative, so subtracting them (W − η × negative) makes all weights increase. A larger network output on the next forward pass — exactly what we needed. Gradient descent is working correctly.
Python Verification — All Numbers Confirmed
import numpy as np
# ── Network from the diagram ──────────────────────────────
x1, x2 = 0.35, 0.70
w11, w21 = 0.2, 0.2 # to h1
w12, w22 = 0.3, 0.3 # to h2
w13, w23 = 0.3, 0.9 # to o3
y = 1.0
lr = 0.5
def sig(z): return 1 / (1 + np.exp(-z))
def sigD(z): s = sig(z); return s * (1 - s)
# ── FORWARD PASS ──────────────────────────────────────────
z_h1 = w11*x1 + w21*x2 # 0.21
a_h1 = sig(z_h1)
z_h2 = w12*x1 + w22*x2 # 0.315
a_h2 = sig(z_h2)
z_o3 = w13*a_h1 + w23*a_h2
a_o3 = sig(z_o3) # ŷ
loss = 0.5 * (a_o3 - y)**2
print("=== FORWARD PASS ===")
print(f"z_h1 = {z_h1:.4f} a_h1 = {a_h1:.4f}")
print(f"z_h2 = {z_h2:.4f} a_h2 = {a_h2:.4f}")
print(f"z_o3 = {z_o3:.4f} y_hat = {a_o3:.4f}")
print(f"Loss = {loss:.4f}")
# ── BACKWARD PASS ─────────────────────────────────────────
dL_do3 = a_o3 - y # dL/dŷ
d_o3 = dL_do3 * sigD(z_o3) # δ_o3
dW13 = d_o3 * a_h1
dW23 = d_o3 * a_h2
dL_ah1 = d_o3 * w13
dL_ah2 = d_o3 * w23
d_h1 = dL_ah1 * sigD(z_h1) # δ_h1
d_h2 = dL_ah2 * sigD(z_h2) # δ_h2
dW11 = d_h1 * x1
dW21 = d_h1 * x2
dW12 = d_h2 * x1
dW22 = d_h2 * x2
print("\n=== BACKWARD PASS ===")
print(f"δ_o3 = {d_o3:.6f}")
print(f"dW13 = {dW13:.6f} dW23 = {dW23:.6f}")
print(f"δ_h1 = {d_h1:.6f} δ_h2 = {d_h2:.6f}")
print(f"dW11 = {dW11:.6f} dW21 = {dW21:.6f}")
print(f"dW12 = {dW12:.6f} dW22 = {dW22:.6f}")
# ── WEIGHT UPDATES (η = 0.5) ───────────────────────────────
print("\n=== UPDATED WEIGHTS ===")
print(f"w11: {w11:.4f} → {w11 - lr*dW11:.4f}")
print(f"w21: {w21:.4f} → {w21 - lr*dW21:.4f}")
print(f"w12: {w12:.4f} → {w12 - lr*dW12:.4f}")
print(f"w22: {w22:.4f} → {w22 - lr*dW22:.4f}")
print(f"w13: {w13:.4f} → {w13 - lr*dW13:.4f}")
print(f"w23: {w23:.4f} → {w23 - lr*dW23:.4f}")
Paper-Exam Cheat-Sheet — The 8-Step Recipe
Z: compute pre-activation z |
A: activate → get a |
S: sum across layer |
A: again for next layer |
Δ: delta at output |
W: weight gradients |
W: propagate delta back |
U: update weights
Say it out loud and you will never forget the order of operations.