The Story: A Whisper Telephone Through Many Rooms
By the last room, the original pixels have been transformed into something much more abstract: a sentence of probabilities — "90% cat, 7% fox, 3% dog."
That journey — input → weighted sums → activations → output — is forward propagation. Nothing learns yet. It is pure, deterministic arithmetic flowing in one direction.
Forward propagation is the process of passing an input through every layer of a neural network — computing weighted sums and applying activations — to produce a final prediction. No weights change during the forward pass.
The Computation Graph — Animated Flow
Each layer is a station. Data flows strictly left → right. Every station performs two operations: an affine transformation and an activation. The graph below animates the full forward pass.
The Four Core Operations
Numerical 1 — Single Neuron, One Layer
A neuron receives inputs x = [2, 3]ᵀ, weights W = [0.5, −0.4], bias b = 1. Activation: ReLU.
Numerical 2 — Full 2-Layer Network + Softmax
Input: 2 neurons | Hidden: 2 neurons (ReLU) | Output: 2 neurons (Softmax) — binary classification.
z¹₂ = 0.3×1 + 0.4×2 = 0.3 + 0.8 = 1.1
∴ z¹ = [0.5, 1.1]ᵀ
a¹₂ = ReLU(1.1) = 1.1
∴ a¹ = [0.5, 1.1]ᵀ (both positive, unchanged)
z²₂ = (−0.1)×0.5 + 0.6×1.1 = −0.05 + 0.66 = 0.61
∴ z² = [−0.08, 0.61]ᵀ
ŷ₁ = 0.923 ÷ 2.763 ≈ 0.334 → 33.4%
ŷ₂ = 1.840 ÷ 2.763 ≈ 0.666 → 66.6%
✅ Sum = 1.000 — valid probability distribution
The network predicts Class 1 with 66.6% confidence. These are random weights — no learning has happened yet. Backpropagation will later adjust W¹, W², b¹, b² to improve this output.
Python Implementation
import numpy as np
# ── Inputs and Weights ────────────────────────────────
x = np.array([1, 2], dtype=float)
W1 = np.array([[0.1, 0.2],
[0.3, 0.4]])
b1 = np.zeros(2)
W2 = np.array([[ 0.5, -0.3],
[-0.1, 0.6]])
b2 = np.zeros(2)
# ── Activation helpers ────────────────────────────────
def relu(z):
return np.maximum(0, z)
def softmax(z):
e = np.exp(z - np.max(z)) # subtract max for numerical stability
return e / e.sum()
# ── Forward Propagation ───────────────────────────────
z1 = W1 @ x + b1 # Layer 1 affine
a1 = relu(z1) # Layer 1 activation
z2 = W2 @ a1 + b2 # Layer 2 affine
y_hat = softmax(z2) # Softmax output
print(f"z1 = {z1}")
print(f"a1 = {a1}")
print(f"z2 = {z2}")
print(f"y_hat = {y_hat}")
print(f"Pred = Class {np.argmax(y_hat)}")
Golden Rules
e^(z − max(z)). This prevents numerical overflow with zero effect
on the final probabilities.