The Story That Explains Standard Deviation
You are a quality control manager at a biscuit factory. Every biscuit should weigh exactly 20 grams. At the end of the day two machines both produced biscuits with a mean weight of 20g. Your boss is happy. But you check the individual weights.
| Biscuit | Machine A (g) | Machine B (g) |
|---|---|---|
| 1 | 19.8 | 14.0 |
| 2 | 20.1 | 26.0 |
| 3 | 20.0 | 18.0 |
| 4 | 19.9 | 22.0 |
| 5 | 20.2 | 20.0 |
Machine A: mean = 20g, std dev = 0.15g — nearly perfect every time.
Machine B: mean = 20g, std dev = 4.36g — all over the place.
A biscuit weighing 14g is underweight — customers complain. A biscuit weighing 26g breaks the packaging and wastes ingredients. The mean looks fine but standard deviation reveals Machine B is broken. This is why standard deviation is used in Six Sigma quality control across every major manufacturing company in the world.
What Standard Deviation Actually Means
Standard deviation answers one simple question: "On average, how far is each value from the mean?"
Standard deviation is simply the square root of variance. That one operation makes all the difference — it brings the unit back to the same scale as your original data. Variance of Machine A is 0.023 g² which is meaningless. Standard deviation is 0.15g — now you can say "biscuits vary by about 0.15 grams on average."
Step-by-Step Calculation
Dataset — exam scores of 8 students:
[52, 74, 68, 90, 61, 85, 72, 78]
(52 + 74 + 68 + 90 + 61 + 85 + 72 + 78) / 8 = 580 / 8 = 72.5
(52−72.5)² = 420.25
(74−72.5)² = 2.25
(68−72.5)² = 20.25
(90−72.5)² = 306.25
(61−72.5)² = 132.25
(85−72.5)² = 156.25
(72−72.5)² = 0.25
(78−72.5)² = 30.25
420.25 + 2.25 + 20.25 + 306.25 + 132.25 + 156.25 + 0.25 + 30.25 = 1068
1068 / (8 − 1) = 1068 / 7 = 152.57
√152.57 = 12.35
The mean exam score is 72.5 and the standard deviation is 12.35. This tells you that on average, students scored within about 12 marks of the mean — either above or below. Most students scored between 60 and 85.
Python Implementation
Manual calculation
scores = [52, 74, 68, 90, 61, 85, 72, 78]
n = len(scores)
mean = sum(scores) / n
squared_diffs = [(x - mean) ** 2 for x in scores]
variance = sum(squared_diffs) / (n - 1)
std_dev = variance ** 0.5
print(f"Mean: {mean}") # 72.5
print(f"Variance: {variance:.2f}") # 152.57
print(f"Std Dev: {std_dev:.2f}") # 12.35
Using the statistics module
import statistics
scores = [52, 74, 68, 90, 61, 85, 72, 78]
print(statistics.mean(scores)) # 72.5
print(statistics.variance(scores)) # 152.57 (sample)
print(statistics.stdev(scores)) # 12.35 (sample std dev)
print(statistics.pstdev(scores)) # 11.57 (population std dev)
Using NumPy
import numpy as np
scores = [52, 74, 68, 90, 61, 85, 72, 78]
print(np.mean(scores)) # 72.5
print(np.std(scores, ddof=1)) # 12.35 (sample)
print(np.std(scores, ddof=0)) # 11.57 (population)
print(np.var(scores, ddof=1)) # 152.57 (sample variance)
Complete comparison
import numpy as np
machine_a = [19.8, 20.1, 20.0, 19.9, 20.2]
machine_b = [14.0, 26.0, 18.0, 22.0, 20.0]
for name, data in [("Machine A", machine_a), ("Machine B", machine_b)]:
mean = np.mean(data)
std_dev = np.std(data, ddof=1)
print(f"{name}: mean={mean:.1f}g std_dev={std_dev:.2f}g")
# Machine A: mean=20.0g std_dev=0.15g
# Machine B: mean=20.0g std_dev=4.36g
The 68 – 95 – 99.7 Rule
When data follows a normal distribution (bell curve), standard deviation lets you predict exactly what percentage of values fall within any range. This is called the Empirical Rule.
- 68% of all values
- The "normal" range
- Mean ± 1 std dev
- 95% of all values
- Covers almost everyone
- Mean ± 2 std devs
- 99.7% of all values
- Beyond = extreme outlier
- Mean ± 3 std devs
Real Example — Adult Male Heights
Adult male heights follow a normal distribution with
mean = 175 cm and std dev = 7 cm.
| Range | Calculation | Heights | % of men |
|---|---|---|---|
| ±1σ | 175 ± 7 | 168 cm – 182 cm | 68% |
| ±2σ | 175 ± 14 | 161 cm – 189 cm | 95% |
| ±3σ | 175 ± 21 | 154 cm – 196 cm | 99.7% |
| Beyond ±3σ | — | <154 cm or >196 cm | 0.3% |
import numpy as np
mean = 175 # cm
std_dev = 7 # cm
print(f"68% range: {mean - std_dev} – {mean + std_dev} cm")
# 68% range: 168 – 182 cm
print(f"95% range: {mean - 2*std_dev} – {mean + 2*std_dev} cm")
# 95% range: 161 – 189 cm
print(f"99.7% range: {mean - 3*std_dev} – {mean + 3*std_dev} cm")
# 99.7% range: 154 – 196 cm
Standard Deviation in Machine Learning
Standard deviation is not just a statistics concept — it powers several core machine learning techniques you will use every day.
Feature Scaling — StandardScaler
Many ML algorithms (SVM, KNN, Neural Networks) are sensitive to the scale of features. StandardScaler transforms each feature so that it has mean = 0 and std dev = 1. This is called Z-score normalisation.
from sklearn.preprocessing import StandardScaler
import numpy as np
# Raw features — very different scales
heights = [[160], [175], [180], [155], [190]] # cm
weights = [[55], [70], [80], [50], [95]] # kg
scaler = StandardScaler()
scaled = scaler.fit_transform(heights)
print("Original heights:", [h[0] for h in heights])
print("Scaled heights: ", [round(s[0], 2) for s in scaled])
# Original heights: [160, 175, 180, 155, 190]
# Scaled heights: [-0.91, 0.15, 0.54, -1.3, 1.52]
Anomaly Detection
import numpy as np
# Server response times (ms)
response_times = [120, 118, 122, 119, 121, 120, 118, 500, 123, 119]
mean = np.mean(response_times)
std_dev = np.std(response_times, ddof=1)
print(f"Mean: {mean:.1f} ms")
print(f"Std Dev: {std_dev:.1f} ms")
# Any value more than 3 std devs from mean is an anomaly
threshold = mean + 3 * std_dev
anomalies = [t for t in response_times if t > threshold]
print(f"Anomalies detected: {anomalies}")
# Anomalies detected: [500]
This exact technique — flagging values beyond 3 standard deviations — is used by banks to detect fraudulent transactions, by hospitals to flag abnormal lab results, and by DevOps teams to detect server incidents. A response time of 500ms when the mean is 120ms and std dev is 11ms is 34 standard deviations away. That is not normal latency — that is an outage.
Variance vs Standard Deviation — Summary
| Property | Variance | Standard Deviation |
|---|---|---|
| Symbol | σ² or s² |
σ or s |
| Formula | Mean of squared differences | Square root of variance |
| Unit | Squared (e.g. g², mins²) | Same as data (e.g. g, mins) |
| Interpretable? | Harder | Easy |
| Used in | PCA, ANOVA, math proofs | Reporting, scaling, outlier detection |
| Sensitive to outliers? | Yes | Yes |