The Archaeologist, The Doctor & The Opinion Poll
In 1991, hikers in the Alps stumbled upon a 5,300-year-old frozen corpse — now famous as Ötzi the Iceman. Scientists could not travel back in time to watch him die. But using carbon-14 decay rates from bone samples, they estimated his age of death at approximately 45 years old. Not a guess. Not a certainty. A precise, evidence-based estimate with a measurable margin of error.
A cardiologist measures a patient's blood pressure three times and reports: "Your average systolic pressure is 138 mmHg — we should investigate." She did not measure every heartbeat in the patient's lifetime. She took a sample and made an inference about the patient's true underlying pressure.
Before an election, a poll reports: "Party A will win 43% of votes — margin of error ±3%, 95% confidence." That single sentence contains the entire machinery of estimation theory: a point estimate (43%), an interval estimate (40%–46%), a confidence level (95%), and a margin of error (±3%). Understanding what each of those numbers actually means — and what they do not mean — is what this tutorial is about.
Estimation theory is the formal statistical framework that tells us how to make those estimates, how precise they are, and how confident we should be in them. It is the engine behind clinical trials, quality control, A/B testing, credit risk models, and virtually every data-driven decision.
The Two Branches of Estimation
When you collect a sample and want to say something about the population, you have two tools: give a single best-guess number, or give a range of plausible values. These are the two branches of estimation — and you almost always need both.
- One number as best guess
- e.g. x̄ estimates μ
- Simple but no uncertainty info
- E.g. "mean salary = ₹42,000"
- Lower and upper bounds
- Captures uncertainty
- Built on sampling distribution
- E.g. "₹39,500 to ₹44,500"
- How often the interval works
- NOT probability the true μ is inside
- Higher = wider interval
- 95% is the scientific standard
A 95% confidence interval does NOT mean there is a 95% probability the true population mean lies inside this specific interval. The true mean is fixed — it either is or is not in the interval. What 95% means is: if we repeated this sampling process 100 times, approximately 95 of the 100 intervals we constructed would contain the true mean. The confidence is in the method, not in any single interval.
Point Estimation — The Best Single Guess
A point estimator is a formula (a statistic) that uses sample data to produce a single number as the best estimate of an unknown population parameter. The most common estimators are:
| Population Parameter | Symbol | Point Estimator | Sample Statistic |
|---|---|---|---|
| Population Mean | μ | Sample Mean | x̄ = Σxᵢ / n |
| Population Variance | σ² | Sample Variance | s² = Σ(xᵢ−x̄)² / (n−1) |
| Population Std Dev | σ | Sample Std Dev | s = √[Σ(xᵢ−x̄)²/(n−1)] |
| Population Proportion | P | Sample Proportion | p̂ = x / n |
Properties of a Good Estimator
| Property | What It Means | Example |
|---|---|---|
| Unbiasedness | On average, the estimator equals the true parameter | E(x̄) = μ | x̄ is unbiased for μ |
| Consistency | Estimate converges to true value as n → ∞ | Larger samples give better estimates |
| Efficiency | Minimum variance among all unbiased estimators | x̄ is more efficient than the median for normal data |
| Sufficiency | Uses all available information in the sample | x̄ is sufficient for μ in a normal population |
When you compute deviations from the sample mean x̄ instead of the true population mean μ, you systematically underestimate spread because x̄ is closer to the sample points than μ is. Dividing by n−1 instead of n corrects this — it inflates the estimate just enough to make E(s²) = σ². This is why ddof=1 is always correct for sample data.
Interval Estimation & Confidence Intervals ⭐
A point estimate is useful but incomplete — it tells you the best single guess but says nothing about how uncertain that guess is. A sample mean of ₹42,000 from 10 workers feels much less reliable than ₹42,000 from 10,000 workers. Interval estimation captures that uncertainty by constructing a range of plausible values for the parameter.
Increasing confidence level widens the interval. If someone asks for a 100% confidence interval, it would span from −∞ to +∞ — perfectly certain but completely useless. The art of estimation is choosing a confidence level high enough to be trustworthy (usually 95%) while keeping the interval narrow enough to be actionable and informative.
Margin of Error — The ± Number Everyone Sees
The Margin of Error (MoE) is the half-width of a confidence interval. It is the ±3% you see in election polls, the ±0.5°C in weather forecasts, the ±2 kg in clinical trials. It tells you how far your point estimate might stray from the true population value in either direction.
MoE = 1/√n for proportions with 95% confidence and p̂=0.5 (worst case). At n=1,000: MoE = 1/√1000 ≈ ±3.2% — precise enough for most political decisions at a reasonable cost. At n=10,000: MoE ≈ ±1.0% — 10× more people but only 3× more precision. That is why serious polls rarely exceed 2,000 respondents. Beyond that, cost grows faster than precision.
Step-by-Step: Building a Confidence Interval
A quality engineer samples 36 LED bulbs from a production line and measures their lifespan (hours). She wants to estimate the true average lifespan of all bulbs produced with a 95% confidence interval.
| Sample Data | Value |
|---|---|
| Sample size (n) | 36 bulbs |
| Sample mean (x̄) | 1,480 hours |
| Sample std dev (s) | 120 hours |
| Confidence level | 95% |
| σ (population std dev) | Unknown → use t-interval |
Degrees of freedom df = n − 1 = 36 − 1 = 35.
For 95% CI with df=35: t* = 2.030 (from t-table or
scipy.stats.t.ppf(0.975, df=35)).
SE = s / √n = 120 / √36 = 120 / 6 = 20 hours
MoE = t* × SE = 2.030 × 20 = 40.6 hours
Lower = x̄ − MoE = 1,480 − 40.6 = 1,439.4 hours
Upper = x̄ + MoE = 1,480 + 40.6 = 1,520.6 hours
95% CI = (1,439.4 , 1,520.6) hours
Narrower CI (more precise): larger n ↑, smaller s ↓, lower confidence level ↓.
Wider CI (less precise): smaller n ↓, larger s ↑, higher confidence level ↑.
In practice, the most powerful lever is sample size — doubling n
shrinks MoE by a factor of √2 ≈ 1.41.
Python Implementation
Point Estimation
import numpy as np
from scipy import stats
np.random.seed(42)
# Sample: lifespan of 36 LED bulbs (hours)
sample = np.random.normal(loc=1480, scale=120, size=36)
# Point estimates
x_bar = np.mean(sample) # estimates μ
s = np.std(sample, ddof=1) # estimates σ (unbiased, ddof=1)
n = len(sample)
print(f"Point estimate of μ (sample mean x̄): {x_bar:.2f} hrs")
print(f"Point estimate of σ (sample std s): {s:.2f} hrs")
print(f"Sample size n: {n}")
# Output (approx):
# Point estimate of μ (sample mean x̄): 1476.84 hrs
# Point estimate of σ (sample std s): 118.43 hrs
Confidence Interval — Using SciPy t-Distribution
from scipy import stats
import numpy as np
np.random.seed(42)
sample = np.random.normal(loc=1480, scale=120, size=36)
x_bar = np.mean(sample)
s = np.std(sample, ddof=1)
n = len(sample)
se = s / np.sqrt(n)
# 95% CI using t-distribution (σ unknown)
ci_95 = stats.t.interval(confidence=0.95, df=n-1, loc=x_bar, scale=se)
# 99% CI
ci_99 = stats.t.interval(confidence=0.99, df=n-1, loc=x_bar, scale=se)
# Critical t* values
t_95 = stats.t.ppf(0.975, df=n-1)
t_99 = stats.t.ppf(0.995, df=n-1)
print(f"Standard Error (SE): {se:.2f} hrs")
print(f"t* for 95% CI (df=35): {t_95:.4f}")
print(f"Margin of Error 95%: ±{t_95 * se:.2f} hrs")
print(f"95% CI: ({ci_95[0]:.2f}, {ci_95[1]:.2f}) hrs")
print(f"99% CI: ({ci_99[0]:.2f}, {ci_99[1]:.2f}) hrs")
Confidence Interval for a Proportion
import numpy as np
from scipy import stats
# Scenario: 420 out of 1,000 voters prefer Party A
n = 1000
x = 420 # successes
p_hat = x / n # 0.42
se_prop = np.sqrt(p_hat * (1 - p_hat) / n)
# 95% CI for proportion (z-interval — n is large)
z_star = stats.norm.ppf(0.975) # 1.95996...
moe = z_star * se_prop
ci_low = p_hat - moe
ci_hi = p_hat + moe
print(f"Sample proportion p̂: {p_hat:.4f} ({p_hat*100:.1f}%)")
print(f"Standard Error: {se_prop:.4f}")
print(f"Critical z*: {z_star:.4f}")
print(f"Margin of Error: ±{moe:.4f} (±{moe*100:.2f}%)")
print(f"95% CI: ({ci_low*100:.2f}%, {ci_hi*100:.2f}%)")
# Output:
# Sample proportion p̂: 0.4200 (42.0%)
# Standard Error: 0.0156
# Critical z*: 1.9600
# Margin of Error: ±0.0306 (±3.06%)
# 95% CI: (38.94%, 45.06%)
Required Sample Size Calculation
import numpy as np
from scipy import stats
# How large a sample do we need for a desired Margin of Error?
z_star = stats.norm.ppf(0.975) # 1.96 for 95% CI
sigma = 120 # estimated std dev (from pilot study)
desired_moe = 25 # we want MoE ≤ 25 hours
# n = (z* × σ / MoE)²
n_required = (z_star * sigma / desired_moe) ** 2
print(f"Required sample size: n ≥ {np.ceil(n_required):.0f}")
# Output: Required sample size: n ≥ 89
# For proportion — worst case p̂ = 0.5
desired_moe_prop = 0.03 # ±3%
n_prop = (z_star * 0.5 / desired_moe_prop) ** 2
print(f"Required n for ±3% prop CI: n ≥ {np.ceil(n_prop):.0f}")
# Output: Required n for ±3% prop CI: n ≥ 1068
Use z-interval when: population σ is known (very rare)
OR n > 100 (t approaches z anyway).
Use t-interval when: σ is unknown AND n ≤ 100 — which is
almost every real-world scenario.
When in doubt, always use the t-interval — it gives a slightly wider (more
conservative) interval and is never wrong to use.
Estimation Methods — Comparison Table
| Property | Point Estimation | Interval Estimation (CI) |
|---|---|---|
| Output | Single number (x̄, p̂, s) | Range (lower, upper bound) |
| Expresses uncertainty? | No | Yes |
| Easy to communicate? | Very easy | Requires explanation |
| Accounts for sample size? | No | Yes — via SE = s/√n |
| Useful for decisions? | Partially | Fully |
| Required for hypothesis tests? | As input | Directly equivalent |
| Confidence Level | z* (known σ) | α (significance) | Tail Area Each Side | Typical Use |
|---|---|---|---|---|
| 90% | 1.645 | 0.10 | 5% | Exploratory studies, quick estimates |
| 95% | 1.960 | 0.05 | 2.5% | Scientific standard — most common |
| 99% | 2.576 | 0.01 | 0.5% | Medical trials, high-stakes decisions |
| 99.9% | 3.291 | 0.001 | 0.05% | Safety-critical systems (aviation, nuclear) |
Golden Rules
stats.t.interval() in Python unless n > 100 or σ is truly known.
np.std(data, ddof=1) gives the unbiased sample standard deviation s.
Using ddof=0 underestimates σ and produces a CI that is too narrow —
a falsely precise result that understates real uncertainty.
n = (z* × σ / MoE)² to determine how many observations you need
to achieve a desired precision. Running a study and then discovering the CI is
too wide to be useful wastes time and resources that cannot be recovered.