Foundations of Data Science 📂 Inferential Statistics · 2 of 8 32 min read

Estimation Theory

A complete guide to estimation theory in statistics — covering point vs interval estimation, how confidence intervals are built and interpreted, what margin of error really means, the confidence–width trade-off, and full Python implementations using NumPy, SciPy and the t-distribution. Includes three SVG diagrams, a real LED bulb step-by-step worked example, and stories from election polling, archaeology, and quality control.

Section 01

The Archaeologist, The Doctor & The Opinion Poll

In 1991, hikers in the Alps stumbled upon a 5,300-year-old frozen corpse — now famous as Ötzi the Iceman. Scientists could not travel back in time to watch him die. But using carbon-14 decay rates from bone samples, they estimated his age of death at approximately 45 years old. Not a guess. Not a certainty. A precise, evidence-based estimate with a measurable margin of error.

A cardiologist measures a patient's blood pressure three times and reports: "Your average systolic pressure is 138 mmHg — we should investigate." She did not measure every heartbeat in the patient's lifetime. She took a sample and made an inference about the patient's true underlying pressure.

💡
Every Estimate in the Real World Works This Way

Before an election, a poll reports: "Party A will win 43% of votes — margin of error ±3%, 95% confidence." That single sentence contains the entire machinery of estimation theory: a point estimate (43%), an interval estimate (40%–46%), a confidence level (95%), and a margin of error (±3%). Understanding what each of those numbers actually means — and what they do not mean — is what this tutorial is about.

Estimation theory is the formal statistical framework that tells us how to make those estimates, how precise they are, and how confident we should be in them. It is the engine behind clinical trials, quality control, A/B testing, credit risk models, and virtually every data-driven decision.


Section 02

The Two Branches of Estimation

When you collect a sample and want to say something about the population, you have two tools: give a single best-guess number, or give a range of plausible values. These are the two branches of estimation — and you almost always need both.

Point Estimation
Single Value
  • One number as best guess
  • e.g. x̄ estimates μ
  • Simple but no uncertainty info
  • E.g. "mean salary = ₹42,000"
Interval Estimation
Range of Values
  • Lower and upper bounds
  • Captures uncertainty
  • Built on sampling distribution
  • E.g. "₹39,500 to ₹44,500"
Confidence Level
90% / 95% / 99%
  • How often the interval works
  • NOT probability the true μ is inside
  • Higher = wider interval
  • 95% is the scientific standard
⚠️
The Most Misunderstood Statement in Statistics

A 95% confidence interval does NOT mean there is a 95% probability the true population mean lies inside this specific interval. The true mean is fixed — it either is or is not in the interval. What 95% means is: if we repeated this sampling process 100 times, approximately 95 of the 100 intervals we constructed would contain the true mean. The confidence is in the method, not in any single interval.


Section 03

Point Estimation — The Best Single Guess

A point estimator is a formula (a statistic) that uses sample data to produce a single number as the best estimate of an unknown population parameter. The most common estimators are:

Population Parameter Symbol Point Estimator Sample Statistic
Population Mean μ Sample Mean x̄ = Σxᵢ / n
Population Variance σ² Sample Variance s² = Σ(xᵢ−x̄)² / (n−1)
Population Std Dev σ Sample Std Dev s = √[Σ(xᵢ−x̄)²/(n−1)]
Population Proportion P Sample Proportion p̂ = x / n

Properties of a Good Estimator

Property What It Means Example
Unbiasedness On average, the estimator equals the true parameter E(x̄) = μ  |  x̄ is unbiased for μ
Consistency Estimate converges to true value as n → ∞ Larger samples give better estimates
Efficiency Minimum variance among all unbiased estimators x̄ is more efficient than the median for normal data
Sufficiency Uses all available information in the sample x̄ is sufficient for μ in a normal population
🎯
Why n−1 (Bessel's Correction) Makes s² Unbiased

When you compute deviations from the sample mean x̄ instead of the true population mean μ, you systematically underestimate spread because x̄ is closer to the sample points than μ is. Dividing by n−1 instead of n corrects this — it inflates the estimate just enough to make E(s²) = σ². This is why ddof=1 is always correct for sample data.


Section 04

Interval Estimation & Confidence Intervals ⭐

A point estimate is useful but incomplete — it tells you the best single guess but says nothing about how uncertain that guess is. A sample mean of ₹42,000 from 10 workers feels much less reliable than ₹42,000 from 10,000 workers. Interval estimation captures that uncertainty by constructing a range of plausible values for the parameter.

Anatomy of a Confidence Interval x̄ (Point Estimate) Lower Bound x̄ − MoE Upper Bound x̄ + MoE 95% Confidence Area 2.5% 2.5% Margin of Error (MoE) 100 CIs — 95 contain μ true μ Contains μ Misses μ (5%)
Confidence Interval — Known σ (z-interval)
CI = x̄ ± z* × (σ / √n)
Use when population standard deviation σ is known (rare in practice) or n is very large (>100). z* = 1.645 for 90%, 1.960 for 95%, 2.576 for 99%.
Confidence Interval — Unknown σ (t-interval)
CI = x̄ ± t* × (s / √n)
Use when σ is unknown — which is almost always the case. t* comes from the t-distribution with df = n − 1 degrees of freedom. As n → ∞, t* → z*. Always prefer this for real-world samples.
Critical z* Values for Common Confidence Levels 90% Confidence −1.645 +1.645 5% 5% 90% 95% Confidence ⭐ −1.960 +1.960 2.5% 2.5% 95% 99% Confidence −2.576 +2.576 0.5% 0.5% 99%
📐
The Confidence–Width Trade-off

Increasing confidence level widens the interval. If someone asks for a 100% confidence interval, it would span from −∞ to +∞ — perfectly certain but completely useless. The art of estimation is choosing a confidence level high enough to be trustworthy (usually 95%) while keeping the interval narrow enough to be actionable and informative.


Section 05

Margin of Error — The ± Number Everyone Sees

The Margin of Error (MoE) is the half-width of a confidence interval. It is the ±3% you see in election polls, the ±0.5°C in weather forecasts, the ±2 kg in clinical trials. It tells you how far your point estimate might stray from the true population value in either direction.

Margin of Error (Mean)
MoE = z* × (σ / √n)   or   t* × (s / √n)
MoE for estimating a population mean. The interval is then x̄ ± MoE. Use z* when σ is known or n > 100; use t* otherwise.
Margin of Error (Proportion)
MoE = z* × √[ p̂(1 − p̂) / n ]
MoE for estimating a population proportion — used in polls, A/B tests, and quality control. The interval is p̂ ± MoE. Maximum MoE occurs at p̂ = 0.5.
How Sample Size Affects Margin of Error & CI Width 0 2% 4% 6% Margin of Error n=25 ±9.8% n=50 ±6.9% n=100 ±4.9% n=400 ±2.5% n=1000 ±1.6% 25 50 100 400 1000 Sample Size (n) Diminishing Returns To halve MoE, quadruple n. n=400→1600 cuts ±2.5% to ±1.25%
💡
The Square Root Law — Why Polls Use ~1,000 People

MoE = 1/√n for proportions with 95% confidence and p̂=0.5 (worst case). At n=1,000: MoE = 1/√1000 ≈ ±3.2% — precise enough for most political decisions at a reasonable cost. At n=10,000: MoE ≈ ±1.0% — 10× more people but only 3× more precision. That is why serious polls rarely exceed 2,000 respondents. Beyond that, cost grows faster than precision.


Section 06

Step-by-Step: Building a Confidence Interval

A quality engineer samples 36 LED bulbs from a production line and measures their lifespan (hours). She wants to estimate the true average lifespan of all bulbs produced with a 95% confidence interval.

Sample Data Value
Sample size (n)36 bulbs
Sample mean (x̄)1,480 hours
Sample std dev (s)120 hours
Confidence level95%
σ (population std dev)Unknown → use t-interval
🧮 Constructing the 95% Confidence Interval
Step 1
Identify the critical value t*.
Degrees of freedom df = n − 1 = 36 − 1 = 35.
For 95% CI with df=35: t* = 2.030 (from t-table or scipy.stats.t.ppf(0.975, df=35)).
Step 2
Calculate the Standard Error (SE).
SE = s / √n = 120 / √36 = 120 / 6 = 20 hours
Step 3
Calculate the Margin of Error (MoE).
MoE = t* × SE = 2.030 × 20 = 40.6 hours
Step 4
Build the confidence interval.
Lower = x̄ − MoE = 1,480 − 40.6 = 1,439.4 hours
Upper = x̄ + MoE = 1,480 + 40.6 = 1,520.6 hours
95% CI = (1,439.4 , 1,520.6) hours
Interpret
We are 95% confident that the true mean lifespan of all LED bulbs produced lies between 1,439 and 1,521 hours. If the company's guarantee is 1,400 hours, this sample provides strong evidence the standard is being met.
What Makes This Interval Narrower or Wider?

Narrower CI (more precise): larger n ↑, smaller s ↓, lower confidence level ↓.
Wider CI (less precise): smaller n ↓, larger s ↑, higher confidence level ↑.
In practice, the most powerful lever is sample size — doubling n shrinks MoE by a factor of √2 ≈ 1.41.


Section 07

Python Implementation

Point Estimation

import numpy as np
from scipy import stats

np.random.seed(42)

# Sample: lifespan of 36 LED bulbs (hours)
sample = np.random.normal(loc=1480, scale=120, size=36)

# Point estimates
x_bar = np.mean(sample)             # estimates μ
s     = np.std(sample, ddof=1)      # estimates σ (unbiased, ddof=1)
n     = len(sample)

print(f"Point estimate of μ (sample mean x̄): {x_bar:.2f} hrs")
print(f"Point estimate of σ (sample std  s):  {s:.2f} hrs")
print(f"Sample size n: {n}")
# Output (approx):
# Point estimate of μ (sample mean x̄): 1476.84 hrs
# Point estimate of σ (sample std  s):  118.43 hrs

Confidence Interval — Using SciPy t-Distribution

from scipy import stats
import numpy as np

np.random.seed(42)
sample = np.random.normal(loc=1480, scale=120, size=36)

x_bar = np.mean(sample)
s     = np.std(sample, ddof=1)
n     = len(sample)
se    = s / np.sqrt(n)

# 95% CI using t-distribution (σ unknown)
ci_95 = stats.t.interval(confidence=0.95, df=n-1, loc=x_bar, scale=se)

# 99% CI
ci_99 = stats.t.interval(confidence=0.99, df=n-1, loc=x_bar, scale=se)

# Critical t* values
t_95 = stats.t.ppf(0.975, df=n-1)
t_99 = stats.t.ppf(0.995, df=n-1)

print(f"Standard Error (SE): {se:.2f} hrs")
print(f"t* for 95% CI (df=35): {t_95:.4f}")
print(f"Margin of Error 95%: ±{t_95 * se:.2f} hrs")
print(f"95% CI: ({ci_95[0]:.2f}, {ci_95[1]:.2f}) hrs")
print(f"99% CI: ({ci_99[0]:.2f}, {ci_99[1]:.2f}) hrs")

Confidence Interval for a Proportion

import numpy as np
from scipy import stats

# Scenario: 420 out of 1,000 voters prefer Party A
n       = 1000
x       = 420          # successes
p_hat   = x / n        # 0.42

se_prop = np.sqrt(p_hat * (1 - p_hat) / n)

# 95% CI for proportion (z-interval — n is large)
z_star = stats.norm.ppf(0.975)           # 1.95996...
moe    = z_star * se_prop
ci_low = p_hat - moe
ci_hi  = p_hat + moe

print(f"Sample proportion p̂:    {p_hat:.4f} ({p_hat*100:.1f}%)")
print(f"Standard Error:          {se_prop:.4f}")
print(f"Critical z*:             {z_star:.4f}")
print(f"Margin of Error:         ±{moe:.4f} (±{moe*100:.2f}%)")
print(f"95% CI: ({ci_low*100:.2f}%, {ci_hi*100:.2f}%)")

# Output:
# Sample proportion p̂:    0.4200 (42.0%)
# Standard Error:          0.0156
# Critical z*:             1.9600
# Margin of Error:         ±0.0306 (±3.06%)
# 95% CI: (38.94%, 45.06%)

Required Sample Size Calculation

import numpy as np
from scipy import stats

# How large a sample do we need for a desired Margin of Error?

z_star   = stats.norm.ppf(0.975)    # 1.96 for 95% CI
sigma    = 120                       # estimated std dev (from pilot study)
desired_moe = 25                     # we want MoE ≤ 25 hours

# n = (z* × σ / MoE)²
n_required = (z_star * sigma / desired_moe) ** 2

print(f"Required sample size: n ≥ {np.ceil(n_required):.0f}")
# Output: Required sample size: n ≥ 89

# For proportion — worst case p̂ = 0.5
desired_moe_prop = 0.03              # ±3%
n_prop = (z_star * 0.5 / desired_moe_prop) ** 2
print(f"Required n for ±3% prop CI: n ≥ {np.ceil(n_prop):.0f}")
# Output: Required n for ±3% prop CI: n ≥ 1068
⚠️
t-interval vs z-interval — When to Use Which

Use z-interval when: population σ is known (very rare) OR n > 100 (t approaches z anyway).
Use t-interval when: σ is unknown AND n ≤ 100 — which is almost every real-world scenario.
When in doubt, always use the t-interval — it gives a slightly wider (more conservative) interval and is never wrong to use.


Section 08

Estimation Methods — Comparison Table

Property Point Estimation Interval Estimation (CI)
Output Single number (x̄, p̂, s) Range (lower, upper bound)
Expresses uncertainty? No Yes
Easy to communicate? Very easy Requires explanation
Accounts for sample size? No Yes — via SE = s/√n
Useful for decisions? Partially Fully
Required for hypothesis tests? As input Directly equivalent
Confidence Level z* (known σ) α (significance) Tail Area Each Side Typical Use
90% 1.645 0.10 5% Exploratory studies, quick estimates
95% 1.960 0.05 2.5% Scientific standard — most common
99% 2.576 0.01 0.5% Medical trials, high-stakes decisions
99.9% 3.291 0.001 0.05% Safety-critical systems (aviation, nuclear)

Section 09

Golden Rules

🎯 Estimation Theory — Key Rules
1
Always report a CI alongside a point estimate. A sample mean without a confidence interval is like a weather forecast with no uncertainty range. The CI tells decision-makers how much to trust the estimate. Point estimates alone are incomplete statistical communication.
2
"95% confident" means the method works 95% of the time — not that this interval has a 95% chance of containing μ. The true μ is fixed. Once an interval is computed, it either contains μ or it does not. The 95% refers to the long-run performance of the procedure.
3
Use t-intervals when σ is unknown — which is almost always. The z-interval requires knowing the population standard deviation, which is essentially never available in practice. Always default to stats.t.interval() in Python unless n > 100 or σ is truly known.
4
To halve the Margin of Error, you must quadruple the sample size. MoE ∝ 1/√n. Doubling n reduces MoE by only √2 ≈ 1.41×. This "square root law" governs the economics of every survey and experiment — precision is expensive.
5
Use ddof=1 for sample standard deviation in all CI calculations. np.std(data, ddof=1) gives the unbiased sample standard deviation s. Using ddof=0 underestimates σ and produces a CI that is too narrow — a falsely precise result that understates real uncertainty.
6
Calculate required sample size before collecting data — not after. Use n = (z* × σ / MoE)² to determine how many observations you need to achieve a desired precision. Running a study and then discovering the CI is too wide to be useful wastes time and resources that cannot be recovered.