The Judge's Threshold ⚖️
Picture a courtroom. A defendant stands accused, and the judge must decide: how much evidence is "enough" to convict? Set the bar too low, and innocent people go to prison. Set it too high, and guilty people walk free. Before the trial even begins, society agrees on a standard — a threshold of proof called "beyond reasonable doubt."
In statistics, that threshold has a name: the Significance Level, written as the Greek letter α (alpha). It is the single number you choose — always before collecting data — that defines how much evidence you need before you will reject the null hypothesis.
Most people encounter α = 0.05 and never question it. But where did 0.05 come from? What does it really mean? What happens when you change it? And how do you choose the right α for your own study? This tutorial answers all of it — with stories, diagrams, and worked examples.
The significance level α is the maximum probability of making a Type I error (rejecting a true H₀) that you are willing to tolerate. It is your "false alarm budget" — set before the experiment, never adjusted after seeing the data.
Where Did α = 0.05 Come From? 📜
The Story of Ronald Fisher
The year is 1925. Ronald Aylmer Fisher, a British statistician working at an agricultural research station, publishes a landmark book: Statistical Methods for Research Workers. In it, he writes almost casually that a result is "significant" if it would occur by chance fewer than 1 in 20 times — that is, with probability less than 0.05.
Fisher never intended this to be a universal law. He chose 1-in-20 as a convenient, round threshold for agricultural experiments — whether a new fertiliser genuinely improved crop yields versus random soil variation. Over the following century, the 0.05 threshold spread from agronomy into medicine, psychology, economics, and engineering, eventually becoming the global default.
Fisher himself warned that 0.05 was not meant to be a rigid rule for all situations. He wrote: a scientific fact should be confirmed by independent repetition, not judged by a single p-value crossing an arbitrary line. The American Statistical Association issued a formal statement in 2016 echoing this — yet α = 0.05 remains the universal default in most fields.
What α Actually Means — Precisely
This is where most textbooks are vague. Let's be exact. Setting α = 0.05 means:
Visualising α on the Bell Curve 📊
When you set α, you are literally drawing a line on the null distribution — everything beyond that line is the "rejection region." Here is how the same test looks with three different α values. Notice how the rejection region grows as α increases, making it easier to reject H₀ — but also increasing the risk of a false alarm.
α and the Two Types of Error 🎯
Choosing α is fundamentally a trade-off between two kinds of mistakes. Understanding this trade-off is one of the most important skills in applied statistics — because the "right" α depends entirely on which mistake is more costly in your specific situation.
- False Positive — rejecting a TRUE H₀
- Probability = α (you control this)
- Lower α → fewer false alarms
- But: harder to detect real effects
- Example: convicting an innocent person
- Example: approving a useless drug
- False Negative — failing to reject a FALSE H₀
- Probability = β (related to Power)
- Lower α → higher β (worse)
- Power = 1 − β (probability of detecting truth)
- Example: acquitting a guilty person
- Example: rejecting a life-saving drug
- Lowering α increases β
- Raising α decreases β
- Only bigger sample size reduces both
- Choose α based on which error costs more
- Medical: prefer low α (safety first)
- Exploration: α = 0.10 may be fine
| H₀ is Actually TRUE | H₀ is Actually FALSE | |
|---|---|---|
| You Reject H₀ | ❌ Type I Error Probability = α (false alarm) |
✅ Correct Decision Probability = Power = 1 − β |
| You Fail to Reject H₀ | ✅ Correct Decision Probability = 1 − α |
❌ Type II Error Probability = β (missed effect) |
Story 1 — The Nuclear Safety Inspector ☢️
An engineer monitors radiation levels at a nuclear plant. The null hypothesis is: radiation is within safe limits (H₀: μ ≤ 50 mSv/year). She is testing whether levels have risen dangerously.
What should α be? Think carefully about the consequences:
- Type I Error (α): She raises a false alarm — the plant shuts down unnecessarily, costing millions. Costly, but reversible.
- Type II Error (β): She misses a real radiation spike — workers are exposed to dangerous levels. Catastrophic, irreversible.
The Type II error is far more dangerous. To minimise β, she must raise α. In safety-critical monitoring like this, engineers often use α = 0.10 — they'd rather trigger ten false alarms than miss one genuine hazard.
Note: At α = 0.05, critical value = 1.645. Z = 1.60 < 1.645 → would FAIL to reject. The choice of α changed the outcome.
With Z = 1.60: at α = 0.10, the plant alerts. At α = 0.05, no action is taken. At α = 0.01, definitely no action. Same data, same test — three different decisions based solely on the threshold chosen beforehand. This is why α must be set before analysis, not chosen to make results "significant."
Story 2 — The Clinical Drug Trial 💊
A pharmaceutical company develops a new cancer drug. The FDA requires clinical trials before approval. The null hypothesis: the drug has no effect (or is harmful). The alternative: the drug improves survival rates.
Here the consequences are reversed from the nuclear case:
- Type I Error (α): Approving a drug that doesn't work — patients take an ineffective treatment, may forgo better alternatives, company profits falsely. Serious harm.
- Type II Error (β): Rejecting a drug that genuinely saves lives — delayed access. Terrible, but the drug can be retested.
The FDA uses α = 0.05 as the standard for Phase III trials, and α = 0.01 for breakthrough therapies where data quality must be rock-solid. The priority is preventing false approvals.
Story 3 — The Particle Physics Standard 🔬
In 2012, CERN announced the discovery of the Higgs boson — the so-called "God particle." Did they use α = 0.05? Not even close.
Particle physics uses the "5-sigma" standard, which corresponds to α ≈ 0.0000003 (3 in 10 million). Why? Because physicists run billions of collision experiments. At α = 0.05, you'd expect 5% of all tests to produce false discoveries purely by chance — with billions of experiments, that's hundreds of millions of false positives. The 5-sigma rule keeps the false discovery rate manageable even at astronomical scales.
α vs p-value — The Most Common Confusion
Students frequently confuse α and the p-value. They are fundamentally different things, and mixing them up leads to incorrect conclusions.
| Feature | Significance Level (α) | p-value |
|---|---|---|
| What it is | A threshold you choose in advance | A probability calculated from your data |
| When it's set | Before data collection | After data collection and analysis |
| Who determines it | You (the researcher), based on context | Calculated from the test statistic |
| What it means | Your tolerance for a false positive | Probability of data this extreme if H₀ were true |
| How it's used | Sets the decision boundary | Compared against α to make the decision |
| Fixed or variable? | Fixed — you do NOT change it after seeing results | Variable — depends entirely on your sample |
| Decision rule | If p < α → Reject H₀ | If p < α → Reject H₀ |
α is the speed limit sign posted before you drive. The p-value is your speedometer reading during the drive. If your speedometer (p-value) shows a number below the speed limit (α), you're fine — fail to reject H₀. If it shows above the limit, you get a ticket — reject H₀. You don't change the speed limit after seeing how fast you're going.
Worked Comparison — Same Data, Three Values of α
A marketing team runs an A/B test on a new website design. They measure conversion rates. The Z-test gives a test statistic of Z = +1.75 (right-tailed). Watch how the conclusion changes with α.
How to Choose Your α — A Field Guide
| Field / Situation | Typical α | Reasoning | Priority |
|---|---|---|---|
| 🔬 Particle physics, astronomy | 0.0000003 (5σ) | Billions of tests; any false discovery is catastrophic | Minimise α |
| 💊 FDA drug approval | 0.01 – 0.05 | False approvals harm patients; very high evidence bar | Minimise α |
| 🧬 Genomics (GWAS) | 5 × 10⁻⁸ | Millions of SNPs tested simultaneously; Bonferroni correction | Minimise α |
| 🏭 Industrial quality control | 0.01 – 0.05 | Stopping production is expensive; balance false alarms | Balance |
| 📊 Social science research | 0.05 | Convention; sample sizes moderate; effects complex | Balance |
| ☢️ Safety / environmental monitoring | 0.05 – 0.10 | Missing a real hazard (Type II) is worse than false alarm | Raise α |
| 📈 A/B testing (marketing) | 0.05 – 0.10 | Low cost of Type I; quick iteration; business decisions | Raise α |
| 🔍 Exploratory / pilot studies | 0.10 – 0.20 | Generating hypotheses, not confirming them; high β risk | Raise α |
If you run 20 hypothesis tests each at α = 0.05, you expect 1 false positive just by chance. The Bonferroni correction solves this by dividing α by the number of tests: α_adjusted = α / m. For 20 tests at α = 0.05, each individual test uses α_adjusted = 0.0025. This keeps the overall family-wise error rate at 5%.
α, Sample Size, and Statistical Power
There is a three-way relationship connecting α, sample size (n), and statistical power (1 − β). Understanding this triangle is essential for designing any experiment properly.
Most research aims for 80% power (β = 0.20) at α = 0.05. This means: a 20% chance of missing a real effect, and a 5% chance of a false positive. This 80/20 balance was suggested by statistician Jacob Cohen and remains the standard for sample size calculations in clinical and social science research.
The Golden Rules of α
α is the most consequential decision a researcher makes — yet it's often set without thought. It determines the sensitivity of your test, the balance between false alarms and missed discoveries, and the credibility of your conclusions. Use it deliberately. Set it with intention. And never, ever change it after seeing your data.