The Story That Explains Stationarity
The second is a man-made canal. Locks and gates keep the water level within a precise band year-round. Its depth in January is statistically identical to its depth in July. Any rule you learn about the canal this week still applies next year.
Statistical models are engineers who want to build a pump on the riverbank. They need to know how deep the water will be — but only the canal lets them plan reliably. The wild river's constantly shifting nature makes every calculation expire the moment it was made.
Stationarity is the difference between the wild river and the canal. A stationary time series is one whose statistical properties — mean, variance, and autocorrelation structure — do not change over time. It is the foundation that makes time series modelling possible.
Every classical time series model — AR, MA, ARMA, ARIMA — ultimately relies on stationarity. Models that assume the canal find the wild river unworkable. Before fitting any model, the single most important question you must answer is: "Is my series stationary?" This tutorial gives you the complete toolkit to answer it.
What Stationarity Actually Means
Stationarity is not one single condition — it comes in two flavours with very different practical implications. Understanding the distinction is essential before running any test.
Left: strict stationarity — the full probability distribution is identical at every point in time. Right: weak (covariance) stationarity — only the mean, variance, and lag-k covariances are constant; the shape of the distribution may change. In practice, almost all time series tests check for weak stationarity.
Weak vs Strict Stationarity — Side by Side
| Condition | Requirement |
|---|---|
| Distribution | Entire joint distribution time-invariant |
| Mean | Constant (implied) |
| Variance | Constant (implied) |
| Higher moments | All moments constant (skewness, kurtosis) |
| Implies weak? | Yes — if finite variance exists |
| Practical use | Theoretical proofs; rarely verified empirically |
| Testable? | No direct standard test exists |
| Condition | Requirement |
|---|---|
| Distribution | Shape may change; not required constant |
| Mean | E[yₜ] = μ constant for all t |
| Variance | Var(yₜ) = σ² constant for all t |
| Higher moments | Not required to be constant |
| Implies strict? | No — only if Gaussian (Normal) process |
| Practical use | Required by ARMA, ARIMA, VAR |
| Testable? | Yes — ADF, KPSS, PP tests |
For Gaussian (Normal) processes, weak and strict stationarity are equivalent. This is because a Normal distribution is completely characterised by its mean and variance — if those are constant, the entire distribution is constant. Since many economic and financial models assume Gaussian errors, this equivalence is often exploited in practice.
Why Stationarity Matters — The Consequences of Ignoring It
This is called spurious regression — it happens whenever you regress two non-stationary series against each other. The trending means create the illusion of correlation. Classic examples in the literature include: shoe-size regressions on stock prices, and stork-population regressions on birth rates. Both appear statistically significant. Both are nonsense.
The fix is always the same: make each series stationary before modelling.
Three properties, each illustrated. Box 1 (green): stationary mean oscillates around μ — non-stationary (faint red) drifts. Box 2 (gold): stationary variance stays within a constant band — non-stationary widens like a funnel. Box 3 (purple): the ACF decay pattern (solid vs dashed — two time windows) is identical, confirming covariance depends only on lag, not on when it is measured.
Types of Non-Stationarity — Know Your Enemy
Non-stationarity is not a single problem — it has four distinct flavours, each requiring a different cure. Misdiagnosing the type leads to the wrong treatment.
Each panel shows a different violation with its fix. Trend: mean drifts up — fix with differencing. Variance: swings grow — fix with log. Structural break: sudden level shift — fix with dummy variable. Seasonal: repeating cycle in mean — fix with seasonal differencing.
The ADF Test — Augmented Dickey-Fuller
The prosecution presents evidence — the ADF test statistic. If the evidence is strong enough (p-value < 0.05), we reject the null and declare the series stationary. If the evidence is weak (p > 0.05), we fail to reject — the series remains presumed non-stationary and must be treated (differenced or transformed) before modelling.
The critical subtlety: failing to reject H₀ does not prove non-stationarity. It just means we lack sufficient evidence for stationarity. This is why we complement ADF with the KPSS test — which flips the burden of proof.
The KPSS Test — Flipping the Burden of Proof
The ADF test can fail to reject H₀ (non-stationarity) simply because it lacks statistical power — especially with short series. The KPSS test (Kwiatkowski-Phillips-Schmidt-Shin) reverses the hypothesis: it assumes stationarity by default and tests whether there is evidence against it.
Using both tests together eliminates ambiguity. The four possible outcomes tell a precise story.
| ADF Result | KPSS Result | Conclusion | Action |
|---|---|---|---|
| Reject H₀ (p < .05) | Fail to reject H₀ (p > .05) | Stationary ✓ — both agree | Proceed with ARMA modelling |
| Fail to reject H₀ (p > .05) | Reject H₀ (p < .05) | Non-stationary — both agree | Difference or log-transform, then re-test |
| Reject H₀ (p < .05) | Reject H₀ (p < .05) | Trend-stationary — disagreement | Detrend (remove deterministic trend); use ADF with trend specification |
| Fail to reject H₀ (p > .05) | Fail to reject H₀ (p > .05) | Inconclusive — disagreement | Increase sample size; try PP test; use judgement from visual inspection |
With fewer than ~100 observations, KPSS often fails to reject its null (stationarity) even when the series is genuinely non-stationary. For short series, rely more heavily on ADF and visual inspection. KPSS is most reliable with 200+ observations.
ADF & KPSS in Python — Full Workflow
# ─── 0. Imports ──────────────────────────────────────────────────────────────
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
from statsmodels.tsa.stattools import adfuller, kpss
# ─── 1. Create two test series ───────────────────────────────────────────────
np.random.seed(42)
n = 300
# Non-stationary: random walk (unit root)
rw = np.cumsum(np.random.normal(0, 1, n))
rw_series = pd.Series(rw, name='Random Walk')
# Stationary: AR(1) with φ=0.7
ar = np.zeros(n)
for t in range(1, n):
ar[t] = 0.7 * ar[t-1] + np.random.normal(0, 1)
ar_series = pd.Series(ar, name='AR(1) φ=0.7')
# ─── 2. Reusable test function ────────────────────────────────────────────────
def run_stationarity_tests(series, name=""):
print(f"\n{'─'*54}")
print(f" Series: {name}")
print(f"{'─'*54}")
# ── ADF (H₀: unit root / non-stationary) ──
adf_stat, adf_p, adf_lags, _, adf_crit, _ = adfuller(series, autolag='AIC')
print(f"\n[ADF Test] H₀: unit root (non-stationary)")
print(f" ADF Stat : {adf_stat:>10.4f}")
print(f" p-value : {adf_p:>10.6f}")
print(f" Crit 1% : {adf_crit['1%']:>10.4f}")
print(f" Crit 5% : {adf_crit['5%']:>10.4f}")
print(f" Lags used : {adf_lags}")
adf_conc = 'STATIONARY ✓' if adf_p < 0.05 else 'NON-STATIONARY ✗'
print(f" Verdict : {adf_conc}")
# ── KPSS (H₀: stationary) ──
kpss_stat, kpss_p, kpss_lags, kpss_crit = kpss(series, regression='c', nlags='auto')
print(f"\n[KPSS Test] H₀: stationary")
print(f" KPSS Stat : {kpss_stat:>10.4f}")
print(f" p-value : {kpss_p:>10.4f}")
print(f" Crit 5% : {kpss_crit['5%']:>10.4f}")
kpss_conc = 'NON-STATIONARY ✗' if kpss_p < 0.05 else 'STATIONARY ✓'
print(f" Verdict : {kpss_conc}")
# Combined conclusion
if adf_p < 0.05 and kpss_p > 0.05:
combined = "✅ BOTH AGREE: STATIONARY"
elif adf_p > 0.05 and kpss_p < 0.05:
combined = "❌ BOTH AGREE: NON-STATIONARY"
elif adf_p < 0.05 and kpss_p < 0.05:
combined = "⚠️ TREND-STATIONARY (disagree)"
else:
combined = "⚠️ INCONCLUSIVE (disagree)"
print(f"\n Combined : {combined}")
run_stationarity_tests(rw_series, "Random Walk")
run_stationarity_tests(ar_series, "AR(1) φ=0.7")
Differencing — The Most Powerful Stationarity Fix
A quant on the trading desk suggested: "Stop reporting the price. Report the change — how many rupees did it move today?" Suddenly the series became tractable: changes centred around zero, had consistent volatility, and the ADF test passed immediately.
That simple act — subtracting yesterday from today — is first differencing. It transforms a random walk (non-stationary) into white noise (stationary). It is the single most important transformation in time series analysis.
Left: a random walk — mean drifts upward, ADF fails (p=0.99). Centre: the differencing formula Δyₜ = yₜ − yₜ₋₁. Right: after one difference — series oscillates around a stable zero mean, variance is constant, ADF passes (p≈0.000).
Log Transformation — Taming Growing Variance
What the engineer needs is a compressor — a device that reduces large signals proportionally more than small ones, creating consistent, manageable dynamics throughout.
The log transformation is the statistical compressor. When variance grows proportionally with the level (as in stock prices, GDP, or population), the log flattens that relationship. Large values are compressed; small values are barely changed. The result is a series with consistent variance — a prerequisite for stationarity.
Left: original exponential series — variance (band width) grows as level rises, ADF fails. Right: after log transformation — the oscillations have uniform height throughout, variance is stabilised within the green band. Log-differencing (applying first difference after the log) then removes the remaining trend entirely.
The Complete Stationarity Pipeline in Python
We now build the full end-to-end workflow: generate a non-stationary series with both trend and growing variance, apply log transformation, apply first differencing, verify stationarity at each step.
# ─── 1. Generate realistic non-stationary series ─────────────────────────────
# Simulates GDP-like data: exponential growth + seasonal pattern + noise
np.random.seed(99)
n = 240 # 20 years of monthly data
t = np.arange(n)
trend = 100 * np.exp(0.008 * t) # exponential growth
seasonal = 10 * np.sin(2 * np.pi * t / 12) # yearly cycle
noise = trend * 0.04 * np.random.randn(n) # proportional noise
raw = pd.Series(trend + seasonal + noise,
index=pd.date_range('2000-01-01', periods=n, freq='MS'),
name='Simulated GDP-like Series')
print(f"Shape: {raw.shape}")
print(f"Range: {raw.min():.1f} to {raw.max():.1f}")
# ─── 2. Step-by-step transformation pipeline ─────────────────────────────────
stages = {}
stages['1_raw'] = raw
stages['2_log'] = np.log(raw)
stages['3_log_diff'] = np.log(raw).diff().dropna()
stages['4_seas_diff'] = stages['3_log_diff'].diff(12).dropna()
print(f"{'Stage':<22} {'ADF p':>10} {'KPSS p':>10} {'Verdict'}")
print("-" * 65)
for label, s in stages.items():
adf_p = adfuller(s, autolag='AIC')[1]
kpss_p = kpss(s, regression='c', nlags='auto')[1]
if adf_p < 0.05 and kpss_p > 0.05:
verdict = "✅ STATIONARY"
elif adf_p > 0.05 and kpss_p < 0.05:
verdict = "❌ NON-STATIONARY"
else:
verdict = "⚠️ INCONCLUSIVE"
print(f"{label:<22} {adf_p:>10.4f} {kpss_p:>10.4f} {verdict}")
# ─── 3. Detailed ADF on final stationary series ──────────────────────────────
final_series = stages['3_log_diff']
run_stationarity_tests(final_series, "Log-Differenced GDP-like Series")
# ─── 4. Verify: mean, variance, and ACF are stable ───────────────────────────
# Split into three equal thirds and compare statistics
thirds = np.array_split(final_series, 3)
print(f"\n{'Period':<10} {'Mean':>10} {'Std Dev':>12} {'Min':>10} {'Max':>10}")
print("-" * 56)
for i, seg in enumerate(thirds, 1):
print(f"Period {i} {seg.mean():>10.5f} {seg.std():>12.5f} {seg.min():>10.5f} {seg.max():>10.5f}")
# ─── 5. Visual comparison of all four stages ─────────────────────────────────
fig, axes = plt.subplots(4, 1, figsize=(12, 10))
titles = [
'① Raw Series — exponential trend + growing variance (NON-STATIONARY)',
'② After log() — growth linearised, variance still rising (NON-STATIONARY)',
'③ After log + diff() — stationary! (STATIONARY ✓)',
'④ After log + diff + seasonal diff — removes seasonal pattern too'
]
colors = ['#f87171', '#f59e0b', '#34d399', '#60a5fa']
for ax, (label, s), title, color in zip(axes, stages.items(), titles, colors):
s.plot(ax=ax, color=color, linewidth=1.2)
ax.axhline(s.mean(), color=color, linestyle='--', alpha=0.6, linewidth=1)
ax.set_title(title, fontsize=9, pad=4)
ax.set_facecolor('#0d1117')
ax.spines['bottom'].set_color('#2a3050')
ax.spines['left'].set_color('#2a3050')
plt.tight_layout(pad=1.5)
plt.savefig('stationarity_pipeline.png', dpi=120, facecolor='#0d1117')
plt.show()
Stationarity Tests — Complete Comparison
| Property | ADF Test | KPSS Test | PP Test (Phillips-Perron) |
|---|---|---|---|
| Null hypothesis (H₀) | Unit root (non-stationary) | Stationary | Unit root (non-stationary) |
| Reject H₀ when | p < 0.05 → stationary | p < 0.05 → non-stationary | p < 0.05 → stationary |
| Test statistic sign | Negative (more negative = better) | Positive (larger = worse) | Negative (more negative = better) |
| Handles autocorrelation | By adding lagged Δyₜ terms | By HAC variance estimator | By non-parametric correction |
| Power in small samples | Moderate | Low — often fails to reject H₀ | Moderate |
| Sensitive to structural breaks | Yes — may fail near breaks | Yes | Less so |
| Preferred when | Default first test; most widely used | Complementary to ADF; confirming stationarity | Short series; suspected autocorrelation |
| statsmodels function | adfuller(series) | kpss(series) | PhillipsPerron(series) |
Transformation Reference — What Each Fix Does
| Problem Detected | Visual Clue | ADF / KPSS Signal | Transformation | Formula | Real Example |
|---|---|---|---|---|---|
| Linear trend (drift) | Mean moves steadily up/down | ADF fails; KPSS fails | First difference | yₜ − yₜ₋₁ | Inflation rate, bond yields |
| Exponential trend | J-curve shape | ADF fails; KPSS fails | Log + first difference | log(yₜ) − log(yₜ₋₁) | GDP, stock prices, population |
| Growing variance only | Funnel shape; homoscedastic after log | ADF may pass; KPSS fails | Log transform | log(yₜ) | Volatility, right-skewed financial series |
| Seasonal non-stationarity | ACF spikes at lag s, 2s, 3s… | ADF may pass at lag 1 but fails seasonally | Seasonal difference | yₜ − yₜ₋ₛ | Monthly retail sales (s=12) |
| Trend + seasonality | Rising wave with cycles | Both fail | Log + regular + seasonal diff | Δ₁Δ₁₂ log(yₜ) | Airline passengers, electricity demand |
| Structural break | Sudden jump in mean level | ADF may pass (break mimics stationarity) | Dummy variable for break point | Dₜ = 1 if t ≥ break | GDP pre/post financial crisis |
| None — already stationary | Flat oscillation, constant width | ADF passes; KPSS passes | No transformation needed | yₜ as-is | Daily returns, white noise residuals |
Golden Rules — Stationarity in Practice
regression='c' (constant) for series with non-zero mean.
Use regression='ct' (constant + trend) only when the series has a visible
deterministic trend. Misspecifying inflates the test size and produces wrong conclusions.