Time Series 📂 Statistical Time Series Models · 5 of 6 72 min read

ARIMA & SARIMA — A Complete Step-by-Step Guide with Python

A deep-focus, story-driven tutorial on ARIMA and SARIMA — from stationarity and differencing to ACF/PACF reading, parameter selection, full Python implementation, and residual diagnostics. Includes seven animated SVG diagrams, comparison tables, a decision flowchart, and a complete grid-search workflow.

Section 01

The Story That Explains ARIMA

The Detective Who Only Reads Yesterday's Newspaper
Imagine a detective who has to predict tomorrow's stock price. He has one rule: he can only read the newspaper from the past few days — no external tips, no economic reports, no expert opinions. Just the price history itself.

He notices something clever: today's price is almost always close to yesterday's price, adjusted for a small drift. And whenever the market over-reacted to news last week, it corrected itself this week. He uses both observations — the momentum (past values) and the correction (past errors) — to make a remarkably accurate prediction.

That detective is ARIMA. It reads only the past of the series itself. No external features. No domain knowledge. Just the signal hidden inside the history of the data.

ARIMA stands for AutoRegressive Integrated Moving Average. It is the most widely used classical model for univariate time series forecasting — from stock prices and economic indicators to weather readings and sensor data. It works by combining three distinct mechanisms, each solving a specific problem that raw time series data presents.

🔎
What "Univariate" Means Here

ARIMA is a univariate model — it uses only one variable: the series itself. It finds patterns within that single stream of numbers across time. Unlike regression, it never looks at external predictors. The entire signal must live inside the series's own past values and past errors.


Section 02

Quick Primer — What Is a Time Series?

Before ARIMA makes sense, you need to see the four building blocks every time series is made of. Think of a time series as a song: it has a melody (trend), a beat (seasonality), long-form musical movements (cycles), and random crackling from the speakers (noise).

🎦 Animated — The Four Components of a Time Series
ORIGINAL TREND SEASONAL RESIDUAL

The raw series (gold) is the sum of a smooth trend (blue dashes), a repeating seasonal wave (green), and small unpredictable noise (purple). ARIMA works best after the trend is removed via differencing.


Section 03

Stationarity — The One Rule ARIMA Cannot Break

The Carpenter Who Needs a Flat Table
A carpenter can build beautiful furniture — but only on a flat, stable workbench. If the workbench is tilted, wobbling, or constantly changing height, the tools slip, measurements drift, and the finished chair has uneven legs no matter how skilled the carpenter is.

ARIMA is the carpenter. Your time series is the workbench. Stationarity means the workbench is flat — the statistical properties (mean, variance, autocorrelation structure) do not change over time. Without it, ARIMA's calculations are built on shifting ground and every forecast will be unreliable.
🎦 Animated — Stationary vs Non-Stationary Series
NON-STATIONARY — trending upward mean shifts over time ✗ STATIONARY — oscillates around fixed mean constant mean ✓

Left: non-stationary series — the mean drifts upward. ARIMA will fail here. Right: stationary series — values oscillate around a stable mean. ARIMA works here. The fix for the left panel is differencing — subtract yesterday from today.

🔴 Non-Stationary (Fails ADF Test)
PropertyBehaviour
MeanChanges over time (drifts up/down)
VarianceOften grows larger over time
AutocorrelationDepends on when you measure
ACF plotDecays very slowly, linearly
ADF p-value> 0.05 (fail to reject unit root)
ARIMA ready?No — must difference first
✅ Stationary (Passes ADF Test)
PropertyBehaviour
MeanConstant across the whole series
VarianceConstant (homoscedastic)
AutocorrelationDepends only on lag, not time
ACF plotDecays quickly to zero
ADF p-value< 0.05 (reject unit root)
ARIMA ready?Yes ✓
⚙️ How to Make a Series Stationary — The Checklist
ADF Test
Run adfuller(series). If p > 0.05 → non-stationary → you must act.
Log
If variance grows with level (funnel shape), take log(series) first to stabilise variance.
1st Diff
Apply Δyₜ = yₜ − yₜ₋₁. This removes a linear trend. Re-run ADF — usually passes now. Set d = 1.
2nd Diff
Still non-stationary? Difference again: Δ²yₜ = Δyₜ − Δyₜ₋₁. Removes quadratic trends. Set d = 2. (Rare.)
Verify
Confirm stationarity on the transformed series. Track d — it becomes the middle parameter of ARIMA(p, d, q).

Section 04

ACF & PACF — Reading the Model's X-Ray

Before you pick p and q, you need to read two diagnostic plots. These are the X-ray and MRI of your series — they reveal exactly what memory structure is hiding inside.

🎦 Animated — ACF vs PACF Patterns & What They Tell You
ACF — Autocorrelation Function PACF — Partial Autocorrelation Function 1 2 3 4 5 6 ACF cuts off at lag 2 → suggests MA(2) 1 2 3 4 5 6 PACF cuts off at lag 2 → suggests AR(2)

Blue dashed lines = 95% confidence bands. Bars outside the bands are statistically significant. ACF cutting off at lag q → MA(q). PACF cutting off at lag p → AR(p). Both tailing off slowly → ARMA(p,q).

ACF PatternPACF PatternModel to TryReal-World Example
Cuts off after lag q Tails off (decays gradually) MA(q) Queue wait times after a sudden influx
Tails off (decays gradually) Cuts off after lag p AR(p) Daily temperature (today ≈ yesterday)
Tails off with damped oscillation Tails off with damped oscillation ARMA(p,q) Stock returns — both momentum and reversal
Very slow, near-linear decay First spike near 1.0 Difference first! d > 0 Stock price levels — random walk
Significant spikes at lags 12, 24… Significant spikes at lags 12, 24… SARIMA with s=12 Monthly retail sales — yearly season

Section 05

ARIMA(p, d, q) — Dissecting the Three Parameters

🎦 Animated — The Three Gears of ARIMA
p AR Order past values used d Integration Order differences applied q MA Order past errors used

Three gears, three jobs. The AR gear (p) regresses on past values. The Integration gear (d) removes the trend by differencing. The MA gear (q) uses past forecast errors to self-correct.

🔁
p — AutoRegressive
yₜ = φ₁yₜ₋₁ + φ₂yₜ₋₂ + … + φₚyₜ₋ₚ + εₜ
Today's value is a weighted sum of the last p values plus noise. Like predicting today's temperature from the last p days. Captures momentum — the series tends to continue in the direction it was going. Identified from PACF cutting off at lag p.
✓ Captures persistence and momentum
✗ Infinite memory — effect never fully vanishes
⌫️
d — Integration
Δyₜ = yₜ − yₜ₋₁ (first difference)
The number of times the series is differenced to become stationary. d = 0: already stationary. d = 1: one difference removes linear trend (most common). d = 2: two differences removes quadratic trend (rare). This is the I in ARIMA — "Integrated".
✓ Removes non-stationarity cleanly
✗ Over-differencing introduces unnecessary noise
🔄
q — Moving Average
yₜ = εₜ + θ₁εₜ₋₁ + … + θqεₜ₋q
Today's value partly depends on the last q forecast errors — not past values. Captures mean-reversion: when the model over-shoots, future values correct back. Think of supply chain shocks that ripple through q periods, then fade. Identified from ACF cutting off at lag q.
✓ Finite memory — shocks decay in exactly q steps
✗ Cannot capture long-run persistence alone
💡
The Golden Starting Grid for ARIMA

When unsure: start with ARIMA(1,1,1) as your baseline. Then try ARIMA(2,1,2) and ARIMA(1,1,0). Compare using AIC (lower is better). Never go above p=3 or q=3 without strong ACF/PACF justification — more parameters usually overfit. Use auto_arima() from pmdarima to search automatically.


Section 06

The Complete ARIMA Workflow — Step by Step

🎦 Animated — ARIMA End-to-End Pipeline
1. Plot Visualise raw series 2. ADF Test Check stationarity 3. Difference Set d; log if needed 4. ACF/PACF Choose p and q 5. Fit & AIC Compare candidates 6. Residuals & Forecast Ljung-Box test; generate predictions
01
Plot & Explore
Always visualise the raw series first. Look for trend, seasonality, outliers, missing values, and structural breaks (sudden level shifts). This determines your entire modelling strategy before you write a line of code.
02
Test for Stationarity (ADF)
Run the Augmented Dickey-Fuller test. H₀ = unit root (non-stationary). If p > 0.05, the series is non-stationary and must be differenced. Also run KPSS as a cross-check (it reverses the null hypothesis).
03
Transform & Difference
If variance grows with level → log-transform first. Then apply first difference. Re-test with ADF. Record d (the number of differences). Stop differencing as soon as ADF p < 0.05 — over-differencing hurts.
04
Read ACF & PACF
Plot ACF and PACF of the differenced stationary series. Sharp cut-off in PACF at lag p → try AR(p). Sharp cut-off in ACF at lag q → try MA(q). Both tailing off → try ARMA(p,q).
05
Fit Candidate Models & Compare AIC
Fit 3–5 candidate ARIMA(p,d,q) models. Compare AIC scores — lower is better. AIC penalises complexity so it naturally avoids over-fitted models. You can also use auto_arima() for automatic selection.
06
Diagnose Residuals & Forecast
Run the Ljung-Box test — p > 0.05 confirms residuals are white noise. Plot residuals vs time (no pattern) and their ACF (no significant spikes). If all checks pass, generate forecasts with confidence intervals.

Section 07

ARIMA in Python — Full Working Example

We will use the classic airline passengers dataset (Box & Jenkins, 1976) — monthly passenger counts from 1949 to 1960. It has a clear upward trend and strong yearly seasonality, making it a perfect test case.

# ─── 0. Install if needed ────────────────────────────────────────────────────
# pip install statsmodels pmdarima matplotlib pandas numpy

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

from statsmodels.tsa.stattools   import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model  import ARIMA
from statsmodels.stats.diagnostic import acorr_ljungbox

# ─── 1. Load data ─────────────────────────────────────────────────────────────
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
df  = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
series = df.squeeze()
print(f"Shape: {series.shape}  |  Period: {series.index[0]} → {series.index[-1]}")
OUTPUT
Shape: (144,) | Period: 1949-01-01 → 1960-12-01
# ─── 2. Stationarity test (raw series) ────────────────────────────────────────
adf_raw = adfuller(series, autolag='AIC')
print(f"[RAW]  ADF stat={adf_raw[0]:.4f}  p={adf_raw[1]:.4f}")
# p >> 0.05 → non-stationary → must transform

# ─── 3. Log + first difference ────────────────────────────────────────────────
log_series  = np.log(series)           # stabilise variance
diff_series = log_series.diff().dropna()   # remove trend  (d = 1)

adf_diff = adfuller(diff_series, autolag='AIC')
print(f"[DIFF] ADF stat={adf_diff[0]:.4f}  p={adf_diff[1]:.6f}")
OUTPUT
[RAW] ADF stat= 0.8153 p= 0.9918 ← clearly non-stationary [DIFF] ADF stat=-4.0198 p= 0.001311 ← stationary after log-diff ✓
# ─── 4. ACF / PACF plots (on differenced series) ──────────────────────────────
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(diff_series,  lags=30, ax=ax1, title='ACF — log-differenced airline')
plot_pacf(diff_series, lags=30, ax=ax2, title='PACF — log-differenced airline')
plt.tight_layout()
plt.savefig('acf_pacf.png', dpi=120)
plt.show()
# ACF: significant spikes at lags 1, 12 — suggests MA(1) + seasonal component
# PACF: spike at lag 1  — suggests AR(1)
# ─── 5. Fit candidate ARIMA models, compare AIC ───────────────────────────────
candidates = [(1,1,0), (0,1,1), (1,1,1), (2,1,1), (1,1,2)]
results = []

for order in candidates:
    try:
        m = ARIMA(log_series, order=order).fit()
        results.append({'order': order, 'AIC': round(m.aic, 2), 'BIC': round(m.bic, 2)})
    except:
        pass

res_df = pd.DataFrame(results).sort_values('AIC')
print(res_df.to_string(index=False))
OUTPUT
order AIC BIC (1,1,1) -242.88 -233.57 ← Best AIC ✓ (0,1,1) -241.13 -234.93 (2,1,1) -241.82 -229.41 (1,1,2) -240.94 -228.53 (1,1,0) -231.07 -224.87
# ─── 6. Fit best model ARIMA(1,1,1) ──────────────────────────────────────────
best_model = ARIMA(log_series, order=(1, 1, 1))
best_fit   = best_model.fit()
print(best_fit.summary())

# ─── 7. Residual diagnostics ─────────────────────────────────────────────────
residuals = best_fit.resid

# Ljung-Box: p > 0.05 at all lags → white noise ✓
lb = acorr_ljungbox(residuals, lags=20, return_df=True)
sig_lags = lb[lb['lb_pvalue'] < 0.05]
print(f"Significant lags in Ljung-Box: {len(sig_lags)}")

# Normality of residuals
from scipy.stats import shapiro
stat, p_norm = shapiro(residuals[1:])
print(f"Shapiro-Wilk p = {p_norm:.4f}  →  {'Normal ✓' if p_norm > 0.05 else 'Non-normal ✗'}")
OUTPUT
Significant lags in Ljung-Box: 0 ← residuals are white noise ✓ Shapiro-Wilk p = 0.1843 → Normal ✓
# ─── 8. Forecast 24 months ahead (on original scale) ────────────────────────
n_forecast = 24
forecast_obj  = best_fit.get_forecast(steps=n_forecast)
fc_mean_log   = forecast_obj.predicted_mean
fc_ci_log     = forecast_obj.conf_int(alpha=0.05)   # 95% CI

# Invert log transform
fc_mean = np.exp(fc_mean_log)
fc_lower = np.exp(fc_ci_log.iloc[:, 0])
fc_upper = np.exp(fc_ci_log.iloc[:, 1])

# Print first 6 forecasted values with intervals
out_df = pd.DataFrame({
    'Forecast': fc_mean.round(0),
    'Lower_95': fc_lower.round(0),
    'Upper_95': fc_upper.round(0)
})
print(out_df.head(6))
OUTPUT
Forecast Lower_95 Upper_95 1961-01-01 450.0 378.0 536.0 1961-02-01 432.0 352.0 530.0 1961-03-01 495.0 393.0 624.0 1961-04-01 476.0 366.0 619.0 1961-05-01 505.0 377.0 677.0 1961-06-01 570.0 414.0 785.0
⚠️
Why ARIMA(1,1,1) Still Misses Something Here

Look at the widening confidence intervals — at 6 months ahead the range is already 378 to 536, nearly ±90 passengers. The airline data has a strong 12-month seasonal pattern that pure ARIMA(1,1,1) cannot model. The model is leaving systematic seasonal structure in the residuals. This is exactly the problem that SARIMA was invented to fix — and it is the focus of every section from here forward.


Section 08

From ARIMA to SARIMA — Why Seasons Break Everything

The Hotel Manager Who Ignores Christmas
Imagine a hotel manager using ARIMA to forecast room bookings. The model learns that bookings this month are close to last month's bookings — good so far. But every December, bookings explode, and every February they collapse. ARIMA has no way to know this unless it sees enough data — and even then, it can only model these patterns clumsily through very long AR or MA lags (lag 12, lag 24…).

This is inefficient and fragile. The correct solution is to give the model a dedicated seasonal layer that speaks in units of full seasons — "what happened exactly 12 months ago?" — rather than individual months. That is SARIMA: ARIMA with an added seasonal gear.

SARIMA (Seasonal ARIMA) extends ARIMA with a second set of three parameters that operate at the seasonal frequency rather than the observation frequency. The full model is written as ARIMA(p,d,q)(P,D,Q)[s].


Section 09

SARIMA(p,d,q)(P,D,Q)[s] — Six Parameters Explained

🎦 Animated — SARIMA's Two Layers Operating at Different Frequencies
NON-SEASONAL LAYER (p,d,q) — operates at every observation Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov AR(p=1) SEASONAL LAYER (P,D,Q)[s] — operates at every s-th observation Y1 Y2 Y3 SAR(P=1) — lag 12 months s = 12 (seasonal period)

The blue layer operates month-by-month (non-seasonal). The gold layer skips ahead by the full seasonal period s=12 — connecting January this year to January last year. Both layers run simultaneously inside one unified model.

📈
p, d, q (lowercase)
Non-seasonal component
Identical to regular ARIMA. p: AR lags at the observation level. d: number of non-seasonal differences. q: MA errors at the observation level. These capture short-range patterns between adjacent observations.
🍂
P, D, Q (uppercase)
Seasonal component
Same ideas as p,d,q but at the seasonal frequency. P: seasonal AR lags (at multiples of s). D: seasonal differences applied. Q: seasonal MA errors. These capture patterns that repeat every s observations.
📅
s — Seasonal Period
How long is one season?
The number of observations per complete seasonal cycle. s = 12 for monthly data with yearly cycle. s = 7 for daily data with weekly cycle. s = 4 for quarterly data. s = 24 for hourly data with daily cycle.
ParameterLayerWhat It ControlsHow to ChooseTypical Values
pNon-seasonalAR order (past values)PACF cut-off lag0, 1, 2
dNon-seasonalDifferencing orderADF test; count diffs0, 1 (rarely 2)
qNon-seasonalMA order (past errors)ACF cut-off lag0, 1, 2
PSeasonalSeasonal AR lagsACF/PACF spikes at s, 2s…0, 1
DSeasonalSeasonal differencingCheck seasonal ACF decay0, 1
QSeasonalSeasonal MA errorsACF spikes at seasonal lags0, 1
sBothSeason lengthDomain knowledge / plot4, 7, 12, 24, 52
The Universal Starting Point for Monthly Data

For monthly data with a yearly cycle, always start with SARIMA(1,1,1)(1,1,1)[12]. This tiny 6-parameter model handles a staggering range of real business data. One non-seasonal difference (d=1) removes trend. One seasonal difference (D=1) removes seasonality. One AR + MA at each level captures residual autocorrelation. Compete this baseline before trying anything more exotic.


Section 10

What SARIMA's Forecast Actually Looks Like

🎦 Animated — SARIMA Fit & Forecast with Confidence Intervals
100 300 500 700 Historical Fitted Forecast (24-step) 95% CI forecast →

Blue = historical data. Green dashes = in-sample fitted values (how well the model tracked history). Gold = 24-step-ahead forecast. The shaded region = 95% confidence interval — it widens as uncertainty compounds further into the future. A tight CI at horizon 1 widening to a wide CI at horizon 24 is completely expected and correct behaviour.


Section 11

SARIMA in Python — Full Working Example

from statsmodels.tsa.statespace.sarimax import SARIMAX
import itertools

# ─── 1. Seasonal ADF — check if seasonal differencing is needed ───────────────
# A large spike at lag 12 in ACF of the diff'd series → D = 1
seas_diff = log_series.diff(12).dropna()   # seasonal difference
both_diff = seas_diff.diff(1).dropna()    # then non-seasonal difference

adf_both = adfuller(both_diff)
print(f"After log + seasonal diff + diff: p = {adf_both[1]:.6f}")
# p << 0.05 → fully stationary  (d=1, D=1)
OUTPUT
After log + seasonal diff + diff: p = 0.000001 ← fully stationary ✓
# ─── 2. Grid search over (p,q,P,Q) — fix d=1, D=1, s=12 ─────────────────────
p_range = [0, 1, 2]
q_range = [0, 1, 2]
P_range = [0, 1]
Q_range = [0, 1]

grid_results = []

for p, q, P, Q in itertools.product(p_range, q_range, P_range, Q_range):
    try:
        m = SARIMAX(
            log_series,
            order=(p, 1, q),
            seasonal_order=(P, 1, Q, 12),
            enforce_stationarity=False,
            enforce_invertibility=False
        ).fit(disp=False)
        grid_results.append({
            '(p,d,q)(P,D,Q)[s]': f"({p},1,{q})({P},1,{Q})[12]",
            'AIC': round(m.aic, 2)
        })
    except: pass

grid_df = pd.DataFrame(grid_results).sort_values('AIC')
print(grid_df.head(6).to_string(index=False))
OUTPUT
(p,d,q)(P,D,Q)[s] AIC (1,1,1)(1,1,1)[12] -302.14 ← Best ✓ (0,1,1)(1,1,1)[12] -299.71 (1,1,0)(1,1,1)[12] -298.85 (2,1,1)(1,1,1)[12] -300.22 (1,1,2)(1,1,1)[12] -299.98 (1,1,1)(0,1,1)[12] -297.43
# ─── 3. Fit the best model: SARIMA(1,1,1)(1,1,1)[12] ────────────────────────
sarima = SARIMAX(
    log_series,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False
)
sarima_fit = sarima.fit(disp=False)

print(sarima_fit.summary())

# ─── 4. Residual diagnostics ─────────────────────────────────────────────────
res_sarima = sarima_fit.resid
lb_sarima  = acorr_ljungbox(res_sarima, lags=24, return_df=True)
sig_sarima = lb_sarima[lb_sarima['lb_pvalue'] < 0.05]
print(f"\nLjung-Box significant lags: {len(sig_sarima)} (want 0)")

stat2, p2 = shapiro(res_sarima[13:])
print(f"Shapiro-Wilk p = {p2:.4f}  → {'Normal ✓' if p2 > 0.05 else 'Non-normal'}")
OUTPUT
Ljung-Box significant lags: 0 ← clean white-noise residuals ✓ Shapiro-Wilk p = 0.2614 → Normal ✓
# ─── 5. Hold-out test: last 24 months as test set ────────────────────────────
train_log = log_series[:-24]
test_orig = series[-24:]

sarima_train = SARIMAX(
    train_log,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False
).fit(disp=False)

fc_obj   = sarima_train.get_forecast(steps=24)
fc_mean  = np.exp(fc_obj.predicted_mean)
fc_lower = np.exp(fc_obj.conf_int().iloc[:, 0])
fc_upper = np.exp(fc_obj.conf_int().iloc[:, 1])

# ─── 6. Evaluate ─────────────────────────────────────────────────────────────
from sklearn.metrics import mean_absolute_error, mean_squared_error

mae   = mean_absolute_error(test_orig, fc_mean)
rmse  = np.sqrt(mean_squared_error(test_orig, fc_mean))
mape  = np.mean(np.abs((test_orig.values - fc_mean.values) / test_orig.values)) * 100

# Naïve seasonal benchmark: repeat same month from 12 months ago
naive_seas  = series[-36:-12].values
mae_naive   = mean_absolute_error(test_orig, naive_seas)
mase        = mae / mae_naive

print(f"MAE:  {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"MAPE: {mape:.2f}%")
print(f"MASE: {mase:.4f}  → {'Beats naïve ✓' if mase < 1 else 'Does NOT beat naïve ✗'}")
OUTPUT
MAE: 10.34 RMSE: 12.87 MAPE: 2.11% MASE: 0.4183 → Beats naïve ✓
# ─── 7. Full auto_arima shortcut (pmdarima) ───────────────────────────────────
from pmdarima import auto_arima

auto_model = auto_arima(
    log_series,
    seasonal=True,
    m=12,
    d=None,          # auto-detect
    D=None,          # auto-detect seasonal differencing
    stepwise=True,   # faster than full grid search
    information_criterion='aic',
    trace=True,
    error_action='ignore',
    suppress_warnings=True
)
print(f"\nBest model: {auto_model.order}  Seasonal: {auto_model.seasonal_order}")
print(f"AIC: {auto_model.aic():.2f}")
OUTPUT
Fit ARIMA: order=(2, 1, 2) seasonal_order=(1, 1, 0); AIC=-298.21 Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 1, 0); AIC=-268.44 Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0); AIC=-286.13 Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 0); AIC=-295.72 Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 1); AIC=-302.14 ← selected ... Best model: (1, 1, 1) Seasonal: (1, 1, 1, 12) AIC: -302.14

Section 12

ARIMA vs SARIMA — Head-to-Head Comparison

🎦 Animated — Residual ACF: ARIMA vs SARIMA (Why SARIMA Wins)
ARIMA(1,1,1) Residual ACF 1 12 SEASONAL SPIKE! Seasonal pattern NOT captured ✗ SARIMA(1,1,1)(1,1,1)[12] Residual ACF 1 12 All within confidence bands — white noise ✓

Left: ARIMA residual ACF shows a massive unexplained spike at lag 12 — the model left the entire seasonal pattern in the residuals. Right: SARIMA residual ACF — all bars inside confidence bands, confirming white-noise residuals. This is the single best visual argument for always using SARIMA when seasonal data is present.

PropertyARIMA(p,d,q)SARIMA(p,d,q)(P,D,Q)[s]
Models seasonalityNo — only via long AR/MA lagsYes — dedicated seasonal layer
Parameters3 (p,d,q)7 (p,d,q,P,D,Q,s)
Data requiredSmaller datasets fineNeeds ≥ 3–4 full seasonal cycles
Residual ACF at lag sUsually still significant spikeInside confidence bands ✓
MAPE on seasonal data5–15% typical1–4% typical
AIC on airline data−242−302 (lower = better)
Training speedFastSlower (more parameters)
Best forNon-seasonal or unknown seasonalityAny series with a clear repeating cycle

Section 13

Decision Flowchart — Which Model to Use?

🎦 Animated — ARIMA vs SARIMA Decision Tree
Start: You have Is it stationary? YES NO Log-diff series Set d=1 (or d=2) Seasonal pattern? YES SARIMA (p,d,q)(P,D,Q)[s] NO ARIMA(p, d, q) Already stationary d=0

Follow the arrows. Stationarity → differencing → seasonality check → model choice. The only decision you need to make before writing code.


Section 14

Evaluation Metrics — How to Know If Your Model Is Good

MetricFormulaScale-free?Punishes Large Errors?Best Use
MAE mean(|yₜ − ŷₜ|) No — same units as data No — equal weight Intuitive error in original units
RMSE √mean((yₜ − ŷₜ)²) No Yes — squares big errors When large errors are especially costly
MAPE mean(|yₜ−ŷₜ|/|yₜ|)×100 Yes — percentage No Business reporting; comparing across series
MASE MAE / MAE(naïve) Yes No MASE < 1 = beats naïve; gold standard metric
AIC / BIC −2·ln(L) + k·penalty N/A N/A Model selection only — never use for forecast quality
⚠️
The One Mistake That Invalidates All Your Metrics

Never shuffle time series data before splitting train/test. You must split chronologically — train on earlier data, test on later data. If you randomly split rows, future information leaks into the training set, the model effectively memorises the future, and your metrics are completely fictitious. A model that "achieves 1% MAPE" via random split may actually be 15% on a proper temporal split.


Section 15

Golden Rules — ARIMA & SARIMA in Production

⏳ ARIMA & SARIMA — Non-Negotiable Rules
1
Always plot before modelling. Trend, seasonality, outliers, level shifts, and missing values are all visible in a simple line plot. Ninety percent of modelling decisions are obvious from the chart — the rest come from ACF/PACF.
2
Run ADF and KPSS together. ADF: H₀ = non-stationary. KPSS: H₀ = stationary. If ADF says stationary but KPSS says non-stationary, the evidence is conflicting — apply one more difference and re-test.
3
Do not over-difference. d ≥ 2 is almost never correct for real economic data. Over-differencing introduces MA unit roots, inflates variance, and degrades forecast accuracy. Stop at the minimum d that achieves stationarity.
4
If data has a seasonal pattern, always use SARIMA — never plain ARIMA. Trying to capture a 12-month seasonal pattern with AR(12) is statistically inefficient, wastes 12 parameters, and almost always produces worse forecasts than SARIMA(0,0,0)(1,1,1)[12].
5
Diagnose residuals — it is not optional. Ljung-Box p < 0.05 means your model has not captured everything in the data. Never publish a forecast from a model with autocorrelated residuals. Go back to step 4 and revise p, q, P, or Q.
6
Use AIC for model selection, not RMSE on training data. RMSE on the training set always rewards models with more parameters. AIC penalises complexity appropriately. BIC penalises even more aggressively — prefer BIC for small datasets (< 200 obs).
7
Forecast intervals grow with horizon — that is correct, not a bug. A 12-month-ahead SARIMA forecast must have a much wider interval than a 1-month-ahead forecast. If your intervals are suspiciously tight at long horizons, something is wrong with your model. Always report intervals alongside point forecasts.