Statistical Time Series Models

Section 01

The Story That Explains Time Series

📖 Real World Analogy

The Weather Diary & The Doctor's Chart

Imagine your grandmother kept a diary — every single day for 40 years, she wrote down the temperature at noon. Now you want to know: what will tomorrow's temperature be?

You could guess randomly. Or you could read her diary and notice patterns: summers are warm, winters are cold, and today's temperature is probably close to yesterday's. That pattern — using past values of a variable, measured over time, to predict future values — is the entire heart of time series analysis.

A doctor reading a patient's blood pressure chart over six months is doing the same thing. Stock traders watching price history. Engineers monitoring sensor readings in a factory. All of them are looking at a sequence of data ordered by time — and asking: what comes next?

A time series is simply a sequence of data points collected at successive points in time, usually at uniform intervals — hourly, daily, monthly, or annually. Unlike cross-sectional data (one snapshot of many subjects), time series data is about one subject evolving through time.

🔎

Why Time Series Is Different From Regular Regression

In standard regression, observations are assumed to be independent — knowing row 42 tells you nothing about row 43. In time series, the opposite is true: yesterday's value is your single best predictor of today's value. Ignoring time-order destroys the most valuable information in the dataset. This is why special models are needed.

Section 02

The Four Components of a Time Series

Every real-world time series is a mixture of four underlying components. Understanding them is the foundation of every model we will build.

📈

Trend (T)

Long-run direction

The long-term upward or downward movement in the data. India's GDP has trended upward for decades. The price of floppy disks has trended downward. Trend is the big picture story.

🍂

Seasonality (S)

Fixed periodic pattern

Regular, predictable patterns that repeat every fixed period. Ice cream sales spike every summer. Electricity demand peaks every weekday morning. Seasonality has a known, fixed frequency.

🔄

Cyclicality (C)

Variable-length waves

Wave-like fluctuations around the trend with irregular duration — not a fixed calendar. Economic boom-and-bust cycles last 3–10 years each. Unlike seasonality, you can't predict when the next cycle arrives.

🎲

Irregular / Noise (I)

Random shocks

The leftover variation after removing trend, seasonality, and cycles. A factory fire, an unexpected election result, a pandemic. These are unpredictable one-off events — no model can forecast them.

⚡

Additive vs Multiplicative Decomposition

If the seasonal swings stay roughly the same size as the series grows → use additive decomposition: Y = T + S + C + I.
If the swings grow proportionally with the level (bigger spikes when prices are higher) → use multiplicative: Y = T × S × C × I.
Real-world financial and retail data is almost always multiplicative.

📊 Time Series Decomposition — Conceptual Diagram

The original series (gold) is the sum of the smooth trend (blue), repeating seasonality (green), and random residuals (purple). Decomposing a series lets you analyse each component independently.

Section 03

Stationarity — The Most Important Concept You Will Ever Learn

📖 Story

The Moving Goalposts Problem

Imagine trying to hit a target with an arrow — but the target is constantly moving upward (like a rising stock price). Your aim that worked yesterday is wrong today. Statistical models face the same problem: if the mean and variance of your data keep shifting over time, any pattern the model learns in 2015 is useless in 2020.

Stationarity means the statistical properties of your series do not change over time. The target stays in one place. Most classical time series models require stationarity, so transforming non-stationary data is step one in almost every real project.

✅

STATIONARY

Stable Properties

Constant mean, constant variance, covariance depends only on lag — not on when you measure. White noise is the perfect example. Models love this.

📈

TREND-STATIONARY

Deterministic Trend

Has a predictable upward/downward drift but otherwise stable. Fix by subtracting a fitted trend line. Common in economic growth data.

🔴

DIFFERENCE-STATIONARY

Stochastic Trend (Unit Root)

Random walk behaviour — shocks accumulate permanently. Fix by differencing: subtract today's value from yesterday's. Most financial prices are here.

🔨 How to Test & Achieve Stationarity

Step 1

Plot it. A rising or falling mean is visible instantly. Increasing variance (funnel shape) is also obvious. Always look before you compute.

Step 2

Run the ADF Test (Augmented Dickey-Fuller). H₀ = series has a unit root (non-stationary). p-value < 0.05 → reject H₀ → stationary. p > 0.05 → non-stationary, action needed.

Step 3

Difference the series if non-stationary. First difference: Δyₜ = yₜ − yₜ₋₁. If still non-stationary, apply second difference. Most economic series need one or two differences.

Step 4

Log-transform first if variance grows with level (e.g. stock prices, GDP). Log makes variance stable, then difference to remove the trend. This is the classic log-diff transformation.

Step 5

Re-test. Run ADF again on the transformed series. Track how many differences (d) you applied — you'll need this for ARIMA.

Section 04

ACF & PACF — The Model's X-Ray

Before choosing a model, you need to understand the memory structure of your series. Two plots reveal this: the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF). Every time series practitioner relies on these like a doctor relies on an X-ray.

ACF — Autocorrelation Function

ρ(k) = Cov(yₜ, yₜ₋ₖ) / Var(yₜ)

Correlation between the series and a lagged version of itself at lag k. Includes both direct and indirect effects of all intermediate lags. Tells you: "How related is today to k days ago — directly or indirectly?"

PACF — Partial Autocorrelation Function

φ(k) = Corr(yₜ, yₜ₋ₖ | yₜ₋₁…yₜ₋ₖ₊₁)

Correlation between the series and lag k after removing the effect of all shorter lags. Shows the direct relationship only. Tells you: "Does lag k still matter once I account for all lags in between?"

Pattern in ACF	Pattern in PACF	Suggested Model	Intuition
Cuts off after lag q	Tails off / decays slowly	MA(q)	Series has memory of q past shocks only
Tails off / decays slowly	Cuts off after lag p	AR(p)	Series regresses on p past values directly
Tails off (decaying waves)	Tails off (decaying waves)	ARMA(p,q)	Mix of autoregressive and moving-average memory
Very slow, linear decay	First lag near 1, rest near 0	Difference first!	Series is non-stationary (unit root present)
Spikes at multiples of s (e.g. 12, 24)	Spikes at multiples of s	SARIMA	Strong seasonal pattern at period s

Section 05

AR, MA, ARMA — The Building Blocks

Before learning ARIMA, you need to understand its three fundamental ingredients separately. Think of them as musical notes — each is simple, but combined they produce rich, complex melodies.

🔁

AR(p) — Autoregressive

yₜ = φ₁yₜ₋₁ + φ₂yₜ₋₂ + … + φₚyₜ₋ₚ + εₜ

Today's value is a weighted sum of p past values plus noise. Like predicting tomorrow's temperature from the last p days. The AR model has infinite memory that decays exponentially — events long ago still have a tiny effect.

✓ Good for trend-like, persistent series

✗ Cannot capture sudden shock effects cleanly

🔄

MA(q) — Moving Average

yₜ = εₜ + θ₁εₜ₋₁ + θ₂εₜ₋₂ + … + θqεₜ₋q

Today's value is a weighted sum of q past random shocks (errors). Not the same as a smoothing moving average — this is a model of error propagation. A supply chain disruption echoing through the next q months is a classic MA process.

✓ Finite memory — great for short-lived shocks

✗ Poor at capturing long-run persistence

⚡

ARMA(p,q) — Combined

yₜ = φ₁yₜ₋₁+…+φₚyₜ₋ₚ + εₜ + θ₁εₜ₋₁+…+θqεₜ₋q

Combines both AR and MA components. The series is influenced by its own past and by past shocks. Most real economic and financial series follow ARMA-type processes — they have both momentum (AR) and reactions to events (MA).

✓ Parsimonious — fewer parameters for complex patterns

✗ Still requires stationarity

Section 06

ARIMA — The Most Famous Time Series Model

📖 Story

The Forgetful Stock Trader Who Learned to Remember Wisely

A stock trader looks at the change in price each day (not the absolute price) — "did the price go up or down, and by how much?" By looking at changes instead of levels, she avoids the problem of a constantly drifting price. She then spots that today's change is partly explained by yesterday's change (AR), and partly by how wrong last week's prediction was (MA). She combines them. This is exactly what ARIMA does: it differences the series to stationarity, then fits ARMA on the differenced series.

🔨 ARIMA(p, d, q) — What Each Parameter Means

AR order — number of past values (lags) used as predictors. Identified from the PACF plot. A sharp cut-off at lag 2 suggests p=2.

Integration order — number of differences needed to make the series stationary. d=0 means already stationary. d=1 is the most common. d=2 is rare.

MA order — number of past forecast errors included. Identified from the ACF plot. A sharp cut-off at lag 3 suggests q=3.

import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

# ── Step 1: Create a sample series (monthly airline passengers) ──
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True, squeeze=True)
series = df['Passengers']

# ── Step 2: Test stationarity (ADF test) ──
result = adfuller(series)
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value:       {result[1]:.4f}")
# p > 0.05 → non-stationary → need differencing

# ── Step 3: Log-difference to achieve stationarity ──
log_series = np.log(series)
diff_series = log_series.diff().dropna()

# ── Step 4: Re-test ──
result2 = adfuller(diff_series)
print(f"\nAfter diff — p-value: {result2[1]:.4f}")
# Now p < 0.05 → stationary ✓

# ── Step 5: Fit ARIMA(2,1,2) ──
model = ARIMA(log_series, order=(2, 1, 2))
fitted = model.fit()
print(fitted.summary())

# ── Step 6: Forecast 12 months ahead ──
forecast = fitted.forecast(steps=12)
forecast_original = np.exp(forecast)   # undo the log transform
print("\n12-Month Forecast (original scale):")
print(forecast_original.round(0))

OUTPUT

ADF Statistic: 0.8153 p-value: 0.9918 ← Non-stationary confirmed After diff — p-value: 0.0000 ← Stationary after log-diff ✓ 12-Month Forecast (original scale): 442.0 451.0 468.0 490.0 461.0 539.0 508.0 559.0 543.0 498.0 450.0 468.0

⚠️

ARIMA's Big Limitation

ARIMA handles trend and autocorrelation beautifully, but it does not natively model seasonality. The airline data has a strong 12-month seasonal cycle. Fitting ARIMA without accounting for it leaves systematic patterns in the residuals — which means the model is leaving money on the table. Enter SARIMA.

Section 07

SARIMA — Adding Seasonality to ARIMA

SARIMA (Seasonal ARIMA) extends ARIMA with a second set of parameters that capture patterns repeating at a fixed seasonal period — monthly data with yearly cycles, weekly data with 7-day cycles, and so on.

SARIMA Notation

ARIMA(p,d,q)(P,D,Q)[s]

Lowercase (p,d,q): non-seasonal AR, integration, MA orders — same as ARIMA. Uppercase (P,D,Q): seasonal AR, integration, MA orders. s = seasonal period (12 for monthly, 7 for daily-weekly, 4 for quarterly).

Most Common Starting Point

SARIMA(1,1,1)(1,1,1)[12]

One non-seasonal difference (d=1) to remove trend. One seasonal difference (D=1) to remove seasonality. One AR and MA term for each. This generic specification often performs surprisingly well as a first try.

from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings('ignore')

# Fit SARIMA(1,1,1)(1,1,1)[12] on log-scale airline data
sarima_model = SARIMAX(
    log_series,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False
)
sarima_fit = sarima_model.fit(disp=False)
print(f"AIC: {sarima_fit.aic:.2f}")

# Forecast 24 months ahead
forecast_obj = sarima_fit.get_forecast(steps=24)
forecast_mean = np.exp(forecast_obj.predicted_mean)
conf_int = np.exp(forecast_obj.conf_int())

print("\nNext 12-month point forecasts:")
print(forecast_mean[::1].round(0).values[:12])

OUTPUT

AIC: -287.45 Next 12-month point forecasts: [444. 470. 500. 476. 559. 537. 546. 587. 556. 513. 462. 478.]

🏆

How to Pick the Best SARIMA Orders Automatically

Use auto_arima from the pmdarima library. It performs a stepwise grid search over (p,d,q)(P,D,Q) combinations, using AIC or BIC as the selection criterion. It saves hours of manual ACF/PACF reading for exploratory work: auto_arima(series, seasonal=True, m=12, stepwise=True, information_criterion='aic')

Section 08

Exponential Smoothing Family — Simple, Double & Triple

Before ARIMA became popular, the Exponential Smoothing family dominated forecasting in business and government. They remain widely used because they are fast, interpretable, and surprisingly accurate — especially for short-term forecasting.

💡

Simple Exponential Smoothing

SES — No trend, no seasonality

Forecast is a weighted average of all past observations, with exponentially decreasing weights. Recent observations get more weight, older ones less. Controlled by α (0<α<1). High α → fast response to change. Low α → smooth, stable. Use when the series has no trend and no seasonality.

📈

Double Exponential Smoothing

Holt's Linear — Trend, no seasonality

Adds a second smoothing equation to track the trend component separately. Two parameters: α (level) and β (trend). Forecasts project the estimated trend forward. Excellent for sales data with clear upward/downward drift but no seasonal pattern.

🍂

Triple Exponential Smoothing

Holt-Winters — Trend + Seasonality

Three smoothing equations: level (α), trend (β), and seasonality (γ). Tracks all three components simultaneously. Available in both additive and multiplicative seasonal versions. The go-to model for retail sales, temperature, utility demand — anything with clear trend and season.

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt-Winters (Triple Exponential Smoothing) — multiplicative seasonality
hw_model = ExponentialSmoothing(
    series,
    trend='add',
    seasonal='mul',          # multiplicative: seasonal swings grow with level
    seasonal_periods=12,
    damped_trend=True        # prevents trend from going infinite long-term
)
hw_fit = hw_model.fit(optimized=True)

print(f"Alpha (level):    {hw_fit.params['smoothing_level']:.4f}")
print(f"Beta  (trend):    {hw_fit.params['smoothing_trend']:.4f}")
print(f"Gamma (seasonal): {hw_fit.params['smoothing_seasonal']:.4f}")

hw_forecast = hw_fit.forecast(12)
print("\nHolt-Winters 12-step forecast:")
print(hw_forecast.round(1).values)

OUTPUT

Alpha (level): 0.4183 Beta (trend): 0.0023 Gamma (seasonal): 0.0001 Holt-Winters 12-step forecast: [446.2 480.1 510.8 487.4 562.3 541.9 548.7 591.2 558.6 514.3 464.7 481.0]

Section 09

Model Comparison — Which to Use When?

Model	Handles Trend?	Handles Seasonality?	Stationarity Required?	Best For	Weaknesses
SES	No	No	Implicitly	Random-walk-like series, no structure	Misses everything systematic
Holt's Linear	Yes	No	Implicitly	Trended series without seasonal cycle	Extrapolates trend indefinitely
Holt-Winters	Yes	Yes	Implicitly	Business KPIs, retail, utilities	No uncertainty intervals natively
ARIMA	Via d	No	Yes (via d)	Stationary or trend-stationary data	No seasonality; complex to tune
SARIMA	Via d	Yes	Yes	Most univariate forecasting tasks	Many parameters to tune; slow
Prophet	Yes	Multiple	No	Business forecasting with holidays	Not ideal for scientific data
LSTM / GRU	Learns	Learns	No	Complex multi-variable, long sequences	Needs lots of data; black box

💡

The Forecasting Hierarchy — Start Simple

Begin with Holt-Winters as your baseline. If it does poorly, add SARIMA. For business data with holidays and missing days, try Prophet. Only reach for deep learning (LSTM) when you have 5,000+ observations and none of the above are competitive. Complexity should be earned, not assumed.

Section 10

Facebook Prophet — Modern Forecasting Made Simple

Released by Meta's Core Data Science team in 2017, Prophet is a procedure for forecasting time series data based on an additive model that fits non-linear trends, multiple seasonalities, and effects of holidays. It was designed to work with business data and non-statisticians.

📈

Trend Component

Piecewise linear or logistic

Automatically detects changepoints — moments when the trend shifts direction or rate. A business that accelerated after a product launch, or a metric that plateaued after a policy change. Prophet finds these automatically or you can specify them.

🍂

Seasonality Component

Fourier series decomposition

Models multiple overlapping seasonal patterns: yearly, weekly, and daily — simultaneously. Uses Fourier series to represent smooth, complex seasonal shapes rather than simple step functions. You can add custom seasonal periods (e.g., quarterly fiscal cycles).

🎉

Holiday Component

User-defined special days

You pass a dataframe of known holidays (Diwali, Christmas, Black Friday) and Prophet adds indicator variables for them plus a user-specified window around each event. This is the feature that makes Prophet uniquely useful for retail and e-commerce.

from prophet import Prophet
import pandas as pd

# Prophet expects columns named 'ds' (date) and 'y' (value)
df_prophet = pd.DataFrame({
    'ds': series.index,
    'y': series.values
})

# Define holidays (optional but powerful)
holidays = pd.DataFrame({
    'holiday': ['new_year', 'new_year'],
    'ds': pd.to_datetime(['1950-01-01', '1951-01-01']),
    'lower_window': [0, 0],
    'upper_window': [1, 1],
})

model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=False,  # monthly data — no weekly pattern
    daily_seasonality=False,
    holidays=holidays,
    changepoint_prior_scale=0.05,   # controls flexibility of trend
    seasonality_prior_scale=10.0    # controls flexibility of seasonality
)
model.fit(df_prophet)

# Create future dataframe and forecast
future = model.make_future_dataframe(periods=24, freq='MS')
forecast = model.predict(future)

# Key output columns
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(6))

OUTPUT

ds yhat yhat_lower yhat_upper 156 1961-07-01 557.43 511.20 604.12 157 1961-08-01 583.61 539.88 630.44 158 1961-09-01 543.79 498.30 590.22 159 1961-10-01 497.21 451.76 543.88 160 1961-11-01 449.08 403.71 494.22 161 1961-12-01 468.35 421.18 515.80

Section 11

Evaluation Metrics — How to Measure Forecast Quality

A model that looks good on a training plot can still be terrible at prediction. You must always evaluate on a held-out test set — ideally the most recent data, since time series cannot be shuffled.

Metric	Formula	Range	Sensitive to Outliers?	Best When
MAE — Mean Absolute Error	mean(\|y − ŷ\|)	[0, ∞)	No — robust	Error in original units, outliers rare
RMSE — Root Mean Sq. Error	√mean((y − ŷ)²)	[0, ∞)	Yes — punishes large errors	Large errors are especially costly
MAPE — Mean Abs. % Error	mean(\|y−ŷ\|/\|y\|)×100	[0, ∞)%	Moderate	Comparing across series at different scales
MASE — Mean Abs. Scaled Error	MAE / MAE(naïve)	[0, ∞)	No	MASE < 1 = beats naïve forecast; scale-free gold standard
AIC / BIC	Log-likelihood − penalty	(−∞, ∞)	N/A	Model selection (lower is better); penalises complexity

⚠️

Never Evaluate on Training Data

Time series data must be split temporally — training on early data, testing on recent data. Randomly shuffling observations and doing a standard train-test split leaks future information into the past and produces optimistically biased metrics that are completely meaningless for real forecasting performance.

from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Temporal train-test split — last 12 months = test set
train = series[:-12]
test  = series[-12:]

# Fit SARIMA on training data only
model_eval = SARIMAX(np.log(train), order=(1,1,1), seasonal_order=(1,1,1,12))
fitted_eval = model_eval.fit(disp=False)

# Forecast 12 steps ahead
pred_log = fitted_eval.forecast(steps=12)
pred = np.exp(pred_log)

# Evaluate
mae  = mean_absolute_error(test, pred)
rmse = np.sqrt(mean_squared_error(test, pred))
mape = np.mean(np.abs((test.values - pred) / test.values)) * 100

# Naïve benchmark: last known value repeated
naive_pred = np.repeat(train.iloc[-1], 12)
mae_naive  = mean_absolute_error(test, naive_pred)
mase = mae / mae_naive

print(f"MAE:  {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"MAPE: {mape:.2f}%")
print(f"MASE: {mase:.4f}  ({'beats naïve ✓' if mase < 1 else 'worse than naïve ✗'})")

OUTPUT

MAE: 15.42 RMSE: 18.67 MAPE: 3.28% MASE: 0.2871 (beats naïve ✓)

Section 12

Residual Diagnostics — Is Your Model Actually Good?

A well-fitted model leaves behind white noise residuals — errors that are random, uncorrelated, and normally distributed. If you can still see patterns in the residuals, your model has not captured everything the data has to offer.

🔎 The Four Residual Checks Every Practitioner Runs

Check 1

Plot residuals over time. Should look like random static — no visible trend, no funnel shape, no repeating waves. Any visible pattern = model is missing something.

Check 2

ACF of residuals. All lags should fall within the blue confidence bands. Significant spikes at any lag mean systematic autocorrelation was missed — increase AR or MA orders.

Check 3

Ljung-Box test. Tests jointly whether multiple ACF lags are significantly different from zero. p-value > 0.05 = residuals look like white noise. p < 0.05 = still correlated.

Check 4

Q-Q plot / Histogram of residuals. Should approximate a normal distribution. Heavy tails or strong skew suggest the model does not capture volatility clustering (consider GARCH for financial data).

from statsmodels.stats.diagnostic import acorr_ljungbox

residuals = fitted_eval.resid

# Ljung-Box test on first 20 lags
lb_test = acorr_ljungbox(residuals, lags=20, return_df=True)
print(lb_test[lb_test['lb_pvalue'] < 0.05])   # print only significant lags

# Shapiro-Wilk normality test on residuals
from scipy.stats import shapiro
stat, p = shapiro(residuals[1:])  # drop first NaN
print(f"\nShapiro-Wilk p-value: {p:.4f}")
print("Residuals look normal ✓" if p > 0.05 else "Non-normal residuals — investigate")

OUTPUT

Empty DataFrame ← No significant lags — residuals are white noise ✓ Columns: [lb_stat, lb_pvalue] Shapiro-Wilk p-value: 0.3812 Residuals look normal ✓

Section 13

GARCH — Modelling Volatility in Financial Series

In financial time series, it is common to observe volatility clustering: calm periods are followed by calm periods, and turbulent periods by turbulent periods. Standard ARIMA assumes constant variance — it cannot capture this. GARCH was designed precisely for this.

📖 Story

The Earthquake Aftershock Analogy

After a major earthquake, aftershocks cluster together — big shocks are followed by more big shocks, and the activity gradually dies down. Stock markets behave identically: a market crash is followed by more violent swings (high volatility), which slowly calm down. GARCH models this conditional heteroscedasticity — the fact that tomorrow's variance depends on today's shock size and today's variance.

GARCH(1,1) — Mean Equation

rₜ = μ + εₜ

Returns modelled as mean plus a shock. The shock εₜ = σₜ · zₜ where zₜ ~ N(0,1). The mean equation is often an ARIMA model for the returns.

GARCH(1,1) — Variance Equation

σ²ₜ = ω + α·ε²ₜ₋₁ + β·σ²ₜ₋₁

Today's variance is a weighted average of: long-run variance (ω), yesterday's squared shock (α), and yesterday's variance (β). α+β < 1 for stationarity. Typical values: α≈0.09, β≈0.90.

from arch import arch_model
import yfinance as yf

# Download NIFTY 50 daily returns
nifty = yf.download('^NSEI', start='2018-01-01', end='2024-01-01')['Close']
returns = 100 * nifty.pct_change().dropna()

# Fit GARCH(1,1) model
garch = arch_model(returns, vol='Garch', p=1, q=1, dist='normal')
garch_fit = garch.fit(disp='off')
print(garch_fit.summary())

# Forecast volatility 10 days ahead
vol_forecast = garch_fit.forecast(horizon=10)
print("\n10-day Conditional Variance Forecast:")
print(vol_forecast.variance.values[-1].round(4))

OUTPUT

Constant Mean - GARCH Model Results ========================================================================== omega: 0.0312 (p=0.001) ← long-run variance weight alpha[1]: 0.0871 (p=0.000) ← shock impact beta[1]: 0.9038 (p=0.000) ← variance persistence alpha + beta = 0.9909 < 1 ✓ Stationary volatility process 10-day Conditional Variance Forecast: [0.8821 0.8840 0.8858 0.8876 0.8893 0.8910 0.8926 0.8942 0.8958 0.8973]

Section 14

VAR — Vector Autoregression for Multiple Series

All models so far are univariate — one series at a time. But in the real world, variables influence each other. GDP affects unemployment. Interest rates affect bond prices. Rainfall affects crop yields. VAR models this mutual dependency.

📈

What VAR Does That ARIMA Cannot

In a VAR model of k variables, every variable's own past and the past of every other variable is used as a predictor. A VAR(2) model with 3 variables has 3 equations, each with 6 lagged terms (3 variables × 2 lags) plus a constant. This captures Granger causality: whether knowing variable X's history improves forecasts of variable Y — even after accounting for Y's own history.

from statsmodels.tsa.vector_ar.var_model import VAR
import pandas as pd
import numpy as np

# Simulate two correlated stationary series
np.random.seed(42)
n = 200
e1 = np.random.normal(0, 1, n)
e2 = np.random.normal(0, 1, n)
y1 = np.zeros(n); y2 = np.zeros(n)

for t in range(2, n):
    y1[t] = 0.5*y1[t-1] + 0.3*y2[t-1] + e1[t]   # y1 depends on y1 and y2
    y2[t] = 0.2*y1[t-1] + 0.6*y2[t-1] + e2[t]   # y2 depends on y1 and y2

df_var = pd.DataFrame({'GDP_growth': y1, 'Unemployment': y2})

# Fit VAR — auto-select lag order by AIC
model_var = VAR(df_var)
lag_order = model_var.select_order(10)
p_optimal = lag_order.aic
print(f"Optimal lag by AIC: {p_optimal}")

result_var = model_var.fit(p_optimal)
print(result_var.summary())

# 5-step ahead forecast
forecast = result_var.forecast(df_var.values[-p_optimal:], steps=5)
forecast_df = pd.DataFrame(forecast, columns=['GDP_growth_hat', 'Unemployment_hat'])
print("\n5-step forecast:")
print(forecast_df.round(4))

OUTPUT

Optimal lag by AIC: 2 5-step forecast: GDP_growth_hat Unemployment_hat 0 0.3127 0.2841 1 0.2563 0.2434 2 0.2106 0.2094 3 0.1798 0.1826 4 0.1561 0.1619

Section 15

Golden Rules of Time Series Modelling

⌛ Time Series — Rules You Must Never Break

Always plot the series first. Before running a single line of code, visualise the raw data. Look for trend, seasonality, volatility clustering, structural breaks, and outliers. Eighty percent of your modelling decisions will be obvious from the plot.

Check stationarity before fitting ARIMA/ARMA. Run the ADF test. If non-stationary, difference the series until it is. Log-transform first if variance grows with level. Track the number of differences (d) you apply.

Never randomly shuffle time series data for train-test split. Split temporally — train on the past, test on the future. Random shuffling leaks future information into the model and produces completely invalid performance metrics.

Always check residuals. Plot them, run the Ljung-Box test, check the ACF. A good model leaves behind white noise. Any systematic pattern in residuals means the model is leaving money on the table — re-specify it.

Use a naïve benchmark. Before declaring victory, check your MASE. If MASE ≥ 1, your model does not outperform a simple "repeat the last observation" rule. Your complex SARIMA should comfortably beat that before going to production.

Prefer parsimony. Between two models with similar AIC, always choose the simpler one (fewer parameters). Over-parameterised models fit noise and forecast poorly on new data. Occam's Razor applies to time series harder than almost anywhere else.

Forecast uncertainty grows with horizon. Always show prediction intervals, not just point forecasts. A 12-month-ahead ARIMA forecast has enormous uncertainty — communicating that uncertainty honestly is as important as the forecast itself.