The Story That Explains Time Series
You could guess randomly. Or you could read her diary and notice patterns: summers are warm, winters are cold, and today's temperature is probably close to yesterday's. That pattern — using past values of a variable, measured over time, to predict future values — is the entire heart of time series analysis.
A doctor reading a patient's blood pressure chart over six months is doing the same thing. Stock traders watching price history. Engineers monitoring sensor readings in a factory. All of them are looking at a sequence of data ordered by time — and asking: what comes next?
A time series is simply a sequence of data points collected at successive points in time, usually at uniform intervals — hourly, daily, monthly, or annually. Unlike cross-sectional data (one snapshot of many subjects), time series data is about one subject evolving through time.
In standard regression, observations are assumed to be independent — knowing row 42 tells you nothing about row 43. In time series, the opposite is true: yesterday's value is your single best predictor of today's value. Ignoring time-order destroys the most valuable information in the dataset. This is why special models are needed.
The Four Components of a Time Series
Every real-world time series is a mixture of four underlying components. Understanding them is the foundation of every model we will build.
If the seasonal swings stay roughly the same size as the series grows → use
additive decomposition: Y = T + S + C + I.
If the swings grow proportionally with the level (bigger spikes when prices are higher)
→ use multiplicative: Y = T × S × C × I.
Real-world financial and retail data is almost always multiplicative.
The original series (gold) is the sum of the smooth trend (blue), repeating seasonality (green), and random residuals (purple). Decomposing a series lets you analyse each component independently.
Stationarity — The Most Important Concept You Will Ever Learn
Stationarity means the statistical properties of your series do not change over time. The target stays in one place. Most classical time series models require stationarity, so transforming non-stationary data is step one in almost every real project.
ACF & PACF — The Model's X-Ray
Before choosing a model, you need to understand the memory structure of your series. Two plots reveal this: the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF). Every time series practitioner relies on these like a doctor relies on an X-ray.
| Pattern in ACF | Pattern in PACF | Suggested Model | Intuition |
|---|---|---|---|
| Cuts off after lag q | Tails off / decays slowly | MA(q) | Series has memory of q past shocks only |
| Tails off / decays slowly | Cuts off after lag p | AR(p) | Series regresses on p past values directly |
| Tails off (decaying waves) | Tails off (decaying waves) | ARMA(p,q) | Mix of autoregressive and moving-average memory |
| Very slow, linear decay | First lag near 1, rest near 0 | Difference first! | Series is non-stationary (unit root present) |
| Spikes at multiples of s (e.g. 12, 24) | Spikes at multiples of s | SARIMA | Strong seasonal pattern at period s |
AR, MA, ARMA — The Building Blocks
Before learning ARIMA, you need to understand its three fundamental ingredients separately. Think of them as musical notes — each is simple, but combined they produce rich, complex melodies.
ARIMA — The Most Famous Time Series Model
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
# ── Step 1: Create a sample series (monthly airline passengers) ──
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True, squeeze=True)
series = df['Passengers']
# ── Step 2: Test stationarity (ADF test) ──
result = adfuller(series)
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
# p > 0.05 → non-stationary → need differencing
# ── Step 3: Log-difference to achieve stationarity ──
log_series = np.log(series)
diff_series = log_series.diff().dropna()
# ── Step 4: Re-test ──
result2 = adfuller(diff_series)
print(f"\nAfter diff — p-value: {result2[1]:.4f}")
# Now p < 0.05 → stationary ✓
# ── Step 5: Fit ARIMA(2,1,2) ──
model = ARIMA(log_series, order=(2, 1, 2))
fitted = model.fit()
print(fitted.summary())
# ── Step 6: Forecast 12 months ahead ──
forecast = fitted.forecast(steps=12)
forecast_original = np.exp(forecast) # undo the log transform
print("\n12-Month Forecast (original scale):")
print(forecast_original.round(0))
ARIMA handles trend and autocorrelation beautifully, but it does not natively model seasonality. The airline data has a strong 12-month seasonal cycle. Fitting ARIMA without accounting for it leaves systematic patterns in the residuals — which means the model is leaving money on the table. Enter SARIMA.
SARIMA — Adding Seasonality to ARIMA
SARIMA (Seasonal ARIMA) extends ARIMA with a second set of parameters that capture patterns repeating at a fixed seasonal period — monthly data with yearly cycles, weekly data with 7-day cycles, and so on.
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings('ignore')
# Fit SARIMA(1,1,1)(1,1,1)[12] on log-scale airline data
sarima_model = SARIMAX(
log_series,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False
)
sarima_fit = sarima_model.fit(disp=False)
print(f"AIC: {sarima_fit.aic:.2f}")
# Forecast 24 months ahead
forecast_obj = sarima_fit.get_forecast(steps=24)
forecast_mean = np.exp(forecast_obj.predicted_mean)
conf_int = np.exp(forecast_obj.conf_int())
print("\nNext 12-month point forecasts:")
print(forecast_mean[::1].round(0).values[:12])
Use auto_arima from the pmdarima library. It performs a stepwise grid search
over (p,d,q)(P,D,Q) combinations, using AIC or BIC as the selection criterion.
It saves hours of manual ACF/PACF reading for exploratory work:
auto_arima(series, seasonal=True, m=12, stepwise=True, information_criterion='aic')
Exponential Smoothing Family — Simple, Double & Triple
Before ARIMA became popular, the Exponential Smoothing family dominated forecasting in business and government. They remain widely used because they are fast, interpretable, and surprisingly accurate — especially for short-term forecasting.
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# Holt-Winters (Triple Exponential Smoothing) — multiplicative seasonality
hw_model = ExponentialSmoothing(
series,
trend='add',
seasonal='mul', # multiplicative: seasonal swings grow with level
seasonal_periods=12,
damped_trend=True # prevents trend from going infinite long-term
)
hw_fit = hw_model.fit(optimized=True)
print(f"Alpha (level): {hw_fit.params['smoothing_level']:.4f}")
print(f"Beta (trend): {hw_fit.params['smoothing_trend']:.4f}")
print(f"Gamma (seasonal): {hw_fit.params['smoothing_seasonal']:.4f}")
hw_forecast = hw_fit.forecast(12)
print("\nHolt-Winters 12-step forecast:")
print(hw_forecast.round(1).values)
Model Comparison — Which to Use When?
| Model | Handles Trend? | Handles Seasonality? | Stationarity Required? | Best For | Weaknesses |
|---|---|---|---|---|---|
| SES | No | No | Implicitly | Random-walk-like series, no structure | Misses everything systematic |
| Holt's Linear | Yes | No | Implicitly | Trended series without seasonal cycle | Extrapolates trend indefinitely |
| Holt-Winters | Yes | Yes | Implicitly | Business KPIs, retail, utilities | No uncertainty intervals natively |
| ARIMA | Via d | No | Yes (via d) | Stationary or trend-stationary data | No seasonality; complex to tune |
| SARIMA | Via d | Yes | Yes | Most univariate forecasting tasks | Many parameters to tune; slow |
| Prophet | Yes | Multiple | No | Business forecasting with holidays | Not ideal for scientific data |
| LSTM / GRU | Learns | Learns | No | Complex multi-variable, long sequences | Needs lots of data; black box |
Begin with Holt-Winters as your baseline. If it does poorly, add SARIMA. For business data with holidays and missing days, try Prophet. Only reach for deep learning (LSTM) when you have 5,000+ observations and none of the above are competitive. Complexity should be earned, not assumed.
Facebook Prophet — Modern Forecasting Made Simple
Released by Meta's Core Data Science team in 2017, Prophet is a procedure for forecasting time series data based on an additive model that fits non-linear trends, multiple seasonalities, and effects of holidays. It was designed to work with business data and non-statisticians.
from prophet import Prophet
import pandas as pd
# Prophet expects columns named 'ds' (date) and 'y' (value)
df_prophet = pd.DataFrame({
'ds': series.index,
'y': series.values
})
# Define holidays (optional but powerful)
holidays = pd.DataFrame({
'holiday': ['new_year', 'new_year'],
'ds': pd.to_datetime(['1950-01-01', '1951-01-01']),
'lower_window': [0, 0],
'upper_window': [1, 1],
})
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=False, # monthly data — no weekly pattern
daily_seasonality=False,
holidays=holidays,
changepoint_prior_scale=0.05, # controls flexibility of trend
seasonality_prior_scale=10.0 # controls flexibility of seasonality
)
model.fit(df_prophet)
# Create future dataframe and forecast
future = model.make_future_dataframe(periods=24, freq='MS')
forecast = model.predict(future)
# Key output columns
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(6))
Evaluation Metrics — How to Measure Forecast Quality
A model that looks good on a training plot can still be terrible at prediction. You must always evaluate on a held-out test set — ideally the most recent data, since time series cannot be shuffled.
| Metric | Formula | Range | Sensitive to Outliers? | Best When |
|---|---|---|---|---|
| MAE — Mean Absolute Error | mean(|y − ŷ|) | [0, ∞) | No — robust | Error in original units, outliers rare |
| RMSE — Root Mean Sq. Error | √mean((y − ŷ)²) | [0, ∞) | Yes — punishes large errors | Large errors are especially costly |
| MAPE — Mean Abs. % Error | mean(|y−ŷ|/|y|)×100 | [0, ∞)% | Moderate | Comparing across series at different scales |
| MASE — Mean Abs. Scaled Error | MAE / MAE(naïve) | [0, ∞) | No | MASE < 1 = beats naïve forecast; scale-free gold standard |
| AIC / BIC | Log-likelihood − penalty | (−∞, ∞) | N/A | Model selection (lower is better); penalises complexity |
Time series data must be split temporally — training on early data, testing on recent data. Randomly shuffling observations and doing a standard train-test split leaks future information into the past and produces optimistically biased metrics that are completely meaningless for real forecasting performance.
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
# Temporal train-test split — last 12 months = test set
train = series[:-12]
test = series[-12:]
# Fit SARIMA on training data only
model_eval = SARIMAX(np.log(train), order=(1,1,1), seasonal_order=(1,1,1,12))
fitted_eval = model_eval.fit(disp=False)
# Forecast 12 steps ahead
pred_log = fitted_eval.forecast(steps=12)
pred = np.exp(pred_log)
# Evaluate
mae = mean_absolute_error(test, pred)
rmse = np.sqrt(mean_squared_error(test, pred))
mape = np.mean(np.abs((test.values - pred) / test.values)) * 100
# Naïve benchmark: last known value repeated
naive_pred = np.repeat(train.iloc[-1], 12)
mae_naive = mean_absolute_error(test, naive_pred)
mase = mae / mae_naive
print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"MAPE: {mape:.2f}%")
print(f"MASE: {mase:.4f} ({'beats naïve ✓' if mase < 1 else 'worse than naïve ✗'})")
Residual Diagnostics — Is Your Model Actually Good?
A well-fitted model leaves behind white noise residuals — errors that are random, uncorrelated, and normally distributed. If you can still see patterns in the residuals, your model has not captured everything the data has to offer.
from statsmodels.stats.diagnostic import acorr_ljungbox
residuals = fitted_eval.resid
# Ljung-Box test on first 20 lags
lb_test = acorr_ljungbox(residuals, lags=20, return_df=True)
print(lb_test[lb_test['lb_pvalue'] < 0.05]) # print only significant lags
# Shapiro-Wilk normality test on residuals
from scipy.stats import shapiro
stat, p = shapiro(residuals[1:]) # drop first NaN
print(f"\nShapiro-Wilk p-value: {p:.4f}")
print("Residuals look normal ✓" if p > 0.05 else "Non-normal residuals — investigate")
GARCH — Modelling Volatility in Financial Series
In financial time series, it is common to observe volatility clustering: calm periods are followed by calm periods, and turbulent periods by turbulent periods. Standard ARIMA assumes constant variance — it cannot capture this. GARCH was designed precisely for this.
from arch import arch_model
import yfinance as yf
# Download NIFTY 50 daily returns
nifty = yf.download('^NSEI', start='2018-01-01', end='2024-01-01')['Close']
returns = 100 * nifty.pct_change().dropna()
# Fit GARCH(1,1) model
garch = arch_model(returns, vol='Garch', p=1, q=1, dist='normal')
garch_fit = garch.fit(disp='off')
print(garch_fit.summary())
# Forecast volatility 10 days ahead
vol_forecast = garch_fit.forecast(horizon=10)
print("\n10-day Conditional Variance Forecast:")
print(vol_forecast.variance.values[-1].round(4))
VAR — Vector Autoregression for Multiple Series
All models so far are univariate — one series at a time. But in the real world, variables influence each other. GDP affects unemployment. Interest rates affect bond prices. Rainfall affects crop yields. VAR models this mutual dependency.
In a VAR model of k variables, every variable's own past and the past of every other variable is used as a predictor. A VAR(2) model with 3 variables has 3 equations, each with 6 lagged terms (3 variables × 2 lags) plus a constant. This captures Granger causality: whether knowing variable X's history improves forecasts of variable Y — even after accounting for Y's own history.
from statsmodels.tsa.vector_ar.var_model import VAR
import pandas as pd
import numpy as np
# Simulate two correlated stationary series
np.random.seed(42)
n = 200
e1 = np.random.normal(0, 1, n)
e2 = np.random.normal(0, 1, n)
y1 = np.zeros(n); y2 = np.zeros(n)
for t in range(2, n):
y1[t] = 0.5*y1[t-1] + 0.3*y2[t-1] + e1[t] # y1 depends on y1 and y2
y2[t] = 0.2*y1[t-1] + 0.6*y2[t-1] + e2[t] # y2 depends on y1 and y2
df_var = pd.DataFrame({'GDP_growth': y1, 'Unemployment': y2})
# Fit VAR — auto-select lag order by AIC
model_var = VAR(df_var)
lag_order = model_var.select_order(10)
p_optimal = lag_order.aic
print(f"Optimal lag by AIC: {p_optimal}")
result_var = model_var.fit(p_optimal)
print(result_var.summary())
# 5-step ahead forecast
forecast = result_var.forecast(df_var.values[-p_optimal:], steps=5)
forecast_df = pd.DataFrame(forecast, columns=['GDP_growth_hat', 'Unemployment_hat'])
print("\n5-step forecast:")
print(forecast_df.round(4))