Covariance in Statistics

Section 01

The Ice Cream & Drowning Story

Imagine a researcher combs through 20 years of summer data and discovers something alarming: on days when ice cream sales spike, the number of drowning incidents also rises. Does ice cream cause drowning? Should we ban ice cream at beaches?

Of course not. Both are driven by a third factor — hot weather. When it's hot, people buy more ice cream and swim more, which raises drowning risk. The two variables move together, but one does not cause the other.

💡

This is what Covariance captures

Covariance tells you whether two variables tend to increase together, decrease together, or move in opposite directions. It measures the direction of a linear relationship — but not its strength on its own.

Understanding covariance is the first step toward understanding correlation, PCA, and multivariate statistics. Every time a data scientist asks "do these two features move together?", they are thinking about covariance.

Section 02

What Covariance Measures

Covariance measures how much two variables change together relative to their own means. For each observation, we look at how far variable X is from its mean, and how far variable Y is from its mean. We then multiply those two deviations and average the results.

Positive Cov

Cov > 0

X↑ when Y↑
X↓ when Y↓
E.g. height & weight

Negative Cov

Cov < 0

X↑ when Y↓
X↓ when Y↑
E.g. price & demand

Zero Cov

Cov ≈ 0

No linear relationship
May still have a curve
E.g. shoe size & IQ

⚠️

Covariance ≠ Correlation

Covariance has no upper or lower bound. Its value depends on the units of X and Y, making it hard to compare across datasets. Correlation normalises covariance to the range [−1, +1] and is therefore more interpretable. Always know which one you are looking at.

Section 03

The Formula

There are two versions of the covariance formula — one for a population and one for a sample. In data science, you will almost always use the sample formula because you are working with a subset of data.

Population Covariance

Cov(X,Y) = Σ (xᵢ − μₓ)(yᵢ − μᵧ) / N

Divide by N (all members). Use only when you have data for the entire population, e.g. all employees in a company.

Sample Covariance

Cov(X,Y) = Σ (xᵢ − x̄)(yᵢ − ȳ) / (n − 1)

Divide by n − 1 (Bessel's correction). Use this in practice — it gives an unbiased estimate of the population covariance.

🎯

Why n − 1?

When you estimate the mean from the sample, you "use up" one degree of freedom. Dividing by n − 1 corrects for this and prevents underestimating variability. This is called Bessel's correction.

Section 04

Step-by-Step Calculation

Let's use a real-world scenario: five students' hours studied (X) and their exam score (Y). We want to know whether studying more is associated with scoring higher.

Student	Hours Studied (X)	Exam Score (Y)
A	2	50
B	4	60
C	6	70
D	8	85
E	10	95

🧮 Calculating Sample Covariance

Step 1

Calculate the mean of X (hours studied).
x̄ = (2 + 4 + 6 + 8 + 10) / 5 = 6.0

Step 2

Calculate the mean of Y (exam score).
ȳ = (50 + 60 + 70 + 85 + 95) / 5 = 72.0

Step 3

Compute each deviation product (xᵢ − x̄)(yᵢ − ȳ):
A: (2−6)(50−72) = (−4)(−22) = 88
B: (4−6)(60−72) = (−2)(−12) = 24
C: (6−6)(70−72) = (0)(−2) = 0
D: (8−6)(85−72) = (2)(13) = 26
E: (10−6)(95−72) = (4)(23) = 92

Step 4

Sum the products.
88 + 24 + 0 + 26 + 92 = 230

Step 5

Divide by n − 1 = 5 − 1 = 4.
Cov(X, Y) = 230 / 4 = 57.5

✅

Result: Cov(X, Y) = 57.5

The positive covariance confirms that students who study more hours tend to score higher. The two variables move in the same direction. The raw value 57.5 is in units of (hours × marks), which is hard to interpret on its own — that is why we often move on to computing Pearson's correlation.

Section 05

Python Implementation

Manual Calculation

import numpy as np

X = [2, 4, 6, 8, 10]   # Hours studied
Y = [50, 60, 70, 85, 95]  # Exam scores

x_mean = np.mean(X)   # 6.0
y_mean = np.mean(Y)   # 72.0

n = len(X)
cov_manual = sum((x - x_mean) * (y - y_mean) for x, y in zip(X, Y)) / (n - 1)

print(f"Manual Sample Covariance: {cov_manual}")
# Output: Manual Sample Covariance: 57.5

Using NumPy

import numpy as np

X = [2, 4, 6, 8, 10]
Y = [50, 60, 70, 85, 95]

# np.cov returns a 2x2 covariance matrix
cov_matrix = np.cov(X, Y)

print(cov_matrix)
# [[10.   23. ]
#  [23.   57.5]]   ← bottom-right is Cov(X, Y)

print(f"Cov(X, Y) = {cov_matrix[0, 1]}")
# Output: Cov(X, Y) = 57.5 (same as cov_matrix[1, 0])

Using Pandas

import pandas as pd

df = pd.DataFrame({
    'hours': [2, 4, 6, 8, 10],
    'score': [50, 60, 70, 85, 95]
})

# Full covariance matrix for all numeric columns
print(df.cov())
#         hours   score
# hours    10.0    23.0
# score    23.0    57.5

# Single pairwise value
print(df['hours'].cov(df['score']))
# Output: 57.5

⚠️

NumPy Default is ddof=1 for np.cov()

Unlike np.std(), the function np.cov() already uses ddof=1 by default, giving the sample covariance. If you specifically need the population covariance, pass ddof=0 explicitly: np.cov(X, Y, ddof=0).

Section 06

Covariance vs Correlation — Comparison

Covariance and correlation are closely related. Correlation is simply the standardised version of covariance — you divide by the standard deviations of both variables to remove the effect of units.

Property	Covariance	Correlation
Range	(−∞, +∞)	[−1, +1]
Unit-dependent?	Yes	No
Interpretable magnitude?	Hard	Easy
Detects direction	Yes	Yes
Used in PCA?	Yes	Sometimes
Formula	Σ(xᵢ−x̄)(yᵢ−ȳ) / (n−1)	Cov(X,Y) / (σₓ · σᵧ)
Symmetric?	Yes	Yes

📐

The Relationship Formula

r(X,Y) = Cov(X,Y) / (σₓ × σᵧ)
Pearson's correlation r is covariance divided by the product of both standard deviations. This scales the value to [−1, +1] regardless of the original units.

Section 07

Golden Rules

🎯 Covariance — Key Rules

Sign tells direction, not magnitude. Positive = variables move together; negative = they move oppositely. The raw number says nothing about how strongly they are related.

Always use ddof=1 for sample data. np.cov() does this by default, but be explicit if you are computing manually or using other libraries to avoid the population-formula mistake.

Covariance is unit-dependent. If you change hours to minutes, your covariance value changes 60×. Never compare covariances across different datasets or unit systems — use correlation instead.

Zero covariance ≠ no relationship. Two variables can have a perfect curved relationship (e.g. Y = X²) and still show zero covariance. Covariance only captures linear association.

Covariance is symmetric. Cov(X, Y) = Cov(Y, X) always. The order of the variables does not matter.

Correlation ≠ causation. A strong covariance (or correlation) between two variables never proves that one causes the other. Always look for confounding variables, like the hot weather driving both ice cream sales and drowning rates.