The Ice Cream & Drowning Story
Imagine a researcher combs through 20 years of summer data and discovers something alarming: on days when ice cream sales spike, the number of drowning incidents also rises. Does ice cream cause drowning? Should we ban ice cream at beaches?
Of course not. Both are driven by a third factor — hot weather. When it's hot, people buy more ice cream and swim more, which raises drowning risk. The two variables move together, but one does not cause the other.
Covariance tells you whether two variables tend to increase together, decrease together, or move in opposite directions. It measures the direction of a linear relationship — but not its strength on its own.
Understanding covariance is the first step toward understanding correlation, PCA, and multivariate statistics. Every time a data scientist asks "do these two features move together?", they are thinking about covariance.
What Covariance Measures
Covariance measures how much two variables change together relative to their own means. For each observation, we look at how far variable X is from its mean, and how far variable Y is from its mean. We then multiply those two deviations and average the results.
- X↑ when Y↑
- X↓ when Y↓
- E.g. height & weight
- X↑ when Y↓
- X↓ when Y↑
- E.g. price & demand
- No linear relationship
- May still have a curve
- E.g. shoe size & IQ
Covariance has no upper or lower bound. Its value depends on the units of X and Y, making it hard to compare across datasets. Correlation normalises covariance to the range [−1, +1] and is therefore more interpretable. Always know which one you are looking at.
The Formula
There are two versions of the covariance formula — one for a population and one for a sample. In data science, you will almost always use the sample formula because you are working with a subset of data.
When you estimate the mean from the sample, you "use up" one degree of freedom. Dividing by n − 1 corrects for this and prevents underestimating variability. This is called Bessel's correction.
Step-by-Step Calculation
Let's use a real-world scenario: five students' hours studied (X) and their exam score (Y). We want to know whether studying more is associated with scoring higher.
| Student | Hours Studied (X) | Exam Score (Y) |
|---|---|---|
| A | 2 | 50 |
| B | 4 | 60 |
| C | 6 | 70 |
| D | 8 | 85 |
| E | 10 | 95 |
x̄ = (2 + 4 + 6 + 8 + 10) / 5 = 6.0
ȳ = (50 + 60 + 70 + 85 + 95) / 5 = 72.0
A: (2−6)(50−72) = (−4)(−22) = 88
B: (4−6)(60−72) = (−2)(−12) = 24
C: (6−6)(70−72) = (0)(−2) = 0
D: (8−6)(85−72) = (2)(13) = 26
E: (10−6)(95−72) = (4)(23) = 92
88 + 24 + 0 + 26 + 92 = 230
Cov(X, Y) = 230 / 4 = 57.5
The positive covariance confirms that students who study more hours tend to score higher. The two variables move in the same direction. The raw value 57.5 is in units of (hours × marks), which is hard to interpret on its own — that is why we often move on to computing Pearson's correlation.
Python Implementation
Manual Calculation
import numpy as np
X = [2, 4, 6, 8, 10] # Hours studied
Y = [50, 60, 70, 85, 95] # Exam scores
x_mean = np.mean(X) # 6.0
y_mean = np.mean(Y) # 72.0
n = len(X)
cov_manual = sum((x - x_mean) * (y - y_mean) for x, y in zip(X, Y)) / (n - 1)
print(f"Manual Sample Covariance: {cov_manual}")
# Output: Manual Sample Covariance: 57.5
Using NumPy
import numpy as np
X = [2, 4, 6, 8, 10]
Y = [50, 60, 70, 85, 95]
# np.cov returns a 2x2 covariance matrix
cov_matrix = np.cov(X, Y)
print(cov_matrix)
# [[10. 23. ]
# [23. 57.5]] ← bottom-right is Cov(X, Y)
print(f"Cov(X, Y) = {cov_matrix[0, 1]}")
# Output: Cov(X, Y) = 57.5 (same as cov_matrix[1, 0])
Using Pandas
import pandas as pd
df = pd.DataFrame({
'hours': [2, 4, 6, 8, 10],
'score': [50, 60, 70, 85, 95]
})
# Full covariance matrix for all numeric columns
print(df.cov())
# hours score
# hours 10.0 23.0
# score 23.0 57.5
# Single pairwise value
print(df['hours'].cov(df['score']))
# Output: 57.5
Unlike np.std(), the function np.cov() already uses
ddof=1 by default, giving the sample covariance.
If you specifically need the population covariance, pass ddof=0
explicitly: np.cov(X, Y, ddof=0).
Covariance vs Correlation — Comparison
Covariance and correlation are closely related. Correlation is simply the standardised version of covariance — you divide by the standard deviations of both variables to remove the effect of units.
| Property | Covariance | Correlation |
|---|---|---|
| Range | (−∞, +∞) | [−1, +1] |
| Unit-dependent? | Yes | No |
| Interpretable magnitude? | Hard | Easy |
| Detects direction | Yes | Yes |
| Used in PCA? | Yes | Sometimes |
| Formula | Σ(xᵢ−x̄)(yᵢ−ȳ) / (n−1) | Cov(X,Y) / (σₓ · σᵧ) |
| Symmetric? | Yes | Yes |
r(X,Y) = Cov(X,Y) / (σₓ × σᵧ)
Pearson's correlation r is covariance divided by the product of both standard deviations.
This scales the value to [−1, +1] regardless of the original units.
Golden Rules
np.cov() does this by default, but be explicit if you are computing
manually or using other libraries to avoid the population-formula mistake.