The Problem Box Plots Solve
You have a dataset of 100 house prices. You could print all 100 numbers — but that tells you nothing at a glance. You could calculate the mean — but one mansion skews it. You could calculate Q1, Q2, Q3 manually — but that is five separate numbers to hold in your head.
A box plot (also called a box-and-whisker plot) solves all of this in one diagram. It shows you the median, the spread of the middle 50%, the full range of normal values, and every outlier — all at once, instantly readable.
The box plot was invented by statistician John Tukey in 1970 — the same person who defined the 1.5 × IQR outlier rule. He designed it specifically to summarise large datasets in a single visual without losing information about spread, skewness, or extreme values. It remains one of the most widely used charts in data science today.
Anatomy of a Box Plot
Every box plot has five components. Learn these and you can read any box plot instantly.
How to Read a Box Plot — Step by Step
| What you see | What it means | Action |
|---|---|---|
| Narrow box | Middle 50% is tightly clustered — low IQR | Data is consistent and predictable |
| Wide box | High spread in the typical range — large IQR | High variability in typical values |
| Median left of centre | More values cluster at the low end — right skew | Use median not mean to summarise |
| Median right of centre | More values cluster at the high end — left skew | Use median not mean to summarise |
| Long right whisker | Large normal spread on the high side | Check for skewness before modelling |
| Dots beyond whisker | Outliers — values beyond 1.5×IQR fence | Investigate — error, anomaly, or genuine extreme |
| Many outlier dots | Heavy-tailed distribution | Consider robust statistics or transformation |
Glance at the median line position to check skewness. Glance at the box width to judge spread. Glance for dots outside whiskers to spot outliers. That is all you need in 5 seconds to understand a dataset's shape.
Drawing Box Plots in Python
Basic box plot with Matplotlib
import matplotlib.pyplot as plt
salaries = [32, 33, 34, 48, 49, 51,
72, 75, 78, 95, 120, 380]
fig, ax = plt.subplots(figsize=(10, 4))
bp = ax.boxplot(salaries, vert=False, patch_artist=True, widths=0.5)
bp['boxes'][0].set_facecolor('#378add'); bp['boxes'][0].set_alpha(0.3)
bp['medians'][0].set_color('#1d9e75'); bp['medians'][0].set_linewidth(2.5)
bp['fliers'][0].set(marker='o', color='#e24b4a',
markerfacecolor='none', markersize=8)
ax.set_title('Salary Distribution — Box Plot', fontsize=13)
ax.set_xlabel('Salary (£k)')
ax.set_yticks([])
plt.tight_layout()
plt.show()
Comparing multiple groups side by side
import matplotlib.pyplot as plt
departments = {
'Engineering': [65, 72, 78, 85, 90, 95, 68, 74, 82, 88, 150],
'Marketing': [45, 48, 52, 55, 58, 61, 65, 47, 53, 60, 110],
'Sales': [35, 40, 48, 52, 55, 60, 70, 42, 50, 58, 200],
'HR': [38, 42, 44, 46, 48, 50, 52, 43, 47, 49, 90],
}
fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#378add', '#1d9e75', '#ba7517', '#7f77dd']
bp = ax.boxplot(list(departments.values()),
labels=list(departments.keys()),
patch_artist=True, widths=0.5)
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color); patch.set_alpha(0.3)
for median in bp['medians']:
median.set_color('#1a1d2e'); median.set_linewidth(2)
for flier in bp['fliers']:
flier.set(marker='o', markerfacecolor='none',
markeredgecolor='#e24b4a', markersize=8)
ax.set_title('Salary Distribution by Department', fontsize=13)
ax.set_ylabel('Salary (£k)')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Box plot with Seaborn
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {
'salary': [65,72,78,85,90,95,68,74,82,88,150,
45,48,52,55,58,61,65,47,53,60,110,
35,40,48,52,55,60,70,42,50,58,200],
'dept': ['Engineering']*11 + ['Marketing']*11 + ['Sales']*11
}
df = pd.DataFrame(data)
fig, ax = plt.subplots(figsize=(10, 5))
sns.boxplot(data=df, x='dept', y='salary',
palette=['#378add','#1d9e75','#ba7517'],
width=0.5, linewidth=1.5,
flierprops=dict(marker='o', markerfacecolor='none',
markeredgecolor='#e24b4a', markersize=8), ax=ax)
ax.set_title('Salary Distribution by Department', fontsize=13)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Reading Box Plots — Three Real Stories
Story 1 — The Reliable Machine
import matplotlib.pyplot as plt
machine_a = [19.8, 20.0, 20.1, 19.9, 20.2, 20.0, 19.8, 20.1, 20.0, 19.9]
machine_b = [14.0, 26.0, 18.0, 22.0, 20.0, 28.0, 12.0, 24.0, 16.0, 20.0]
fig, ax = plt.subplots(figsize=(8, 4))
bp = ax.boxplot([machine_a, machine_b],
labels=['Machine A', 'Machine B'],
patch_artist=True, vert=True)
bp['boxes'][0].set_facecolor('#1d9e75'); bp['boxes'][0].set_alpha(0.3)
bp['boxes'][1].set_facecolor('#e24b4a'); bp['boxes'][1].set_alpha(0.3)
ax.axhline(20, color='#7f77dd', linestyle='--', alpha=0.5, label='Target: 20g')
ax.set_title('Biscuit Weight — Machine A vs Machine B')
ax.set_ylabel('Weight (g)')
ax.legend()
plt.tight_layout()
plt.show()
Machine A — a tiny box hugging the 20g target line. Whiskers barely visible. No outliers. Machine B — a large box, long whiskers, dots scattered far above and below. Same mean, completely different story. A box plot communicates this in one glance without a single number.
Story 2 — Exam Score Distribution
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
class_a = np.clip(np.random.normal(68, 5, 40), 40, 100)
class_b = np.clip(np.random.normal(65, 18, 40), 20, 100)
fig, ax = plt.subplots(figsize=(8, 4))
bp = ax.boxplot([class_a, class_b], labels=['Class A', 'Class B'],
patch_artist=True,
flierprops=dict(marker='o', markerfacecolor='none',
markeredgecolor='#e24b4a', markersize=7))
bp['boxes'][0].set_facecolor('#378add'); bp['boxes'][0].set_alpha(0.3)
bp['boxes'][1].set_facecolor('#ba7517'); bp['boxes'][1].set_alpha(0.3)
ax.set_title('Exam Score Distribution — Two Classes')
ax.set_ylabel('Score'); ax.set_ylim(0, 110); ax.grid(axis='y', alpha=0.3)
plt.tight_layout(); plt.show()
Story 3 — Spotting Sensor Faults
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
normal = np.random.normal(21, 0.5, 50)
faulty = np.append(normal, [35.2, 38.7, 2.1])
fig, ax = plt.subplots(figsize=(8, 4))
bp = ax.boxplot(faulty, vert=False, patch_artist=True,
flierprops=dict(marker='D', markerfacecolor='#e24b4a',
markeredgecolor='#e24b4a', markersize=9))
bp['boxes'][0].set_facecolor('#378add'); bp['boxes'][0].set_alpha(0.25)
bp['medians'][0].set_color('#1d9e75'); bp['medians'][0].set_linewidth(2)
ax.set_title('Room Temperature — Sensor Readings')
ax.set_xlabel('Temperature (°C)'); ax.set_yticks([])
for x in [35.2, 38.7, 2.1]:
ax.annotate('Faulty reading', xy=(x, 1), xytext=(x, 1.3),
fontsize=9, color='#e24b4a', ha='center',
arrowprops=dict(arrowstyle='->', color='#e24b4a'))
plt.tight_layout(); plt.show()
Box Plot vs Histogram — Which to Use?
| Situation | Box plot | Histogram |
|---|---|---|
| Comparing multiple groups | Better | Cluttered |
| Spotting outliers | Better | Not shown |
| Understanding distribution shape | Limited | Better |
| Reading exact quartile values | Built in | Manual |
| Summarising in reports | Compact | Takes more space |
| Seeing bimodal distribution | Invisible | Clear |
Use a histogram first to understand the shape of a single variable. Switch to a box plot when you need to compare groups, spot outliers, or summarise data compactly.