Foundations of Data Science 📂 Descriptive Statistics · 7 of 11 19 min read

Understanding Box Plots

A box plot is a visual summary of quartiles, IQR and outliers in a single chart. Learn how to read, build and interpret box plots with real stories and Python code using Matplotlib and Seaborn.

IQR outlier min Q1 median (Q2) Q3 max outlier 60 100 200 310 420 530
Click any button above to highlight a part of the box plot and read what it means.
Box
The rectangle spans from Q1 to Q3 — it contains the middle 50% of all data. Its width is the IQR. A narrow box means values cluster tightly. A wide box means high spread in the typical range.
Median line
The vertical line inside the box marks Q2 — the median. Where it sits inside the box tells you about skewness. Centred = symmetric distribution. Shifted toward Q1 = right-skewed. Shifted toward Q3 = left-skewed.
Whiskers
The lines extending from the box reach to the smallest and largest non-outlier values — typically within 1.5 × IQR of Q1 and Q3. They show how far the bulk of data stretches beyond the middle 50%.
Outliers
Points beyond the whisker fences — farther than 1.5 × IQR from Q1 or Q3 — are plotted individually as circles. They are not part of the whisker range and may warrant investigation.
IQR
The Interquartile Range is Q3 − Q1. It is the core spread metric: robust to outliers and anchored entirely in the middle half of the data.
Section 01

The Problem Box Plots Solve

You have a dataset of 100 house prices. You could print all 100 numbers — but that tells you nothing at a glance. You could calculate the mean — but one mansion skews it. You could calculate Q1, Q2, Q3 manually — but that is five separate numbers to hold in your head.

A box plot (also called a box-and-whisker plot) solves all of this in one diagram. It shows you the median, the spread of the middle 50%, the full range of normal values, and every outlier — all at once, instantly readable.

💡
Invented by John Tukey

The box plot was invented by statistician John Tukey in 1970 — the same person who defined the 1.5 × IQR outlier rule. He designed it specifically to summarise large datasets in a single visual without losing information about spread, skewness, or extreme values. It remains one of the most widely used charts in data science today.


Section 02

Anatomy of a Box Plot

Every box plot has five components. Learn these and you can read any box plot instantly.

📦 The Five Parts of a Box Plot
Box
The rectangle spans from Q1 to Q3 — it contains the middle 50% of all data. Its width is the IQR. A narrow box means values cluster tightly. A wide box means high spread in the typical range.
Median line
The vertical line inside the box marks Q2 — the median. Where it sits inside the box tells you about skewness. Centred = symmetric data. Left of centre = right-skewed. Right of centre = left-skewed.
Whiskers
Lines extending from each side of the box. They reach to the last data point still within the fence (Q1 − 1.5×IQR on the left, Q3 + 1.5×IQR on the right). They do not always reach the fence — only if a data point exists there.
Fences
The invisible boundaries at Q1 − 1.5×IQR and Q3 + 1.5×IQR. Data points beyond the fences are outliers. The whiskers stop at the fence — they never extend past it.
Outlier dots
Individual points plotted beyond the whisker ends. Each dot is a single data value that fell outside the 1.5×IQR fence. Multiple dots = multiple outliers.

Section 03

How to Read a Box Plot — Step by Step

What you see What it means Action
Narrow boxMiddle 50% is tightly clustered — low IQRData is consistent and predictable
Wide boxHigh spread in the typical range — large IQRHigh variability in typical values
Median left of centreMore values cluster at the low end — right skewUse median not mean to summarise
Median right of centreMore values cluster at the high end — left skewUse median not mean to summarise
Long right whiskerLarge normal spread on the high sideCheck for skewness before modelling
Dots beyond whiskerOutliers — values beyond 1.5×IQR fenceInvestigate — error, anomaly, or genuine extreme
Many outlier dotsHeavy-tailed distributionConsider robust statistics or transformation
🎯
The 5-Second Box Plot Rule

Glance at the median line position to check skewness. Glance at the box width to judge spread. Glance for dots outside whiskers to spot outliers. That is all you need in 5 seconds to understand a dataset's shape.


Section 04

Drawing Box Plots in Python

Basic box plot with Matplotlib

import matplotlib.pyplot as plt

salaries = [32, 33, 34, 48, 49, 51,
            72, 75, 78, 95, 120, 380]

fig, ax = plt.subplots(figsize=(10, 4))

bp = ax.boxplot(salaries, vert=False, patch_artist=True, widths=0.5)

bp['boxes'][0].set_facecolor('#378add');  bp['boxes'][0].set_alpha(0.3)
bp['medians'][0].set_color('#1d9e75');    bp['medians'][0].set_linewidth(2.5)
bp['fliers'][0].set(marker='o', color='#e24b4a',
                    markerfacecolor='none', markersize=8)

ax.set_title('Salary Distribution — Box Plot', fontsize=13)
ax.set_xlabel('Salary (£k)')
ax.set_yticks([])
plt.tight_layout()
plt.show()

Comparing multiple groups side by side

import matplotlib.pyplot as plt

departments = {
    'Engineering': [65, 72, 78, 85, 90, 95, 68, 74, 82, 88, 150],
    'Marketing':   [45, 48, 52, 55, 58, 61, 65, 47, 53, 60, 110],
    'Sales':       [35, 40, 48, 52, 55, 60, 70, 42, 50, 58, 200],
    'HR':          [38, 42, 44, 46, 48, 50, 52, 43, 47, 49,  90],
}

fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#378add', '#1d9e75', '#ba7517', '#7f77dd']

bp = ax.boxplot(list(departments.values()),
                labels=list(departments.keys()),
                patch_artist=True, widths=0.5)

for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color); patch.set_alpha(0.3)
for median in bp['medians']:
    median.set_color('#1a1d2e'); median.set_linewidth(2)
for flier in bp['fliers']:
    flier.set(marker='o', markerfacecolor='none',
              markeredgecolor='#e24b4a', markersize=8)

ax.set_title('Salary Distribution by Department', fontsize=13)
ax.set_ylabel('Salary (£k)')
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Box plot with Seaborn

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

data = {
    'salary': [65,72,78,85,90,95,68,74,82,88,150,
               45,48,52,55,58,61,65,47,53,60,110,
               35,40,48,52,55,60,70,42,50,58,200],
    'dept':   ['Engineering']*11 + ['Marketing']*11 + ['Sales']*11
}
df = pd.DataFrame(data)

fig, ax = plt.subplots(figsize=(10, 5))
sns.boxplot(data=df, x='dept', y='salary',
            palette=['#378add','#1d9e75','#ba7517'],
            width=0.5, linewidth=1.5,
            flierprops=dict(marker='o', markerfacecolor='none',
                            markeredgecolor='#e24b4a', markersize=8), ax=ax)
ax.set_title('Salary Distribution by Department', fontsize=13)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Section 05

Reading Box Plots — Three Real Stories

Story 1 — The Reliable Machine

import matplotlib.pyplot as plt

machine_a = [19.8, 20.0, 20.1, 19.9, 20.2, 20.0, 19.8, 20.1, 20.0, 19.9]
machine_b = [14.0, 26.0, 18.0, 22.0, 20.0, 28.0, 12.0, 24.0, 16.0, 20.0]

fig, ax = plt.subplots(figsize=(8, 4))
bp = ax.boxplot([machine_a, machine_b],
                labels=['Machine A', 'Machine B'],
                patch_artist=True, vert=True)

bp['boxes'][0].set_facecolor('#1d9e75'); bp['boxes'][0].set_alpha(0.3)
bp['boxes'][1].set_facecolor('#e24b4a'); bp['boxes'][1].set_alpha(0.3)

ax.axhline(20, color='#7f77dd', linestyle='--', alpha=0.5, label='Target: 20g')
ax.set_title('Biscuit Weight — Machine A vs Machine B')
ax.set_ylabel('Weight (g)')
ax.legend()
plt.tight_layout()
plt.show()
What the box plot shows instantly

Machine A — a tiny box hugging the 20g target line. Whiskers barely visible. No outliers. Machine B — a large box, long whiskers, dots scattered far above and below. Same mean, completely different story. A box plot communicates this in one glance without a single number.

Story 2 — Exam Score Distribution

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
class_a = np.clip(np.random.normal(68, 5, 40), 40, 100)
class_b = np.clip(np.random.normal(65, 18, 40), 20, 100)

fig, ax = plt.subplots(figsize=(8, 4))
bp = ax.boxplot([class_a, class_b], labels=['Class A', 'Class B'],
                patch_artist=True,
                flierprops=dict(marker='o', markerfacecolor='none',
                                markeredgecolor='#e24b4a', markersize=7))
bp['boxes'][0].set_facecolor('#378add'); bp['boxes'][0].set_alpha(0.3)
bp['boxes'][1].set_facecolor('#ba7517'); bp['boxes'][1].set_alpha(0.3)
ax.set_title('Exam Score Distribution — Two Classes')
ax.set_ylabel('Score'); ax.set_ylim(0, 110); ax.grid(axis='y', alpha=0.3)
plt.tight_layout(); plt.show()

Story 3 — Spotting Sensor Faults

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1)
normal = np.random.normal(21, 0.5, 50)
faulty = np.append(normal, [35.2, 38.7, 2.1])

fig, ax = plt.subplots(figsize=(8, 4))
bp = ax.boxplot(faulty, vert=False, patch_artist=True,
                flierprops=dict(marker='D', markerfacecolor='#e24b4a',
                                markeredgecolor='#e24b4a', markersize=9))
bp['boxes'][0].set_facecolor('#378add'); bp['boxes'][0].set_alpha(0.25)
bp['medians'][0].set_color('#1d9e75');   bp['medians'][0].set_linewidth(2)
ax.set_title('Room Temperature — Sensor Readings')
ax.set_xlabel('Temperature (°C)'); ax.set_yticks([])
for x in [35.2, 38.7, 2.1]:
    ax.annotate('Faulty reading', xy=(x, 1), xytext=(x, 1.3),
                fontsize=9, color='#e24b4a', ha='center',
                arrowprops=dict(arrowstyle='->', color='#e24b4a'))
plt.tight_layout(); plt.show()

Section 06

Box Plot vs Histogram — Which to Use?

SituationBox plotHistogram
Comparing multiple groupsBetterCluttered
Spotting outliersBetterNot shown
Understanding distribution shapeLimitedBetter
Reading exact quartile valuesBuilt inManual
Summarising in reportsCompactTakes more space
Seeing bimodal distributionInvisibleClear
📐
Best Practice

Use a histogram first to understand the shape of a single variable. Switch to a box plot when you need to compare groups, spot outliers, or summarise data compactly.


Section 07

Golden Rules

🎯 Box Plots — Key Rules
1
The whiskers stop at the last data point within the fence — not at the fence itself. A short whisker means no data exists near the boundary.
2
Skewness is visible in the median line position. Left of centre = right-skewed (mean higher than median). Right of centre = left-skewed.
3
Box plots are best for comparing groups. Side-by-side box plots instantly reveal which group has a higher median, more spread, or more outliers.
4
Always use patch_artist=True and set flierprops explicitly in Matplotlib — unfilled boxes and invisible outlier dots are the two most common mistakes.
5
Box plots hide bimodal distributions. Data with two peaks looks normal in a box plot. Always check with a histogram first.