Mean, Median and Mode Explained — Formulas, Examples & When to Use Eac

Section 01

Overview — What Are Measures of Central Tendency?

In statistics, a measure of central tendency is a single value that summarises a dataset by identifying where the centre of the data lies. The three most common measures are:

Mean

x̄ / μ

Sum ÷ Count
Sensitive to outliers
Best for symmetric data

Median

Middle value (sorted)
Outlier-resistant
Best for skewed data

Mode

Most frequent value
Works on categorical data
Can be multiple values

📌

Why This Matters

Choosing the wrong measure can completely mislead your analysis. Reporting average income in a country with extreme inequality gives a false picture — the median is far more honest.

Section 02

The Mean (Arithmetic Average)

The mean is calculated by adding all values and dividing by the number of values. It takes into account every single value in the dataset — which makes it powerful but also vulnerable to extreme values (outliers).

Formula

Population Mean

μ = (Σ xᵢ) / N

Σ xᵢ = sum of all population values
N = total population size

Sample Mean

x̄ = (Σ xᵢ) / n

Σ xᵢ = sum of sample values
n = sample size (subset of data)

Real Example — Employee Salaries

A small company has 10 employees with the following monthly salaries (₹):

Emp #	1	2	3	4	5	6	7	8	9	10
Salary (₹)	32,000	35,000	37,000	38,000	40,000	42,000	45,000	47,000	50,000	54,000

🧮 Step-by-Step Calculation

Step 1

Sum all salaries:
32,000 + 35,000 + 37,000 + 38,000 + 40,000 + 42,000 + 45,000 + 47,000 + 50,000 + 54,000 = ₹4,20,000

Step 2

Divide by count (n = 10):
x̄ = 4,20,000 / 10 = ₹42,000

Result

The average salary is ₹42,000/month — a fair representation since all salaries are close to each other.

The Outlier Problem — When Mean Fails

Now suppose the CEO joins with a salary of ₹5,00,000/month:

⚠️

Outlier Effect on Mean

New Sum = 4,20,000 + 5,00,000 = ₹9,20,000
New Mean = 9,20,000 / 11 = ₹83,636/month

But 10 out of 11 employees earn between ₹32,000–₹54,000! The mean of ₹83,636 does NOT represent a typical salary. This is exactly why salary surveys always report the median.

Python Code

mean_example.py

import numpy as np

salaries = [32000, 35000, 37000, 38000, 40000,
            42000, 45000, 47000, 50000, 54000]

print("Mean (no CEO)  :", np.mean(salaries))        # 42000.0

salaries_with_ceo = salaries + [500000]
print("Mean (with CEO):", np.mean(salaries_with_ceo))  # 83636.36
print("Median (robust):", np.median(salaries_with_ceo)) # 42000.0

Industry Applications of Mean

Industry	Application
💰 Finance	Average daily stock return, average monthly expenses
🏭 Manufacturing	Average product weight, average defect rate per batch
🏥 Healthcare	Average patient recovery time, average dosage
🎓 Education	Average test score across a class
🛒 E-commerce	Average order value (AOV), average delivery time

When to Use the Mean

Data is normally distributed (symmetric, bell-shaped)
No significant outliers present
Variables are continuous (height, weight, temperature, exam scores)
You need the value in further calculations (standard deviation, regression)

Section 03

The Median (Middle Value)

The median is the middle value in a sorted dataset. Exactly half the values fall above it and half fall below. Because the median only depends on position — not magnitude — it is completely unaffected by extreme outliers.

Formula

Odd Number of Values

M = value at (n+1)/2

Sort the data. The median is the single middle value.
e.g. n = 9 → position 5

Even Number of Values

M = [x(n/2) + x(n/2+1)] / 2

Sort the data. Average the two middle values.
e.g. n = 10 → average positions 5 & 6

Real Example — House Prices in a City

Seven properties recently sold in a neighbourhood (₹ Lakhs):

Property	A	B	C	D	E	F	Luxury Villa
Price (₹L)	45	52	47	55	50	48	420

🧮 Step-by-Step Median Calculation

Step 1

Sort the values: 45, 47, 48, 50, 52, 55, 420

Step 2

n = 7 (odd) → Middle position = (7+1)/2 = 4th value

Median

M = ₹50 Lakhs ✓

Compare

Mean = (45+47+48+50+52+55+420)/7 = ₹109.6 Lakhs — completely misleading! No typical buyer pays ₹109.6L when 6 out of 7 homes cost under ₹56L.

💡

Real-World Insight — Income Inequality

India's GDP per capita (mean income per person) ≈ ₹2.1 Lakhs/year. But the median Indian income is closer to ₹50,000–70,000/year. The massive gap exists because a small number of ultra-wealthy individuals pull the mean far above what most people actually earn.

Python Code

median_example.py

import numpy as np

house_prices = [45, 52, 47, 55, 50, 48, 420]

print("Mean  :", np.mean(house_prices))    # 109.57 — misleading!
print("Median:", np.median(house_prices))  # 50.0  — accurate ✓

Industry Applications of Median

Industry	Application
🏠 Real Estate	Median home price in a city (standard reporting measure)
📊 Economics	Median household income, median net worth
🏥 Healthcare	Median survival time in clinical trials (not average)
💼 HR / Recruitment	Median salary for a job role — fairer than mean for job ads
🚚 Logistics	Median delivery time — unaffected by rare extreme delays

When to Use the Median

Data is skewed (not symmetric — income, property prices, wealth)
Outliers are present and significant
Ordinal data — values have order but distances are not equal (e.g., survey ratings 1–5)
Reporting typical experience rather than a mathematical average

Section 04

The Mode (Most Frequent Value)

The mode is the value that appears most frequently in a dataset. Unlike the mean and median, the mode can be used with categorical (non-numeric) data. A dataset may have one mode, two modes (bimodal), multiple modes, or no mode at all.

Mode Definition

Mo = argmax f(x)

The value x that maximises frequency f(x).
Simply: the most occurring value.

Real Example — Retail Shoe Store

A shoe store records sizes purchased by 238 customers to decide which size to order the most:

Size	6	6.5	7	7.5	8	8.5 ★	9	9.5	10	10.5	11
Customers	5	8	15	22	38	45	40	30	20	10	5

🧮

Finding the Mode

Mode = Size 8.5 — purchased by 45 customers, the highest frequency.

The mode is the ONLY measure that directly tells the store: order the most 8.5s! No formula for mean or median could give this actionable insight as directly.

Real Example — Favourite Colour Survey (Categorical Data)

A school surveys 200 students: "What is your favourite colour?"

Blue ★	Red	Green	Yellow	Purple	Orange
72	58	35	18	12	5

⚠️

Why Only Mode Works Here

You cannot calculate a mean or median for colours — they are categorical, not numeric. Mode = Blue is the only valid measure. This is a common mistake in data science — always check your data type before choosing a measure.

Unimodal, Bimodal & Multimodal Distributions

Type	Definition	Example
Unimodal	One peak / one mode	Exam scores clustered around 70
Bimodal	Two peaks / two modes	Customer ages: peaks at 25 and 55
Multimodal	Three+ peaks	Reviews split between very low and very high ratings
No Mode	All equally frequent	Rolling a fair die

Python Code

mode_example.py

from scipy import stats
import numpy as np

shoe_sizes = [6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11]
counts      = [5, 8,  15, 22, 38, 45, 40, 30,  10,  10,  5]

data = np.repeat(shoe_sizes, counts)
print("Mode:", stats.mode(data).mode[0])   # 8.5

# Categorical example
colours = ["Blue"]*72 + ["Red"]*58 + ["Green"]*35
print("Favourite:", stats.mode(colours).mode[0])  # Blue

Industry Applications of Mode

Industry	Application
🛍️ Retail	Most popular product size, colour, or category to stock
📣 Marketing	Most common customer age group, most popular channel
🏥 Healthcare	Most common diagnosis, most frequently prescribed drug
🚗 Transportation	Most common trip duration, most popular route
📱 Social Media	Most liked post type, most used hashtag

When to Use the Mode

Data is categorical (colours, brands, yes/no, city names)
You need to identify the most popular item or preference
Detecting clusters or groups in data (bimodal = two distinct groups)
Quality control — finding the most common defect type
Imputing missing values in categorical ML columns

Section 05

Skewness — How Distribution Shape Changes Everything

The relationship between mean, median, and mode reveals the shape of a distribution. Understanding skewness is essential for choosing the right measure and avoiding misleading analysis.

Symmetric

Bell-shaped

Mean = Median = Mode

Right Skewed (+)

Long tail on right

Mode < Median < Mean

Left Skewed (−)

Long tail on left

Mean < Median < Mode

Distribution	Shape	Relationship	Best Measure
Symmetric	Bell-shaped	Mean = Median = Mode	Any — all are equal
Right-Skewed (+)	Long tail on right	Mode < Median < Mean	Median
Left-Skewed (−)	Long tail on left	Mean < Median < Mode	Median

📐

Pearson's Approximation

For moderately skewed distributions:
Mean − Mode ≈ 3 × (Mean − Median)

This formula lets you estimate the mode from mean and median, or verify consistency in your data.

Section 06

Complete Real Example — Student Exam Scores

A class of 20 students takes a maths exam (out of 100). Here are their scores:

#	Score	#	Score	#	Score	#	Score	#	Score
1	45	5	60	9	65	13	72	17	85
2	52	6	60	10	65	14	75	18	80
3	55	7	62	11	65	15	78	19	88
4	58	8	62	12	68	16	70	20	92

🧮 Calculating All Three Measures

Mean

Sum = 1,369 → x̄ = 1,369 / 20 = 68.45

Median

n = 20 (even) → Average 10th and 11th values in sorted list
..., 62, 65, 65, 68, ... → M = (65+65)/2 = 65.0

Mode

65 appears 3 times — more than any other value → Mo = 65

Interpretation for the Teacher

Measure	Value	What It Tells the Teacher
Mean	68.45	Mathematical average — useful for comparing this class against other classes or past years.
Median	65.0	Typical student score — half the class scored above 65, half below. Better for reporting typical performance.
Mode	65	Most common score — many students clustered here; useful for identifying where teaching focus paid off.

Section 07

Decision Guide — Which Measure to Choose?

Situation	Use Mean	Use Median	Use Mode
Data type	Continuous numeric	Continuous numeric	Categorical or discrete
Distribution	Symmetric	Skewed	Any
Outliers present?	No outliers	Outliers present	Doesn't matter
Goal	Further calculations	Describe typical value	Find most popular value
Real example	Average temperature	Household income	Most popular product
ML imputation	Normal features	Skewed features	Categorical columns

🎯 Golden Rules for Data Scientists

Always visualize your data before choosing a measure — check for skewness and outliers using histograms and box plots.

Report both mean and median when in doubt — the gap between them immediately reveals skewness.

Never use mean for categorical data — it is mathematically meaningless.

For ML missing value imputation: use mean for normal features, median for skewed features, mode for categorical columns.

A large gap between mean and median is always a red flag — investigate for outliers or skewness before proceeding.

Section 08

Quick Reference Card

Property	Mean	Median	Mode
Symbol	x̄ or μ	M	Mo
Formula	Σxᵢ / n	Middle value (sorted)	Most frequent value
Data Type	Numeric only	Numeric only	Any (incl. categorical)
Outlier effect?	Yes — strongly	No — robust	No — unaffected
Always unique?	Always 1 value	Always 1 value	0, 1, or many
Skewed data	Misleading	Reliable	Depends
Used in Std Dev?	Yes	No	No
Real Estate	Avoid	Standard use	Rare
Income reporting	Avoid	Always preferred	Sector breakdown
ML imputation	Normal features	Skewed features	Categorical features