Overview — What Are Measures of Central Tendency?
In statistics, a measure of central tendency is a single value that summarises a dataset by identifying where the centre of the data lies. The three most common measures are:
- Sum ÷ Count
- Sensitive to outliers
- Best for symmetric data
- Middle value (sorted)
- Outlier-resistant
- Best for skewed data
- Most frequent value
- Works on categorical data
- Can be multiple values
Choosing the wrong measure can completely mislead your analysis. Reporting average income in a country with extreme inequality gives a false picture — the median is far more honest.
The Mean (Arithmetic Average)
The mean is calculated by adding all values and dividing by the number of values. It takes into account every single value in the dataset — which makes it powerful but also vulnerable to extreme values (outliers).
Formula
N = total population size
n = sample size (subset of data)
Real Example — Employee Salaries
A small company has 10 employees with the following monthly salaries (₹):
| Emp # | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Salary (₹) | 32,000 | 35,000 | 37,000 | 38,000 | 40,000 | 42,000 | 45,000 | 47,000 | 50,000 | 54,000 |
32,000 + 35,000 + 37,000 + 38,000 + 40,000 + 42,000 + 45,000 + 47,000 + 50,000 + 54,000 = ₹4,20,000
x̄ = 4,20,000 / 10 = ₹42,000
The Outlier Problem — When Mean Fails
Now suppose the CEO joins with a salary of ₹5,00,000/month:
New Sum = 4,20,000 + 5,00,000 = ₹9,20,000
New Mean = 9,20,000 / 11 = ₹83,636/month
But 10 out of 11 employees earn between ₹32,000–₹54,000! The mean of ₹83,636 does NOT represent a typical salary. This is exactly why salary surveys always report the median.
Python Code
import numpy as np salaries = [32000, 35000, 37000, 38000, 40000, 42000, 45000, 47000, 50000, 54000] print("Mean (no CEO) :", np.mean(salaries)) # 42000.0 salaries_with_ceo = salaries + [500000] print("Mean (with CEO):", np.mean(salaries_with_ceo)) # 83636.36 print("Median (robust):", np.median(salaries_with_ceo)) # 42000.0
Industry Applications of Mean
| Industry | Application |
|---|---|
| 💰 Finance | Average daily stock return, average monthly expenses |
| 🏭 Manufacturing | Average product weight, average defect rate per batch |
| 🏥 Healthcare | Average patient recovery time, average dosage |
| 🎓 Education | Average test score across a class |
| 🛒 E-commerce | Average order value (AOV), average delivery time |
When to Use the Mean
- Data is normally distributed (symmetric, bell-shaped)
- No significant outliers present
- Variables are continuous (height, weight, temperature, exam scores)
- You need the value in further calculations (standard deviation, regression)
The Median (Middle Value)
The median is the middle value in a sorted dataset. Exactly half the values fall above it and half fall below. Because the median only depends on position — not magnitude — it is completely unaffected by extreme outliers.
Formula
e.g. n = 9 → position 5
e.g. n = 10 → average positions 5 & 6
Real Example — House Prices in a City
Seven properties recently sold in a neighbourhood (₹ Lakhs):
| Property | A | B | C | D | E | F | Luxury Villa |
|---|---|---|---|---|---|---|---|
| Price (₹L) | 45 | 52 | 47 | 55 | 50 | 48 | 420 |
India's GDP per capita (mean income per person) ≈ ₹2.1 Lakhs/year. But the median Indian income is closer to ₹50,000–70,000/year. The massive gap exists because a small number of ultra-wealthy individuals pull the mean far above what most people actually earn.
Python Code
import numpy as np house_prices = [45, 52, 47, 55, 50, 48, 420] print("Mean :", np.mean(house_prices)) # 109.57 — misleading! print("Median:", np.median(house_prices)) # 50.0 — accurate ✓
Industry Applications of Median
| Industry | Application |
|---|---|
| 🏠 Real Estate | Median home price in a city (standard reporting measure) |
| 📊 Economics | Median household income, median net worth |
| 🏥 Healthcare | Median survival time in clinical trials (not average) |
| 💼 HR / Recruitment | Median salary for a job role — fairer than mean for job ads |
| 🚚 Logistics | Median delivery time — unaffected by rare extreme delays |
When to Use the Median
- Data is skewed (not symmetric — income, property prices, wealth)
- Outliers are present and significant
- Ordinal data — values have order but distances are not equal (e.g., survey ratings 1–5)
- Reporting typical experience rather than a mathematical average
The Mode (Most Frequent Value)
The mode is the value that appears most frequently in a dataset. Unlike the mean and median, the mode can be used with categorical (non-numeric) data. A dataset may have one mode, two modes (bimodal), multiple modes, or no mode at all.
Simply: the most occurring value.
Real Example — Retail Shoe Store
A shoe store records sizes purchased by 238 customers to decide which size to order the most:
| Size | 6 | 6.5 | 7 | 7.5 | 8 | 8.5 ★ | 9 | 9.5 | 10 | 10.5 | 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Customers | 5 | 8 | 15 | 22 | 38 | 45 | 40 | 30 | 20 | 10 | 5 |
Mode = Size 8.5 — purchased by 45 customers, the highest frequency.
The mode is the ONLY measure that directly tells the store: order the most 8.5s! No formula for mean or median could give this actionable insight as directly.
Real Example — Favourite Colour Survey (Categorical Data)
A school surveys 200 students: "What is your favourite colour?"
| Blue ★ | Red | Green | Yellow | Purple | Orange |
|---|---|---|---|---|---|
| 72 | 58 | 35 | 18 | 12 | 5 |
You cannot calculate a mean or median for colours — they are categorical, not numeric. Mode = Blue is the only valid measure. This is a common mistake in data science — always check your data type before choosing a measure.
Unimodal, Bimodal & Multimodal Distributions
| Type | Definition | Example |
|---|---|---|
| Unimodal | One peak / one mode | Exam scores clustered around 70 |
| Bimodal | Two peaks / two modes | Customer ages: peaks at 25 and 55 |
| Multimodal | Three+ peaks | Reviews split between very low and very high ratings |
| No Mode | All equally frequent | Rolling a fair die |
Python Code
from scipy import stats import numpy as np shoe_sizes = [6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11] counts = [5, 8, 15, 22, 38, 45, 40, 30, 10, 10, 5] data = np.repeat(shoe_sizes, counts) print("Mode:", stats.mode(data).mode[0]) # 8.5 # Categorical example colours = ["Blue"]*72 + ["Red"]*58 + ["Green"]*35 print("Favourite:", stats.mode(colours).mode[0]) # Blue
Industry Applications of Mode
| Industry | Application |
|---|---|
| 🛍️ Retail | Most popular product size, colour, or category to stock |
| 📣 Marketing | Most common customer age group, most popular channel |
| 🏥 Healthcare | Most common diagnosis, most frequently prescribed drug |
| 🚗 Transportation | Most common trip duration, most popular route |
| 📱 Social Media | Most liked post type, most used hashtag |
When to Use the Mode
- Data is categorical (colours, brands, yes/no, city names)
- You need to identify the most popular item or preference
- Detecting clusters or groups in data (bimodal = two distinct groups)
- Quality control — finding the most common defect type
- Imputing missing values in categorical ML columns
Skewness — How Distribution Shape Changes Everything
The relationship between mean, median, and mode reveals the shape of a distribution. Understanding skewness is essential for choosing the right measure and avoiding misleading analysis.
| Distribution | Shape | Relationship | Best Measure |
|---|---|---|---|
| Symmetric | Bell-shaped | Mean = Median = Mode | Any — all are equal |
| Right-Skewed (+) | Long tail on right | Mode < Median < Mean | Median |
| Left-Skewed (−) | Long tail on left | Mean < Median < Mode | Median |
For moderately skewed distributions:
Mean − Mode ≈ 3 × (Mean − Median)
This formula lets you estimate the mode from mean and median, or verify consistency in your data.
Complete Real Example — Student Exam Scores
A class of 20 students takes a maths exam (out of 100). Here are their scores:
| # | Score | # | Score | # | Score | # | Score | # | Score |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 45 | 5 | 60 | 9 | 65 | 13 | 72 | 17 | 85 |
| 2 | 52 | 6 | 60 | 10 | 65 | 14 | 75 | 18 | 80 |
| 3 | 55 | 7 | 62 | 11 | 65 | 15 | 78 | 19 | 88 |
| 4 | 58 | 8 | 62 | 12 | 68 | 16 | 70 | 20 | 92 |
..., 62, 65, 65, 68, ... → M = (65+65)/2 = 65.0
Interpretation for the Teacher
| Measure | Value | What It Tells the Teacher |
|---|---|---|
| Mean | 68.45 | Mathematical average — useful for comparing this class against other classes or past years. |
| Median | 65.0 | Typical student score — half the class scored above 65, half below. Better for reporting typical performance. |
| Mode | 65 | Most common score — many students clustered here; useful for identifying where teaching focus paid off. |
Decision Guide — Which Measure to Choose?
| Situation | Use Mean | Use Median | Use Mode |
|---|---|---|---|
| Data type | Continuous numeric | Continuous numeric | Categorical or discrete |
| Distribution | Symmetric | Skewed | Any |
| Outliers present? | No outliers | Outliers present | Doesn't matter |
| Goal | Further calculations | Describe typical value | Find most popular value |
| Real example | Average temperature | Household income | Most popular product |
| ML imputation | Normal features | Skewed features | Categorical columns |
Quick Reference Card
| Property | Mean | Median | Mode |
|---|---|---|---|
| Symbol | x̄ or μ | M | Mo |
| Formula | Σxᵢ / n | Middle value (sorted) | Most frequent value |
| Data Type | Numeric only | Numeric only | Any (incl. categorical) |
| Outlier effect? | Yes — strongly | No — robust | No — unaffected |
| Always unique? | Always 1 value | Always 1 value | 0, 1, or many |
| Skewed data | Misleading | Reliable | Depends |
| Used in Std Dev? | Yes | No | No |
| Real Estate | Avoid | Standard use | Rare |
| Income reporting | Avoid | Always preferred | Sector breakdown |
| ML imputation | Normal features | Skewed features | Categorical features |