Numbers That Carry Uncertainty 🎲
Imagine you are about to roll a die. Before it lands, you cannot say with certainty what the outcome will be — but you know it will be some number. Now imagine you record that number, call it X. This X is not a fixed, known quantity. It is a number that takes different values depending on the outcome of a random experiment. That is a Random Variable.
Random variables are the bridge between the abstract language of probability (sample spaces, events) and the practical world of data, measurement, and machine learning. Every dataset column you have ever analysed — customer age, transaction amount, exam score, daily rainfall — is a realisation of a random variable. The entire machinery of statistics is built around describing, comparing, and modelling them.
This tutorial covers the four pillars: what a random variable is, the crucial distinction between discrete and continuous types, and the three functions — PMF, PDF, and CDF — that completely characterise their probabilistic behaviour.
Random variables are written as capital letters (X, Y, Z). Their specific observed values are written as lowercase letters (x, y, z). So "X = 3" means the random variable X took the specific value 3 in one trial. P(X = 3) is the probability that this happens. This distinction matters throughout statistics and machine learning literature.
What Is a Random Variable? 🔢
The Story: The Call Centre Manager
Priya manages a call centre. Each hour, she observes how many customer complaints arrive. Some hours it's 0, some hours 12, occasionally 30. She can't predict the exact number in advance — but she can describe the pattern. She defines X = "number of complaints per hour." X is her random variable.
Meanwhile, her colleague Arjun monitors how long each call lasts. A call might last 2.3 minutes, 7.81 minutes, or 14.002 minutes. Any positive real number is possible — there is no list of discrete options. He defines Y = "call duration in minutes." Y is also a random variable, but of a fundamentally different type.
Discrete vs Continuous — The Core Split ⚖️
The most fundamental distinction in the theory of random variables is whether the variable takes values from a countable set or an uncountable continuum. This determines everything: which formula you use, which distribution family applies, which plots make sense, and which statistical methods are valid.
- Takes countable values (finite or infinite)
- Values can be listed: 0, 1, 2, 3…
- Gaps exist between possible values
- Described by PMF: P(X = x)
- Sum of all PMF values = 1
- CDF is a staircase function
- Examples: die rolls, coin flips, counts
- Takes any value in an interval
- Values cannot be listed — uncountable
- No gaps — infinitely dense
- Described by PDF: f(x)
- Area under PDF curve = 1
- CDF is a smooth, continuous curve
- Examples: height, time, temperature
- Discrete: P(X=x) can be > 0
- Continuous: P(X=x) = 0 always
- For continuous: ask P(a ≤ X ≤ b)
- Probabilities need intervals, not points
- This is why PDFs are densities, not probs
- Areas = probabilities (for continuous)
- Heights = probabilities (for discrete)
| Feature | Discrete | Continuous |
|---|---|---|
| Values | Countable set {0, 1, 2, 3, …} | Uncountable interval [a, b] or ℝ |
| Probability at a point | P(X = x) ≥ 0 possible | P(X = x) = 0 always |
| Probability function | PMF — p(x) = P(X = x) | PDF — f(x), area gives probability |
| Sums to | Σ p(x) = 1 | ∫ f(x)dx = 1 |
| CDF shape | Staircase (step function) | Smooth S-curve |
| Common distributions | Binomial, Poisson, Geometric | Normal, Exponential, Uniform, Beta |
| Real examples | Number of defects, goals scored, clicks | Height, weight, time, temperature |
Probability Mass Function (PMF) 📊
The Story: The Quality Control Inspector
Ravi inspects batches of 3 smartphones. Each phone independently has a 20% chance of having a defect. He defines X = "number of defective phones in a batch of 3." X can be 0, 1, 2, or 3. Before shipping any batch, Ravi wants to know the probability of each possible count. That probability assignment — one value for each possible outcome — is the Probability Mass Function.
On average, 0.6 phones per batch are defective. (Also = n×p = 3×0.2 = 0.6 ✓)
More PMF Examples
| Experiment | X (Random Variable) | Possible Values | PMF Example |
|---|---|---|---|
| Coin flip (fair) | X = 1 if Heads, 0 if Tails | {0, 1} | P(X=0) = P(X=1) = 0.5 |
| Die roll (fair) | X = face value shown | {1, 2, 3, 4, 5, 6} | P(X=k) = 1/6 for each k |
| Goals in a football match | X = total goals scored | {0, 1, 2, 3, …} | Poisson: P(X=k) = e⁻λ λᵏ/k! |
| Emails per hour | X = number of emails received | {0, 1, 2, 3, …} | Poisson with λ = average rate |
| Raffle draw | X = 1 if win, 0 if lose | {0, 1} | P(X=1)=1/1000, P(X=0)=999/1000 |
Probability Density Function (PDF) 🌊
The Story: The Hospital Wait Time
The emergency room of a hospital records wait times. A patient might wait 8.7 minutes, 23.14 minutes, or 5.003 minutes. The wait time Y is a continuous random variable — it can take any non-negative real value. The probability that Y equals exactly 8.7 minutes is zero (there are infinitely many possible times, each with infinitely small probability).
But the probability that Y falls between 10 and 20 minutes is a perfectly meaningful number — it is the area under the probability density curve between those two points. The function that defines this curve is the Probability Density Function.
f(x) is NOT a probability. It is a density — like population density (people per km², not just people). f(x) can be greater than 1. What gives probability is the area under the curve over an interval: P(a ≤ X ≤ b) = ∫ₐᵇ f(x)dx. The total area under the entire curve must equal 1, but the height f(x) at any point can exceed 1.
= (1/8√2π) × exp[−(h−165)²/128]
About 26.6% of adults are taller than 170 cm.
P(−1 ≤ Z ≤ +1) ≈ 0.683 (68.3%) — the famous 68% rule.
μ ± 2σ → 95.4% fall between 149 and 181 cm
μ ± 3σ → 99.7% fall between 141 and 189 cm
Cumulative Distribution Function (CDF) 📈
The Story: The Weather Forecaster
A meteorologist studies daily rainfall. Instead of asking "What's the probability of exactly 15mm of rain?", she asks "What is the probability of getting at most 15mm of rain?" This cumulative question — phrased with "at most" or "less than or equal to" — is precisely what the Cumulative Distribution Function answers for any value.
The CDF is defined for every type of random variable — discrete and continuous. It always starts at 0, always ends at 1, and is always non-decreasing. For discrete variables it looks like stairs. For continuous variables it forms a smooth S-curve. It is arguably the most fundamental function in all of probability theory.
Three Properties of Every CDF
P(1 < X ≤ 3) = F(3) − F(1) = 1.000 − 0.896 = 0.104
PMF vs PDF vs CDF — The Full Comparison
| Feature | PMF p(x) | PDF f(x) | CDF F(x) |
|---|---|---|---|
| Variable type | Discrete only | Continuous only | Both discrete & continuous |
| Value at a point | P(X=x) — a probability | A density (NOT a probability) | P(X≤x) — always a probability |
| Range of values | 0 ≤ p(x) ≤ 1 | f(x) ≥ 0 (can exceed 1!) | 0 ≤ F(x) ≤ 1 always |
| Sums/integrates to | Σ p(x) = 1 | ∫ f(x)dx = 1 | F(+∞) = 1 |
| Shape | Bar chart (spikes at values) | Smooth curve or flat line | Staircase (discrete) or S-curve |
| Probability of interval | Σ p(x) for x in [a,b] | ∫ₐᵇ f(x)dx (area under curve) | F(b) − F(a) always |
| Relationship | CDF = cumulative sum of PMF | CDF = integral of PDF | PDF = derivative of CDF | ||
Common Discrete Distributions & Their PMFs 🎰
| Distribution | PMF p(x) | Parameters | Mean | Use Case |
|---|---|---|---|---|
| Bernoulli | p if x=1; (1−p) if x=0 | p ∈ [0,1] | p | Single trial: click/no click |
| Binomial B(n,p) | C(n,k)pᵏ(1−p)ⁿ⁻ᵏ | n trials, p success prob | np | n trials, count successes |
| Poisson(λ) | e⁻λ λᵏ / k! | λ = average rate | λ | Count events per unit time/area |
| Geometric(p) | (1−p)ᵏ⁻¹ p | p = success probability | 1/p | Trials until first success |
| Uniform(a,b) discrete | 1/(b−a+1) | a = min, b = max | (a+b)/2 | Fair die, lottery |
Common Continuous Distributions & Their PDFs 🌊
| Distribution | PDF f(x) | Range | Mean | Use Case |
|---|---|---|---|---|
| Uniform(a,b) | 1/(b−a) | [a, b] | (a+b)/2 | Equal likelihood over interval |
| Normal N(μ,σ²) | (1/σ√2π)e^(−(x−μ)²/2σ²) | (−∞, +∞) | μ | Heights, errors, test scores |
| Exponential(λ) | λe^(−λx) | [0, +∞) | 1/λ | Time between events, survival |
| Beta(α,β) | x^(α−1)(1−x)^(β−1)/B(α,β) | [0, 1] | α/(α+β) | Probabilities, proportions |
| Gamma(α,β) | x^(α−1)e^(−x/β)/(βᵅΓ(α)) | [0, +∞) | αβ | Waiting times, insurance claims |
| Log-Normal | (1/xσ√2π)e^(−(ln x−μ)²/2σ²) | (0, +∞) | e^(μ+σ²/2) | Stock prices, income, file sizes |
Using the CDF — Practical Calculations 🔧
84.1% of students score 80 or below.
15.9% of students score above 80.
P = F(80) − F(60) = 0.841 − 0.159 = 0.682 (68.2%)
z = 1.282 → x = μ + z×σ = 70 + 1.282×10 = 82.82
90% of students score below 82.82.
Random Variables in Data Science & ML 🤖
| Application | Random Variable | Type | Distribution Used |
|---|---|---|---|
| 🖱️ Ad click prediction | X = 1 (click), 0 (no click) | Discrete | Bernoulli / Binomial |
| 📦 Inventory management | X = daily demand (units) | Discrete | Poisson |
| 📉 Stock price modelling | X = daily log-return | Continuous | Normal / Log-Normal |
| ⏱️ Server response time | X = time to respond (ms) | Continuous | Exponential / Gamma |
| 🎯 Bayesian A/B testing | X = true conversion rate | Continuous | Beta |
| 🔤 NLP: word frequency | X = count of word w in doc | Discrete | Multinomial / Poisson |
| 🧬 Genomics: mutations | X = mutations per genome | Discrete | Poisson |
| 🏠 House price prediction | X = log(house price) | Continuous | Normal (log-transformed) |
Every ML loss function is secretly a statement about random variables and their distributions. Mean Squared Error assumes the target variable is normally distributed around predictions. Cross-entropy loss (for classification) assumes a Bernoulli or categorical distribution. Poisson loss is used for count data. Understanding PMFs and PDFs is the key to understanding why certain loss functions are appropriate for certain problems.
The Golden Rules
Random variables give numbers to uncertainty. The discrete/continuous split determines your mathematical language. The PMF precisely assigns probability to each discrete outcome. The PDF describes the density of probability across a continuous range — with areas as probabilities. The CDF accumulates probability from left to right and answers every "at most" question instantly. Together, these four tools form the complete probabilistic description of any real-world quantity — the foundation of every statistical model, every machine learning algorithm, and every data-driven decision.