Big-O Notation Explained: Time Complexity & Best, Worst, Avg

Section 01

Three Runners, Three Roads

🏃 Real World Analogy

The Race That Explains Everything

Picture three runners who must travel from city A to city B — same destination, same distance, but completely different roads.

Runner 1 gets the clear highway. No traffic, no red lights, straight tarmac. She arrives in record time — every single time, no matter what. This is your best case: conditions conspire perfectly in your algorithm's favour.

Runner 2 gets normal city streets. Some lights, occasional slow drivers, a roundabout or two. She finishes in a reasonable time — not great, not terrible. This is your average case: what you'd expect on a typical day with typical data.

Runner 3 gets rush-hour gridlock. Every junction is jammed, every light is red, a lorry has shed its load on the main road. He crawls in at the very end — exhausted and hours late. This is your worst case: the data that maximally punishes your algorithm.

When computer scientists talk about algorithm efficiency, they're analysing these three runners. Big-O describes Runner 3's ceiling. Big-Ω describes Runner 1's floor. Big-Θ describes what happens on average, every typical run. Knowing all three tells you everything about whether an algorithm will hold up — and where it will break.

Every time you write a loop, call a sort function, or query a database, you're implicitly choosing a road for your data to travel. The notations below give you a precise, mathematical language to describe that journey — and to compare it with any alternative.

💡

Why This Matters in Practice

An algorithm that runs in 10 ms on 100 items might take 10 seconds on 10,000 items if it's O(n²) — or still just 13 ms if it's O(n log n). Understanding asymptotic notation is the difference between code that scales and code that silently destroys your servers.

Section 02

The Three Notations

Each notation answers a different question about your algorithm's performance. Used together they give you a complete picture of how it behaves across all possible inputs.

▲

Big-O — Worst-Case Ceiling

O(n) — upper bound

Big-O gives the maximum time or space an algorithm will ever consume, relative to input size n. It answers: "In the worst possible scenario, how does performance grow?" This is the notation you'll use 90% of the time — interviewers, system designers, and library authors all default to worst-case guarantees.

Use it when: designing systems that must meet SLA guarantees, or choosing between algorithms in a production context.

▼

Big-Ω — Best-Case Floor

Ω(1) — lower bound

Big-Omega gives the minimum time an algorithm will ever take — the absolute best scenario. It answers: "Even under perfect conditions, how fast can this possibly be?" A sort algorithm has Ω(n log n) because you must at least read every element. This is mathematically useful for proving limits.

Use it when: proving that no algorithm in a problem class can do better than a certain bound (theoretical lower bounds).

↔

Big-Θ — Tight / Average Bound

Θ(n log n) — tight bound

Big-Theta is the exact growth rate when best and worst case are the same order of magnitude. It says: "This algorithm always behaves like f(n), neither faster nor slower." Merge sort is Θ(n log n) regardless of input — it always divides and merges. This is the most precise of the three notations.

Use it when: benchmarking real-world performance or analysing algorithms with consistent behaviour across all inputs.

📖

Quick Memory Aid

O = "Oh no, the worst" · Ω = "At best, omega fast" · Θ = "THeta = THe exact truth". When you can only remember one, it's almost always Big-O that people want.

Section 03

Asymptotic Notation — The Formal Theory

📚 From the Textbook

Why We Squint at the Horizon

The three runners gave us intuition. Cormen (CLRS, Chapter 3) gives us the mathematics. The word asymptotic means “as n approaches infinity” — we deliberately stop caring about small inputs and squint at the far horizon, where only the order of growth survives.

Why throw away detail? Because on small inputs everything is fast — a clumsy O(n²) sort beats a clever O(n log n) sort when n is 5, thanks to smaller constants. The interesting question is what happens when n grows large, and there the constant factors and lower-order terms become irrelevant noise. Asymptotic notation is the precise language for describing that horizon — and every definition below is built from the same three ingredients: two functions, constant multipliers, and a threshold n₀ beyond which the relationship is guaranteed to hold.

📐

The One Idea Behind All Five Notations

Every asymptotic bound says the same kind of thing: “for inputs at least as big as some threshold n₀, function f is hemmed in by a constant multiple of function g.” What changes between O, Ω, Θ, o and ω is only which side g hems f in from, and whether the bound is tight or strict.

Θ-Notation — Asymptotically Tight Bound

Θ is the central notation in CLRS. It is a two-sided bound: f(n) is squeezed between two constant multiples of g(n) once n is large enough. Formally, Θ(g(n)) is a set of functions:

Formal Definition (CLRS)

Θ(g(n)) = { f(n) : ∃ c₁, c₂, n₀ > 0 such that
0 ≤ c₁·g(n) ≤ f(n) ≤ c₂·g(n) for all n ≥ n₀ }

Read aloud: “there exist positive constants c₁, c₂ and a threshold n₀ so that, for every input at least n₀, f(n) sits in the band between c₁·g(n) and c₂·g(n).” We write f(n) = Θ(g(n)), and say g(n) is an asymptotically tight bound for f(n).

🧮 Animated — The Θ “Sandwich” (f hemmed in on both sides)

For every n to the right of n₀, the blue f(n) stays trapped between the green lower bound and the red upper bound — that is exactly what f(n) = Θ(g(n)) means.

O-Notation & Ω-Notation — The One-Sided Bounds

Drop one side of the sandwich and you get the one-sided bounds. O keeps only the ceiling; Ω keeps only the floor.

O — Asymptotic Upper Bound

O(g(n)) = { f(n) : ∃ c, n₀ > 0
with 0 ≤ f(n) ≤ c·g(n), ∀ n ≥ n₀ }

f grows no faster than g. The ceiling only — f could be much smaller.

Ω — Asymptotic Lower Bound

Ω(g(n)) = { f(n) : ∃ c, n₀ > 0
with 0 ≤ c·g(n) ≤ f(n), ∀ n ≥ n₀ }

f grows no slower than g. The floor only — f could be much larger.

📍 Animated — One-Sided Bounds: O (ceiling) vs Ω (floor)

Same threshold idea, half the sandwich: O traps f from above, Ω traps f from below. Beyond n₀ the guarantee holds; before it, anything goes.

🔗

Theorem 3.1 — The Link Between All Three

For any two functions f(n) and g(n), we have f(n) = Θ(g(n)) if and only if f(n) = O(g(n)) and f(n) = Ω(g(n)). In words: a tight bound is exactly an upper bound and a lower bound holding simultaneously. This is why proving Θ usually means proving O and Ω separately, then stitching them together.

A Worked Proof — ½n² − 3n = Θ(n²)

This is the canonical CLRS example. To prove a Θ bound we must produce actual constants c₁, c₂ and a threshold n₀ that satisfy the definition. We need: c₁n² ≤ ½n² − 3n ≤ c₂n² for all n ≥ n₀.

🧾 Deriving the Constants Step by Step

Step 1

Divide every term by n² (valid since n > 0): c₁ ≤ ½ − 3/n ≤ c₂.

Step 2

Right inequality: ½ − 3/n is always ≤ ½, so choosing c₂ = ½ works for every n ≥ 1.

Step 3

Left inequality: we need ½ − 3/n to be a positive constant. At n = 7, ½ − 3/7 = 1/14, and it only grows for larger n.

Step 4

So pick c₁ = 1/14 and threshold n₀ = 7. Both inequalities now hold for all n ≥ 7.

Done

With c₁ = 1/14, c₂ = ½, n₀ = 7 the definition is satisfied → ½n² − 3n = Θ(n²). ✓

💡

The Shortcut Everyone Uses

In practice you rarely grind through constants. The lower-order −3n term and the ½ coefficient both vanish asymptotically, leaving n². The same reasoning shows any polynomial a_dn^d + … + a₁n + a₀ with a_d > 0 is Θ(n^d) — keep the highest-degree term, drop its coefficient.

little-o and little-ω — The Strict (Non-Tight) Bounds

O and Ω allow the bound to be tight (f can equal a constant multiple of g). The lowercase cousins forbid that: they describe bounds that are strictly looser, the way < differs from ≤.

∘

little-o — Strict Upper Bound

f(n) = o(g(n))

For every constant c > 0 there is an n₀ such that 0 ≤ f(n) < c·g(n) for all n ≥ n₀. f becomes negligible relative to g. Equivalently lim_n→∞ f(n)/g(n) = 0. Example: 2n = o(n²).

little-ω — Strict Lower Bound

f(n) = ω(g(n))

For every constant c > 0 there is an n₀ such that 0 ≤ c·g(n) < f(n) for all n ≥ n₀. f dominates g without bound. Equivalently lim_n→∞ f(n)/g(n) = ∞. Example: n²/2 = ω(n).

⚖️

The Key Distinction

≤ vs <

O and Ω are like ≤ and ≥ — the bound may be reached. o and ω are like < and > — the bound is never reached, the gap keeps widening. So n² = O(n²) is true, but n² = o(n²) is false.

The Real-Number Analogy

Cormen draws a memorable parallel: comparing the growth rates of two functions is like comparing two real numbers a and b. This single table makes all five notations click.

Asymptotic Relation	Meaning	Number Analogy	Tight?
`f(n) = O(g(n))`	f grows no faster than g	a ≤ b	may be tight
`f(n) = Ω(g(n))`	f grows no slower than g	a ≥ b	may be tight
`f(n) = Θ(g(n))`	f grows at the same rate as g	a = b	always tight
`f(n) = o(g(n))`	f grows strictly slower than g	a < b	never tight
`f(n) = ω(g(n))`	f grows strictly faster than g	a > b	never tight

Formal Properties of Asymptotic Notation

These properties (CLRS §3.1) let you manipulate bounds algebraically. Note which ones each notation obeys — the gaps are as instructive as the entries.

Property	Statement	Holds For
Transitivity	f = O(g) and g = O(h) ⇒ f = O(h)	O, Ω, Θ, o, ω
Reflexivity	f(n) = O(f(n))	O, Ω, Θ only
Symmetry	f = Θ(g) ⇔ g = Θ(f)	Θ only
Transpose symmetry	f = O(g) ⇔ g = Ω(f); f = o(g) ⇔ g = ω(f)	O↔Ω, o↔ω

⚠️

Two Traps Cormen Warns About

1. The “=” sign is one-directional. Writing f(n) = O(g(n)) really means f(n) ∈ O(g(n)) — f belongs to the set of functions bounded by g. You can never flip it: O(g(n)) = f(n) is meaningless.

2. Not every pair is comparable. Unlike real numbers, asymptotic notation has no trichotomy. Two functions can fail to satisfy any of O, Ω, Θ with respect to each other — for example n versus n^{1 + sin n}, whose exponent oscillates forever. Real numbers always satisfy exactly one of <, =, >; growth rates do not.

Section 04

Growth Curves — Animated Diagram

Watch how each complexity class explodes (or barely grows) as input size n increases from 1 to 20. The vertical axis represents relative time units. Even at small n, the difference between O(n²) and O(n log n) is dramatic.

📈 Complexity Growth — n = 1 to 20

Speed

▬ O(1) ▬ O(log n) ▬ O(n) ▬ O(n log n) ▬ O(n²)

Values are scaled logarithmically to keep O(n²) visible alongside O(1). Real growth rates are far more extreme.

Section 05

Common Data Structure Operations

The table below shows how the three complexity cases map to real operations across the most common data structures. Use it as a quick reference when choosing a structure for a specific access pattern.

Data Structure	Operation	Best Case	Average Case	Worst Case	Space
Array	Access by index	O(1)	O(1)	O(1)	O(n)
	Search (unsorted)	O(1)	O(n)	O(n)	—
	Insert / Delete (end)	O(1)	O(1)	O(1)	—
	Insert / Delete (middle)	O(n)	O(n)	O(n)	—
Linked List	Access by index	O(1)	O(n)	O(n)	O(n)
	Insert / Delete (head)	O(1)	O(1)	O(1)	—
	Search	O(1)	O(n)	O(n)	—
Stack	Push / Pop / Peek	O(1)	O(1)	O(1)	O(n)
	Search	O(1)	O(n)	O(n)	—
Queue	Enqueue / Dequeue	O(1)	O(1)	O(1)	O(n)
	Search	O(1)	O(n)	O(n)	—
Hash Table	Insert / Search / Delete	O(1)	O(1)	O(n)	O(n)
Binary Search Tree	Insert / Search / Delete	O(1)	O(log n)	O(n)	O(n)

⚠️

BST Worst Case = Sorted Input

An unbalanced BST degrades to O(n) when data is inserted in sorted order — it becomes a linked list. Always use a self-balancing variant (AVL tree, Red-Black tree) when input order is unpredictable. Balanced BSTs guarantee O(log n) for all three operations.

Section 06

Real-World Impact: O(n²) vs O(n log n)

Abstract notation becomes concrete when you count actual steps. Sorting 1,000 items with bubble sort versus merge sort reveals a gap that grows catastrophically with scale.

🔴 Naïve Approach — Bubble Sort — O(n²)

Metric	Value
Input size (n)	1,000 items
Algorithm	Bubble Sort
Complexity	O(n²)
Comparisons	~499,500
Swaps (worst)	~499,500
Total operations	~999,000
At n = 10,000	~100,000,000
At n = 100,000	~10,000,000,000
Practical verdict	Unusable at scale

🟢 Optimised — Merge Sort — O(n log n)

Metric	Value
Input size (n)	1,000 items
Algorithm	Merge Sort
Complexity	O(n log n)
Comparisons	~9,965
Merge passes	~10
Total operations	~9,965
At n = 10,000	~132,877
At n = 100,000	~1,660,964
Practical verdict	Scales to millions

📊 The Numbers at a Glance

n=1,000

Bubble Sort: ~999,000 ops · Merge Sort: ~9,965 ops · Merge Sort is ~100× faster

n=10,000

Bubble Sort: ~100M ops · Merge Sort: ~133K ops · Merge Sort is ~750× faster

n=100,000

Bubble Sort: ~10B ops · Merge Sort: ~1.66M ops · Merge Sort is ~6,000× faster

Section 07

Python Code — Annotated Examples

Linear Search — O(n) worst, O(1) best

def linear_search(arr, target):
    # Best case: target is first element → O(1)
    # Average case: target is in the middle → O(n/2) = O(n)
    # Worst case: target is last or absent → O(n)
    for i, val in enumerate(arr):
        if val == target:
            return i      # Found — stops early in best/average case
    return -1            # Not found — always scans every element

nums = [4, 7, 2, 9, 1, 5, 8, 3, 6]
print(linear_search(nums, 4))   # Best case: found at index 0 → 1 step
print(linear_search(nums, 6))   # Worst case: found at index 8 → 9 steps
print(linear_search(nums, 99))  # Miss: scans all 9 elements → 9 steps

OUTPUT

0 8 -1

Binary Search — O(log n) worst, O(1) best

def binary_search(arr, target):
    # REQUIRES: arr must be sorted
    # Best case: target is the middle element → O(1)
    # Worst case: target is absent or at the edges → O(log n)
    left, right = 0, len(arr) - 1

    while left <= right:
        mid = (left + right) // 2   # Integer division — halves the search space
        if arr[mid] == target:
            return mid              # Found
        elif arr[mid] < target:
            left = mid + 1          # Discard left half
        else:
            right = mid - 1         # Discard right half

    return -1                       # Target not found

sorted_nums = [1, 3, 5, 7, 9, 11, 13, 15, 17]
# n = 9 → log₂(9) ≈ 3.17 → at most 4 iterations
print(binary_search(sorted_nums, 9))   # Best: hits middle (index 4) instantly
print(binary_search(sorted_nums, 1))   # Worst: 4 iterations to reach left edge
print(binary_search(sorted_nums, 99))  # Miss: 4 iterations, exhausts range

OUTPUT

4 0 -1

Bubble Sort — O(n²)

def bubble_sort(arr):
    # O(n²) — two nested loops, each up to n iterations
    # Best case O(n) with early exit when already sorted
    n = len(arr)
    for i in range(n):
        swapped = False
        for j in range(0, n - i - 1):   # Each pass bubbles largest to end
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swapped = True
        if not swapped:
            break                         # Early exit: already sorted
    return arr

data = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(data))  # ~21 comparisons for n=7

OUTPUT

[11, 12, 22, 25, 34, 64, 90]

Merge Sort — O(n log n)

def merge_sort(arr):
    # O(n log n) — divides in half log n times, merges in O(n) each time
    # Θ(n log n): same complexity for ALL inputs — no lucky case
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left  = merge_sort(arr[:mid])   # Recurse left half
    right = merge_sort(arr[mid:])   # Recurse right half

    return _merge(left, right)

def _merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i]); i += 1
        else:
            result.append(right[j]); j += 1
    return result + left[i:] + right[j:]

data = [64, 34, 25, 12, 22, 11, 90]
print(merge_sort(data))  # ~17 comparisons for n=7 vs bubble's ~21

OUTPUT

[11, 12, 22, 25, 34, 64, 90]

Section 08

The Complexity Ladder

These classes form a hierarchy from constant time bliss to exponential despair. Each rung of the ladder tells you what a million-item input actually means in practice.

O(1) — Constant Time

Real DS example: Hash table lookup by key, array access by index, stack push/pop.
At n = 1,000,000: Still exactly 1 operation. Doesn't matter if you have 10 or 10 billion items — the time is identical. This is the holy grail.

O(log n) — Logarithmic

Real DS example: Binary search, balanced BST lookup, heap insert/extract.
At n = 1,000,000: About 20 operations (log₂ 1,000,000 ≈ 20). Feels instant — even if you double your data, you only add one more step.

O(n) — Linear

Real DS example: Linear search, linked list traversal, summing an array, copying a list.
At n = 1,000,000: One million operations. Perfectly acceptable for a single scan — your bottleneck is usually I/O, not the algorithm.

O(n log n) — Linearithmic

Real DS example: Merge sort, heapsort, Timsort (Python's default sort), FFT.
At n = 1,000,000: About 20 million operations. The sweet spot for sorting — fast enough for most real systems, and provably optimal for comparison-based sorts.

O(n²) — Quadratic

Real DS example: Bubble sort, insertion sort, naive string matching, comparing all pairs.
At n = 1,000,000: One trillion operations. At modern CPU speeds (~10⁹ ops/sec), that's ~17 minutes. Completely infeasible for large inputs. Acceptable only for n < 10,000.

O(2ⁿ) — Exponential

Real DS example: Naive recursive Fibonacci, brute-force subset enumeration, travelling salesman (brute force).
At n = 1,000,000: 2^1,000,000 operations — a number with 300,000 digits. Not even theoretically computable. Avoid without memoisation or dynamic programming.

Section 09

When Each Notation Matters

Knowing the notations is half the job. Knowing when to reach for each is what separates a practitioner from someone who's just memorised definitions.

🔌

Choosing an Algorithm

Use Big-O. When selecting between two algorithms for production, you care about the worst your system will ever face. Big-O guarantees that upper bound regardless of input shape.

→ Big-O / Worst Case

📋

Proving a Lower Bound

Use Big-Ω. When you want to prove that no algorithm in a problem class can possibly do better (e.g. Ω(n log n) for comparison sorts), Big-Omega is your tool.

→ Big-Ω / Best Case

📈

Benchmarking Real Performance

Use Big-Θ. When best and worst case match (like merge sort), Theta gives you the most honest characterisation for performance prediction, profiling, and capacity planning.

→ Big-Θ / Tight Bound

🎯

Interview Answers

Always state worst case first. Interviewers assume Big-O unless otherwise specified. Lead with O(n²), then optionally add "with early exit, Ω(n) in the best case." Never flip the order.

→ Big-O by default

💾

Space Complexity

Often ignored — just as important. Merge sort is O(n log n) time but O(n) space (auxiliary array). Recursive algorithms add O(n) or O(log n) stack frames. Always analyse both dimensions separately.

→ Big-O applies to space too

⚙️

Amortised Analysis

Use when single ops are occasionally expensive but the average over many ops is cheap. Dynamic array append is O(1) amortised — most appends are instant, but periodic resizing is O(n). Spread the cost across all operations.

→ Amortised Big-O

Section 10

The Golden Rules

These seven rules cover the most common mistakes in complexity analysis. Internalise them and your Big-O derivations will rarely be wrong.

★ Big-O, Big-Ω, Big-Θ — Non-Negotiable Rules

Always state worst case unless told otherwise. When someone asks "what's the complexity?", they mean Big-O worst case. If you mean best case, explicitly say "Ω(1) in the best case when the element is first."

Drop constants — O(2n) = O(n). Big-O describes growth rate as n → ∞. Multiplicative constants don't change the shape of the curve — O(2n) and O(n) are both linear. Same for O(n/2), O(100n), etc.

Drop lower-order terms — O(n² + n) = O(n²). At large n, the n² term completely dominates. The n term becomes negligible. O(n³ + n² + n + 1) simplifies to just O(n³). Keep only the fastest-growing term.

Different inputs = different variables: O(a + b), not O(n). If a function iterates over array A then array B, the complexity is O(a + b) where a = len(A) and b = len(B). Using n for both assumes they're the same size — a dangerous and common mistake.

Nested loops multiply: two nested O(n) loops = O(n²). An outer loop running n times containing an inner loop running n times = n × n operations. Three nested loops = O(n³). Sequential (non-nested) loops of the same size add: O(n) + O(n) = O(n).

Space complexity counts too — recursive calls use stack space. A recursive function that calls itself n times has O(n) space complexity from the call stack alone, even if each frame uses O(1) memory. Binary search recursion uses O(log n) space. Always report both time and space.

Best case is rarely useful — systems are designed for the worst. Saying "this is O(1) in the best case" doesn't help you plan infrastructure. A database query that's O(1) for cached results and O(n) for cold reads must be designed around the O(n) path. Best case is a curiosity; worst case is an engineering constraint.

✅

You're Ready

With Big-O, Big-Ω, and Big-Θ in your toolkit — plus the seven golden rules — you can analyse any algorithm, communicate its performance precisely, and make principled decisions about which data structure or algorithm to reach for in any situation. Practice by analysing every loop you write. It becomes instinctive within weeks.