The Story That Explains B-Trees
Option A — Binary Search Tree approach: One giant alphabetical list on a single long scroll. To find any book, you walk half the scroll, then half again… It works, but each "fold" requires a separate trip to a different storage room. With 10 million books, that's up to 23 trips.
Option B — B-Tree approach: A hierarchical cabinet system. The top cabinet has 500 labels (A–Z plus ranges). Each drawer has sub-labels. You navigate 2–3 cabinets and you're done — regardless of whether the library has 10,000 or 10 billion books.
That's the magic of B-Trees. Each node holds many keys, so the tree stays very shallow even with billions of records. Fewer levels = fewer disk reads = blazing speed.
A B-Tree (Balanced Tree) is a self-balancing search tree designed specifically for systems that read and write large blocks of data — like databases and file systems. Unlike a binary tree where each node has at most 2 children, a B-Tree node can have hundreds of children, which keeps the tree height tiny.
Disk I/O is the bottleneck in databases — reading one value from disk takes ~10ms, but reading 1000 sequential values from the same disk block takes almost the same time. B-Trees exploit this by packing many keys into one node (one disk block), minimising the number of disk reads to find any record.
B-Tree vs Binary Search Tree — The Key Difference
| Property | Value |
|---|---|
| Keys per node | 1 |
| Children per node | 0 or 2 |
| Height (1M records) | ~20 levels |
| Disk reads per lookup | ~20 |
| Balance guarantee | ❌ No (standard) |
| Property | Value |
|---|---|
| Keys per node | 1 to 199 |
| Children per node | 2 to 200 |
| Height (1M records) | 3–4 levels |
| Disk reads per lookup | 3–4 |
| Balance guarantee | ✅ Always |
B-Tree Structure — Anatomy
▲ Each node appears with a fade-in animation. Yellow = root, Blue = internal, Green = leaf. All leaf nodes sit at the same depth — this is the balance guarantee.
Operation 1 — Search
▲ Watch the search path animate: root → internal node (35,44) → leaf with 40. Only 3 disk reads total.
Python Code — B-Tree Search
class BTreeNode:
def __init__(self, leaf=False):
self.keys = [] # sorted list of keys in this node
self.children = [] # child node references (len = len(keys)+1 for internal)
self.leaf = leaf # True if this is a leaf node
class BTree:
def __init__(self, t):
self.root = BTreeNode(leaf=True) # tree starts as single empty leaf
self.t = t # minimum degree: each node has t-1..2t-1 keys
# ─── SEARCH ───────────────────────────────────────────────
def search(self, node, k):
"""Return (node, index) if k found, else None."""
i = 0
while i < len(node.keys) and k > node.keys[i]:
i += 1 # advance past keys smaller than k
if i < len(node.keys) and node.keys[i] == k:
return (node, i) # ✅ key found at index i
if node.leaf:
return None # ❌ leaf reached, key not in tree
return self.search(node.children[i], k) # recurse into child[i]
Operation 2 — Insert
Python Code — B-Tree Insert with Split
# ─── INSERT PUBLIC ENTRY POINT ───────────────────────────
def insert(self, k):
root = self.root
if len(root.keys) == (2 * self.t - 1): # root is FULL → must split first
new_root = BTreeNode()
new_root.children = [self.root] # old root becomes child
self._split_child(new_root, 0) # split old root, promote median
self.root = new_root # tree grows one level taller
self._insert_non_full(self.root, k)
# ─── INSERT INTO A GUARANTEED NON-FULL NODE ──────────────
def _insert_non_full(self, node, k):
i = len(node.keys) - 1 # start from rightmost key
if node.leaf:
node.keys.append(None) # make room
while i >= 0 and k < node.keys[i]:
node.keys[i + 1] = node.keys[i] # shift right to keep sorted order
i -= 1
node.keys[i + 1] = k # insert k in correct position
else:
while i >= 0 and k < node.keys[i]:
i -= 1 # find correct child to descend
i += 1
if len(node.children[i].keys) == (2 * self.t - 1):
self._split_child(node, i) # proactively split full child
if k > node.keys[i]:
i += 1 # decide which of the 2 new children to use
self._insert_non_full(node.children[i], k) # recurse down
# ─── SPLIT A FULL CHILD ──────────────────────────────────
def _split_child(self, parent, i):
t = self.t
full = parent.children[i] # the child to split
sibling = BTreeNode(leaf=full.leaf)
# Median key rises to parent
parent.keys.insert(i, full.keys[t - 1]) # promote median
parent.children.insert(i + 1, sibling)
# Right half goes to sibling
sibling.keys = full.keys[t:] # keys after median → new right sibling
full.keys = full.keys[:t - 1] # keys before median stay in full
if not full.leaf: # split children pointers too
sibling.children = full.children[t:]
full.children = full.children[:t]
Classical B-Tree insertion splits on the way back up (after finding the leaf). The proactive approach splits on the way down, so we only need a single pass and never backtrack. This is the algorithm used in real database engines like PostgreSQL and SQLite — it's more I/O efficient.
Operation 3 — Delete
Python Code — B-Tree Delete
# ─── DELETE PUBLIC ENTRY POINT ───────────────────────────
def delete(self, node, k):
t = self.t
i = 0
while i < len(node.keys) and k > node.keys[i]:
i += 1
if i < len(node.keys) and node.keys[i] == k:
# Key found in this node
if node.leaf:
node.keys.pop(i) # Case 1: simply remove from leaf
elif len(node.children[i].keys) >= t:
# Case 2a: replace with in-order predecessor
pred = self._get_pred(node, i)
node.keys[i] = pred
self.delete(node.children[i], pred)
elif len(node.children[i+1].keys) >= t:
# Case 2b: replace with in-order successor
succ = self._get_succ(node, i)
node.keys[i] = succ
self.delete(node.children[i+1], succ)
else:
# Case 2c: both children have t-1 keys → merge
self._merge(node, i)
self.delete(node.children[i], k)
else:
# Key not in this node — go deeper
if node.leaf:
return # key not in tree
if len(node.children[i].keys) < t:
self._fill(node, i) # Case 3: ensure child has ≥ t keys
# After fill, child may have merged — re-check index
if i >= len(node.keys) and i > 0:
self.delete(node.children[i-1], k)
else:
self.delete(node.children[i], k)
def _get_pred(self, node, i):
"""Largest key in left subtree (in-order predecessor)."""
curr = node.children[i]
while not curr.leaf:
curr = curr.children[-1]
return curr.keys[-1]
def _get_succ(self, node, i):
"""Smallest key in right subtree (in-order successor)."""
curr = node.children[i + 1]
while not curr.leaf:
curr = curr.children[0]
return curr.keys[0]
def _merge(self, parent, i):
"""Merge child[i+1] into child[i]; drop key[i] from parent."""
child = parent.children[i]
sibling = parent.children[i + 1]
child.keys.append(parent.keys[i]) # parent key drops down
child.keys.extend(sibling.keys) # sibling keys join child
child.children.extend(sibling.children) # sibling children move too
parent.keys.pop(i) # remove key from parent
parent.children.pop(i + 1) # remove sibling from parent
Complete Interactive Demo — Build a B-Tree Live
The code below builds a B-Tree step by step and shows the tree structure after each insertion. Each line is annotated to show what it is doing:
# ══════════════════════════════════════════════════════════
# FULL B-TREE DEMO — Build, Search, Delete
# ══════════════════════════════════════════════════════════
class BTreeNode:
def __init__(self, leaf=False):
self.keys = [] # keys stored in ascending order
self.children = [] # child pointers (len = len(keys)+1 for internal)
self.leaf = leaf # True for leaf nodes
class BTree:
def __init__(self, t=2): # t=2 → classic 2-3-4 tree (max 3 keys per node)
self.root = BTreeNode(leaf=True)
self.t = t
def insert(self, k):
root = self.root
if len(root.keys) == 2 * self.t - 1: # root full → grow tree
s = BTreeNode()
s.children = [self.root]
self._split_child(s, 0)
self.root = s
self._insert_non_full(self.root, k)
def _insert_non_full(self, node, k):
i = len(node.keys) - 1
if node.leaf:
node.keys.append(None)
while i >= 0 and k < node.keys[i]:
node.keys[i+1] = node.keys[i] # shift right
i -= 1
node.keys[i+1] = k
else:
while i >= 0 and k < node.keys[i]: i -= 1
i += 1
if len(node.children[i].keys) == 2*self.t-1:
self._split_child(node, i)
if k > node.keys[i]: i += 1
self._insert_non_full(node.children[i], k)
def _split_child(self, parent, i):
t = self.t
y = parent.children[i] # full child to split
z = BTreeNode(leaf=y.leaf) # new right sibling
parent.keys.insert(i, y.keys[t-1]) # promote median
parent.children.insert(i+1, z)
z.keys = y.keys[t:] # right half to sibling
y.keys = y.keys[:t-1] # left half stays
if not y.leaf:
z.children = y.children[t:]
y.children = y.children[:t]
def search(self, node, k):
i = 0
while i < len(node.keys) and k > node.keys[i]: i += 1
if i < len(node.keys) and node.keys[i] == k: return (node, i)
if node.leaf: return None
return self.search(node.children[i], k)
def print_tree(self, node=None, indent=0):
if node is None: node = self.root
print(" " * indent + f"[{', '.join(map(str, node.keys))}]")
for child in node.children:
self.print_tree(child, indent + 1)
# ── Build and demonstrate ──────────────────────────────────
bt = BTree(t=2) # minimum degree 2 → each node: 1 to 3 keys
keys = [10, 20, 5, 6, 12, 30, 7, 17]
for k in keys:
bt.insert(k)
print(f"After inserting {k}:")
bt.print_tree()
print()
# Search
result = bt.search(bt.root, 12)
print(f"Search 12: {'FOUND at node ' + str(result[0].keys) if result else 'NOT FOUND'}")
Complexity Summary
| Operation | Time Complexity | Disk I/Os | Space | Notes |
|---|---|---|---|---|
| Search | O(log n) | O(logt n) | O(1) | With t=500 and n=10⁹: only 4–5 I/Os |
| Insert | O(log n) | O(logt n) | O(1) | Splits add a constant overhead per level |
| Delete | O(log n) | O(logt n) | O(1) | Merges/rotations bounded by tree height |
| Space | — | — | O(n) | Each key stored once; up to 50% utilisation |
| In-order traversal | O(n) | O(n/B) | O(h) | h = tree height; B = keys per block |
B-Tree vs B+ Tree vs B* Tree
| Variant | Data in internal nodes? | Leaf linking | Min fill | Best used in |
|---|---|---|---|---|
| B-Tree | ✅ Yes | ❌ No leaf chain | 50% | General key-value stores; Btrfs filesystem |
| B+ Tree | ❌ Internal nodes: keys only | ✅ Leaves linked as a list | 50% | MySQL InnoDB, PostgreSQL — range queries blazing fast |
| B* Tree | ✅ Yes | ❌ No | 67% | Where space utilisation matters — delays splits, redistributes first |
In a B+ Tree, all actual data records live in the leaf nodes. Internal nodes only store separator keys for navigation. Because the leaves are linked in order (leaf₁ → leaf₂ → leaf₃ → …), a range query like WHERE age BETWEEN 20 AND 30 can find the start position in O(log n) and then scan forward linearly — no tree traversal needed for the range portion.
Real-World Use Cases
Benefits of B-Trees
Comparison Table — Data Structures for Search
| Structure | Search | Insert | Delete | Range Query | Disk Friendly | Balanced? |
|---|---|---|---|---|---|---|
| Array (sorted) | O(log n) | O(n) | O(n) | O(log n + k) | Medium | N/A |
| Hash Table | O(1) avg | O(1) avg | O(1) avg | ❌ No | Poor | N/A |
| BST (unbalanced) | O(n) worst | O(n) worst | O(n) worst | O(n) worst | Very Poor | ❌ |
| AVL / Red-Black | O(log n) | O(log n) | O(log n) | O(log n + k) | Poor (1 key/node) | ✅ |
| Skip List | O(log n) avg | O(log n) avg | O(log n) avg | O(log n + k) | Moderate | Probabilistic |
| B-Tree ✅ | O(log n) | O(log n) | O(log n) | O(log n + k) | Excellent | ✅ Always |
| B+ Tree ✅ | O(log n) | O(log n) | O(log n) | O(log n + k)* | Best | ✅ Always |
* B+ Tree range scan: after O(log n) to find start, remaining scan is O(k) — sequential leaf traversal, no tree re-traversal.
Golden Rules for B-Trees
A B-Tree is a self-balancing tree where each node holds many keys and pointers — keeping the tree very short, so any record in a billion-row database can be found with just 3–4 disk reads, and the tree stays perfectly balanced after every insert and delete without any extra effort.