The Story That Explains Bucket Sort
Once every letter is in a slot, each slot holds only a small, similar pile. The clerk does a quick tidy-up within each slot, then walks the wall left-to-right and scoops the slots back into one neat, fully-ordered bundle.
That is Bucket Sort exactly: scatter items into buckets by value, sort each small bucket, then gather them back in order.
Bucket Sort is a distribution sort, not a comparison sort. Instead of repeatedly comparing pairs (like Bubble or Quick Sort), it uses the value itself to decide where an element goes. When data is spread fairly evenly across a range, this can sort in near linear time — faster than the O(n log n) comparison-sort barrier.
Split the range into buckets, drop each value into the bucket that covers its range, sort the (now small) buckets cheaply, then concatenate the buckets in order. The magic is that placing a value into a bucket costs no comparisons at all — just arithmetic.
Each value drops straight into its bucket by value — no comparisons needed.
The Four Phases
For values uniformly distributed in [0, 1) with n buckets,
the bucket for a value x is int(n × x). Example with n = 8:
0.12 → int(8×0.12)=0, 0.42 → int(8×0.42)=3,
0.91 → int(8×0.91)=7. For a general range, first scale:
int(n × (x − min) / (max − min)).
Hands-On: Watch Each Line Run
The heart of the tutorial. Press Play (or Step) and watch the three phases unfold: values leave the input row and drop into buckets, each bucket gets sorted, then everything is gathered into the output row. At every moment the exact Python line doing that work lights up on the right.
def bucket_sort(arr):
n = len(arr)
buckets = [[] for _ in range(n)]
for x in arr:
idx = int(n * x)
buckets[idx].append(x)
for b in buckets:
b.sort()
result = []
for b in buckets:
result.extend(b)
return result
Amber = the value or bucket being touched right now. Green = sorted / gathered into the final result. Indigo = still waiting. The amber code line is the instruction the interpreter is executing at that instant.
The Code, Line by Line — "This Line Does That"
Same program as the animation, now fully annotated. Each line maps to exactly one job the visualiser highlights.
def bucket_sort(arr):
n = len(arr) # how many values (and buckets)
buckets = [[] for _ in range(n)] # 1) CREATE n empty buckets
for x in arr: # 2) SCATTER: visit each value
idx = int(n * x) # pick its bucket by value
buckets[idx].append(x) # drop x into that bucket
for b in buckets: # 3) SORT every bucket
b.sort() # small bucket -> cheap sort
result = [] # 4) GATHER into one list
for b in buckets: # walk buckets left -> right
result.extend(b) # append bucket contents
return result # fully sorted output
def bucket_sort(arr) — defines the function and receives the list of values in [0, 1).
n = len(arr) — counts the values; we'll also use it as the bucket count.
buckets = [[] for _ in range(n)] — Phase 1: build n empty lists.
for x in arr — Phase 2 begins: loop over every input value.
idx = int(n * x) — compute the destination bucket from the value itself. No comparison.
buckets[idx].append(x) — drop the value into its bucket.
for b in buckets — Phase 3: iterate over each bucket.
b.sort() — sort that bucket in place. Buckets are tiny, so this is cheap.
result = [] — Phase 4: start an empty output list.
for b in buckets — walk the buckets in index order (smallest range first).
result.extend(b) — append the bucket's sorted contents to the result.
return result — hand back the fully sorted list.
A Full Worked Example
Let's scatter [0.42, 0.37, 0.73, 0.12, 0.91, 0.32, 0.55, 0.78] into
n = 8 buckets, using idx = int(8 × value).
| Value | int(8 × value) | Goes to bucket |
|---|---|---|
| 0.42 | int(3.36) | 3 |
| 0.37 | int(2.96) | 2 |
| 0.73 | int(5.84) | 5 |
| 0.12 | int(0.96) | 0 |
| 0.91 | int(7.28) | 7 |
| 0.32 | int(2.56) | 2 (joins 0.37) |
| 0.55 | int(4.40) | 4 |
| 0.78 | int(6.24) | 6 |
| Bucket | After Scatter | After Sort |
|---|---|---|
| 0 | [0.12] | [0.12] |
| 1 | [ ] empty | [ ] |
| 2 | [0.37, 0.32] | [0.32, 0.37] |
| 3 | [0.42] | [0.42] |
| 4 | [0.55] | [0.55] |
| 5 | [0.73] | [0.73] |
| 6 | [0.78] | [0.78] |
| 7 | [0.91] | [0.91] |
Walking buckets 0→7 and concatenating gives [0.12, 0.32, 0.37, 0.42, 0.55, 0.73, 0.78, 0.91] — fully sorted. Only bucket 2 ever needed an internal sort, because every other bucket held at most one value.
Why Distribution Matters: Uniform vs Skewed
Bucket Sort's speed depends entirely on how evenly values spread across the buckets. Spread them out and it flies; pile them into one bucket and it collapses into whatever sort you used inside.
| Bucket | Items |
|---|---|
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
| ... | ~1 each |
| Result | O(n + k) |
| Bucket | Items |
|---|---|
| 0 | 0 |
| 1 | 0 |
| 2 | n (all here!) |
| ... | 0 |
| Result | O(n²) |
If most values land in one bucket, that bucket's inner sort does all the work and you inherit its complexity — typically O(n²) with insertion sort. Bucket Sort assumes (or engineers) a roughly uniform distribution. It is not a general-purpose sorter for arbitrary data.
Time & Space Complexity
| Case | When | Complexity |
|---|---|---|
| Best | Values uniform, ~1 per bucket | O(n + k) |
| Average | Reasonably uniform input | O(n + k) |
| Worst | All values in one bucket | O(n²) |
| Space | Buckets + output | O(n + k) |
| Stable? | If inner sort is stable | Yes ✓ |
| In-place? | Needs extra bucket storage | No |
Comparison sorts can't do better than O(n log n) in general. Bucket Sort sidesteps that limit by not comparing during distribution — here k is the number of buckets. With k ≈ n and uniform data, the total is effectively O(n). The trade-off is the extra memory for buckets and the uniform-distribution assumption.
Benefits
When To Use It — and When Not To
sorted() — Python's Timsort is robust and fast without assumptions.Bucket Sort vs Other Sorts
| Algorithm | Type | Average | Worst | Space | Stable |
|---|---|---|---|---|---|
| Bucket Sort | Distribution | O(n + k) | O(n²) | O(n + k) | Yes* |
| Counting Sort | Distribution | O(n + k) | O(n + k) | O(k) | Yes |
| Radix Sort | Distribution | O(d(n + k)) | O(d(n + k)) | O(n + k) | Yes |
| Merge Sort | Comparison | O(n log n) | O(n log n) | O(n) | Yes |
| Quick Sort | Comparison | O(n log n) | O(n²) | O(log n) | No |
*Bucket Sort is stable only when the inner bucket sort is stable. k = number of buckets; d = number of digits.
Bucket, Counting, and Radix Sort are cousins — all avoid comparisons by using values as addresses. Counting Sort suits small integer ranges, Radix Sort sorts digit by digit, and Bucket Sort generalises to real-valued ranges. Radix Sort often uses Counting Sort as its per-digit step, just as Bucket Sort uses a small sort per bucket.
Handy Variation — Any Numeric Range
The basic version assumes values in [0, 1). To sort any range, scale each value into that interval first.
def bucket_sort_range(arr, k=10):
if not arr:
return arr
lo, hi = min(arr), max(arr)
if lo == hi:
return arr[:] # all equal -> nothing to do
buckets = [[] for _ in range(k)]
span = (hi - lo) / k
for x in arr:
idx = int((x - lo) / span)
if idx == k: # the max value lands one past the end
idx -= 1
buckets[idx].append(x)
out = []
for b in buckets:
out.extend(sorted(b)) # sort + gather in one step
return out
Golden Rules
int(n × x) for [0, 1), or
int(k × (x − min)/(max − min)) for a general range. Always guard the
max value so it doesn't land in an out-of-range bucket.
.sort() (Timsort) is a safe default.
O(n + k) extra memory for the buckets and
the output list.