Data Structure 📂 Data Structures (Core) · 1 of 6 63 min read

Arrays — The Foundation of All Data Structures

A complete, beginner-to-intermediate tutorial on arrays covering the cinema-seats mental model, animated interactive visualisation of indexing / insertion / deletion / linear search, Python code with output blocks, time complexity table, static vs dynamic arrays, cache locality, a 4-step memory pipeline, and 6 golden rules.

Section 01

The Story Behind Arrays

A Row of Cinema Seats
Picture a cinema with a single row of numbered seats: Seat 0, Seat 1, Seat 2 … all the way to the last one. Every seat is the same size and they sit shoulder-to-shoulder with no gaps allowed between them.

When you want to find your friend in seat 7, you don't ask the usher to search the whole auditorium — you walk straight to position 7. You know exactly where it is because seats are evenly spaced from the entrance. That instant lookup is the superpower of an array.

But here's the catch: if someone arrives late and wants to sit between seats 3 and 4, every person from seat 4 onward must shuffle one seat to the right to make room. And if someone leaves mid-film, the remaining audience shuffles left to close the gap. That shuffling cost is the price of keeping seats contiguous.

An array is a fixed-size, contiguous block of memory that stores elements of the same type. Each element sits at a predictable offset from the start, so reading any element by its index costs exactly the same time regardless of array size — a property called random access in O(1).

💡
Why Zero-Indexing?

The index is not a count of seats — it is an offset from the first seat. The first element is 0 seats away from the start, so its index is 0. This makes address arithmetic trivial: address = base + index × element_size. Every major language except a handful of legacy ones follows this convention.


Section 02

Interactive Diagram — See Arrays in Motion

Use the controls below to step through four core operations: indexing, insertion, deletion, and linear search. Each frame shows what happens to the cells in memory.

▶ Array Visualiser
Press Play or Step → to begin.
0 / 0
Speed
The Core Trade-off in One Sentence

Arrays give you blazing-fast reads (O(1)) in exchange for slow writes in the middle (O(n)). If your workload is read-heavy and you know the index, arrays are the best possible choice. If you insert and delete frequently in the middle, consider a linked list or a dynamic array with amortised O(1) appends.


Section 03

Time Complexity at a Glance

Operation Time (Best) Time (Worst) In-place? Notes
Read by index O(1) O(1) Yes Direct address arithmetic. Always constant time.
Write by index O(1) O(1) Yes Same as read — one memory write at a computed offset.
Append (end) O(1) O(n)* Varies Amortised O(1) for dynamic arrays (Python list, Java ArrayList). O(n) only on resize.
Insert (middle/start) O(n) O(n) Yes Every element after the insertion point shifts right. Insert at index 0 = worst case.
Delete (middle/start) O(n) O(n) Yes Every element after the deletion point shifts left. Delete at index 0 = worst case.
Delete (end) O(1) O(1) Yes Just decrement the length pointer. No shifting needed.
Search (unsorted) O(1) O(n) Yes Best case: found at index 0. Worst: last element or not present.
Search (sorted) O(1) O(log n) Yes Binary search — halves search space each step. Requires sorted array.
Space usage O(n) O(n) No per-element overhead. Tightest possible memory layout.
⚠️
Amortised vs Worst-Case for Dynamic Arrays

Python's list and Java's ArrayList are dynamic arrays: when full, they allocate a new block roughly 2× the current size and copy everything over. That copy is O(n) — but it happens so rarely (exponentially spaced out) that the amortised cost per append is O(1). This is not magic; it is mathematics. Still, if you're appending 10 million items and want zero pauses, pre-allocate with list = [None] * N upfront.


Section 04

Python Arrays — Code & Output

1 · Creating and Indexing

# Python's built-in list IS a dynamic array under the hood
seats = [12, 7, 45, 3, 28, 16, 9]

# O(1) index reads
print("First element :", seats[0])   # index 0
print("Middle element:", seats[3])   # index 3
print("Last element  :", seats[-1])  # negative index wraps from end

# Slice — creates a new list, O(k) where k = slice length
print("Indices 1–3   :", seats[1:4])  # [7, 45, 3]

# Length — O(1), stored as metadata
print("Length        :", len(seats))
OUTPUT
First element : 12 Middle element: 3 Last element : 9 Indices 1–3 : [7, 45, 3] Length : 7
🐍
Negative Indices Are Pure Syntax Sugar

seats[-1] is equivalent to seats[len(seats) - 1]. Python computes the real index before performing the memory lookup. The cost is still O(1) — just one extra subtraction. There is no backwards scan.

2 · Insertion and Deletion

cinema = [10, 20, 30, 40, 50]

# INSERT at index 2 — O(n): elements 30, 40, 50 shift right
cinema.insert(2, 99)
print("After insert(2, 99):", cinema)

# DELETE by index — O(n): elements after index 2 shift left
del cinema[2]
print("After del [2]      :", cinema)

# APPEND to end — amortised O(1)
cinema.append(60)
print("After append(60)   :", cinema)

# POP from end — O(1): no shifting
last = cinema.pop()
print(f"Popped: {last}, list now:", cinema)
OUTPUT
After insert(2, 99): [10, 20, 99, 30, 40, 50] After del [2] : [10, 20, 30, 40, 50] After append(60) : [10, 20, 30, 40, 50, 60] Popped: 60, list now: [10, 20, 30, 40, 50]

3 · Linear Search vs Binary Search

# ── Linear Search — works on any list, O(n) ──────────────
def linear_search(arr, target):
    for i, val in enumerate(arr):
        if val == target:
            return i
    return -1

# ── Binary Search — sorted array only, O(log n) ──────────
def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if   arr[mid] == target: return mid
        elif arr[mid] <  target: lo = mid + 1
        else:                    hi = mid - 1
    return -1

unsorted = [12, 7, 45, 3, 28, 16, 9]
sorted_a = sorted(unsorted)  # [3, 7, 9, 12, 16, 28, 45]

print("Linear search for 28:", linear_search(unsorted, 28))
print("Binary search for 28:", binary_search(sorted_a, 28))
print("Linear search for 99:", linear_search(unsorted, 99))
print("Binary search for 99:", binary_search(sorted_a, 99))
OUTPUT
Linear search for 28: 4 Binary search for 28: 5 Linear search for 99: -1 Binary search for 99: -1
🏆
Binary Search: How Fast is O(log n) Really?

For an array of 1 billion elements, binary search needs at most 30 comparisons (log₂(1,000,000,000) ≈ 30). Linear search needs up to 1 billion. The only requirement: the array must be sorted first. If you search the same array thousands of times, the one-time O(n log n) sort cost is repaid almost immediately.


Section 05

Static vs Dynamic Arrays

🔒
Static Array
Fixed size declared at creation. Lives on the stack (small) or pre-allocated heap. Zero overhead per element — just raw values. Can't grow without a full copy. Languages: C, C++, Go arrays, NumPy ndarray.
Use when: size is known, max performance needed
📈
Dynamic Array
Resizes automatically by doubling capacity when full. Stores a pointer, length, and capacity alongside data. Append is amortised O(1). Slight memory overhead. Languages: Python list, Java ArrayList, C++ vector, JS Array.
Use when: size is unknown or grows over time

Section 06

When Arrays Win & When They Don't

❌ Avoid Arrays When…
Frequent insertions/deletions in the middle
Size is completely unknown and changes wildly
You need fast membership tests on large sets
Key-value lookups (use a dict/hash map instead)
Ordered insertions keeping sorted order (use a heap/BST)
✅ Arrays Excel When…
Random access by index is the primary operation
Data is read far more often than written
Append-only workloads (logs, buffers, queues)
Cache performance matters (contiguous in memory)
Numeric computation (NumPy, matrix operations)
🧠
Cache Locality — The Hidden Superpower

Modern CPUs load memory in cache lines of 64 bytes. When you read arr[0], the CPU automatically fetches arr[0] through roughly arr[15] (for 4-byte integers) into the L1 cache. Iterating sequentially over an array is therefore far faster in practice than the O(n) notation suggests — cache hits are nearly free. Linked lists, by contrast, scatter nodes across memory and suffer frequent cache misses.


Section 07

Pipeline: What Happens Inside arr[i]

01
Bounds Check (Python only)
Python verifies 0 ≤ i < len(arr). If not, it raises IndexError before touching memory. C/C++ skip this — hence buffer overflow vulnerabilities.
02
Address Calculation
CPU computes address = base_ptr + i × element_size. For a Python list, element_size is the size of a pointer (8 bytes on 64-bit). One multiply and one add.
03
Cache Lookup
CPU checks L1 cache (4 cycles), then L2 (12 cycles), then L3 (40 cycles), then RAM (~200 cycles). If the array was recently accessed, it is almost certainly in L1 or L2.
04
Value Returned
The value at the computed address is loaded into a CPU register and returned. Total wall time on a cache-hot array: roughly 1–4 nanoseconds.

Section 08

6 Golden Rules for Arrays

📐 Arrays — Non-Negotiable Rules
1
Default to a list (dynamic array) in Python. The overhead of pre-sizing a static array is almost never worth the minor memory savings unless you're writing NumPy-level numerical code.
2
Append to the end; never insert at the front. arr.append(x) is amortised O(1). arr.insert(0, x) is always O(n) — it shifts every single element. If you need a fast front-insert, use collections.deque instead.
3
Use binary search on sorted arrays — never linear search if you can avoid it. bisect.bisect_left(arr, x) in Python gives you O(log n) search in one line. The precondition is that the array is sorted; maintaining that sort is your responsibility.
4
Prefer list comprehensions and slices over explicit loops. CPython's built-in list operations are implemented in C and run 5–20× faster than equivalent Python for-loops. [x * 2 for x in arr] is both faster and more readable than a manual loop with appends.
5
For numeric data, always use NumPy arrays instead of Python lists. A Python list of integers stores a pointer per element (8 bytes) plus object overhead (~28 bytes each). A NumPy array stores raw 4-byte integers. For 1 million numbers: ~256 MB (list) vs ~4 MB (NumPy). NumPy also vectorises loops to SIMD instructions automatically.
6
When you need both fast search and fast insert/delete, switch data structures. No array can beat O(log n) for both simultaneously. Consider a balanced BST, a skip list, or a sorted container like sortedcontainers.SortedList. Know when the array has served its purpose.

Section 09

Code Execution Animator — Watch Every Line Run

Select an operation, then press Play or click Step →. The highlighted line in the code panel shows exactly which instruction is executing, while the memory panel mirrors the result in real time.

⚡ Live Code Executor
📄 Python Code
🧠 Memory & Variables
Variable Watch
Press Play or Step → to start the animation.
0 / 0
Speed
🔍
How to Read the Animator

The purple arrow ▶ marks the currently executing line. The Memory panel reflects the array state after that line runs. Blue cells = being read, green = written/created, amber = shifting, red = being scanned, purple = found/result. The Variable Watch table tracks every live variable in real time.