Data Structure 📂 Part I – Foundations · 1 of 4 72 min read

Data Structures Explained

A comprehensive, beginner-to-advanced tutorial covering every major data structure — Arrays, Linked Lists, Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, and Tries. Every topic includes a real-world story analogy, an animated SVG diagram, Python code with colour-coded syntax, output blocks, and complexity tables. By the end, you will know exactly which structure to choose for any problem.

Section 01

The Story That Explains Data Structures

The Library Without a System
Imagine a library with 1 million books — but none of them are sorted, labeled, or catalogued. Every time you want a book, a librarian must walk through every shelf until they find it. Finding one book could take hours. Adding a new book? You just toss it somewhere.

Now imagine a second library: books are organised by genre, then author, then title. There is a card index at the front. A map on the wall. A numbering system on every shelf. You find any book in under a minute. This is what data structures do for a computer — they give raw data a shape, an address, and a purpose.

Choosing the wrong data structure is like choosing the wrong library system. A computer scientists' most important skill is knowing which structure to pick — and why.

A data structure is a way of organising, storing, and managing data in a computer so that it can be accessed and modified efficiently. Every program you write uses data structures — whether you know it or not. Lists, dictionaries, queues, trees — these are all data structures, each with its own strengths, weaknesses, and ideal use cases.

💡
Why Data Structures Matter

The difference between a program that runs in 0.001 seconds and one that takes 10 hours on the same machine is almost always the data structure chosen. Algorithms and data structures are two sides of the same coin — a great algorithm with a bad data structure is still slow. Knowing both is what separates junior developers from senior engineers.


Section 02

The Big Picture — Types of Data Structures

Data structures are broadly split into two families: Linear (elements arranged in a sequence) and Non-Linear (elements connected in hierarchies or networks).

📏
Linear Structures
Sequential Access
Elements are stored in a sequence. Each element has exactly one predecessor and one successor (except the first and last). Array, Linked List, Stack, Queue, Deque are all linear structures. Easy to traverse, natural for ordered data.
🌳
Non-Linear Structures
Hierarchical / Networked
Elements can have multiple connections. Data fans out in multiple directions. Trees, Graphs, Heaps are non-linear. Better for representing real-world relationships like family trees, networks, or filesystem directories.
🔑
Hash-Based Structures
Key-Value Lookup
Use a hash function to map keys to storage locations for near-instant lookup. Hash Tables, Dictionaries, Sets live here. The backbone of databases, caches, and symbol tables in every programming language runtime.
📊 Data Structure Family Tree
Data Structures Linear Non-Linear Hash-Based Array Linked List Stack Queue Tree (BST) Heap Graph Hash Table Set / Dict AVERAGE TIME COMPLEXITY — SEARCH O(1) Hash Table O(log n) BST / Heap O(n) Array / Linked List O(n²) Naive Graph O(V+E) BFS / DFS

Each family trades memory for speed in different ways. The right choice depends entirely on your most frequent operation.


Section 03

Arrays — The Foundation of Everything

The Egg Carton
An array is exactly like an egg carton. Each slot is numbered, all slots are the same size, and they are all lined up in a row. If you want egg number 7, you go directly to slot 7 — no searching, no wandering. But if you want to add a new egg to the middle? You must either buy a bigger carton and move every egg, or leave a gap. Arrays are fast at access, slow at insertion and deletion in the middle.

An array is a contiguous block of memory holding elements of the same type. Every element occupies the same number of bytes. Given a base address and an index, the CPU can compute any element's exact location in one arithmetic operation: address = base + index × element_size. This is why array access is always O(1).

🔢 Array in Memory — Index-to-Address Mapping
MEMORY ADDRESS BASE+0 BASE+4 BASE+8 BASE+12 BASE+16 BASE+20 BASE+24 42 17 88 56 23 99 11 [0] [1] [2] [3] [4] [5] [6] arr[3] → O(1) direct access addr = base + 3×4 = BASE+12

Highlighted cell [3] is accessed instantly — the CPU computes its exact memory address without scanning.

Array Complexity Summary

OperationBestAverageWorstNotes
Access by indexO(1)O(1)O(1)Direct address computation
Search (unsorted)O(1)O(n)O(n)Linear scan needed
Search (sorted)O(1)O(log n)O(log n)Binary search applies
Insert at endO(1)O(1)*O(n)*Amortised (dynamic arrays)
Insert at middleO(n)O(n)O(n)Must shift elements right
Delete from endO(1)O(1)O(1)Just decrease size
Delete from middleO(n)O(n)O(n)Must shift elements left

Python: Arrays in Action

# Python lists are dynamic arrays under the hood
arr = [42, 17, 88, 56, 23, 99, 11]

# O(1) — direct index access
print(arr[3])          # → 56
print(arr[-1])         # → 11  (last element)

# O(1) amortised — append to end
arr.append(77)

# O(n) — insert in the middle (shifts elements)
arr.insert(2, 100)    # insert 100 at index 2

# O(n) — delete from middle
arr.pop(2)             # remove element at index 2

# O(log n) — binary search on sorted array
import bisect
sorted_arr = sorted(arr)
idx = bisect.bisect_left(sorted_arr, 56)
print(f"56 found at index {idx}")

# Memory-efficient numeric arrays (numpy)
import numpy as np
np_arr = np.array([1, 2, 3, 4, 5], dtype=np.int32)
print(np_arr * 2)       # → [2, 4, 6, 8, 10] — vectorised O(n)
OUTPUT
56 11 56 found at index 3 [ 2 4 6 8 10]

Section 04

Linked Lists — Flexible but Scattered

The Treasure Hunt
Imagine a treasure hunt where each clue tells you only where to find the next clue. You start at clue 1 — it says "go to the oak tree". At the oak tree you find clue 2 — it says "go to the red barn". You cannot skip to clue 5 directly. You must follow the chain.

That is a linked list. Each element (a node) holds two things: the actual data, and a pointer to the next node. You can insert a new clue anywhere in the hunt without rewriting all the others — just change two pointers. But to find clue 5, you always start at clue 1 and count forward.
🔗 Singly Linked List — Node Structure
HEAD 42 data | next→ 17 data | next→ 88 data | next→ 56 data | next→ 23 data | NULL TAIL (next = NULL) NEW: 99 insert after [88] O(1) pointer change ✓ ⚠ Nodes scattered in memory — no cache locality — slower traversal than arrays

Insertion after a known node requires only two pointer changes — O(1). But finding that node first requires O(n) traversal.

Types of Linked Lists

➡️
Singly Linked List
next pointer only
Each node has one pointer to the next node. Traversal is forward only. Simple, low memory overhead. Used in stacks and simple queues.
↔️
Doubly Linked List
prev + next pointers
Each node has pointers to both next and previous nodes. Enables backward traversal and O(1) deletion with a node reference. Python's deque is built on this.
🔄
Circular Linked List
tail.next = head
The last node points back to the first, forming a ring. Used in round-robin scheduling, music playlists, and buffer management.
# Building a Singly Linked List from scratch in Python

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None

    def append(self, data):          # O(n) — must walk to tail
        node = Node(data)
        if not self.head:
            self.head = node; return
        cur = self.head
        while cur.next:
            cur = cur.next
        cur.next = node

    def prepend(self, data):         # O(1) — insert at head
        node = Node(data)
        node.next = self.head
        self.head = node

    def delete(self, data):          # O(n) — search then remove
        cur = self.head
        if cur and cur.data == data:
            self.head = cur.next; return
        while cur.next:
            if cur.next.data == data:
                cur.next = cur.next.next; return
            cur = cur.next

    def __repr__(self):
        vals, cur = [], self.head
        while cur:
            vals.append(str(cur.data))
            cur = cur.next
        return ' → '.join(vals) + ' → NULL'

ll = LinkedList()
for v in [42, 17, 88, 56]:
    ll.append(v)
ll.prepend(99)
ll.delete(17)
print(ll)
OUTPUT
99 → 42 → 88 → 56 → NULL

Section 05

Stacks — Last In, First Out

The Plate Stack in a Cafeteria
In a cafeteria, clean plates are stacked on a spring-loaded holder. The cook places new plates on top. A student always takes the plate from the top. The plate at the very bottom was placed first, but nobody will ever reach it until all the plates above it are taken. This is LIFO — Last In, First Out.

Your computer uses a stack constantly: every time you call a function, a "stack frame" (containing local variables and return address) is pushed onto the call stack. When the function returns, its frame is popped. When you get a "stack overflow" error, you have pushed too many frames — infinite recursion is the usual culprit.
📦 Stack — Push / Pop Operations
← PUSH "D" here (TOP) D ← TOP C B A ← BOTTOM PUSH(D) add to top → O(1) POP() → D remove from top → O(1) PEEK() → D view top without removing → O(1) REAL-WORLD STACK USES 🔙 Browser back button (page history) 📞 Function call stack (recursion) ↩ Undo / Redo in text editors 🔢 Expression evaluation (postfix) 🔤 Balanced parentheses checker 🌲 DFS graph traversal (iterative)
# Stack using Python list (list.append = push, list.pop = pop)
stack = []

stack.append('A')     # push A
stack.append('B')     # push B
stack.append('C')     # push C

print("Top:", stack[-1])     # peek → C
print("Pop:", stack.pop())   # pop → C
print("Stack:", stack)       # ['A', 'B']

# Classic interview problem: balanced parentheses
def is_balanced(s):
    stack, pairs = [], {')':'(', ']':'[', '}':'{'}
    for ch in s:
        if ch in '([{':
            stack.append(ch)
        elif ch in pairs:
            if not stack or stack[-1] != pairs[ch]:
                return False
            stack.pop()
    return not stack

print(is_balanced("({[]})"))   # True
print(is_balanced("([)]"))     # False
OUTPUT
Top: C Pop: C Stack: ['A', 'B'] True False

Section 06

Queues — First In, First Out

The Coffee Shop Queue
You join the back of the queue at a coffee shop. The person who arrived first gets served first. New arrivals join the back. This is FIFO — First In, First Out.

Operating systems use queues everywhere: print jobs, CPU task scheduling, network packet buffers, message brokers (Kafka, RabbitMQ), and BFS graph traversal all rely on queue logic. Fairness is built-in — no element jumps the line.
➡️
Simple Queue
FIFO
Enqueue at rear, dequeue from front. The classic. Python's collections.deque gives O(1) at both ends.
✔ Simple, fair scheduling
✖ No priority, single direction
🔢
Priority Queue
Highest priority first
Each element has a priority. The highest-priority element is dequeued first regardless of insertion order. Built on a min-heap internally.
✔ Dijkstra's algorithm, A*, scheduling
✖ O(log n) enqueue/dequeue
↔️
Deque (Double-Ended)
Front + rear access
Elements can be added or removed from both ends in O(1). Python's collections.deque implements this using a doubly linked list.
✔ Sliding window, palindrome check
✖ Slightly more memory than list
from collections import deque
import heapq

# ── Simple Queue ──────────────────────────────────────
q = deque()
q.append('Task A')     # enqueue → O(1)
q.append('Task B')
q.append('Task C')
print(q.popleft())        # dequeue → 'Task A'  O(1)

# ── Priority Queue (Min-Heap) ─────────────────────────
pq = []
heapq.heappush(pq, (3, 'Low priority'))
heapq.heappush(pq, (1, 'Critical task'))
heapq.heappush(pq, (2, 'Normal task'))

while pq:
    priority, task = heapq.heappop(pq)   # always pops minimum
    print(f"  Priority {priority}: {task}")

# ── Deque (sliding window max) ─────────────────────────
def sliding_max(nums, k):
    dq, result = deque(), []
    for i, n in enumerate(nums):
        while dq and nums[dq[-1]] <= n:
            dq.pop()
        dq.append(i)
        if dq[0] == i - k:
            dq.popleft()
        if i >= k - 1:
            result.append(nums[dq[0]])
    return result

print(sliding_max([1,3,-1,-3,5,3,6,7], k=3))
OUTPUT
Task A Priority 1: Critical task Priority 2: Normal task Priority 3: Low priority [3, 3, 5, 5, 6, 7]

Section 07

Hash Tables — The O(1) Lookup Machine

The Hotel Key System
Imagine a hotel with 1,000 rooms numbered 000–999. When a guest checks in with passport number AB123456, the front desk doesn't scan all 1,000 rooms. Instead, they run the passport number through a formula (the hash function) that instantly produces a room number — say, 347. The guest goes directly to room 347.

Two guests could theoretically be assigned the same room — this is called a collision. The hotel handles it by chaining a second key inside the same room (separate chaining) or finding the next empty room (open addressing). A good hash function makes collisions rare. This is exactly how Python's dict works.
🔑 Hash Table — Hashing and Collision Resolution
KEYS "apple" "banana" "cherry" "date" hash(key) % table_size HASH TABLE (8 buckets) [0] empty [1] apple [2] empty [3] cherry [4] date [5] banana [6] empty [7] empty VALUES 🍎 price=1.2 🍒 price=3.5 🌴 price=0.8 🍌 price=0.5 LOOKUP COMPLEXITY O(1) average O(n) worst (collisions) 0.7 ideal load factor

"date" (highlighted) maps to bucket [4]. Load factor (items÷buckets) should stay below 0.7 for good performance.

# Python dict IS a hash table — O(1) average for all operations
prices = {
    'apple':  1.20,
    'banana': 0.50,
    'cherry': 3.50,
    'date':   0.80
}

# O(1) — lookup, insert, delete
print(prices['cherry'])          # 3.5
prices['elderberry'] = 5.00      # insert
del prices['date']                # delete
print('apple' in prices)          # True → O(1)

# Frequency counter — classic hash table pattern
words = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat']
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1
print(freq)    # {'cat': 3, 'dog': 2, 'bird': 1}

# Two-sum problem — O(n) using hash table
def two_sum(nums, target):
    seen = {}
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

print(two_sum([2, 7, 11, 15], 9))  # [0, 1]
OUTPUT
3.5 True {'cat': 3, 'dog': 2, 'bird': 1} [0, 1]

Section 08

Trees — Hierarchical Power

The Company Org Chart
A company has a CEO at the top. Below the CEO are VPs. Below each VP are Managers. Below each Manager are Employees. You can find any employee's boss in O(log n) steps if the company is structured logically. You can also answer: "who reports to the Marketing VP?" or "what is the longest chain of command?" in efficient scans.

A tree is a hierarchical data structure with a root node and subtrees of children. Every node except the root has exactly one parent. Trees without cycles are everywhere: filesystems, HTML DOM, JSON, compiler ASTs, and databases all use trees internally.
🌳 Binary Search Tree (BST) — Structure & Invariant
50 ROOT 30 70 20 40 60 80 BST INVARIANT: left subtree < node < right subtree (at every node) < 50 (left) = 50 (root) > 50 (right)

BST search: at each node, go left if target < node, right if target > node. Finds 60 in 3 comparisons instead of 7.

Tree Traversal Methods

⬅️
In-Order (LNR)
Left → Node → Right
Visits nodes in sorted order for a BST. Result: 20, 30, 40, 50, 60, 70, 80. Used to produce sorted output from a BST.
⬆️
Pre-Order (NLR)
Node → Left → Right
Visits root before children. Result: 50, 30, 20, 40, 70, 60, 80. Used to copy/serialize a tree, or evaluate prefix expressions.
⬇️
Post-Order (LRN)
Left → Right → Node
Visits root after children. Result: 20, 40, 30, 60, 80, 70, 50. Used in deletion (delete children before parent) and expression evaluation.
class BST:
    class Node:
        def __init__(self, val):
            self.val, self.left, self.right = val, None, None

    def __init__(self):
        self.root = None

    def insert(self, val):      # O(log n) avg, O(n) worst (skewed)
        def _ins(node, v):
            if not node: return BST.Node(v)
            if v < node.val: node.left  = _ins(node.left,  v)
            else:            node.right = _ins(node.right, v)
            return node
        self.root = _ins(self.root, val)

    def inorder(self):          # O(n) — sorted output
        result = []
        def _walk(node):
            if node:
                _walk(node.left)
                result.append(node.val)
                _walk(node.right)
        _walk(self.root); return result

    def search(self, val):      # O(log n) average
        node = self.root
        while node:
            if   val == node.val: return True
            elif val <  node.val: node = node.left
            else:                 node = node.right
        return False

bst = BST()
for v in [50, 30, 70, 20, 40, 60, 80]:
    bst.insert(v)

print("In-order:", bst.inorder())
print("Search 60:", bst.search(60))
print("Search 55:", bst.search(55))
OUTPUT
In-order: [20, 30, 40, 50, 60, 70, 80] Search 60: True Search 55: False

Section 09

Heaps — The Priority Expert

A heap is a complete binary tree that satisfies the heap property: in a min-heap, every parent is smaller than its children. In a max-heap, every parent is larger. The root is always the smallest (or largest) element, making it perfect for priority queues and sorting.

⛰️ Min-Heap — Always O(1) to Find Minimum
MIN-HEAP TREE FORM 1 3 2 7 5 4 6 STORED AS ARRAY 1 3 2 7 5 4 6 [0] [1] [2] [3] [4] [5] [6] INDEX FORMULAS Parent of i : floor((i-1) / 2) Left child : 2i + 1 Right child : 2i + 2 Min always at index 0 → O(1) heappush / heappop → O(log n)

A heap is a complete binary tree stored as a flat array. No pointers needed — parent/child relationships are computed by index arithmetic.

import heapq

# Python heapq is a MIN-heap by default
nums = [7, 3, 1, 5, 4, 2, 6]
heapq.heapify(nums)          # O(n) — convert list to heap in-place
print("Heap:", nums)

# Always pops the minimum
print("Pop min:", heapq.heappop(nums))   # 1  → O(log n)
print("Pop min:", heapq.heappop(nums))   # 2  → O(log n)

# Find k largest elements — classic heap use
data = [10, 4, 20, 1, 7, 15, 3]
print("3 largest:", heapq.nlargest(3, data))   # [20, 15, 10]
print("3 smallest:", heapq.nsmallest(3, data)) # [1, 3, 4]

# Max-heap trick: negate the values
max_heap = []
for n in data:
    heapq.heappush(max_heap, -n)
print("Max:", -heapq.heappop(max_heap))    # 20
OUTPUT
Heap: [1, 3, 2, 5, 4, 7, 6] Pop min: 1 Pop min: 2 3 largest: [20, 15, 10] 3 smallest: [1, 3, 4] Max: 20

Section 10

Graphs — The Most Powerful Structure

The Road Map
Cities are dots. Roads between cities are lines. Google Maps is a graph. The internet is a graph (routers = nodes, cables = edges). Social media is a graph (users = nodes, friendships = edges). Graphs are the most powerful and general data structure in computer science — almost every real-world relationship can be modelled as one.

BFS (Breadth-First Search) finds the shortest path in an unweighted graph — like finding the fewest hops between two Facebook friends. Dijkstra's algorithm finds the shortest path in a weighted graph — like the fastest route on Google Maps.
🗺️ Directed Weighted Graph — BFS vs DFS
A B C D E F 4 2 3 5 1 6 2 BFS from A (level by level): A → B,C → D → E,F DFS from A (deep first): A → B → D → E → F → C
from collections import deque

# Adjacency list representation (most common)
graph = {
    'A': ['B', 'C'],
    'B': ['D'],
    'C': ['D'],
    'D': ['E', 'F'],
    'E': ['F'],
    'F': []
}

# BFS — shortest path (unweighted)
def bfs(graph, start, goal):
    queue = deque([[start]])
    visited = {start}
    while queue:
        path = queue.popleft()
        node = path[-1]
        if node == goal: return path
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(path + [neighbour])
    return None

# DFS — explores depth first
def dfs(graph, node, visited=None):
    if visited is None: visited = set()
    visited.add(node)
    result = [node]
    for n in graph[node]:
        if n not in visited:
            result += dfs(graph, n, visited)
    return result

print("BFS A→F:", bfs(graph, 'A', 'F'))
print("DFS from A:", dfs(graph, 'A'))
OUTPUT
BFS A→F: ['A', 'B', 'D', 'F'] DFS from A: ['A', 'B', 'D', 'E', 'F', 'C']

Section 11

Big-O Complexity — Choosing Wisely

Every data structure has a complexity profile. This is the single most important table in this tutorial — memorise it, and you will make the right architecture decisions in every interview and production system.

Data Structure Access Search Insert Delete Space Best Use Case
Array O(1) O(n) O(n) O(n) O(n) Index-heavy access, numeric computation
Linked List O(n) O(n) O(1)* O(1)* O(n) Frequent insertion/deletion, queues, undo stacks
Stack O(n) O(n) O(1) O(1) O(n) LIFO ordering, call stacks, expression parsing
Queue O(n) O(n) O(1) O(1) O(n) FIFO scheduling, BFS, message buffers
Hash Table N/A O(1)** O(1)** O(1)** O(n) Fast lookup by key, caching, deduplication
BST (balanced) O(n) O(log n) O(log n) O(log n) O(n) Sorted data, range queries, databases
Heap O(1)† O(n) O(log n) O(log n) O(n) Priority queues, top-k, Dijkstra's
Graph (adj list) N/A O(V+E) O(1) O(E) O(V+E) Relationships, networks, routing, social graphs

* with known node reference · ** average case · † min or max only

🏆
The Decision Framework

Ask three questions: (1) What operation is most frequent? — if it's lookup by key, use a hash table. If it's finding minimum/maximum repeatedly, use a heap. (2) Is the data ordered? — if yes and you need range queries, use a BST. (3) Is memory tight? — arrays are the most memory-efficient per element; linked structures add pointer overhead.


Section 12

Structure Comparison — When to Use What

ScenarioWrong ChoiceRight ChoiceWhy
Check if username exists in a database of 10M users Array / List Hash Set O(1) lookup vs O(n) scan
Undo the last 20 actions in a text editor Queue Stack LIFO — last action undone first
Find the shortest path between cities Array Graph + BFS/Dijkstra Relationships naturally form a graph
Process customer support tickets in arrival order Stack Queue FIFO — first arrived, first served
Always serve the highest-urgency hospital patient next Queue Priority Queue (Heap) Need "most urgent first", not "oldest first"
Find all employees earning between £40k–£60k Hash Table BST / Sorted Array Range queries need ordered structure
Frequent insertion/removal in the middle of a list Array Doubly Linked List Array shifts are O(n); linked insert is O(1)
Store 1 million integers and access by index constantly Linked List Array (numpy) Cache-friendly contiguous memory, O(1) access

Section 13

Advanced: Tries — The String Search Expert

A Trie (prefix tree) is a tree specialised for strings. Each node represents a single character. Words are paths from root to a marked node. If 1,000 words share the prefix "pre", that prefix is stored only once — not 1,000 times.

🔤 Trie — Storing "car", "card", "care", "cat", "cut"
* c a u r t ✓ "cat" ✓"car" t ✓ "cut" → d(card) e(care)

Prefix "ca" is stored once and shared by "car", "card", "care", "cat". Trie lookup is O(m) where m = length of the word — independent of how many words are stored.

🔍
Where Tries Are Used

Autocomplete in search engines and IDEs, spell checkers, IP routing tables (CIDR prefix matching), phone contact search, and word games like Scrabble all use Trie structures. The reason: no other structure gives O(m) prefix search with shared storage as elegantly.

# Trie implementation in Python
class TrieNode:
    def __init__(self):
        self.children = {}        # char → TrieNode
        self.is_end   = False    # marks complete word

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):      # O(m) where m = len(word)
        node = self.root
        for ch in word:
            node = node.children.setdefault(ch, TrieNode())
        node.is_end = True

    def search(self, word):       # O(m) — exact match
        node = self.root
        for ch in word:
            if ch not in node.children: return False
            node = node.children[ch]
        return node.is_end

    def starts_with(self, prefix):  # O(p) — autocomplete check
        node = self.root
        for ch in prefix:
            if ch not in node.children: return False
            node = node.children[ch]
        return True

trie = Trie()
for w in ['car', 'card', 'care', 'cat', 'cut']:
    trie.insert(w)

print(trie.search('card'))       # True
print(trie.search('car'))        # True
print(trie.search('cars'))       # False (not inserted)
print(trie.starts_with('ca'))   # True  (prefix check)
print(trie.starts_with('cu'))   # True
print(trie.starts_with('cv'))   # False
OUTPUT
True True False True True False

Section 14

Golden Rules

🌿 Data Structures — Non-Negotiable Rules
1
Never choose a structure before identifying the dominant operation. Ask: "What will I do to this data 90% of the time?" — then match the structure that makes that operation O(1) or O(log n).
2
Arrays are not always slow. Their cache locality (contiguous memory) makes them dramatically faster than linked structures in practice, even when the big-O is the same. A sorted array + binary search often outperforms a BST for read-heavy workloads.
3
Python's dict and set are hash tables. Use them relentlessly for membership testing (x in s), deduplication, and frequency counting. Converting a list to a set before searching is one of the most common O(n) → O(1) optimisations in Python.
4
Use collections.deque instead of list as a queue. list.pop(0) is O(n) because it shifts all remaining elements. deque.popleft() is O(1). This single swap can reduce BFS from O(n²) to O(n).
5
Heaps give you the minimum or maximum in O(1), but searching arbitrary values is O(n). If you need both fast priority access AND arbitrary lookup, combine a heap with a hash map. This pattern powers many real-time leaderboards and schedulers.
6
Graphs are the most general structure — almost every problem involving relationships, dependencies, or networks is a graph problem. Learn BFS, DFS, Dijkstra's, and topological sort thoroughly. These four algorithms solve the vast majority of graph interview problems.
7
An unbalanced BST degrades to a linked list. Inserting sorted data into a plain BST creates a O(n) search worst case. Always use a self-balancing tree (AVL, Red-Black — Python's sortedcontainers.SortedList) for production work that requires ordered access.