Data Structures in Python

Section 01

The Story That Explains Data Structures

📖 Real World Analogy

The Library Without a System

Imagine a library with 1 million books — but none of them are sorted, labeled, or catalogued. Every time you want a book, a librarian must walk through every shelf until they find it. Finding one book could take hours. Adding a new book? You just toss it somewhere.

Now imagine a second library: books are organised by genre, then author, then title. There is a card index at the front. A map on the wall. A numbering system on every shelf. You find any book in under a minute. This is what data structures do for a computer — they give raw data a shape, an address, and a purpose.

Choosing the wrong data structure is like choosing the wrong library system. A computer scientists' most important skill is knowing which structure to pick — and why.

A data structure is a way of organising, storing, and managing data in a computer so that it can be accessed and modified efficiently. Every program you write uses data structures — whether you know it or not. Lists, dictionaries, queues, trees — these are all data structures, each with its own strengths, weaknesses, and ideal use cases.

💡

Why Data Structures Matter

The difference between a program that runs in 0.001 seconds and one that takes 10 hours on the same machine is almost always the data structure chosen. Algorithms and data structures are two sides of the same coin — a great algorithm with a bad data structure is still slow. Knowing both is what separates junior developers from senior engineers.

Section 02

The Big Picture — Types of Data Structures

Data structures are broadly split into two families: Linear (elements arranged in a sequence) and Non-Linear (elements connected in hierarchies or networks).

📏

Linear Structures

Sequential Access

Elements are stored in a sequence. Each element has exactly one predecessor and one successor (except the first and last). Array, Linked List, Stack, Queue, Deque are all linear structures. Easy to traverse, natural for ordered data.

🌳

Non-Linear Structures

Hierarchical / Networked

Elements can have multiple connections. Data fans out in multiple directions. Trees, Graphs, Heaps are non-linear. Better for representing real-world relationships like family trees, networks, or filesystem directories.

🔑

Hash-Based Structures

Key-Value Lookup

Use a hash function to map keys to storage locations for near-instant lookup. Hash Tables, Dictionaries, Sets live here. The backbone of databases, caches, and symbol tables in every programming language runtime.

📊 Data Structure Family Tree

Each family trades memory for speed in different ways. The right choice depends entirely on your most frequent operation.

Section 03

Arrays — The Foundation of Everything

📖 Story

The Egg Carton

An array is exactly like an egg carton. Each slot is numbered, all slots are the same size, and they are all lined up in a row. If you want egg number 7, you go directly to slot 7 — no searching, no wandering. But if you want to add a new egg to the middle? You must either buy a bigger carton and move every egg, or leave a gap. Arrays are fast at access, slow at insertion and deletion in the middle.

An array is a contiguous block of memory holding elements of the same type. Every element occupies the same number of bytes. Given a base address and an index, the CPU can compute any element's exact location in one arithmetic operation: address = base + index × element_size. This is why array access is always O(1).

🔢 Array in Memory — Index-to-Address Mapping

Highlighted cell [3] is accessed instantly — the CPU computes its exact memory address without scanning.

Array Complexity Summary

Operation	Best	Average	Worst	Notes
Access by index	O(1)	O(1)	O(1)	Direct address computation
Search (unsorted)	O(1)	O(n)	O(n)	Linear scan needed
Search (sorted)	O(1)	O(log n)	O(log n)	Binary search applies
Insert at end	O(1)	O(1)*	O(n)	*Amortised (dynamic arrays)
Insert at middle	O(n)	O(n)	O(n)	Must shift elements right
Delete from end	O(1)	O(1)	O(1)	Just decrease size
Delete from middle	O(n)	O(n)	O(n)	Must shift elements left

Python: Arrays in Action

# Python lists are dynamic arrays under the hood
arr = [42, 17, 88, 56, 23, 99, 11]

# O(1) — direct index access
print(arr[3])          # → 56
print(arr[-1])         # → 11  (last element)

# O(1) amortised — append to end
arr.append(77)

# O(n) — insert in the middle (shifts elements)
arr.insert(2, 100)    # insert 100 at index 2

# O(n) — delete from middle
arr.pop(2)             # remove element at index 2

# O(log n) — binary search on sorted array
import bisect
sorted_arr = sorted(arr)
idx = bisect.bisect_left(sorted_arr, 56)
print(f"56 found at index {idx}")

# Memory-efficient numeric arrays (numpy)
import numpy as np
np_arr = np.array([1, 2, 3, 4, 5], dtype=np.int32)
print(np_arr * 2)       # → [2, 4, 6, 8, 10] — vectorised O(n)

OUTPUT

56 11 56 found at index 3 [ 2 4 6 8 10]

Section 04

Linked Lists — Flexible but Scattered

📖 Story

The Treasure Hunt

Imagine a treasure hunt where each clue tells you only where to find the next clue. You start at clue 1 — it says "go to the oak tree". At the oak tree you find clue 2 — it says "go to the red barn". You cannot skip to clue 5 directly. You must follow the chain.

That is a linked list. Each element (a node) holds two things: the actual data, and a pointer to the next node. You can insert a new clue anywhere in the hunt without rewriting all the others — just change two pointers. But to find clue 5, you always start at clue 1 and count forward.

🔗 Singly Linked List — Node Structure

Insertion after a known node requires only two pointer changes — O(1). But finding that node first requires O(n) traversal.

Types of Linked Lists

➡️

Singly Linked List

next pointer only

Each node has one pointer to the next node. Traversal is forward only. Simple, low memory overhead. Used in stacks and simple queues.

↔️

Doubly Linked List

prev + next pointers

Each node has pointers to both next and previous nodes. Enables backward traversal and O(1) deletion with a node reference. Python's deque is built on this.

🔄

Circular Linked List

tail.next = head

The last node points back to the first, forming a ring. Used in round-robin scheduling, music playlists, and buffer management.

# Building a Singly Linked List from scratch in Python

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None

    def append(self, data):          # O(n) — must walk to tail
        node = Node(data)
        if not self.head:
            self.head = node; return
        cur = self.head
        while cur.next:
            cur = cur.next
        cur.next = node

    def prepend(self, data):         # O(1) — insert at head
        node = Node(data)
        node.next = self.head
        self.head = node

    def delete(self, data):          # O(n) — search then remove
        cur = self.head
        if cur and cur.data == data:
            self.head = cur.next; return
        while cur.next:
            if cur.next.data == data:
                cur.next = cur.next.next; return
            cur = cur.next

    def __repr__(self):
        vals, cur = [], self.head
        while cur:
            vals.append(str(cur.data))
            cur = cur.next
        return ' → '.join(vals) + ' → NULL'

ll = LinkedList()
for v in [42, 17, 88, 56]:
    ll.append(v)
ll.prepend(99)
ll.delete(17)
print(ll)

OUTPUT

99 → 42 → 88 → 56 → NULL

Section 05

Stacks — Last In, First Out

📖 Story

The Plate Stack in a Cafeteria

In a cafeteria, clean plates are stacked on a spring-loaded holder. The cook places new plates on top. A student always takes the plate from the top. The plate at the very bottom was placed first, but nobody will ever reach it until all the plates above it are taken. This is LIFO — Last In, First Out.

Your computer uses a stack constantly: every time you call a function, a "stack frame" (containing local variables and return address) is pushed onto the call stack. When the function returns, its frame is popped. When you get a "stack overflow" error, you have pushed too many frames — infinite recursion is the usual culprit.

📦 Stack — Push / Pop Operations

# Stack using Python list (list.append = push, list.pop = pop)
stack = []

stack.append('A')     # push A
stack.append('B')     # push B
stack.append('C')     # push C

print("Top:", stack[-1])     # peek → C
print("Pop:", stack.pop())   # pop → C
print("Stack:", stack)       # ['A', 'B']

# Classic interview problem: balanced parentheses
def is_balanced(s):
    stack, pairs = [], {')':'(', ']':'[', '}':'{'}
    for ch in s:
        if ch in '([{':
            stack.append(ch)
        elif ch in pairs:
            if not stack or stack[-1] != pairs[ch]:
                return False
            stack.pop()
    return not stack

print(is_balanced("({[]})"))   # True
print(is_balanced("([)]"))     # False

OUTPUT

Top: C Pop: C Stack: ['A', 'B'] True False

Section 06

Queues — First In, First Out

📖 Story

The Coffee Shop Queue

You join the back of the queue at a coffee shop. The person who arrived first gets served first. New arrivals join the back. This is FIFO — First In, First Out.

Operating systems use queues everywhere: print jobs, CPU task scheduling, network packet buffers, message brokers (Kafka, RabbitMQ), and BFS graph traversal all rely on queue logic. Fairness is built-in — no element jumps the line.

➡️

Simple Queue

FIFO

Enqueue at rear, dequeue from front. The classic. Python's collections.deque gives O(1) at both ends.

✔ Simple, fair scheduling

✖ No priority, single direction

🔢

Priority Queue

Highest priority first

Each element has a priority. The highest-priority element is dequeued first regardless of insertion order. Built on a min-heap internally.

✔ Dijkstra's algorithm, A*, scheduling

✖ O(log n) enqueue/dequeue

↔️

Deque (Double-Ended)

Front + rear access

Elements can be added or removed from both ends in O(1). Python's collections.deque implements this using a doubly linked list.

✔ Sliding window, palindrome check

✖ Slightly more memory than list

from collections import deque
import heapq

# ── Simple Queue ──────────────────────────────────────
q = deque()
q.append('Task A')     # enqueue → O(1)
q.append('Task B')
q.append('Task C')
print(q.popleft())        # dequeue → 'Task A'  O(1)

# ── Priority Queue (Min-Heap) ─────────────────────────
pq = []
heapq.heappush(pq, (3, 'Low priority'))
heapq.heappush(pq, (1, 'Critical task'))
heapq.heappush(pq, (2, 'Normal task'))

while pq:
    priority, task = heapq.heappop(pq)   # always pops minimum
    print(f"  Priority {priority}: {task}")

# ── Deque (sliding window max) ─────────────────────────
def sliding_max(nums, k):
    dq, result = deque(), []
    for i, n in enumerate(nums):
        while dq and nums[dq[-1]] <= n:
            dq.pop()
        dq.append(i)
        if dq[0] == i - k:
            dq.popleft()
        if i >= k - 1:
            result.append(nums[dq[0]])
    return result

print(sliding_max([1,3,-1,-3,5,3,6,7], k=3))

OUTPUT

Task A Priority 1: Critical task Priority 2: Normal task Priority 3: Low priority [3, 3, 5, 5, 6, 7]

Section 07

Hash Tables — The O(1) Lookup Machine

📖 Story

The Hotel Key System

Imagine a hotel with 1,000 rooms numbered 000–999. When a guest checks in with passport number AB123456, the front desk doesn't scan all 1,000 rooms. Instead, they run the passport number through a formula (the hash function) that instantly produces a room number — say, 347. The guest goes directly to room 347.

Two guests could theoretically be assigned the same room — this is called a collision. The hotel handles it by chaining a second key inside the same room (separate chaining) or finding the next empty room (open addressing). A good hash function makes collisions rare. This is exactly how Python's dict works.

🔑 Hash Table — Hashing and Collision Resolution

"date" (highlighted) maps to bucket [4]. Load factor (items÷buckets) should stay below 0.7 for good performance.

# Python dict IS a hash table — O(1) average for all operations
prices = {
    'apple':  1.20,
    'banana': 0.50,
    'cherry': 3.50,
    'date':   0.80
}

# O(1) — lookup, insert, delete
print(prices['cherry'])          # 3.5
prices['elderberry'] = 5.00      # insert
del prices['date']                # delete
print('apple' in prices)          # True → O(1)

# Frequency counter — classic hash table pattern
words = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat']
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1
print(freq)    # {'cat': 3, 'dog': 2, 'bird': 1}

# Two-sum problem — O(n) using hash table
def two_sum(nums, target):
    seen = {}
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

print(two_sum([2, 7, 11, 15], 9))  # [0, 1]

OUTPUT

3.5 True {'cat': 3, 'dog': 2, 'bird': 1} [0, 1]

Section 08

Trees — Hierarchical Power

📖 Story

The Company Org Chart

A company has a CEO at the top. Below the CEO are VPs. Below each VP are Managers. Below each Manager are Employees. You can find any employee's boss in O(log n) steps if the company is structured logically. You can also answer: "who reports to the Marketing VP?" or "what is the longest chain of command?" in efficient scans.

A tree is a hierarchical data structure with a root node and subtrees of children. Every node except the root has exactly one parent. Trees without cycles are everywhere: filesystems, HTML DOM, JSON, compiler ASTs, and databases all use trees internally.

🌳 Binary Search Tree (BST) — Structure & Invariant

BST search: at each node, go left if target < node, right if target > node. Finds 60 in 3 comparisons instead of 7.

Tree Traversal Methods

⬅️

In-Order (LNR)

Left → Node → Right

Visits nodes in sorted order for a BST. Result: 20, 30, 40, 50, 60, 70, 80. Used to produce sorted output from a BST.

⬆️

Pre-Order (NLR)

Node → Left → Right

Visits root before children. Result: 50, 30, 20, 40, 70, 60, 80. Used to copy/serialize a tree, or evaluate prefix expressions.

⬇️

Post-Order (LRN)

Left → Right → Node

Visits root after children. Result: 20, 40, 30, 60, 80, 70, 50. Used in deletion (delete children before parent) and expression evaluation.

class BST:
    class Node:
        def __init__(self, val):
            self.val, self.left, self.right = val, None, None

    def __init__(self):
        self.root = None

    def insert(self, val):      # O(log n) avg, O(n) worst (skewed)
        def _ins(node, v):
            if not node: return BST.Node(v)
            if v < node.val: node.left  = _ins(node.left,  v)
            else:            node.right = _ins(node.right, v)
            return node
        self.root = _ins(self.root, val)

    def inorder(self):          # O(n) — sorted output
        result = []
        def _walk(node):
            if node:
                _walk(node.left)
                result.append(node.val)
                _walk(node.right)
        _walk(self.root); return result

    def search(self, val):      # O(log n) average
        node = self.root
        while node:
            if   val == node.val: return True
            elif val <  node.val: node = node.left
            else:                 node = node.right
        return False

bst = BST()
for v in [50, 30, 70, 20, 40, 60, 80]:
    bst.insert(v)

print("In-order:", bst.inorder())
print("Search 60:", bst.search(60))
print("Search 55:", bst.search(55))

OUTPUT

In-order: [20, 30, 40, 50, 60, 70, 80] Search 60: True Search 55: False

Section 09

Heaps — The Priority Expert

A heap is a complete binary tree that satisfies the heap property: in a min-heap, every parent is smaller than its children. In a max-heap, every parent is larger. The root is always the smallest (or largest) element, making it perfect for priority queues and sorting.

⛰️ Min-Heap — Always O(1) to Find Minimum

A heap is a complete binary tree stored as a flat array. No pointers needed — parent/child relationships are computed by index arithmetic.

import heapq

# Python heapq is a MIN-heap by default
nums = [7, 3, 1, 5, 4, 2, 6]
heapq.heapify(nums)          # O(n) — convert list to heap in-place
print("Heap:", nums)

# Always pops the minimum
print("Pop min:", heapq.heappop(nums))   # 1  → O(log n)
print("Pop min:", heapq.heappop(nums))   # 2  → O(log n)

# Find k largest elements — classic heap use
data = [10, 4, 20, 1, 7, 15, 3]
print("3 largest:", heapq.nlargest(3, data))   # [20, 15, 10]
print("3 smallest:", heapq.nsmallest(3, data)) # [1, 3, 4]

# Max-heap trick: negate the values
max_heap = []
for n in data:
    heapq.heappush(max_heap, -n)
print("Max:", -heapq.heappop(max_heap))    # 20

OUTPUT

Heap: [1, 3, 2, 5, 4, 7, 6] Pop min: 1 Pop min: 2 3 largest: [20, 15, 10] 3 smallest: [1, 3, 4] Max: 20

Section 10

Graphs — The Most Powerful Structure

📖 Story

The Road Map

Cities are dots. Roads between cities are lines. Google Maps is a graph. The internet is a graph (routers = nodes, cables = edges). Social media is a graph (users = nodes, friendships = edges). Graphs are the most powerful and general data structure in computer science — almost every real-world relationship can be modelled as one.

BFS (Breadth-First Search) finds the shortest path in an unweighted graph — like finding the fewest hops between two Facebook friends. Dijkstra's algorithm finds the shortest path in a weighted graph — like the fastest route on Google Maps.

🗺️ Directed Weighted Graph — BFS vs DFS

from collections import deque

# Adjacency list representation (most common)
graph = {
    'A': ['B', 'C'],
    'B': ['D'],
    'C': ['D'],
    'D': ['E', 'F'],
    'E': ['F'],
    'F': []
}

# BFS — shortest path (unweighted)
def bfs(graph, start, goal):
    queue = deque([[start]])
    visited = {start}
    while queue:
        path = queue.popleft()
        node = path[-1]
        if node == goal: return path
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(path + [neighbour])
    return None

# DFS — explores depth first
def dfs(graph, node, visited=None):
    if visited is None: visited = set()
    visited.add(node)
    result = [node]
    for n in graph[node]:
        if n not in visited:
            result += dfs(graph, n, visited)
    return result

print("BFS A→F:", bfs(graph, 'A', 'F'))
print("DFS from A:", dfs(graph, 'A'))

OUTPUT

BFS A→F: ['A', 'B', 'D', 'F'] DFS from A: ['A', 'B', 'D', 'E', 'F', 'C']

Section 11

Big-O Complexity — Choosing Wisely

Every data structure has a complexity profile. This is the single most important table in this tutorial — memorise it, and you will make the right architecture decisions in every interview and production system.

Data Structure	Access	Search	Insert	Delete	Space	Best Use Case
Array	O(1)	O(n)	O(n)	O(n)	O(n)	Index-heavy access, numeric computation
Linked List	O(n)	O(n)	O(1)*	O(1)*	O(n)	Frequent insertion/deletion, queues, undo stacks
Stack	O(n)	O(n)	O(1)	O(1)	O(n)	LIFO ordering, call stacks, expression parsing
Queue	O(n)	O(n)	O(1)	O(1)	O(n)	FIFO scheduling, BFS, message buffers
Hash Table	N/A	O(1)**	O(1)**	O(1)**	O(n)	Fast lookup by key, caching, deduplication
BST (balanced)	O(n)	O(log n)	O(log n)	O(log n)	O(n)	Sorted data, range queries, databases
Heap	O(1)†	O(n)	O(log n)	O(log n)	O(n)	Priority queues, top-k, Dijkstra's
Graph (adj list)	N/A	O(V+E)	O(1)	O(E)	O(V+E)	Relationships, networks, routing, social graphs

* with known node reference · ** average case · † min or max only

🏆

The Decision Framework

Ask three questions: (1) What operation is most frequent? — if it's lookup by key, use a hash table. If it's finding minimum/maximum repeatedly, use a heap. (2) Is the data ordered? — if yes and you need range queries, use a BST. (3) Is memory tight? — arrays are the most memory-efficient per element; linked structures add pointer overhead.

Section 12

Structure Comparison — When to Use What

Scenario	Wrong Choice	Right Choice	Why
Check if username exists in a database of 10M users	Array / List	Hash Set	O(1) lookup vs O(n) scan
Undo the last 20 actions in a text editor	Queue	Stack	LIFO — last action undone first
Find the shortest path between cities	Array	Graph + BFS/Dijkstra	Relationships naturally form a graph
Process customer support tickets in arrival order	Stack	Queue	FIFO — first arrived, first served
Always serve the highest-urgency hospital patient next	Queue	Priority Queue (Heap)	Need "most urgent first", not "oldest first"
Find all employees earning between £40k–£60k	Hash Table	BST / Sorted Array	Range queries need ordered structure
Frequent insertion/removal in the middle of a list	Array	Doubly Linked List	Array shifts are O(n); linked insert is O(1)
Store 1 million integers and access by index constantly	Linked List	Array (numpy)	Cache-friendly contiguous memory, O(1) access

Section 13

Advanced: Tries — The String Search Expert

A Trie (prefix tree) is a tree specialised for strings. Each node represents a single character. Words are paths from root to a marked node. If 1,000 words share the prefix "pre", that prefix is stored only once — not 1,000 times.

🔤 Trie — Storing "car", "card", "care", "cat", "cut"

Prefix "ca" is stored once and shared by "car", "card", "care", "cat". Trie lookup is O(m) where m = length of the word — independent of how many words are stored.

🔍

Where Tries Are Used

Autocomplete in search engines and IDEs, spell checkers, IP routing tables (CIDR prefix matching), phone contact search, and word games like Scrabble all use Trie structures. The reason: no other structure gives O(m) prefix search with shared storage as elegantly.

# Trie implementation in Python
class TrieNode:
    def __init__(self):
        self.children = {}        # char → TrieNode
        self.is_end   = False    # marks complete word

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):      # O(m) where m = len(word)
        node = self.root
        for ch in word:
            node = node.children.setdefault(ch, TrieNode())
        node.is_end = True

    def search(self, word):       # O(m) — exact match
        node = self.root
        for ch in word:
            if ch not in node.children: return False
            node = node.children[ch]
        return node.is_end

    def starts_with(self, prefix):  # O(p) — autocomplete check
        node = self.root
        for ch in prefix:
            if ch not in node.children: return False
            node = node.children[ch]
        return True

trie = Trie()
for w in ['car', 'card', 'care', 'cat', 'cut']:
    trie.insert(w)

print(trie.search('card'))       # True
print(trie.search('car'))        # True
print(trie.search('cars'))       # False (not inserted)
print(trie.starts_with('ca'))   # True  (prefix check)
print(trie.starts_with('cu'))   # True
print(trie.starts_with('cv'))   # False

OUTPUT

True True False True True False

Section 14

Golden Rules

🌿 Data Structures — Non-Negotiable Rules

Never choose a structure before identifying the dominant operation. Ask: "What will I do to this data 90% of the time?" — then match the structure that makes that operation O(1) or O(log n).

Arrays are not always slow. Their cache locality (contiguous memory) makes them dramatically faster than linked structures in practice, even when the big-O is the same. A sorted array + binary search often outperforms a BST for read-heavy workloads.

Python's dict and set are hash tables. Use them relentlessly for membership testing (x in s), deduplication, and frequency counting. Converting a list to a set before searching is one of the most common O(n) → O(1) optimisations in Python.

Use collections.deque instead of list as a queue. list.pop(0) is O(n) because it shifts all remaining elements. deque.popleft() is O(1). This single swap can reduce BFS from O(n²) to O(n).

Heaps give you the minimum or maximum in O(1), but searching arbitrary values is O(n). If you need both fast priority access AND arbitrary lookup, combine a heap with a hash map. This pattern powers many real-time leaderboards and schedulers.

Graphs are the most general structure — almost every problem involving relationships, dependencies, or networks is a graph problem. Learn BFS, DFS, Dijkstra's, and topological sort thoroughly. These four algorithms solve the vast majority of graph interview problems.

An unbalanced BST degrades to a linked list. Inserting sorted data into a plain BST creates a O(n) search worst case. Always use a self-balancing tree (AVL, Red-Black — Python's sortedcontainers.SortedList) for production work that requires ordered access.