Software Architectures
📂 Software Design in the Real World
· 4 of 5
61 min read
WhatsApp's Secret: How Erlang/BEAM Handles 2 Million Connections on One Server
A deep-dive into why WhatsApp chose Erlang and the BEAM virtual machine to serve 450 million users with just 55 engineers — covering BEAM processes, preemptive scheduling, OTP supervision trees, message passing, and hot code loading, with animated diagrams and real architecture walkthroughs.
Section 01
The Story Behind 2 Million Connections
📖 The Origin Story
How WhatsApp Shocked Silicon Valley with 55 Engineers
It was January 2014. Facebook acquired a tiny company called WhatsApp for $19 billion — the largest tech acquisition in history at the time. The world asked: why so much for a messaging app?
The answer wasn't the product. It was the engineering.
WhatsApp had 450 million users sending over 50 billion messages per day, with peak loads exceeding 2 million simultaneous TCP connections on a single server. The entire engineering team? 55 people. Most companies needed thousands of engineers and hundreds of servers to serve a fraction of that load.
The secret weapon was a 30-year-old programming language built for telephone networks — Erlang, running on the BEAM virtual machine. While the rest of the world was building chat apps on Ruby on Rails or Node.js, WhatsApp had gone back to the 1980s — and won.
This tutorial will take you deep into why Erlang/BEAM can handle millions of concurrent connections where other stacks collapse, explaining every architectural concept with clear analogies, animated diagrams, and real-world context.
📡
The Numbers That Matter
A single commodity Linux server running Erlang/BEAM handled 2 million simultaneous WebSocket/TCP connections during WhatsApp's peak. For context, a typical Java application server (Tomcat with thread-per-connection) would run out of memory trying to maintain just 10,000 concurrent connections on the same hardware.
Section 02
The C10K Problem — Why Concurrency Is Hard
Before understanding why Erlang wins, you need to understand the problem every messaging platform faces. It has a name: the C10K problem (10,000 concurrent connections).
📖 Analogy
The Restaurant That Couldn't Seat Everyone
Imagine a restaurant where every customer who walks in is assigned a dedicated waiter. That waiter does nothing else — they just stand next to the customer's table, waiting for the customer to speak, ready to serve instantly.
When 50 customers arrive, you have 50 waiters. When 1,000 customers arrive, you need 1,000 waiters — most of them standing idle waiting for customers to place their orders. The waiters eat food, take up space, cost money, and mostly just wait.
This is exactly how traditional thread-per-connection servers work. Each TCP connection gets its own OS thread. Most threads just sit there, blocked, waiting for a message that hasn't arrived yet. At 2 million connections, you'd need 2 million OS threads. That's not a server — that's a catastrophe.
🧵
OS Threads
Traditional Model
Each connection = 1 OS thread. Threads consume ~1–8 MB of stack memory each. At 10,000 connections = 10 GB just for stacks. At 2 million connections = completely impossible.
⚡
Event Loop
Node.js / Nginx Model
Single thread handles all connections via epoll/kqueue. Excellent for I/O but one slow operation blocks everyone. CPU-intensive work destroys throughput. No true parallelism.
🌳
BEAM Processes
Erlang / Elixir Model
Each connection = 1 lightweight BEAM process (~300 bytes initial heap). At 2 million connections = ~600 MB just for process heaps. Each process is isolated, preemptively scheduled, and independently garbage collected.
📊 Memory Cost Per Connection Model — Animated Comparison
BEAM process overhead is so small it barely registers. At 10,000 connections, OS threads demand 16 GB while BEAM processes need just ~3 MB.
Section 03
What is Erlang and the BEAM VM?
Erlang is a functional, concurrent, fault-tolerant programming language created at Ericsson in 1986 by Joe Armstrong, Robert Virding, and Mike Williams. Its sole original purpose was to power telephone switching software — systems that had to handle thousands of calls simultaneously, never go down, and be updated without rebooting.
📖 History
Built for Nine Nines — 99.9999999% Uptime
Ericsson's AXD301 ATM switch, written in Erlang, achieved nine nines of availability — that's 99.9999999% uptime, or roughly 31 milliseconds of downtime per year. The entire internet aspires to five nines (5 minutes downtime per year). Ericsson had already solved reliability at a level 10,000× better, decades ago.
The language was open-sourced in 1998. For most of the 2000s, it stayed in the shadows — until massively concurrent internet systems started hitting the same problems telephone networks had solved in the 1980s.
🧠 The BEAM Virtual Machine — Key Components
BEAM
Bogdan/Björn's Erlang Abstract Machine — the bytecode VM that runs compiled Erlang code. Think JVM for Java, but built specifically for concurrency and soft real-time guarantees.
Scheduler
One OS thread per CPU core (default). Each scheduler has its own run queue. BEAM schedules its lightweight processes across all schedulers, fully utilising multi-core hardware automatically.
Processes
Ultra-lightweight concurrent entities — not OS threads, not system processes. They live entirely inside the BEAM VM. Each has its own heap, stack, and message mailbox. No shared memory.
GC
Per-process garbage collection — each BEAM process collects its own garbage independently. There are no global GC pauses that stop the world. A long GC in one process doesn't delay others.
OTP
Open Telecom Platform — a set of libraries and design patterns (Supervisors, GenServers, Applications) that make building fault-tolerant distributed systems dramatically easier.
Section 04
BEAM Processes — The Core Superpower
The BEAM process is the fundamental unit of concurrency in Erlang. Understanding it deeply is the key to understanding why WhatsApp could do what it did.
📦
Initial Heap Size
233 words (~1.8 KB)
A brand-new BEAM process starts with this tiny heap. It grows on demand. Compare to ~1 MB minimum for an OS thread stack.
⚡
Spawn Time
< 1 microsecond
Creating a BEAM process is nearly instant. Spawning an OS thread takes microseconds to milliseconds. You can spawn millions per second.
🔢
Max Processes
134 million (default)
The default BEAM VM limit is 134,217,728 processes per node. This is configurable. WhatsApp ran comfortably at 2 million simultaneous processes.
🔬 Anatomy of a Single BEAM Process — Interactive Diagram
Each BEAM process is a fully isolated universe. It has its own heap, stack, mailbox, and GC cycle. The scheduler treats every process fairly — no single process can hog the CPU.
🏆
Why Isolation Is the Key Insight
Because each BEAM process has its own private memory, there is no shared state to lock. No mutexes. No semaphores. No race conditions. Two processes never corrupt each other's memory. When one process crashes, it dies alone — it cannot corrupt the heap of any other process. This is the foundation of Erlang's legendary fault tolerance.
Section 05
Preemptive Scheduling — The Fairness Engine
One of BEAM's most critical and misunderstood features is its preemptive scheduler. This is what prevents a single slow process from blocking millions of others.
📖 Analogy
The Supermarket with a Time Limit Per Customer
Imagine a supermarket checkout with 10 tills. Normally customers can stand at the till as long as they like — until they're done. One customer with 500 items holds up everyone behind them.
BEAM's scheduler is like a supermarket where each customer gets exactly 2,000 "actions" at the till. When they use those up, they must step to the back of the queue, even if they're not done — and the next customer steps forward. No customer can monopolise a till.
In BEAM terms, each process gets a reduction quota of 2,000 reductions. Every function call, pattern match, and message send consumes reductions. When the quota is spent, the scheduler forcibly suspends that process and runs the next one. This is preemptive, not cooperative.
The BEAM scheduler rotates through all runnable processes in round-robin fashion. Every process gets a fair time slice, measured in reductions — not wall-clock time. This gives predictable, bounded latency even under heavy load.
Section 06
Message Passing — No Shared Memory, No Locks
In traditional concurrent programming, threads share memory. Two threads can read and write the same variable. This leads to race conditions, deadlocks, and hours of debugging. Erlang eliminates this entirely.
📖 Analogy
The Office Without a Whiteboard
Imagine two employees sharing one whiteboard. When both try to write at the same time, they corrupt each other's work. To prevent this, they need a lock — one must wait while the other writes. Under high traffic, this waiting becomes a bottleneck. If one employee holds the lock and crashes, the other waits forever — a deadlock.
Erlang's model: there is no shared whiteboard. Instead, every employee (process) has their own private notepad. If Employee A wants Employee B to know something, A writes a copy of the information on a note and drops it in B's mailbox. A's notepad and B's notepad never share memory. No lock needed. No race condition possible. Crash-safe.
💬 Message Passing — Animated Process Communication
When Process A sends to Process B, the message data is deep-copied into B's mailbox. A and B never share a memory address. A process crash cannot corrupt another process's data.
Section 07
OTP Supervisors — The "Let It Crash" Philosophy
Erlang's most radical idea isn't concurrency — it's how to handle failure. Most languages try to defensively prevent crashes everywhere. Erlang says: let it crash, then restart it automatically.
📖 Analogy
The Hospital with Self-Healing Cells
Imagine your body. Cells die all the time — by the billions every day. But you don't notice because a replacement system immediately regenerates them. You don't need to prevent every cell death; you just need robust replacement machinery.
OTP Supervisors are that machinery. Each supervised process is a cell. If it crashes (unexpected error, bad data, network timeout), the Supervisor sees the crash notification, looks up the restart strategy, and spawns a fresh replacement process within milliseconds. The rest of the system didn't know the crash happened. The user sees maybe a momentarily dropped connection — then reconnects automatically.
🌳 OTP Supervision Tree — WhatsApp-Style Architecture
Watch User #2's worker: it crashes (turns red), the Supervisor detects the crash signal, and restarts a fresh worker process in milliseconds. Users #1 and #3 never notice. The rest of the system is unaffected.
💡
"Let It Crash" vs Defensive Programming
Traditional languages wrap every operation in try/catch and spend enormous effort preventing crashes. Erlang's philosophy: don't try to handle every possible error in every function. Let the process crash on unexpected input. The Supervisor will restart it in a known-good state. This results in far simpler code — no defensive null checks, no try/catch pyramids, no partially-corrupted state to clean up.
Section 08
WhatsApp's Actual Architecture on Erlang/BEAM
Let's trace a real WhatsApp message from Alice's phone to Bob's phone, and understand exactly which BEAM components handle each step.
01
📱 TCP Connection Established
Alice's phone opens a persistent TCP connection to a WhatsApp server. A new BEAM process (~300 bytes) is spawned specifically for this connection. This process lives as long as Alice is online — potentially hours or days — consuming almost no CPU while idle, just sitting in the scheduler's wait queue listening for data.
02
📨 Message Arrives from Alice
Alice types and sends a message. The TCP data arrives at Alice's connection process. The process wakes from its idle wait, parses the binary protocol, and sends an internal BEAM message to the router process: send({to: bob_id, content: "Hey"}). The connection process immediately returns to listening state.
03
🗺️ Router Process Looks Up Bob
A message routing process checks an in-memory ETS (Erlang Term Storage) table — a highly optimised, concurrent hash table built into BEAM. It looks up bob_id and finds Bob's connection process PID, or discovers Bob is offline and routes to a delivery queue instead. ETS lookups are O(1) and lock-free for reads.
04
📬 Bob's Connection Process Receives It
The router sends the message to Bob's connection process via BEAM message passing. Bob's process wakes up, serialises the message into WhatsApp's binary protocol, and writes it to Bob's TCP socket. Bob's phone receives the message. The entire path from Alice's send to Bob's receive can happen in under 50ms, all within BEAM on a single server.
05
✅ Delivery Acknowledgement
Bob's phone sends an ACK back. Bob's connection process receives it, sends it back through the router, Alice's connection process receives it, and writes the "double tick" signal to Alice's socket. Alice sees the double tick. All of this is BEAM process communication — fast, reliable, with zero locks.
🌐 2 Million Connections on One Server — Visualised
Each blinking dot is an active phone connection, maintained by exactly one BEAM process. 2 million dots — one commodity server. Most processes are sleeping (waiting for messages), consuming only a few bytes of memory each.
Section 09
WhatsApp's Key Erlang/BEAM Optimisations
Getting to 2 million connections wasn't default Erlang — WhatsApp pushed the BEAM to its limits with several key optimisations documented by engineer Rick Reed at EUC 2012.
⚙️ WhatsApp's BEAM Tuning Secrets
1
Custom Linux Kernel Tuning — Modified net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and file descriptor limits to allow millions of simultaneous TCP connections. Default Linux limits would have capped them at ~65,536 connections per port.
2
Reduced Process Heap Sizes — Tweaked initial BEAM process heap from the default 233 words down, and aggressively tuned garbage collection thresholds. When you have 2 million processes, even a few hundred bytes saved per process adds up to gigabytes.
3
Binary Data Sharing — Erlang binaries larger than 64 bytes are stored in a shared heap with reference counting (not copied on send). A 50KB profile photo sent to 100 processes exists once in memory, with 100 reference pointers. This avoids catastrophic memory multiplication.
4
ETS for Hot Data — Erlang Term Storage (ETS) tables provided O(1) concurrent reads with no locking for the session registry — mapping user IDs to BEAM process PIDs. At 2 million users, this lookup had to be nanosecond-fast and lock-free.
5
BEAM SMP Schedulers = CPU Cores — Configured one BEAM scheduler thread per physical CPU core (not logical hyperthreaded core). This avoided false sharing and cache bouncing between schedulers, giving near-linear throughput scaling as cores increased.
6
Network I/O via async_threads — BEAM's async thread pool handled blocking I/O operations (file system, legacy APIs) off the scheduler threads, preventing any I/O wait from blocking the main BEAM schedulers.
Section 10
Erlang/BEAM vs Other Concurrency Models
Feature
OS Threads (Java/C++)
Async/Event Loop (Node.js)
Goroutines (Go)
BEAM Processes (Erlang)
Min memory per unit
~1–8 MB stack
Single thread (shared)
~2–8 KB initial stack
~300 bytes initial heap
10k concurrent units
~10–80 GB RAM
Low RAM but serial
~20–80 MB RAM
~3 MB RAM
Scheduling
OS preemptive
Cooperative (single thread)
Go runtime preemptive
BEAM preemptive (2k reductions)
True parallelism
Yes (OS threads)
No (GIL-free but single-threaded)
Yes (GOMAXPROCS)
Yes (1 scheduler per core)
Garbage collection
Global stop-the-world (JVM)
V8 GC pauses
Tri-color, short but global
Per-process, zero global pause
Fault isolation
One thread crash = process crash
One uncaught exception = process crash
Panics recovered per goroutine
Process crash → Supervisor restarts it
Race conditions
Yes — shared memory + locks
No (single thread)
Possible — shared memory with channels
Impossible — no shared memory
Hot code upgrade
Requires restart
Requires restart
Requires restart
Live upgrade — zero downtime
WhatsApp @ 2M connections
Impossible
Not truly parallel
Theoretically possible
Achieved in production ✓
🎯
When to Choose Erlang/BEAM (Elixir)
BEAM shines at massively concurrent, stateful, long-lived connections — chat servers, IoT device networks, multiplayer game backends, telecom switches, real-time dashboards. It is not optimised for raw CPU-heavy computation (image processing, ML training, compression). For those, Go or C++ wins. For connections-per-server, nothing matches BEAM.
Section 11
Hot Code Loading — Deploy Without Downtime
One of Erlang's most astonishing features is hot code loading — the ability to update running code on a live production system without stopping it or dropping a single connection.
📖 Analogy
Replacing an Aeroplane Engine Mid-Flight
In most systems, deploying new code is like landing the plane, swapping the engines on the runway, and taking off again. Users see downtime. Sessions drop. Connections reset.
BEAM's hot code loading is like replacing an engine while the plane is still in the air at 35,000 feet. The BEAM VM can hold two versions of a module simultaneously. New processes use the new version. Old processes finish their current work and transition to the new version naturally. No connections are dropped. No service interruption. No restart.
Ericsson ATM switches stayed online through software upgrades for decades using exactly this mechanism. WhatsApp deployed updates to tens of millions of connected users without anyone's message being dropped.
🔄 Hot Code Loading — How It Works
Step 1
New module version is compiled to .beam bytecode and loaded into the VM alongside the current version.
Step 2
BEAM holds two versions: current (old) and old. Any third load ejects the old version.
Step 3
Processes still running in the old module continue running undisturbed — they hold a reference to that code.
Step 4
New processes spawned after the load use the new module version automatically.
Step 5
Existing processes transition to the new version on their next fully qualified function call (e.g. module:function() syntax).
Section 12
Real-World Use Cases — Who Uses Erlang/BEAM Today
💬
WhatsApp
Messaging
The defining case study. 2M connections/server, 450M users, 55 engineers. The entire backend routing and delivery layer was Erlang. Acquired for $19B in 2014.
🎮
Discord
Real-Time Chat
Used Elixir (Erlang on BEAM) to power their voice/text backend. Handled millions of concurrent WebSocket connections per server with dramatically less infrastructure than competitors.
📡
Ericsson Telecom
Telecom Infrastructure
The original use case. Erlang powers AXD301, HLR systems, and countless switching platforms. 40% of all world telecom traffic routes through Erlang-based systems.
🗄️
CouchDB / Riak
Distributed Databases
Both written in Erlang for their distributed, fault-tolerant properties. Riak powers backend storage at Riot Games, GitHub, and many Fortune 500 companies requiring consistent uptime.
🐰
RabbitMQ
Message Broker
The world's most popular open-source message broker is written in Erlang. It handles millions of messages per second across distributed clusters with built-in fault tolerance and zero-downtime upgrades.
🌊
Elixir / Phoenix
Modern BEAM Ecosystem
Elixir is a modern language that runs on BEAM. Phoenix LiveView demonstrated 2 million WebSocket connections on a single server in 2019, proving BEAM's concurrency record holds in modern stacks.
Section 13
The Seven Pillars of BEAM's Power
⚡
CONCURRENCY
Millions of Processes
Lightweight BEAM processes (300 bytes) make millions of concurrent connections feasible on commodity hardware.
🛡️
ISOLATION
No Shared Memory
Process isolation eliminates race conditions, deadlocks, and cascading failures. Each process is an independent universe.
🔄
RESILIENCE
Let It Crash
OTP Supervisors restart crashed processes automatically. The system heals itself without human intervention or downtime.
⚖️
FAIRNESS
Preemptive Scheduling
2,000-reduction quantum guarantees no process can starve others. Millisecond latency maintained even at full load.
♻️
GC DESIGN
Per-Process GC
Garbage collection happens independently per process. No stop-the-world pauses. GC in one process never stalls others.
🚀
UPGRADES
Hot Code Loading
Deploy new code to a running system without dropping connections. The telecom industry's 30-year proven feature.
Section 14
When Erlang/BEAM Is NOT the Right Choice
⚠️
Honest Limitations — Know Before You Build
Erlang is not a universal solution. Understanding its limitations is as important as understanding its strengths.
Limitation
Why It Matters
Mitigation
CPU-heavy computation
BEAM processes are preempted every 2,000 reductions. Heavy number crunching is slow.
Use NIFs (native code) for compute-heavy work, or offload to C/Rust
Large binary data
Message copying for binaries > 64 bytes goes to shared binary heap. Misuse causes fragmentation.
Use sub-binaries, match contexts, and explicit binary ref-counting awareness
Hiring difficulty
Far fewer Erlang/Elixir engineers than Java/Python/JavaScript. Higher salary premium.
Elixir has a friendlier syntax; growing community and talent pool
Ecosystem maturity
Fewer libraries than Node.js/Python for non-telecom domains (ML, image processing, etc.)
Use Elixir's Hex package ecosystem + Erlang NIFs to call C/Rust libraries
Learning curve
Functional, immutable, pattern-matching paradigm is unfamiliar to OOP developers.
Elixir provides a gentler on-ramp with familiar Ruby-like syntax on top of BEAM
Debugging distributed state
With millions of processes, tracing a bug through message flows is non-trivial.
OTP's sys module, recon library, and observer tool provide powerful introspection
Section 15
Golden Rules — Building on Erlang/BEAM
🌱 BEAM Architecture — Non-Negotiable Principles
1
One process per connection, always. This is the BEAM idiom. Don't fight it with thread pools or async callbacks. Spawn a process for each TCP connection, each user session, each active entity. The VM is built for this.
2
Never share state through memory. If two processes need shared data, use a dedicated GenServer process as the single source of truth, or an ETS table for high-read data. Shared memory kills the fault-isolation guarantee.
3
Supervise everything. Every process that matters should be under a Supervisor. Unsupervised processes that crash silently are the most dangerous bugs in BEAM systems. Build your supervision tree first, write process logic second.
4
Use receive with timeouts. A process waiting in receive forever is a resource leak waiting to happen. Always use after Timeout clauses. In production systems, zombie processes accumulate silently without them.
5
Monitor ETS table memory. ETS tables grow unboundedly unless explicitly managed. Implement :ets.delete_object/2 or TTL mechanisms. Unchecked ETS growth is the #1 cause of OOM crashes in production Erlang systems.
6
Benchmark with realistic concurrency. BEAM performance characteristics change dramatically between 1,000 and 1,000,000 processes. Always load test at realistic connection counts before production launch.
7
Consider Elixir for new projects. Elixir runs on BEAM with identical performance characteristics but provides a modern, approachable syntax, a thriving ecosystem (Hex), and excellent tooling (mix, ExUnit, Phoenix LiveView).
🏁
The WhatsApp Legacy
WhatsApp proved something the software industry needed to hear: the right tool for the job beats brute-force scaling every time. While competitors were buying hundreds of servers, WhatsApp was getting more out of one server than anyone thought possible — by trusting a 30-year-old language built for a different industry but engineered for the exact problem they faced. Today, Erlang and Elixir power the backbone of real-time communication at companies worldwide, and the BEAM VM's concurrency model remains the gold standard for systems that must stay alive, stay fast, and stay connected — at any scale.