Software Architectures 📂 Software Design in the Real World · 4 of 5 61 min read

WhatsApp's Secret: How Erlang/BEAM Handles 2 Million Connections on One Server

A deep-dive into why WhatsApp chose Erlang and the BEAM virtual machine to serve 450 million users with just 55 engineers — covering BEAM processes, preemptive scheduling, OTP supervision trees, message passing, and hot code loading, with animated diagrams and real architecture walkthroughs.

Section 01

The Story Behind 2 Million Connections

How WhatsApp Shocked Silicon Valley with 55 Engineers
It was January 2014. Facebook acquired a tiny company called WhatsApp for $19 billion — the largest tech acquisition in history at the time. The world asked: why so much for a messaging app?

The answer wasn't the product. It was the engineering.

WhatsApp had 450 million users sending over 50 billion messages per day, with peak loads exceeding 2 million simultaneous TCP connections on a single server. The entire engineering team? 55 people. Most companies needed thousands of engineers and hundreds of servers to serve a fraction of that load.

The secret weapon was a 30-year-old programming language built for telephone networks — Erlang, running on the BEAM virtual machine. While the rest of the world was building chat apps on Ruby on Rails or Node.js, WhatsApp had gone back to the 1980s — and won.

This tutorial will take you deep into why Erlang/BEAM can handle millions of concurrent connections where other stacks collapse, explaining every architectural concept with clear analogies, animated diagrams, and real-world context.

📡
The Numbers That Matter

A single commodity Linux server running Erlang/BEAM handled 2 million simultaneous WebSocket/TCP connections during WhatsApp's peak. For context, a typical Java application server (Tomcat with thread-per-connection) would run out of memory trying to maintain just 10,000 concurrent connections on the same hardware.


Section 02

The C10K Problem — Why Concurrency Is Hard

Before understanding why Erlang wins, you need to understand the problem every messaging platform faces. It has a name: the C10K problem (10,000 concurrent connections).

The Restaurant That Couldn't Seat Everyone
Imagine a restaurant where every customer who walks in is assigned a dedicated waiter. That waiter does nothing else — they just stand next to the customer's table, waiting for the customer to speak, ready to serve instantly.

When 50 customers arrive, you have 50 waiters. When 1,000 customers arrive, you need 1,000 waiters — most of them standing idle waiting for customers to place their orders. The waiters eat food, take up space, cost money, and mostly just wait.

This is exactly how traditional thread-per-connection servers work. Each TCP connection gets its own OS thread. Most threads just sit there, blocked, waiting for a message that hasn't arrived yet. At 2 million connections, you'd need 2 million OS threads. That's not a server — that's a catastrophe.
🧵
OS Threads
Traditional Model
Each connection = 1 OS thread. Threads consume ~1–8 MB of stack memory each. At 10,000 connections = 10 GB just for stacks. At 2 million connections = completely impossible.
Event Loop
Node.js / Nginx Model
Single thread handles all connections via epoll/kqueue. Excellent for I/O but one slow operation blocks everyone. CPU-intensive work destroys throughput. No true parallelism.
🌳
BEAM Processes
Erlang / Elixir Model
Each connection = 1 lightweight BEAM process (~300 bytes initial heap). At 2 million connections = ~600 MB just for process heaps. Each process is isolated, preemptively scheduled, and independently garbage collected.
📊 Memory Cost Per Connection Model — Animated Comparison
0 500 MB 2 GB 8 GB 16 GB RAM for 10k connections OS Threads Java / .NET ~16 GB RAM Event Loop Node.js ~800 MB BEAM Procs Erlang / Elixir ~3 MB ✓ 5,000× cheaper! Memory Used for 10,000 Simultaneous Connections

BEAM process overhead is so small it barely registers. At 10,000 connections, OS threads demand 16 GB while BEAM processes need just ~3 MB.


Section 03

What is Erlang and the BEAM VM?

Erlang is a functional, concurrent, fault-tolerant programming language created at Ericsson in 1986 by Joe Armstrong, Robert Virding, and Mike Williams. Its sole original purpose was to power telephone switching software — systems that had to handle thousands of calls simultaneously, never go down, and be updated without rebooting.

Built for Nine Nines — 99.9999999% Uptime
Ericsson's AXD301 ATM switch, written in Erlang, achieved nine nines of availability — that's 99.9999999% uptime, or roughly 31 milliseconds of downtime per year. The entire internet aspires to five nines (5 minutes downtime per year). Ericsson had already solved reliability at a level 10,000× better, decades ago.

The language was open-sourced in 1998. For most of the 2000s, it stayed in the shadows — until massively concurrent internet systems started hitting the same problems telephone networks had solved in the 1980s.
🧠 The BEAM Virtual Machine — Key Components
BEAM
Bogdan/Björn's Erlang Abstract Machine — the bytecode VM that runs compiled Erlang code. Think JVM for Java, but built specifically for concurrency and soft real-time guarantees.
Scheduler
One OS thread per CPU core (default). Each scheduler has its own run queue. BEAM schedules its lightweight processes across all schedulers, fully utilising multi-core hardware automatically.
Processes
Ultra-lightweight concurrent entities — not OS threads, not system processes. They live entirely inside the BEAM VM. Each has its own heap, stack, and message mailbox. No shared memory.
GC
Per-process garbage collection — each BEAM process collects its own garbage independently. There are no global GC pauses that stop the world. A long GC in one process doesn't delay others.
OTP
Open Telecom Platform — a set of libraries and design patterns (Supervisors, GenServers, Applications) that make building fault-tolerant distributed systems dramatically easier.

Section 04

BEAM Processes — The Core Superpower

The BEAM process is the fundamental unit of concurrency in Erlang. Understanding it deeply is the key to understanding why WhatsApp could do what it did.

📦
Initial Heap Size
233 words (~1.8 KB)
A brand-new BEAM process starts with this tiny heap. It grows on demand. Compare to ~1 MB minimum for an OS thread stack.
Spawn Time
< 1 microsecond
Creating a BEAM process is nearly instant. Spawning an OS thread takes microseconds to milliseconds. You can spawn millions per second.
🔢
Max Processes
134 million (default)
The default BEAM VM limit is 134,217,728 processes per node. This is configurable. WhatsApp ran comfortably at 2 million simultaneous processes.
🔬 Anatomy of a Single BEAM Process — Interactive Diagram
BEAM Process PID: <0.1234.0> Process Identifier CALL STACK Function frames Return addresses Local variables PRIVATE HEAP Terms & data Binaries (small) Grows on demand MESSAGE MAILBOX (Queue) Messages received — processed in order, one at a time REDUCTION COUNTER Scheduler yields at 2000 reductions SCHEDULER 1 per CPU core Run queue Preemptive Work-stealing Fair scheduling PER-PROCESS GC Generational GC Only this process No stop-the-world Short pauses (<1ms) Other procs unaffected 💬 Communication: only via message passing — NO shared memory, NO locks, NO race conditions

Each BEAM process is a fully isolated universe. It has its own heap, stack, mailbox, and GC cycle. The scheduler treats every process fairly — no single process can hog the CPU.

🏆
Why Isolation Is the Key Insight

Because each BEAM process has its own private memory, there is no shared state to lock. No mutexes. No semaphores. No race conditions. Two processes never corrupt each other's memory. When one process crashes, it dies alone — it cannot corrupt the heap of any other process. This is the foundation of Erlang's legendary fault tolerance.


Section 05

Preemptive Scheduling — The Fairness Engine

One of BEAM's most critical and misunderstood features is its preemptive scheduler. This is what prevents a single slow process from blocking millions of others.

The Supermarket with a Time Limit Per Customer
Imagine a supermarket checkout with 10 tills. Normally customers can stand at the till as long as they like — until they're done. One customer with 500 items holds up everyone behind them.

BEAM's scheduler is like a supermarket where each customer gets exactly 2,000 "actions" at the till. When they use those up, they must step to the back of the queue, even if they're not done — and the next customer steps forward. No customer can monopolise a till.

In BEAM terms, each process gets a reduction quota of 2,000 reductions. Every function call, pattern match, and message send consumes reductions. When the quota is spent, the scheduler forcibly suspends that process and runs the next one. This is preemptive, not cooperative.
⛔ Cooperative Scheduling (Node.js)
What HappensImpact
Process A runs heavy loopAll others blocked
Process A must yield voluntarilyNo guaranteed fairness
Slow DB call blocks event loop100ms jank for everyone
One bad callback = service degradedCascading slowdowns
✅ Preemptive Scheduling (BEAM)
What HappensImpact
Process A runs heavy loopPaused at 2,000 reductions
Scheduler forces context switchGuaranteed fairness
Slow process can't block othersMillisecond response time
2M processes share schedulers evenlySoft real-time guarantees
⚙️ Reduction-Based Preemption — Animated Scheduler Rotation
P-1 P-2 P-3 P-4 P-5 P-6 BEAM Scheduler Round-robin fair scheduling 2,000 reductions per turn Reduction Counter — Process P-1 / 2000 Each function call uses 1 reduction HOW PREEMPTION WORKS Process runs — uses reductions Counter hits 2,000 Scheduler forcibly suspends it Next process in queue runs P-1 rejoins queue, counter reset

The BEAM scheduler rotates through all runnable processes in round-robin fashion. Every process gets a fair time slice, measured in reductions — not wall-clock time. This gives predictable, bounded latency even under heavy load.


Section 06

Message Passing — No Shared Memory, No Locks

In traditional concurrent programming, threads share memory. Two threads can read and write the same variable. This leads to race conditions, deadlocks, and hours of debugging. Erlang eliminates this entirely.

The Office Without a Whiteboard
Imagine two employees sharing one whiteboard. When both try to write at the same time, they corrupt each other's work. To prevent this, they need a lock — one must wait while the other writes. Under high traffic, this waiting becomes a bottleneck. If one employee holds the lock and crashes, the other waits forever — a deadlock.

Erlang's model: there is no shared whiteboard. Instead, every employee (process) has their own private notepad. If Employee A wants Employee B to know something, A writes a copy of the information on a note and drops it in B's mailbox. A's notepad and B's notepad never share memory. No lock needed. No race condition possible. Crash-safe.
💬 Message Passing — Animated Process Communication
Process A PID: <0.100.0> Private Heap {user, "Alice", msg} send(PID_B, Message) 📨 copy of msg 📨 copy of msg 📨 copy of msg Message is COPIED — not shared Process B PID: <0.200.0> MAILBOX [msg1, msg2, ...] receive > pattern match ✓ No shared memory  |  ✓ No locks  |  ✓ No deadlocks  |  ✓ No race conditions  |  ✓ Crash-safe isolation

When Process A sends to Process B, the message data is deep-copied into B's mailbox. A and B never share a memory address. A process crash cannot corrupt another process's data.


Section 07

OTP Supervisors — The "Let It Crash" Philosophy

Erlang's most radical idea isn't concurrency — it's how to handle failure. Most languages try to defensively prevent crashes everywhere. Erlang says: let it crash, then restart it automatically.

The Hospital with Self-Healing Cells
Imagine your body. Cells die all the time — by the billions every day. But you don't notice because a replacement system immediately regenerates them. You don't need to prevent every cell death; you just need robust replacement machinery.

OTP Supervisors are that machinery. Each supervised process is a cell. If it crashes (unexpected error, bad data, network timeout), the Supervisor sees the crash notification, looks up the restart strategy, and spawns a fresh replacement process within milliseconds. The rest of the system didn't know the crash happened. The user sees maybe a momentarily dropped connection — then reconnects automatically.
🌳 OTP Supervision Tree — WhatsApp-Style Architecture
ROOT SUPERVISOR Strategy: one_for_one CONNECTION SUPERVISOR Manages user connections MESSAGE SUPERVISOR Routes & delivers messages PRESENCE SUPERVISOR Tracks online/offline state conn_worker User #1 conn_worker User #2 💥 CRASH → RESTART ↺ conn_worker User #3 msg_router Queue A msg_router Queue B msg_router Queue C Supervisor Restart Strategies one_for_one — Only the crashed child is restarted. Others continue unaffected. one_for_all — If one child crashes, all children restart. For tightly coupled workers. rest_for_one — Crashed child + all children started after it restart.

Watch User #2's worker: it crashes (turns red), the Supervisor detects the crash signal, and restarts a fresh worker process in milliseconds. Users #1 and #3 never notice. The rest of the system is unaffected.

💡
"Let It Crash" vs Defensive Programming

Traditional languages wrap every operation in try/catch and spend enormous effort preventing crashes. Erlang's philosophy: don't try to handle every possible error in every function. Let the process crash on unexpected input. The Supervisor will restart it in a known-good state. This results in far simpler code — no defensive null checks, no try/catch pyramids, no partially-corrupted state to clean up.


Section 08

WhatsApp's Actual Architecture on Erlang/BEAM

Let's trace a real WhatsApp message from Alice's phone to Bob's phone, and understand exactly which BEAM components handle each step.

01
📱 TCP Connection Established
Alice's phone opens a persistent TCP connection to a WhatsApp server. A new BEAM process (~300 bytes) is spawned specifically for this connection. This process lives as long as Alice is online — potentially hours or days — consuming almost no CPU while idle, just sitting in the scheduler's wait queue listening for data.
02
📨 Message Arrives from Alice
Alice types and sends a message. The TCP data arrives at Alice's connection process. The process wakes from its idle wait, parses the binary protocol, and sends an internal BEAM message to the router process: send({to: bob_id, content: "Hey"}). The connection process immediately returns to listening state.
03
🗺️ Router Process Looks Up Bob
A message routing process checks an in-memory ETS (Erlang Term Storage) table — a highly optimised, concurrent hash table built into BEAM. It looks up bob_id and finds Bob's connection process PID, or discovers Bob is offline and routes to a delivery queue instead. ETS lookups are O(1) and lock-free for reads.
04
📬 Bob's Connection Process Receives It
The router sends the message to Bob's connection process via BEAM message passing. Bob's process wakes up, serialises the message into WhatsApp's binary protocol, and writes it to Bob's TCP socket. Bob's phone receives the message. The entire path from Alice's send to Bob's receive can happen in under 50ms, all within BEAM on a single server.
05
✅ Delivery Acknowledgement
Bob's phone sends an ACK back. Bob's connection process receives it, sends it back through the router, Alice's connection process receives it, and writes the "double tick" signal to Alice's socket. Alice sees the double tick. All of this is BEAM process communication — fast, reliable, with zero locks.
🌐 2 Million Connections on One Server — Visualised
BEAM SERVER Erlang OTP Node 2,000,000 processes 16 cores × scheduler ~1M users online ~1M users online = 2 Million concurrent BEAM processes

Each blinking dot is an active phone connection, maintained by exactly one BEAM process. 2 million dots — one commodity server. Most processes are sleeping (waiting for messages), consuming only a few bytes of memory each.


Section 09

WhatsApp's Key Erlang/BEAM Optimisations

Getting to 2 million connections wasn't default Erlang — WhatsApp pushed the BEAM to its limits with several key optimisations documented by engineer Rick Reed at EUC 2012.

⚙️ WhatsApp's BEAM Tuning Secrets
1
Custom Linux Kernel Tuning — Modified net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and file descriptor limits to allow millions of simultaneous TCP connections. Default Linux limits would have capped them at ~65,536 connections per port.
2
Reduced Process Heap Sizes — Tweaked initial BEAM process heap from the default 233 words down, and aggressively tuned garbage collection thresholds. When you have 2 million processes, even a few hundred bytes saved per process adds up to gigabytes.
3
Binary Data Sharing — Erlang binaries larger than 64 bytes are stored in a shared heap with reference counting (not copied on send). A 50KB profile photo sent to 100 processes exists once in memory, with 100 reference pointers. This avoids catastrophic memory multiplication.
4
ETS for Hot Data — Erlang Term Storage (ETS) tables provided O(1) concurrent reads with no locking for the session registry — mapping user IDs to BEAM process PIDs. At 2 million users, this lookup had to be nanosecond-fast and lock-free.
5
BEAM SMP Schedulers = CPU Cores — Configured one BEAM scheduler thread per physical CPU core (not logical hyperthreaded core). This avoided false sharing and cache bouncing between schedulers, giving near-linear throughput scaling as cores increased.
6
Network I/O via async_threads — BEAM's async thread pool handled blocking I/O operations (file system, legacy APIs) off the scheduler threads, preventing any I/O wait from blocking the main BEAM schedulers.

Section 10

Erlang/BEAM vs Other Concurrency Models

Feature OS Threads (Java/C++) Async/Event Loop (Node.js) Goroutines (Go) BEAM Processes (Erlang)
Min memory per unit ~1–8 MB stack Single thread (shared) ~2–8 KB initial stack ~300 bytes initial heap
10k concurrent units ~10–80 GB RAM Low RAM but serial ~20–80 MB RAM ~3 MB RAM
Scheduling OS preemptive Cooperative (single thread) Go runtime preemptive BEAM preemptive (2k reductions)
True parallelism Yes (OS threads) No (GIL-free but single-threaded) Yes (GOMAXPROCS) Yes (1 scheduler per core)
Garbage collection Global stop-the-world (JVM) V8 GC pauses Tri-color, short but global Per-process, zero global pause
Fault isolation One thread crash = process crash One uncaught exception = process crash Panics recovered per goroutine Process crash → Supervisor restarts it
Race conditions Yes — shared memory + locks No (single thread) Possible — shared memory with channels Impossible — no shared memory
Hot code upgrade Requires restart Requires restart Requires restart Live upgrade — zero downtime
WhatsApp @ 2M connections Impossible Not truly parallel Theoretically possible Achieved in production ✓
🎯
When to Choose Erlang/BEAM (Elixir)

BEAM shines at massively concurrent, stateful, long-lived connections — chat servers, IoT device networks, multiplayer game backends, telecom switches, real-time dashboards. It is not optimised for raw CPU-heavy computation (image processing, ML training, compression). For those, Go or C++ wins. For connections-per-server, nothing matches BEAM.


Section 11

Hot Code Loading — Deploy Without Downtime

One of Erlang's most astonishing features is hot code loading — the ability to update running code on a live production system without stopping it or dropping a single connection.

Replacing an Aeroplane Engine Mid-Flight
In most systems, deploying new code is like landing the plane, swapping the engines on the runway, and taking off again. Users see downtime. Sessions drop. Connections reset.

BEAM's hot code loading is like replacing an engine while the plane is still in the air at 35,000 feet. The BEAM VM can hold two versions of a module simultaneously. New processes use the new version. Old processes finish their current work and transition to the new version naturally. No connections are dropped. No service interruption. No restart.

Ericsson ATM switches stayed online through software upgrades for decades using exactly this mechanism. WhatsApp deployed updates to tens of millions of connected users without anyone's message being dropped.
🔄 Hot Code Loading — How It Works
Step 1
New module version is compiled to .beam bytecode and loaded into the VM alongside the current version.
Step 2
BEAM holds two versions: current (old) and old. Any third load ejects the old version.
Step 3
Processes still running in the old module continue running undisturbed — they hold a reference to that code.
Step 4
New processes spawned after the load use the new module version automatically.
Step 5
Existing processes transition to the new version on their next fully qualified function call (e.g. module:function() syntax).

Section 12

Real-World Use Cases — Who Uses Erlang/BEAM Today

💬
WhatsApp
Messaging
The defining case study. 2M connections/server, 450M users, 55 engineers. The entire backend routing and delivery layer was Erlang. Acquired for $19B in 2014.
🎮
Discord
Real-Time Chat
Used Elixir (Erlang on BEAM) to power their voice/text backend. Handled millions of concurrent WebSocket connections per server with dramatically less infrastructure than competitors.
📡
Ericsson Telecom
Telecom Infrastructure
The original use case. Erlang powers AXD301, HLR systems, and countless switching platforms. 40% of all world telecom traffic routes through Erlang-based systems.
🗄️
CouchDB / Riak
Distributed Databases
Both written in Erlang for their distributed, fault-tolerant properties. Riak powers backend storage at Riot Games, GitHub, and many Fortune 500 companies requiring consistent uptime.
🐰
RabbitMQ
Message Broker
The world's most popular open-source message broker is written in Erlang. It handles millions of messages per second across distributed clusters with built-in fault tolerance and zero-downtime upgrades.
🌊
Elixir / Phoenix
Modern BEAM Ecosystem
Elixir is a modern language that runs on BEAM. Phoenix LiveView demonstrated 2 million WebSocket connections on a single server in 2019, proving BEAM's concurrency record holds in modern stacks.

Section 13

The Seven Pillars of BEAM's Power

CONCURRENCY
Millions of Processes
Lightweight BEAM processes (300 bytes) make millions of concurrent connections feasible on commodity hardware.
🛡️
ISOLATION
No Shared Memory
Process isolation eliminates race conditions, deadlocks, and cascading failures. Each process is an independent universe.
🔄
RESILIENCE
Let It Crash
OTP Supervisors restart crashed processes automatically. The system heals itself without human intervention or downtime.
⚖️
FAIRNESS
Preemptive Scheduling
2,000-reduction quantum guarantees no process can starve others. Millisecond latency maintained even at full load.
♻️
GC DESIGN
Per-Process GC
Garbage collection happens independently per process. No stop-the-world pauses. GC in one process never stalls others.
🚀
UPGRADES
Hot Code Loading
Deploy new code to a running system without dropping connections. The telecom industry's 30-year proven feature.

Section 14

When Erlang/BEAM Is NOT the Right Choice

⚠️
Honest Limitations — Know Before You Build

Erlang is not a universal solution. Understanding its limitations is as important as understanding its strengths.

Limitation Why It Matters Mitigation
CPU-heavy computation BEAM processes are preempted every 2,000 reductions. Heavy number crunching is slow. Use NIFs (native code) for compute-heavy work, or offload to C/Rust
Large binary data Message copying for binaries > 64 bytes goes to shared binary heap. Misuse causes fragmentation. Use sub-binaries, match contexts, and explicit binary ref-counting awareness
Hiring difficulty Far fewer Erlang/Elixir engineers than Java/Python/JavaScript. Higher salary premium. Elixir has a friendlier syntax; growing community and talent pool
Ecosystem maturity Fewer libraries than Node.js/Python for non-telecom domains (ML, image processing, etc.) Use Elixir's Hex package ecosystem + Erlang NIFs to call C/Rust libraries
Learning curve Functional, immutable, pattern-matching paradigm is unfamiliar to OOP developers. Elixir provides a gentler on-ramp with familiar Ruby-like syntax on top of BEAM
Debugging distributed state With millions of processes, tracing a bug through message flows is non-trivial. OTP's sys module, recon library, and observer tool provide powerful introspection

Section 15

Golden Rules — Building on Erlang/BEAM

🌱 BEAM Architecture — Non-Negotiable Principles
1
One process per connection, always. This is the BEAM idiom. Don't fight it with thread pools or async callbacks. Spawn a process for each TCP connection, each user session, each active entity. The VM is built for this.
2
Never share state through memory. If two processes need shared data, use a dedicated GenServer process as the single source of truth, or an ETS table for high-read data. Shared memory kills the fault-isolation guarantee.
3
Supervise everything. Every process that matters should be under a Supervisor. Unsupervised processes that crash silently are the most dangerous bugs in BEAM systems. Build your supervision tree first, write process logic second.
4
Use receive with timeouts. A process waiting in receive forever is a resource leak waiting to happen. Always use after Timeout clauses. In production systems, zombie processes accumulate silently without them.
5
Monitor ETS table memory. ETS tables grow unboundedly unless explicitly managed. Implement :ets.delete_object/2 or TTL mechanisms. Unchecked ETS growth is the #1 cause of OOM crashes in production Erlang systems.
6
Benchmark with realistic concurrency. BEAM performance characteristics change dramatically between 1,000 and 1,000,000 processes. Always load test at realistic connection counts before production launch.
7
Consider Elixir for new projects. Elixir runs on BEAM with identical performance characteristics but provides a modern, approachable syntax, a thriving ecosystem (Hex), and excellent tooling (mix, ExUnit, Phoenix LiveView).

🏁
The WhatsApp Legacy

WhatsApp proved something the software industry needed to hear: the right tool for the job beats brute-force scaling every time. While competitors were buying hundreds of servers, WhatsApp was getting more out of one server than anyone thought possible — by trusting a 30-year-old language built for a different industry but engineered for the exact problem they faced. Today, Erlang and Elixir power the backbone of real-time communication at companies worldwide, and the BEAM VM's concurrency model remains the gold standard for systems that must stay alive, stay fast, and stay connected — at any scale.