WhatsApp + Erlang/BEAM: 2M Connections Per Server

Section 01

The Story Behind 2 Million Connections

📖 The Origin Story

How WhatsApp Shocked Silicon Valley with 55 Engineers

It was January 2014. Facebook acquired a tiny company called WhatsApp for $19 billion — the largest tech acquisition in history at the time. The world asked: why so much for a messaging app?

The answer wasn't the product. It was the engineering.

WhatsApp had 450 million users sending over 50 billion messages per day, with peak loads exceeding 2 million simultaneous TCP connections on a single server. The entire engineering team? 55 people. Most companies needed thousands of engineers and hundreds of servers to serve a fraction of that load.

The secret weapon was a 30-year-old programming language built for telephone networks — Erlang, running on the BEAM virtual machine. While the rest of the world was building chat apps on Ruby on Rails or Node.js, WhatsApp had gone back to the 1980s — and won.

This tutorial will take you deep into why Erlang/BEAM can handle millions of concurrent connections where other stacks collapse, explaining every architectural concept with clear analogies, animated diagrams, and real-world context.

📡

The Numbers That Matter

A single commodity Linux server running Erlang/BEAM handled 2 million simultaneous WebSocket/TCP connections during WhatsApp's peak. For context, a typical Java application server (Tomcat with thread-per-connection) would run out of memory trying to maintain just 10,000 concurrent connections on the same hardware.

Section 02

The C10K Problem — Why Concurrency Is Hard

Before understanding why Erlang wins, you need to understand the problem every messaging platform faces. It has a name: the C10K problem (10,000 concurrent connections).

📖 Analogy

The Restaurant That Couldn't Seat Everyone

Imagine a restaurant where every customer who walks in is assigned a dedicated waiter. That waiter does nothing else — they just stand next to the customer's table, waiting for the customer to speak, ready to serve instantly.

When 50 customers arrive, you have 50 waiters. When 1,000 customers arrive, you need 1,000 waiters — most of them standing idle waiting for customers to place their orders. The waiters eat food, take up space, cost money, and mostly just wait.

This is exactly how traditional thread-per-connection servers work. Each TCP connection gets its own OS thread. Most threads just sit there, blocked, waiting for a message that hasn't arrived yet. At 2 million connections, you'd need 2 million OS threads. That's not a server — that's a catastrophe.

🧵

OS Threads

Traditional Model

Each connection = 1 OS thread. Threads consume ~1–8 MB of stack memory each. At 10,000 connections = 10 GB just for stacks. At 2 million connections = completely impossible.

⚡

Event Loop

Node.js / Nginx Model

Single thread handles all connections via epoll/kqueue. Excellent for I/O but one slow operation blocks everyone. CPU-intensive work destroys throughput. No true parallelism.

🌳

BEAM Processes

Erlang / Elixir Model

Each connection = 1 lightweight BEAM process (~300 bytes initial heap). At 2 million connections = ~600 MB just for process heaps. Each process is isolated, preemptively scheduled, and independently garbage collected.

📊 Memory Cost Per Connection Model — Animated Comparison

BEAM process overhead is so small it barely registers. At 10,000 connections, OS threads demand 16 GB while BEAM processes need just ~3 MB.

Section 03

What is Erlang and the BEAM VM?

Erlang is a functional, concurrent, fault-tolerant programming language created at Ericsson in 1986 by Joe Armstrong, Robert Virding, and Mike Williams. Its sole original purpose was to power telephone switching software — systems that had to handle thousands of calls simultaneously, never go down, and be updated without rebooting.

📖 History

Built for Nine Nines — 99.9999999% Uptime

Ericsson's AXD301 ATM switch, written in Erlang, achieved nine nines of availability — that's 99.9999999% uptime, or roughly 31 milliseconds of downtime per year. The entire internet aspires to five nines (5 minutes downtime per year). Ericsson had already solved reliability at a level 10,000× better, decades ago.

The language was open-sourced in 1998. For most of the 2000s, it stayed in the shadows — until massively concurrent internet systems started hitting the same problems telephone networks had solved in the 1980s.

🧠 The BEAM Virtual Machine — Key Components

BEAM

Bogdan/Björn's Erlang Abstract Machine — the bytecode VM that runs compiled Erlang code. Think JVM for Java, but built specifically for concurrency and soft real-time guarantees.

Scheduler

One OS thread per CPU core (default). Each scheduler has its own run queue. BEAM schedules its lightweight processes across all schedulers, fully utilising multi-core hardware automatically.

Processes

Ultra-lightweight concurrent entities — not OS threads, not system processes. They live entirely inside the BEAM VM. Each has its own heap, stack, and message mailbox. No shared memory.

Per-process garbage collection — each BEAM process collects its own garbage independently. There are no global GC pauses that stop the world. A long GC in one process doesn't delay others.

OTP

Open Telecom Platform — a set of libraries and design patterns (Supervisors, GenServers, Applications) that make building fault-tolerant distributed systems dramatically easier.

Section 04

BEAM Processes — The Core Superpower

The BEAM process is the fundamental unit of concurrency in Erlang. Understanding it deeply is the key to understanding why WhatsApp could do what it did.

📦

Initial Heap Size

233 words (~1.8 KB)

A brand-new BEAM process starts with this tiny heap. It grows on demand. Compare to ~1 MB minimum for an OS thread stack.

⚡

Spawn Time

< 1 microsecond

Creating a BEAM process is nearly instant. Spawning an OS thread takes microseconds to milliseconds. You can spawn millions per second.

🔢

Max Processes

134 million (default)

The default BEAM VM limit is 134,217,728 processes per node. This is configurable. WhatsApp ran comfortably at 2 million simultaneous processes.

🔬 Anatomy of a Single BEAM Process — Interactive Diagram

Each BEAM process is a fully isolated universe. It has its own heap, stack, mailbox, and GC cycle. The scheduler treats every process fairly — no single process can hog the CPU.

🏆

Why Isolation Is the Key Insight

Because each BEAM process has its own private memory, there is no shared state to lock. No mutexes. No semaphores. No race conditions. Two processes never corrupt each other's memory. When one process crashes, it dies alone — it cannot corrupt the heap of any other process. This is the foundation of Erlang's legendary fault tolerance.

Section 05

Preemptive Scheduling — The Fairness Engine

One of BEAM's most critical and misunderstood features is its preemptive scheduler. This is what prevents a single slow process from blocking millions of others.

📖 Analogy

The Supermarket with a Time Limit Per Customer

Imagine a supermarket checkout with 10 tills. Normally customers can stand at the till as long as they like — until they're done. One customer with 500 items holds up everyone behind them.

BEAM's scheduler is like a supermarket where each customer gets exactly 2,000 "actions" at the till. When they use those up, they must step to the back of the queue, even if they're not done — and the next customer steps forward. No customer can monopolise a till.

In BEAM terms, each process gets a reduction quota of 2,000 reductions. Every function call, pattern match, and message send consumes reductions. When the quota is spent, the scheduler forcibly suspends that process and runs the next one. This is preemptive, not cooperative.

⛔ Cooperative Scheduling (Node.js)

What Happens	Impact
Process A runs heavy loop	All others blocked
Process A must yield voluntarily	No guaranteed fairness
Slow DB call blocks event loop	100ms jank for everyone
One bad callback = service degraded	Cascading slowdowns

✅ Preemptive Scheduling (BEAM)

What Happens	Impact
Process A runs heavy loop	Paused at 2,000 reductions
Scheduler forces context switch	Guaranteed fairness
Slow process can't block others	Millisecond response time
2M processes share schedulers evenly	Soft real-time guarantees

⚙️ Reduction-Based Preemption — Animated Scheduler Rotation

The BEAM scheduler rotates through all runnable processes in round-robin fashion. Every process gets a fair time slice, measured in reductions — not wall-clock time. This gives predictable, bounded latency even under heavy load.

Section 06

Message Passing — No Shared Memory, No Locks

In traditional concurrent programming, threads share memory. Two threads can read and write the same variable. This leads to race conditions, deadlocks, and hours of debugging. Erlang eliminates this entirely.

📖 Analogy

The Office Without a Whiteboard

Imagine two employees sharing one whiteboard. When both try to write at the same time, they corrupt each other's work. To prevent this, they need a lock — one must wait while the other writes. Under high traffic, this waiting becomes a bottleneck. If one employee holds the lock and crashes, the other waits forever — a deadlock.

Erlang's model: there is no shared whiteboard. Instead, every employee (process) has their own private notepad. If Employee A wants Employee B to know something, A writes a copy of the information on a note and drops it in B's mailbox. A's notepad and B's notepad never share memory. No lock needed. No race condition possible. Crash-safe.

💬 Message Passing — Animated Process Communication

When Process A sends to Process B, the message data is deep-copied into B's mailbox. A and B never share a memory address. A process crash cannot corrupt another process's data.

Section 07

OTP Supervisors — The "Let It Crash" Philosophy

Erlang's most radical idea isn't concurrency — it's how to handle failure. Most languages try to defensively prevent crashes everywhere. Erlang says: let it crash, then restart it automatically.

📖 Analogy

The Hospital with Self-Healing Cells

Imagine your body. Cells die all the time — by the billions every day. But you don't notice because a replacement system immediately regenerates them. You don't need to prevent every cell death; you just need robust replacement machinery.

OTP Supervisors are that machinery. Each supervised process is a cell. If it crashes (unexpected error, bad data, network timeout), the Supervisor sees the crash notification, looks up the restart strategy, and spawns a fresh replacement process within milliseconds. The rest of the system didn't know the crash happened. The user sees maybe a momentarily dropped connection — then reconnects automatically.

🌳 OTP Supervision Tree — WhatsApp-Style Architecture

Watch User #2's worker: it crashes (turns red), the Supervisor detects the crash signal, and restarts a fresh worker process in milliseconds. Users #1 and #3 never notice. The rest of the system is unaffected.

💡

"Let It Crash" vs Defensive Programming

Traditional languages wrap every operation in try/catch and spend enormous effort preventing crashes. Erlang's philosophy: don't try to handle every possible error in every function. Let the process crash on unexpected input. The Supervisor will restart it in a known-good state. This results in far simpler code — no defensive null checks, no try/catch pyramids, no partially-corrupted state to clean up.

Section 08

WhatsApp's Actual Architecture on Erlang/BEAM

Let's trace a real WhatsApp message from Alice's phone to Bob's phone, and understand exactly which BEAM components handle each step.

📱 TCP Connection Established

Alice's phone opens a persistent TCP connection to a WhatsApp server. A new BEAM process (~300 bytes) is spawned specifically for this connection. This process lives as long as Alice is online — potentially hours or days — consuming almost no CPU while idle, just sitting in the scheduler's wait queue listening for data.

📨 Message Arrives from Alice

Alice types and sends a message. The TCP data arrives at Alice's connection process. The process wakes from its idle wait, parses the binary protocol, and sends an internal BEAM message to the router process: send({to: bob_id, content: "Hey"}). The connection process immediately returns to listening state.

🗺️ Router Process Looks Up Bob

A message routing process checks an in-memory ETS (Erlang Term Storage) table — a highly optimised, concurrent hash table built into BEAM. It looks up bob_id and finds Bob's connection process PID, or discovers Bob is offline and routes to a delivery queue instead. ETS lookups are O(1) and lock-free for reads.

📬 Bob's Connection Process Receives It

The router sends the message to Bob's connection process via BEAM message passing. Bob's process wakes up, serialises the message into WhatsApp's binary protocol, and writes it to Bob's TCP socket. Bob's phone receives the message. The entire path from Alice's send to Bob's receive can happen in under 50ms, all within BEAM on a single server.

✅ Delivery Acknowledgement

Bob's phone sends an ACK back. Bob's connection process receives it, sends it back through the router, Alice's connection process receives it, and writes the "double tick" signal to Alice's socket. Alice sees the double tick. All of this is BEAM process communication — fast, reliable, with zero locks.

🌐 2 Million Connections on One Server — Visualised

Each blinking dot is an active phone connection, maintained by exactly one BEAM process. 2 million dots — one commodity server. Most processes are sleeping (waiting for messages), consuming only a few bytes of memory each.

Section 09

WhatsApp's Key Erlang/BEAM Optimisations

Getting to 2 million connections wasn't default Erlang — WhatsApp pushed the BEAM to its limits with several key optimisations documented by engineer Rick Reed at EUC 2012.

⚙️ WhatsApp's BEAM Tuning Secrets

Custom Linux Kernel Tuning — Modified net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and file descriptor limits to allow millions of simultaneous TCP connections. Default Linux limits would have capped them at ~65,536 connections per port.

Reduced Process Heap Sizes — Tweaked initial BEAM process heap from the default 233 words down, and aggressively tuned garbage collection thresholds. When you have 2 million processes, even a few hundred bytes saved per process adds up to gigabytes.

Binary Data Sharing — Erlang binaries larger than 64 bytes are stored in a shared heap with reference counting (not copied on send). A 50KB profile photo sent to 100 processes exists once in memory, with 100 reference pointers. This avoids catastrophic memory multiplication.

ETS for Hot Data — Erlang Term Storage (ETS) tables provided O(1) concurrent reads with no locking for the session registry — mapping user IDs to BEAM process PIDs. At 2 million users, this lookup had to be nanosecond-fast and lock-free.

BEAM SMP Schedulers = CPU Cores — Configured one BEAM scheduler thread per physical CPU core (not logical hyperthreaded core). This avoided false sharing and cache bouncing between schedulers, giving near-linear throughput scaling as cores increased.

Network I/O via async_threads — BEAM's async thread pool handled blocking I/O operations (file system, legacy APIs) off the scheduler threads, preventing any I/O wait from blocking the main BEAM schedulers.

Section 10

Erlang/BEAM vs Other Concurrency Models

Feature	OS Threads (Java/C++)	Async/Event Loop (Node.js)	Goroutines (Go)	BEAM Processes (Erlang)
Min memory per unit	~1–8 MB stack	Single thread (shared)	~2–8 KB initial stack	~300 bytes initial heap
10k concurrent units	~10–80 GB RAM	Low RAM but serial	~20–80 MB RAM	~3 MB RAM
Scheduling	OS preemptive	Cooperative (single thread)	Go runtime preemptive	BEAM preemptive (2k reductions)
True parallelism	Yes (OS threads)	No (GIL-free but single-threaded)	Yes (GOMAXPROCS)	Yes (1 scheduler per core)
Garbage collection	Global stop-the-world (JVM)	V8 GC pauses	Tri-color, short but global	Per-process, zero global pause
Fault isolation	One thread crash = process crash	One uncaught exception = process crash	Panics recovered per goroutine	Process crash → Supervisor restarts it
Race conditions	Yes — shared memory + locks	No (single thread)	Possible — shared memory with channels	Impossible — no shared memory
Hot code upgrade	Requires restart	Requires restart	Requires restart	Live upgrade — zero downtime
WhatsApp @ 2M connections	Impossible	Not truly parallel	Theoretically possible	Achieved in production ✓

🎯

When to Choose Erlang/BEAM (Elixir)

BEAM shines at massively concurrent, stateful, long-lived connections — chat servers, IoT device networks, multiplayer game backends, telecom switches, real-time dashboards. It is not optimised for raw CPU-heavy computation (image processing, ML training, compression). For those, Go or C++ wins. For connections-per-server, nothing matches BEAM.

Section 11

Hot Code Loading — Deploy Without Downtime

One of Erlang's most astonishing features is hot code loading — the ability to update running code on a live production system without stopping it or dropping a single connection.

📖 Analogy

Replacing an Aeroplane Engine Mid-Flight

In most systems, deploying new code is like landing the plane, swapping the engines on the runway, and taking off again. Users see downtime. Sessions drop. Connections reset.

BEAM's hot code loading is like replacing an engine while the plane is still in the air at 35,000 feet. The BEAM VM can hold two versions of a module simultaneously. New processes use the new version. Old processes finish their current work and transition to the new version naturally. No connections are dropped. No service interruption. No restart.

Ericsson ATM switches stayed online through software upgrades for decades using exactly this mechanism. WhatsApp deployed updates to tens of millions of connected users without anyone's message being dropped.

🔄 Hot Code Loading — How It Works

Step 1

New module version is compiled to .beam bytecode and loaded into the VM alongside the current version.

Step 2

BEAM holds two versions: current (old) and old. Any third load ejects the old version.

Step 3

Processes still running in the old module continue running undisturbed — they hold a reference to that code.

Step 4

New processes spawned after the load use the new module version automatically.

Step 5

Existing processes transition to the new version on their next fully qualified function call (e.g. module:function() syntax).

Section 12

Real-World Use Cases — Who Uses Erlang/BEAM Today

💬

Messaging

The defining case study. 2M connections/server, 450M users, 55 engineers. The entire backend routing and delivery layer was Erlang. Acquired for $19B in 2014.

🎮

Discord

Real-Time Chat

Used Elixir (Erlang on BEAM) to power their voice/text backend. Handled millions of concurrent WebSocket connections per server with dramatically less infrastructure than competitors.

📡

Ericsson Telecom

Telecom Infrastructure

The original use case. Erlang powers AXD301, HLR systems, and countless switching platforms. 40% of all world telecom traffic routes through Erlang-based systems.

🗄️

CouchDB / Riak

Distributed Databases

Both written in Erlang for their distributed, fault-tolerant properties. Riak powers backend storage at Riot Games, GitHub, and many Fortune 500 companies requiring consistent uptime.

🐰

RabbitMQ

Message Broker

The world's most popular open-source message broker is written in Erlang. It handles millions of messages per second across distributed clusters with built-in fault tolerance and zero-downtime upgrades.

🌊

Elixir / Phoenix

Modern BEAM Ecosystem

Elixir is a modern language that runs on BEAM. Phoenix LiveView demonstrated 2 million WebSocket connections on a single server in 2019, proving BEAM's concurrency record holds in modern stacks.

Section 13

The Seven Pillars of BEAM's Power

⚡

CONCURRENCY

Millions of Processes

Lightweight BEAM processes (300 bytes) make millions of concurrent connections feasible on commodity hardware.

🛡️

ISOLATION

No Shared Memory

Process isolation eliminates race conditions, deadlocks, and cascading failures. Each process is an independent universe.

🔄

RESILIENCE

Let It Crash

OTP Supervisors restart crashed processes automatically. The system heals itself without human intervention or downtime.

⚖️

FAIRNESS

Preemptive Scheduling

2,000-reduction quantum guarantees no process can starve others. Millisecond latency maintained even at full load.

♻️

GC DESIGN

Per-Process GC

Garbage collection happens independently per process. No stop-the-world pauses. GC in one process never stalls others.

🚀

UPGRADES

Hot Code Loading

Deploy new code to a running system without dropping connections. The telecom industry's 30-year proven feature.

Section 14

When Erlang/BEAM Is NOT the Right Choice

⚠️

Honest Limitations — Know Before You Build

Erlang is not a universal solution. Understanding its limitations is as important as understanding its strengths.

Limitation	Why It Matters	Mitigation
CPU-heavy computation	BEAM processes are preempted every 2,000 reductions. Heavy number crunching is slow.	Use NIFs (native code) for compute-heavy work, or offload to C/Rust
Large binary data	Message copying for binaries > 64 bytes goes to shared binary heap. Misuse causes fragmentation.	Use sub-binaries, match contexts, and explicit binary ref-counting awareness
Hiring difficulty	Far fewer Erlang/Elixir engineers than Java/Python/JavaScript. Higher salary premium.	Elixir has a friendlier syntax; growing community and talent pool
Ecosystem maturity	Fewer libraries than Node.js/Python for non-telecom domains (ML, image processing, etc.)	Use Elixir's Hex package ecosystem + Erlang NIFs to call C/Rust libraries
Learning curve	Functional, immutable, pattern-matching paradigm is unfamiliar to OOP developers.	Elixir provides a gentler on-ramp with familiar Ruby-like syntax on top of BEAM
Debugging distributed state	With millions of processes, tracing a bug through message flows is non-trivial.	OTP's `sys` module, `recon` library, and observer tool provide powerful introspection

Section 15

Golden Rules — Building on Erlang/BEAM

🌱 BEAM Architecture — Non-Negotiable Principles

One process per connection, always. This is the BEAM idiom. Don't fight it with thread pools or async callbacks. Spawn a process for each TCP connection, each user session, each active entity. The VM is built for this.

Never share state through memory. If two processes need shared data, use a dedicated GenServer process as the single source of truth, or an ETS table for high-read data. Shared memory kills the fault-isolation guarantee.

Supervise everything. Every process that matters should be under a Supervisor. Unsupervised processes that crash silently are the most dangerous bugs in BEAM systems. Build your supervision tree first, write process logic second.

Use receive with timeouts. A process waiting in receive forever is a resource leak waiting to happen. Always use after Timeout clauses. In production systems, zombie processes accumulate silently without them.

Monitor ETS table memory. ETS tables grow unboundedly unless explicitly managed. Implement :ets.delete_object/2 or TTL mechanisms. Unchecked ETS growth is the #1 cause of OOM crashes in production Erlang systems.

Benchmark with realistic concurrency. BEAM performance characteristics change dramatically between 1,000 and 1,000,000 processes. Always load test at realistic connection counts before production launch.

Consider Elixir for new projects. Elixir runs on BEAM with identical performance characteristics but provides a modern, approachable syntax, a thriving ecosystem (Hex), and excellent tooling (mix, ExUnit, Phoenix LiveView).

🏁

The WhatsApp Legacy

WhatsApp proved something the software industry needed to hear: the right tool for the job beats brute-force scaling every time. While competitors were buying hundreds of servers, WhatsApp was getting more out of one server than anyone thought possible — by trusting a 30-year-old language built for a different industry but engineered for the exact problem they faced. Today, Erlang and Elixir power the backbone of real-time communication at companies worldwide, and the BEAM VM's concurrency model remains the gold standard for systems that must stay alive, stay fast, and stay connected — at any scale.