Slack Architecture: Threading, WebSockets & Flannel

Section 01

The Story That Built Slack's Backbone

📖 Origin Story

From a Dying Game to a $27B Messaging Giant

In 2012, a small team at Tiny Speck was building a massively multiplayer game called Glitch. The game was shutting down — but inside the company, the engineers had accidentally built something extraordinary: an internal chat tool that let them coordinate across time zones, track conversations in threads, and never lose context.

That tool became Slack. By 2021, it had 12 million daily active users sending over 1.8 billion messages per day. The engineering challenge was staggering: how do you deliver a message from Alice in Mumbai to Bob in London — instantly — while keeping every reply threaded, every notification real-time, and every connection alive across millions of simultaneous users?

The answer involved three interlocking systems: Message Threading, WebSockets at scale, and a little-known but brilliant piece of infrastructure called Flannel.

This tutorial is a deep-dive into those three pillars. By the end, you will understand how Slack transformed a naive HTTP polling architecture into a real-time messaging powerhouse — and the specific engineering decisions that made it possible at a scale most engineers never confront.

💡

What You Will Learn

How Slack's threaded messaging model is structured in data; why HTTP polling fails at 10M users and how WebSockets save the day; what Flannel is, why Slack built it, and how it solved the "thundering herd" and channel fan-out problems. Packed with diagrams, real numbers, and architecture stories.

Section 02

Message Threading — The Problem It Solves

💬 Analogy

The Conference Room vs. The Noticeboard

Imagine a company with 200 employees sharing a single room. Every thought anyone has gets shouted into the room in sequence. Alice asks about the deployment. Bob answers. Chandra asks about lunch. Diana asks a follow-up about the deployment. Now nobody can follow anything.

This is what unthreaded channels look like. A channel like #engineering with 50 active users becomes an incomprehensible river of noise within minutes.

Threads are the equivalent of pulling someone into a side conversation. The main channel carries the headline; the detail stays organised underneath the original message — accessible to anyone, but not polluting the shared stream.

The Data Model Behind Threading

Every message in Slack carries a ts (timestamp) field — a floating-point number like 1609459200.000100 — which acts as its unique ID. Threading hangs entirely on two additional fields:

📋 Slack Message Object — Threading Fields

Timestamp ID of this specific message. Example: 1609459200.000100. Doubles as the unique message identifier.

thread_ts

Thread parent timestamp. If this field exists and equals ts, this message is the root of a thread. If it exists and differs from ts, this message is a reply to the thread whose root has that timestamp.

reply_count

Integer. How many replies this root message has received. Stored on the parent; updated via counter increment on every new reply.

reply_users

Array of user IDs who have replied. Used to render the reply avatars shown beneath the root message in the channel view.

latest_reply

Timestamp of the most recent reply — so Slack can show "Last reply 3 minutes ago" without fetching all replies.

subscribed

Boolean. Whether the current user has opted into notifications for this thread. Defaults to true if they sent a message in it.

Two Views, One Data Store

Slack serves two completely different views from the same underlying message store: the channel view (flat, latest N root messages) and the thread view (all replies to a single parent). The trick is how they query.

📥 Channel View Query

Goal	Strategy
Fetch main channel messages	WHERE channel_id = X AND thread_ts IS NULL OR thread_ts = ts ORDER BY ts DESC LIMIT 50
Render thread indicator	Use reply_count, reply_users, latest_reply from parent row — no JOIN needed
Click "View thread"	Triggers a separate thread panel; fetches replies lazily

📤 Thread View Query

Goal	Strategy
Fetch all replies	WHERE thread_ts = <parent_ts> AND channel_id = X ORDER BY ts ASC
Root message at top	Fetched separately by ts = thread_ts, always pinned
Pagination	Cursor-based on ts; client requests >last_ts for newer replies

🔑

Why Float Timestamps as IDs?

Using Unix epoch with microsecond precision as a message ID gives Slack sortability for free. No separate sort column. Pagination is trivial — "give me messages after ts=X" becomes a range scan on the primary key. The decimal portion (.000100) acts as a tiebreaker for messages created within the same second by different servers.

Thread Notification Logic

One of threading's most complex problems is who to notify when a reply arrives. Slack uses a subscription model layered on top of the data structure:

Auto-Subscribe on Participation

When you post a message or reply in a thread, Slack automatically sets subscribed = true for your user_id against that thread_ts. You implicitly joined the conversation.

Explicit Mention Override

If someone @mentions you in a thread reply — even one you haven't participated in — a notification is generated and you are auto-subscribed. This ensures no mention goes unnoticed.

Channel-Level Filter

Thread notifications only generate pushes if the user hasn't muted the parent channel. Channel mute wins; thread subscription loses. This prevents notification storms in large organisations.

Delivery via Real-Time Layer

The notification is pushed over the WebSocket connection if the user is online, or to mobile push (APNs / FCM) if they are offline. The thread system doesn't own delivery — it hands off to the transport layer.

Unread State Tracking

Each user has a per-thread last_read pointer — the ts of the last message they saw. Unread count = number of messages in that thread with ts > last_read. Updated when the user opens the thread panel.

Section 03

Threading at Scale — The Hard Problems

🚫

Fan-Out Amplification

problem: high cardinality

A message in a channel with 10,000 members means 10,000 potential notification evaluations. A highly-replied thread with 500 subscribers means each new reply triggers 500 lookups. Naive row-per-event in a relational DB collapses under this load.

✅ Solution: subscription lists pre-computed; async worker queues for delivery

📋

Counter Consistency

problem: concurrent writes

reply_count on the parent row is a counter that many workers might increment simultaneously. Two replies arriving at the same millisecond could both read "5", both write "6" — now the count is wrong. Classic lost-update problem under concurrent load.

✅ Solution: atomic counter increment (Redis INCR) or DB-level atomic UPDATE reply_count = reply_count + 1

🔄

Cross-Channel Threads

problem: data isolation

Slack allows "Also send to channel" when replying in a thread. This creates a message visible in both the thread AND the main channel — effectively two references to the same content, with different read-state tracking for each context.

✅ Solution: broadcast flag on message; channel_feed and thread_feed are separate logical views over the same row

⚠️

The Thundering Herd in Threading

Imagine a CEO posts a company-wide message in a 50,000-person channel. Within 30 seconds, 500 people reply. Each reply triggers a fan-out to 50,000 notification evaluations. That is 25 million operations in under a minute. Without careful queue-based fan-out and rate limiting per-user, this single event can take down the notification pipeline for everyone.

Threading Scenario	Engineering Challenge	Slack's Approach	Complexity
Reply in 2-person DM thread	Minimal — 1 recipient	Direct push via WebSocket	Low
Reply in 50-person channel thread	Fan-out to all thread subscribers (~15)	Async notification worker batch	Medium
Reply in 10,000-member channel thread	Subscription list can be large; delivery lag risk	Sharded notification queues per subscriber bucket	High
"Also send to channel" in giant workspace	Dual fan-out: thread subscribers + channel members	Separate fan-out jobs; deduplication by user_id before push	Very High
Thread in shared channel (cross-workspace)	Members live in two separate data tenants	Cross-org federation layer; separate delivery pipelines per org	Extreme

Section 04

WebSockets — Why HTTP Alone Cannot Power Slack

📖 The Polling Nightmare

How Slack Almost Killed Its Own Servers

In Slack's early days, the client used HTTP long-polling: the browser would open a request to the server, the server would hold it open until a message arrived (or 30 seconds elapsed), then the client would immediately open another request.

At 10,000 users, this meant 10,000 open HTTP connections per server — each burning memory, each requiring a full TLS handshake every 30 seconds, each generating overhead even in silence. The infrastructure bill was enormous and latency was brutal. A message could take up to 2 seconds to appear on screen after it was sent.

The engineering team knew they needed a fundamental rethink. The answer was already in the HTTP spec: WebSocket Upgrade.

How WebSockets Work — The Upgrade Handshake

A WebSocket connection starts its life as a normal HTTP/1.1 request. The client sends a special Upgrade: websocket header. If the server agrees, both sides switch protocols — and that single TCP connection becomes a persistent, full-duplex channel: both sides can send messages at any time, with no new handshake required.

🔁 WebSocket Upgrade — The 4-Step Dance

Step 1

Client sends HTTP GET with headers: Upgrade: websocket, Connection: Upgrade, Sec-WebSocket-Key: <base64-nonce>

Step 2

Server responds 101 Switching Protocols with Sec-WebSocket-Accept (SHA-1 of nonce + magic GUID). TLS layer stays in place underneath.

Step 3

Both sides are now in WebSocket framing mode. Messages are sent as frames: 2-byte header + optional extended length + masking key (client→server only) + payload.

Step 4

Slack sends a hello event to confirm the channel is ready. The client starts receiving real-time events immediately — no polling, no repeated handshakes.

HTTP Polling vs. WebSockets — Head to Head

Dimension	HTTP Long-Polling	WebSocket	Winner
Connection overhead	Full TLS handshake every ~30s	One TLS handshake; persistent	WebSocket
Latency per message	Up to 2,000ms (wait for next poll cycle)	~50–200ms (network only)	WebSocket
Server memory per idle user	~4–8KB (HTTP request state)	~2–4KB (socket file descriptor)	WebSocket
Server-to-client push	Only during active poll window	Any time, instantly	WebSocket
Client-to-server messages	New HTTP request each time	Same open channel, any time	WebSocket
Load balancer compatibility	Stateless — easy	Sticky sessions required	HTTP Polling
Horizontal scaling	Trivial — any server handles any request	Complex — client pinned to server	HTTP Polling
Firewall / proxy compatibility	Universal	Some corporate proxies block WS	HTTP Polling

⚡

The Real-World Numbers

After Slack switched to WebSockets, message delivery latency dropped from an average of 1,400ms to under 200ms. Server count for the real-time layer dropped by roughly 60% at the same traffic level. The persistent connection also enabled a new class of features — typing indicators, presence dots, live emoji reactions — that would have been economically impossible over polling.

Section 05

WebSockets at Scale — The Engineering Nightmares

A single WebSocket server can comfortably hold 50,000–100,000 simultaneous connections on modern hardware (thanks to Linux's epoll event loop). But Slack at peak had millions of concurrent connections. That means a fleet of WebSocket servers — and a fleet creates problems a single server never had.

🔌

Problem 1: Sticky Sessions

User Alice's WebSocket is connected to Server A. A message for Alice arrives at Server B (which processed the API call). How does Server B deliver to Alice's socket on Server A? Requires cross-server pub/sub routing.

cross-server delivery

🔁

Problem 2: Channel Fan-Out

A message in a 5,000-member channel must be delivered to every online member — who are spread across hundreds of WebSocket servers. The sending server must know which servers have relevant connections and push to each of them.

broadcast amplification

📌

Problem 3: Presence

The green dot next to a user's name means their WebSocket is alive. But which server tracks this? If a user has three tabs open (mobile, laptop, desktop), three connections exist on potentially three servers. Presence = "at least one live connection."

distributed state

🌞

Problem 4: Reconnection Storms

If a WebSocket server restarts (deploy, crash, scaling event), all connected clients immediately try to reconnect to a new server — simultaneously. This can overwhelm the auth layer and remaining servers. Classic thundering herd.

thundering herd

📋

Problem 5: Message Ordering

If two messages arrive for the same user via different delivery paths (direct push + channel fan-out), which arrives first? The client must de-duplicate and sequence by ts — the server can't guarantee ordering across network hops.

ordering & deduplication

🚫

Problem 6: Slow Consumer

A client on a slow mobile connection can't consume messages as fast as the server produces them. The server's outbound buffer fills up. Do you drop messages? Block? Slack needs to deliver all messages, in order, even over poor connectivity.

back-pressure

🔄

The Pub/Sub Backbone

Slack's WebSocket fleet sits behind a publish/subscribe broker (historically Redis Pub/Sub, later a custom solution). When a message is published to channel C, the application server publishes to a topic. Every WebSocket gateway server subscribed to that channel's topic receives the event and delivers it to its locally-connected clients who are members. This is the core of how cross-server delivery works.

🌐 WebSocket Fleet Architecture — Message Fan-Out Flow

Dave sends a message in #engineering. The App Server publishes to the Flannel broker. All three WS Gateways (subscribed to channel_42) receive the event and push it to their locally-connected users.

Section 06

Flannel — Slack's Custom Channel Service

📖 Engineering Story

When Redis Said "No More"

Around 2015, Slack's real-time layer ran on Redis Pub/Sub. Redis is single-threaded, blazing fast, and dead simple to reason about. For a while, it worked beautifully.

Then Slack hit 1 million daily active users. Then 3 million. The pattern became clear: on Monday mornings (US Eastern time), every team across America opened Slack simultaneously. Hundreds of thousands of reconnecting WebSocket clients all flooded the same Redis nodes with subscription requests. Redis, being single-threaded, began queueing these — and the queue grew faster than it shrank. Messages were delayed by seconds. The green dots froze.

The team had tried sharding Redis across more nodes, but the fundamental problem wasn't memory or throughput — it was that channel subscriptions had to live on one node per channel, and popular channels (like #general in a 10,000-person workspace) would become hotspots no matter how many Redis nodes you added. You can't shard a single pub/sub topic across nodes without fundamentally redesigning the routing logic.

The answer was: build a new thing. That thing was Flannel.

What Flannel Is

Flannel is Slack's purpose-built channel server — a distributed system that sits between the application servers and the WebSocket gateways, managing all event routing, channel fan-out, and presence aggregation. It was publicly announced by Slack Engineering in 2019.

🏠

What Flannel Replaces

the old Redis Pub/Sub layer

Flannel replaces direct Redis Pub/Sub subscriptions by the WebSocket gateways. Instead of each gateway subscribing to thousands of Redis topics (one per channel), gateways connect to Flannel — and Flannel owns channel membership and event routing.

📈

What Flannel Adds

intelligent fan-out

Flannel knows which WS gateways have connections to members of a given channel. It routes events only to the gateways that need them. A channel with 10,000 members but only 200 online means 200 targeted deliveries — not 10,000 broadcast evaluations.

👥

Presence Aggregation

first-class citizen

Flannel is the single source of truth for presence. A user with 3 open tabs has 3 connections on potentially 3 gateways. Flannel aggregates these into one presence state per user. When the last tab closes, Flannel fires the "went offline" event after a grace period.

Section 07

Flannel Architecture — How It Actually Works

The Three-Layer Model

Flannel operates as three logically distinct layers, each solving a different part of the scaling problem:

Layer A — Client Gateway Interface

WebSocket gateways connect to Flannel servers on startup and register themselves. They report which users are connected to them — so Flannel always knows "user X is on gateway G." This registration is heartbeated every few seconds; stale gateways are evicted. Flannel maintains a connection map: user_id → gateway_id(s).

Layer B — Channel Membership Index

Flannel maintains an in-memory index of channel_id → [list of online user_ids]. This is not the full channel membership (which could be 10,000 people) — it is only the currently online members. The index is updated lazily: users joining/leaving channels update it; the full sync happens on reconnect. This is the key optimisation: fan-out is proportional to online members, not total members.

Layer C — Event Router

When an application server publishes an event to channel X, Flannel: (1) looks up online members of channel X, (2) maps each member to their gateway(s), (3) deduplicates gateway targets, (4) pushes the event to each unique gateway. The gateway then delivers to its local WebSocket connections. Flannel never touches individual sockets — it only talks to gateways.

🆕 Flannel Event Routing — Channel Fan-Out Anatomy

Flannel's routing is proportional to online members, not total channel membership. A 10,000-member channel with 3% online requires only 300 delivery operations, not 10,000.

Section 08

Flannel's Key Innovations Explained

Innovation 1 — Lazy Channel Hydration

When Flannel starts up, it does not pre-load every channel's membership list into memory. Channels are "hydrated" lazily: the first time an event arrives for a channel, Flannel queries the database for the online members and caches it. Subsequent events use the cache.

💡

Why This Matters

Slack has millions of channels. Most channels are dormant at any moment — a channel might only see a message once a week. Loading all channel memberships on startup would require terabytes of memory. Lazy hydration means Flannel's working set is proportional to currently active channels, not all channels — typically 100x smaller.

Innovation 2 — The Thundering Herd Defense

The Monday morning reconnection storm was the original trigger for building Flannel. The solution has three components:

🋹️ Flannel's Thundering Herd Defences

Jittered Reconnect Backoff: When a WebSocket client disconnects, it waits a random interval (e.g., 1–5 seconds) before reconnecting, with exponential backoff on repeated failures. This spreads what would be a simultaneous burst across a 30-60 second window.

Gateway-Level Rate Limiting: Each Flannel node caps the rate at which new gateway registrations are processed. A gateway flood during a deploy is throttled — gateways queue behind a leaky bucket, preventing a single Flannel node from being overwhelmed.

Session Resume (not Re-Subscribe): When a client reconnects to a gateway, Flannel can resume the existing subscription state rather than rebuilding it from scratch. The client sends its last-seen event ID; Flannel replays missed events from a short ring buffer. No full channel re-hydration needed.

Consistent Hashing for Gateway Assignment: New connections are assigned to Flannel nodes via consistent hashing on workspace_id. This means all connections from one workspace tend to land on the same Flannel shard — reducing cross-shard routing for workspace-internal events.

Innovation 3 — Presence Without a Database

Traditional presence systems require a database write on every connect and disconnect event. At millions of events per day, this is expensive. Flannel solves it with an entirely in-memory model:

👥 Flannel Presence Model

Connect

Gateway registers user_id with Flannel in-memory. Flannel increments a connection_count[user_id]. If count goes from 0 → 1, fire user_presence_changed: active event.

Heartbeat

Every 30s, gateway sends a keep-alive for all connected users. Flannel resets a per-user last_seen timer. If a heartbeat is missed, count the absence — but don't immediately mark as offline (short network blip tolerance).

Disconnect

Gateway reports user_id disconnected. Flannel decrements connection_count. Starts a grace timer (typically 10–30 seconds). If count reaches 0 AND the timer expires without a new connection, fire user_presence_changed: away.

Multi-device

User has 3 open tabs → connection_count = 3. Closes 2 tabs → count = 1. Still online. Closes last tab → count = 0 → grace timer starts. This is how Slack avoids marking you offline just for switching browser tabs.

✅

The Grace Timer — Why It Exists

Mobile networks frequently drop WebSocket connections and reconnect within seconds (switching from WiFi to cellular, for example). Without a grace timer, every subway commuter would flicker between online and offline dozens of times on their commute. The 10–30 second grace window absorbs these micro-disconnects and keeps the green dot stable — a small detail that has enormous impact on user experience.

Section 09

Flannel's Benefits — Why It Won

Benefit	Before Flannel (Redis Pub/Sub)	After Flannel	Impact
Fan-out cost	Proportional to total channel members	Proportional to online members only	10–30x reduction
Reconnect storm handling	Redis single thread overwhelmed	Rate-limited + session resume	Thundering herd eliminated
Presence accuracy	Per-event DB writes; eventual consistency	In-memory with grace timer	Sub-second presence updates
Memory footprint	All channel memberships in Redis	Lazy hydration — active channels only	~100x smaller working set
Horizontal scaling	Single Redis node per channel topic	Consistent hashing across Flannel nodes	Linear scaling
Message latency	Variable — Redis queue depth dependent	Stable under load	p99 latency improved ~40%
Missed event recovery	Client must full-sync on reconnect	Ring buffer replay from last event ID	Reconnect cost near zero

Real-World Use Cases Flannel Enables

⚡

Typing Indicators

When Alice types, a user_typing event must reach all online channel members within ~300ms — or the indicator is useless. Flannel's low-latency routing makes ephemeral events like this practical at scale.

event: user_typing { channel, user_id, expires_in: 3000ms }

👍

Live Emoji Reactions

Emoji reactions are essentially micro-edits to a message. Each reaction requires fan-out to all channel members viewing that message. Flannel routes these as lightweight delta events, not full message re-deliveries.

event: reaction_added { message_ts, emoji, user_id }

📊

Read State Sync

When you mark a channel as read on your phone, your laptop should instantly reflect that. Flannel delivers cross-device state sync events — your own actions are broadcast to all your own sessions via the same pipeline.

event: channel_marked { channel_id, ts, unread_count: 0 }

👤

Presence Broadcasting

When you set yourself as "Away" or return from a meeting, Flannel broadcasts your presence change to all workspaces you share channels with — potentially spanning multiple organisational boundaries via Slack Connect.

event: user_presence_changed { user_id, presence: "active" }

🔒

Channel Join/Leave Sync

When 200 users join a new #all-hands channel simultaneously (common during a company announcement), Flannel updates the online membership index incrementally — no thundering herd on the channel index itself.

event: member_joined_channel { channel_id, user_id }

🆕

App / Bot Events

Slack apps and bots subscribe to events via the Events API. Flannel routes relevant workspace events to registered app endpoints — the same routing infrastructure that serves human users also serves programmatic integrations.

event: app_mention { channel, text, user, ts }

Section 10

The Complete System — How All Three Layers Interlock

📖 The Full Journey

Tracing a Single Message from "Send" to "Seen"

Alice is in the #engineering channel. She replies to Bob's message about a deployment (creating a threaded reply). Bob is online, connected via WebSocket to Gateway 2. Flannel routes the event. Here is every step that happens in roughly 150 milliseconds.

Alice Hits Send

Alice's Slack client sends an HTTP POST to the API endpoint. The message body includes channel, text, thread_ts (the parent message's timestamp). The API validates, authenticates via OAuth token, rate-limits Alice's workspace.

Message Persistence

The application server writes the message row to the database with a new ts value. It atomically increments reply_count on the parent message row. It updates latest_reply and reply_users on the parent. The HTTP response returns 200 OK with the new message's ts — Alice's client confirms the message was accepted. This is the end of the synchronous path.

Event Publication to Flannel

The application server asynchronously publishes a message event to Flannel, targeted at channel_id. The event payload is the full message JSON. This is a non-blocking fire-and-forget from the HTTP handler's perspective.

Flannel — Channel Lookup

Flannel receives the event. It checks its in-memory index for channel_42: online members are Alice (GW-1), Bob (GW-2), Carol (GW-3), Dave (GW-1). It deduplicates the target gateways: {GW-1, GW-2, GW-3}. Three delivery operations are queued.

Gateway Delivery

Each gateway receives the event from Flannel. It iterates its locally-connected sockets for the relevant users. Bob's socket on GW-2 receives the event. The gateway writes a WebSocket text frame containing the event JSON to Bob's socket. Bob's browser receives it via the onmessage handler. Slack's client-side JavaScript renders the reply under the thread. Bob sees Alice's message.

Thread Notification Async Path

In parallel, a separate notification worker checks the thread subscription list for thread_ts. Bob is subscribed (he authored the parent). Carol is subscribed (she replied yesterday). Bob is online — his notification is already delivered via the WebSocket event above. Carol is offline — a mobile push notification is enqueued for APNs/FCM delivery. Her phone buzzes with "Alice replied in #engineering."

Read State Update

When Bob opens the thread panel, his client sends a subscriptions.mark API call, setting his last_read for that thread to the latest ts. Flannel broadcasts a thread_marked event to Bob's other devices (his phone) — the unread badge disappears everywhere simultaneously.

🌐

Total Elapsed Time: ~150ms

Steps 1–2 take ~40–80ms (network + DB write). Steps 3–5 take ~30–60ms (Flannel lookup + gateway push + network). Bob sees Alice's message roughly 150ms after she hits Send — indistinguishable from instant to a human. The notification for Carol happens asynchronously and doesn't block any of this.

Section 11

Architecture Comparison — The Three Pillars Side by Side

Dimension	💬 Message Threading	⚡ WebSockets at Scale	🆕 Flannel
Core Problem Solved	Conversation context and organisability in busy channels	Real-time delivery without HTTP overhead	Efficient fan-out across millions of persistent connections
Key Data Structure	ts + thread_ts float-timestamp relationship model	Persistent TCP/TLS connection with framing protocol	In-memory channel → online member → gateway index
Scaling Challenge	Fan-out notification storms; counter contention	Cross-server delivery; sticky session routing	Thundering herd; lazy hydration; consistent hashing
State Location	Persistent DB (message rows, subscription tables)	Per-server (socket file descriptors)	In-memory on Flannel nodes; no DB for presence
Failure Mode	Lost notification if worker queue drops	Client reconnects; replays missed events	Ring buffer replay; lazy re-hydration on node restart
Key Metric	reply_count accuracy; notification delivery rate	Message latency p50/p99; connection count per server	Online fan-out ratio; presence update latency
Introduced In Slack	2016 (public launch of threads)	~2014 (original WebSocket migration)	~2018 (internal); 2019 (public engineering blog)

Section 12

Lessons for Your Own Real-Time System

You're probably not building the next Slack. But the architectural lessons here apply to any system with real-time delivery requirements — from a live collaborative editor to a trading dashboard to a multiplayer game lobby. Here are the non-negotiable rules:

🌟 Golden Rules — Real-Time Systems Architecture

Fan-out cost is proportional to online recipients, not total recipients. Design your fan-out pipeline to first check who is actually connected before broadcasting. An "online member index" in memory is the single highest-leverage optimisation in any pub/sub system.

Separate your write path from your delivery path. The HTTP handler that persists a message should return success to the client immediately. Delivery (fan-out, notification, push) happens asynchronously. Coupling delivery to the write path creates cascading failures under load.

Always design for reconnection from day one. WebSocket connections drop. Clients reconnect. Your system must be able to deliver missed events from a ring buffer or event log without requiring a full state re-sync. The reconnect experience defines perceived reliability.

Presence is a distributed consensus problem — solve it with grace periods. Never fire "user went offline" the instant a connection drops. Use a short grace window (10–30 seconds) to absorb network blips. The false-offline rate drops by 90%, and users' experience improves dramatically.

Jitter is your friend against thundering herds. Any time you have many clients that might reconnect simultaneously (deploy, crash, network event), inject random exponential backoff. 1,000 clients reconnecting over 60 seconds looks like normal load; 1,000 clients reconnecting at T=0 looks like a DDoS against yourself.

Float timestamps as IDs are underrated. Slack's approach of using Unix epoch with microsecond precision as the message ID gives you sortability, uniqueness, and pagination for free — no UUID lookup table, no separate sort column, no sequence generator bottleneck. Consider it for any append-heavy, time-ordered dataset.

Build the simple thing first; replace it when it breaks. Slack ran on Redis Pub/Sub for years before building Flannel. They didn't over-engineer upfront — they built for their current scale and replaced components when the data said they had to. The lesson: know your bottleneck metrics and build ahead of them, not a decade ahead.

Section 13

Quick Reference Summary

💬

Message Threading

ts + thread_ts

Two float fields that turn a flat message list into a conversation tree. Channel view shows roots only; thread view loads replies on demand. Notification fan-out is subscription-based.

⚡

WebSockets at Scale

50k+ per server

One TLS handshake per client session; full-duplex persistent channel. Cross-server delivery via pub/sub broker. Sticky sessions required. Reconnection design is critical for resilience.

🆕

Flannel

online-aware routing

Replaces Redis Pub/Sub with an intelligent channel server. Routes events to gateways, not users. Lazy hydration. In-memory presence with grace timers. Thundering herd defences built in.

🏆

The Takeaway

Slack's architecture is a masterclass in progressive refinement under scale. Each component — threading data model, WebSocket fleet, Flannel routing — was built to solve a specific bottleneck encountered at a specific scale milestone. None of it was over-engineered upfront. The result is a system that delivers 1.8 billion messages a day at sub-200ms latency, built by a team that started by making a game about exploring dungeons.