Federated Learning 📂 FL System Architecture · 1 of 5 56 min read

FL System Architecture: The Complete Client–Server Topology Guide

A deep-dive into how federated learning systems are physically wired together. Covers the canonical star topology, the five-layer communication protocol stack (gRPC → TLS → compression → serialisation → FL logic), synchronous vs. asynchronous operating modes, the full round lifecycle in 7 steps, four topology variants (standard star, hierarchical, P2P, split learning), client selection strategies including Oort and Power-of-Choice, and a complete production-ready implementation in Flower.

Section 01

The Blueprint Before the Bricks

Building a Skyscraper Without Moving the Land
Imagine a construction company awarded contracts to build identical floors in skyscrapers across New York, London, and Tokyo simultaneously. Each site has its own workers, materials, and local rules — none of which can leave the city. But the final buildings must all match the same master blueprint.

A central architect sends the blueprint to each site. Workers build their floor locally. The architect reviews progress reports (not the buildings themselves), refines the blueprint, and sends updates. No materials ever cross an ocean — only knowledge does.

This is FL System Architecture. The "blueprint" is the global model. The "sites" are clients. The "architect" is the aggregation server. The "progress reports" are gradient updates. And the client–server topology is the engineering scaffold that makes it all work reliably at scale.

Before writing a single line of FL code, every practitioner must understand the system topology — how components connect, communicate, and co-ordinate. Getting this wrong means models that diverge, clients that stall, and systems that collapse the moment a single node drops off the network.

💡
What This Topic Covers

We will dissect the client–server topology that underpins every federated system: the roles each node plays, how data and model weights flow between them, the communication protocol stack, synchronous vs asynchronous operation modes, hierarchical and peer-to-peer extensions, and how to implement a production-ready FL server with Flower (flwr).


Section 02

The Core Client–Server Topology

The canonical FL topology is a star network: one central server (or a small cluster) at the hub, and N clients at the spokes. This is not the only topology (we will cover hierarchical and P2P variants), but it is where every FL practitioner starts.

🏠 FL Client–Server Star Topology — Full System View
🖥 Server Aggregation Engine FedAvg · FedProx · SCAFFOLD 🏥 Client 1 Hospital — MRI Data n=12,400 scans 📱 Client 2 Mobile — Keyboard n=88,000 tokens 🏭 Client 3 Factory — IoT Sensor n=5,200 cycles 🏦 Client 4 Bank — Transactions n=34,000 records 🚗 Client 5 Vehicle — Camera n=190,000 frames Global model broadcast (server → clients) Gradient/weight update (clients → server)

Star topology: one central server broadcasts the global model; each client trains locally and returns only weight updates. Raw data never crosses the dashed lines.

📋 What Each Node Does

☁️
Aggregation Server
Hub of the star
Holds the global model weights between rounds. Selects participating clients each round. Receives compressed weight deltas. Runs the aggregation algorithm (FedAvg, FedProx, etc.). Broadcasts the updated global model. Tracks convergence metrics. Does not store any client raw data — ever.
✓ Single source of truth for model state
✗ Potential bottleneck at very high client counts
📱
Client Node
Spoke of the star
Stores private local data that never leaves. Downloads the current global model when selected. Runs local SGD for E epochs on its own data. Computes weight delta: Δw = w_local − w_global. Uploads only Δw (optionally compressed + noise-masked). May decline participation if battery / bandwidth is low.
✓ Full data sovereignty maintained
✗ Heterogeneous compute; may straggle
📋
Coordinator / Selector
Orchestration layer
Often a separate process from the aggregation server. Maintains a registry of all eligible clients and their metadata (last-seen, battery, WiFi status, data size, round history). Applies selection strategy: random, power-of-choice, importance-weighted. Manages round timing and timeout logic. In small deployments, often co-located with the server.
✓ Decouples selection policy from aggregation
✗ Must handle client churn registration at scale

Section 03

The Communication Protocol Stack

Data flowing between server and clients doesn't travel as raw Python objects. A full protocol stack handles serialisation, compression, encryption, and transport. Understanding each layer prevents the most common production failures.

🔌 FL Communication Protocol Stack
L5 — App L4 — Serial L3 — Compress L2 — Encrypt L1 — Transport FL Application Layer FitIns / EvaluateIns / FitRes — Flower strategy objects · model weights as NumPy arrays Serialisation Layer Protocol Buffers (protobuf) · MessagePack · weights → bytes · Parameters object Compression Layer Top-k sparsification · quantisation (fp32→int8) · sketching · gradient clipping + DP noise Security Layer TLS 1.3 mutual auth · Secure Aggregation (SecAgg) · Homomorphic Encryption (optional) Transport Layer gRPC over HTTP/2 · bidirectional streaming · timeout / retry logic · connection pooling

Every gradient update passes down all five layers before transmission and back up on receipt. Compression alone can reduce bandwidth by 100–1000×.

Layer Technology Used What It Does Bandwidth Impact
FL Application Flower flwr, TensorFlow Federated Defines training logic, strategy, client/server interface N/A
Serialisation protobuf, MessagePack, NumPy bytes Converts tensors to byte streams for transmission Baseline
Compression Top-k, quantisation, random mask Reduces gradient size before encryption 10–1000× reduction
Security TLS 1.3, SecAgg, DP noise Prevents interception; hides individual updates +5–15% overhead
Transport gRPC / HTTP2 Reliable delivery with streaming and multiplexing Lowest latency option
🔑
Why gRPC over REST?

FL systems use gRPC (not REST) for three reasons. First, gRPC uses HTTP/2 which supports bidirectional streaming — the server can push the new model to a client while that client is still uploading its gradients from the previous round. Second, Protocol Buffers are 3–10× more compact than JSON. Third, gRPC has built-in deadline/retry semantics that handle the unreliable nature of edge client connections gracefully. Flower uses gRPC by default; TensorFlow Federated offers both gRPC and REST.


Section 04

Synchronous vs. Asynchronous Operation

One of the most consequential architectural decisions is whether the server waits for all selected clients before aggregating, or whether it aggregates whenever updates arrive. This is the sync vs. async trade-off.

The Synchronous Marathon vs. The Asynchronous Relay Race
Imagine organising a marathon where 100 runners must all cross the finish line before the race clock advances. The fastest runners wait at the finish line for the slowest. Nobody's time is wasted training — but the clock only moves when everyone arrives. This is synchronous FL: slow stragglers block every round.

Now imagine a relay race where each runner passes the baton as soon as they finish their leg, without waiting for anyone else. The team's collective speed improves continuously. A tired runner who drops the baton doesn't stop the race. This is asynchronous FL: the global model updates the moment any client returns — but the model may drift if fast clients dominate all the updates.
⚙️ Synchronous vs Asynchronous FL — Round Timing
Synchronous FL Server waits for ALL selected clients t=0 t=T C1 C2 C3 C4 Barrier (waits for C2) AGG ✓ Stable convergence ⚠ Straggler bottleneck Asynchronous FL Server aggregates on EACH client return t=0 t=T C1 C2 C3 C4 ✓ No straggler blocking ⚠ Model staleness risk Partial agg

Sync FL: all clients must finish before aggregation. Async FL: server aggregates each update immediately — faster rounds, but risk of gradient staleness.

⏳ Synchronous FL
How it works
Server selects K clients, broadcasts model
Waits until min(K, threshold) clients respond
Aggregates all received updates at once
Broadcasts updated model for next round
Best for: Cross-silo (hospitals, banks) where clients are reliable servers, not mobile devices
Risk: One slow client delays every other client
⚡ Asynchronous FL
How it works
Server continuously accepts incoming updates
Aggregates each update with momentum into global model
Fast clients train on a newer model version
Slow clients may upload stale gradients (staleness τ)
Best for: Cross-device (billions of phones) where client availability is unpredictable
Risk: Gradient staleness degrades convergence if τ is large

Section 05

The Full Round Lifecycle — Step by Step

A single FL communication round is more complex than it first appears. Here is every state transition from round start to round end, with the exact data flowing at each step.

01
Client Registration & Eligibility Check
Before any round begins, clients register with the coordinator by sending a ClientHello message containing: device ID, battery %, connection type (WiFi/4G), available RAM, local dataset size, and last participation timestamp. The coordinator marks clients as ELIGIBLE or INELIGIBLE based on configurable thresholds (e.g. battery > 20%, on WiFi, idle for 5+ minutes in Google's Gboard system).
02
Client Selection (Coordinator → Server)
The coordinator selects a subset S of eligible clients using a selection strategy. Default: uniform random sample of fraction C (e.g. C=0.1 means 10% of eligible clients). Advanced strategies: power-of-choice (selects clients with highest local loss to reduce bias), deadline-aware selection (only selects clients likely to finish within the round timeout). Selected clients receive a RoundConfig object containing round ID, local epochs E, batch size B, and learning rate η.
03
Model Broadcast (Server → Clients)
The server serialises current global weights wt into a Parameters object (Flower) or ServerMessage (TFF). The weights are compressed (optional quantisation to int8) and encrypted via TLS. For a 100M parameter model, this is ~400 MB in fp32 or ~100 MB quantised. Clients acknowledge receipt with a ModelAck message containing their local dataset size — used later for weighted aggregation.
04
Local Training (Client-Side)
Each client initialises its local model with wt, then runs E epochs of mini-batch SGD on its local dataset Dk. The local loss function Lk(w) = (1/|Dk|) Σ ℓ(w; xi, yi). After E epochs, the client has local weights wk. It computes the update: Δwk = wk − wt. This delta is the only information that will leave the client.
05
Gradient Upload (Clients → Server)
The client applies optional gradient compression (top-k sparsification keeps only k% of largest gradient values; the rest are zeroed). Then adds Gaussian noise N(0, σ²) for differential privacy (if enabled). The compressed, noised delta is serialised to protobuf and sent via gRPC streaming. Along with Δwk, the client also reports its local loss value and number of samples trained on — metadata used by the server strategy.
06
Aggregation (Server)
Once the server has received updates from a sufficient number of clients (min_available_clients threshold), it runs the aggregation algorithm. FedAvg: wt+1 = Σ (nk/n) · wk, where nk is client k's dataset size and n = Σnk. The aggregated weights form the new global model wt+1. Aggregation is typically <1 second even for 1000 clients in a GPU cluster.
07
Evaluation & Round Completion
The server evaluates wt+1 on its held-out validation dataset. It logs: global loss, global accuracy, per-round client participation rate, average gradient magnitude, and wall-clock time per round. If convergence criteria are met (e.g. loss plateau for 5 consecutive rounds), training stops. Otherwise, round t+2 begins from step 01. Final model weights are saved and optionally deployed to all clients.

Section 06

Beyond Star: Three Topology Variants

The standard client–server star works well for up to ~10,000 clients. Above that, or in settings with geographic constraints, three extended topologies are used in production.

🌎 FL Topology Variants
1. Standard Star 1 server, N clients SVR C1 C2 C3 C4 C5 Use: <10K clients Simple, standard approach 2. Hierarchical (Two-Tier) Global server + regional edge servers GLOBAL SERVER EDGE-A EDGE-B EDGE-C Use: 10K–10M clients Geographic / 5G edge deployment

Left: standard star for small-to-medium deployments. Right: hierarchical two-tier for geographic scale — regional edge servers aggregate locally before reporting to the global server.

Standard Star
1 server · N clients
One central aggregation server communicates directly with all selected clients. Simple to implement and reason about. Works well up to ~10K clients with sufficient server bandwidth. The default topology for most FL frameworks (Flower, PySyft, TFF).
🏠
Hierarchical (Two-Tier)
Global server · Edge servers · Clients
Regional edge servers (e.g. 5G MEC nodes or hospital cluster nodes) aggregate locally first, then report to a global cloud server. Drastically reduces WAN traffic. Used by Huawei in 5G FL, and by healthcare consortia spanning multiple countries. Convergence is slightly slower per global round but each round is much faster.
🔗
Peer-to-Peer (Decentralised)
No central server · Gossip protocol
No central server at all. Each client communicates with a small neighbourhood of peers and averages their models via gossip protocols (e.g. MATCHA, D-PSGD). Eliminates the single point of failure and the trust requirement on the server. Used in blockchain-integrated FL systems. Convergence proofs are harder; not yet production-standard for most applications.
⚙️
Split Learning
Model partitioned across client + server
The neural network is split: clients compute forward pass through early layers, send only the smashed data (intermediate activations) to the server, which computes the rest. The server sends gradients back to complete the backward pass. Used for vertical FL where clients have different feature sets for the same samples. Requires more communication rounds but allows huge models on weak clients.
📋
Cluster-Based FL
Clients grouped by data similarity
Clients with similar data distributions are clustered before training. Each cluster trains its own specialised global model (IFCA algorithm). Addresses the non-IID problem by separating heterogeneous clients. Useful when client populations are genuinely multi-modal (e.g. teenage users vs professional users of the same keyboard app have very different language patterns).
🎉
Personalised FL (pFL)
Global model + local fine-tuning head
Combines topology and training: a shared global backbone is trained federally, then each client fine-tunes a small personal head (last 1–2 layers) on local data. Techniques: FedPer, Per-FedAvg (MAML-inspired), Ditto (regularised local objective). Achieves the best of both worlds: global generality + local personalisation. Apple uses this for on-device personalisation of Siri and keyboard.

Section 07

Client Selection Strategies

Which clients train in each round is one of the most impactful decisions in FL system design. Random selection is the baseline — but it ignores data quality, connectivity, and model bias.

📋 Client Selection Strategies — Impact on Model Quality
Random 4 of 10 selected uniform probability Power-of-Choice High-loss clients preferred L=0.2 L=0.8 L=0.4 L=1.1 L=0.3 L=1.4 L=0.7 Circle size ∝ local loss ● Selected (high loss) ● Maybe selected ● Not selected (low loss)

Random selection (left) treats all eligible clients equally. Power-of-choice (right) biases selection toward clients with higher local loss — these clients have the most to learn and speed up convergence.

Strategy Selection Criterion Convergence Speed Fairness Best For
Uniform Random Equal probability for all eligible clients Baseline High Default; most deployments
Power-of-Choice Sample d candidates; pick top-k by local loss 1.5–3× faster Medium Non-IID data; slow convergence
Deadline-Aware Predict training time; select likely finishers Fewer stragglers Low (fast clients favoured) Mobile cross-device FL
Importance Weighted Weight by data quality / label diversity score Best final accuracy Medium Medical imaging; rare class data
Oort (Microsoft) Utility = data utility × system utility SOTA in heterogeneous nets High (enforced fairness) Production cross-device systems

Section 08

Implementing FL Architecture with Flower

Flower (flwr) is the most widely used FL framework, designed to be framework-agnostic (works with PyTorch, TensorFlow, JAX, scikit-learn). It implements the full client–server topology we've described, using gRPC for communication.

🚀
Flower Architecture Map to Our Concepts

flwr.server.Server = Aggregation Server | flwr.server.Strategy = Aggregation Algorithm (FedAvg, FedProx, etc.) | flwr.client.Client = Client Node | flwr.server.start_server() = Coordinator entrypoint | flwr.client.start_client() = Client registration + round participation

💻 Complete Flower Server Implementation

# server.py — FL Aggregation Server with Custom FedAvg Strategy
import flwr as fl
from flwr.common import Metrics
from typing import List, Tuple, Optional, Dict
import numpy as np

# ── Custom weighted FedAvg strategy ──────────────────────
class WeightedFedAvg(fl.server.strategy.FedAvg):
    """FedAvg + server-side evaluation logging."""

    def aggregate_fit(
        self,
        server_round: int,
        results: List,
        failures: List,
    ) -> Tuple[Optional[fl.common.Parameters], Dict]:

        # Log participation stats each round
        total     = len(results) + len(failures)
        success   = len(results)
        fail_rate = len(failures) / total if total > 0 else 0
        print(f"[Round {server_round}] Clients: {success}/{total} "
              f"| Failure rate: {fail_rate:.1%}")

        # Delegate aggregation to parent FedAvg
        aggregated_params, metrics = super().aggregate_fit(
            server_round, results, failures
        )
        return aggregated_params, metrics

    def aggregate_evaluate(
        self,
        server_round: int,
        results: List,
        failures: List,
    ) -> Tuple[Optional[float], Dict]:
        # Weighted average of client-reported losses
        if not results:
            return None, {}

        total_samples   = sum([num for num, _ in results])
        weighted_losses = sum([num * loss
                           for num, loss in results]) / total_samples
        print(f"[Round {server_round}] Global loss: {weighted_losses:.4f}")
        return weighted_losses, {}

# ── Server configuration ──────────────────────────────────
strategy = WeightedFedAvg(
    fraction_fit=0.1,           # 10% of clients per round
    fraction_evaluate=0.05,     # 5% of clients for eval
    min_fit_clients=10,         # minimum to start a round
    min_evaluate_clients=5,     # minimum for evaluation
    min_available_clients=50,   # wait until 50 clients connect
)

# ── Start the server ──────────────────────────────────────
if __name__ == "__main__":
    fl.server.start_server(
        server_address="0.0.0.0:8080",   # gRPC endpoint
        config=fl.server.ServerConfig(num_rounds=20),
        strategy=strategy,
    )
SERVER CONSOLE OUTPUT
INFO flwr 1.8.0 / Starting Flower server, listening on 0.0.0.0:8080 INFO Flower ECE: gRPC server running (20 rounds), SSL disabled [Round 1] Clients: 12/15 | Failure rate: 20.0% [Round 1] Global loss: 0.8341 [Round 2] Clients: 14/15 | Failure rate: 6.7% [Round 2] Global loss: 0.7204 [Round 3] Clients: 13/15 | Failure rate: 13.3% [Round 3] Global loss: 0.6118 ... [Round 20] Clients: 15/15 | Failure rate: 0.0% [Round 20] Global loss: 0.1973

💻 Complete Flower Client Implementation

# client.py — FL Client Node (PyTorch backend)
import flwr as fl
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from collections import OrderedDict
from typing import List, Dict, Tuple
import numpy as np

# ── Model definition ──────────────────────────────────────
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d((4, 4)),
        )
        self.fc = nn.Linear(64 * 4 * 4, 10)

    def forward(self, x):
        return self.fc(self.conv(x).flatten(1))

# ── Flower client class ───────────────────────────────────
class FLClient(fl.client.NumPyClient):
    def __init__(self, model, trainloader, valloader, client_id):
        self.model       = model
        self.trainloader = trainloader
        self.valloader   = valloader
        self.client_id   = client_id
        self.device      = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)

    def get_parameters(self, config) -> List[np.ndarray]:
        # Extract model weights as NumPy arrays → Flower serialises to protobuf
        return [val.cpu().numpy() for _, val
                in self.model.state_dict().items()]

    def set_parameters(self, parameters: List[np.ndarray]):
        # Load server weights into local model
        params_dict  = zip(self.model.state_dict().keys(), parameters)
        state_dict   = OrderedDict(
            {k: torch.tensor(v) for k, v in params_dict}
        )
        self.model.load_state_dict(state_dict, strict=True)

    def fit(self, parameters, config) -> Tuple[List, int, Dict]:
        # Step 1: load global weights from server
        self.set_parameters(parameters)

        # Step 2: read hyperparams from server config
        lr         = config.get("learning_rate", 0.01)
        epochs     = config.get("local_epochs",  3)
        batch_size = config.get("batch_size",    32)

        # Step 3: local training
        optimizer = torch.optim.SGD(self.model.parameters(),
                                     lr=lr, momentum=0.9)
        criterion = nn.CrossEntropyLoss()
        self.model.train()

        for _ in range(epochs):
            for images, labels in self.trainloader:
                images, labels = images.to(self.device), labels.to(self.device)
                optimizer.zero_grad()
                criterion(self.model(images), labels).backward()
                optimizer.step()

        # Step 4: return updated weights + dataset size (for weighted FedAvg)
        return self.get_parameters(config={}), len(self.trainloader.dataset), {}

    def evaluate(self, parameters, config) -> Tuple[float, int, Dict]:
        self.set_parameters(parameters)
        criterion = nn.CrossEntropyLoss()
        self.model.eval()
        loss, correct = 0.0, 0

        with torch.no_grad():
            for images, labels in self.valloader:
                images, labels = images.to(self.device), labels.to(self.device)
                outputs = self.model(images)
                loss    += criterion(outputs, labels).item()
                correct += (outputs.argmax(1) == labels).sum().item()

        n        = len(self.valloader.dataset)
        accuracy = correct / n
        return loss / len(self.valloader), n, {"accuracy": accuracy}

# ── Launch the client ─────────────────────────────────────
if __name__ == "__main__":
    import sys
    client_id = int(sys.argv[1]) if len(sys.argv) > 1 else 0

    # Each client loads ONLY its own local data partition
    trainloader, valloader = load_local_partition(client_id)
    model  = Net()
    client = FLClient(model, trainloader, valloader, client_id)

    fl.client.start_client(
        server_address="server-host:8080",   # gRPC server address
        client=client.to_client(),
    )
CLIENT CONSOLE OUTPUT (client 3, 20 rounds)
INFO flwr 1.8.0 / Starting Flower client INFO Connecting to server at server-host:8080 [Round 1] fit() called | local epochs=3 | n=1240 samples [Round 1] evaluate() → loss=0.8821 | accuracy=68.2% [Round 5] fit() called | local epochs=3 | n=1240 samples [Round 5] evaluate() → loss=0.5233 | accuracy=81.4% [Round 10] fit() called | local epochs=3 | n=1240 samples [Round 10] evaluate() → loss=0.3140 | accuracy=88.9% [Round 20] fit() called | local epochs=3 | n=1240 samples [Round 20] evaluate() → loss=0.1876 | accuracy=93.7%
⚙️
Scaling to Real Deployments: What the Code Doesn't Show

The implementation above is clean and functional but production systems add: (1) SecAgg — cryptographic secret sharing so the server never sees individual client gradients. (2) Differential Privacy — clip + Gaussian noise before upload (flwr has DPFedAvgFixed built-in). (3) Compression — top-k sparsification plugin for large models. (4) TLS mutual auth — pass grpc_max_message_length and SSL credentials to start_server(). (5) Client state persistence — save client model between rounds so returning clients resume training.


Section 09

Architecture Decision Guide

Choosing the right topology and operating mode depends on your specific constraints. Use this table to make the decision systematically.

Constraint Recommended Topology Sync Mode Selection Strategy Notes
🏠 <100 reliable servers (hospitals, banks) Standard Star Synchronous Uniform random Cross-silo; clients are reliable; FedAvg default
📱 10K–10M mobile devices Standard Star or Hierarchical Async or Semi-Sync Deadline-aware / Oort Cross-device; high churn; need straggler mitigation
🏠🏠 Multi-country, geo-distributed Hierarchical (2-tier) Sync within tier, Async across Regional coordinator Reduces cross-WAN bandwidth by 80-95%
🦔 No trusted central server Peer-to-Peer (gossip) Asynchronous Neighbour-based Blockchain FL; slower convergence
🔌 Tiny edge devices (<512MB RAM) Split Learning Synchronous Uniform Clients only run first few network layers
🏭 Vertical FL (different features, same users) Standard Star + VFL protocol Synchronous All participants always Needs PSI for user alignment; use FATE framework

Section 10

Architecture Golden Rules

🌟 FL System Architecture — Non-Negotiable Rules
1
Never let raw data cross a topology boundary. If any component in your architecture allows raw features or labels to flow outside the originating client node, it is not federated learning — it is distributed learning with privacy violations. Audit every gRPC message type in your implementation.
2
Design for client dropout from day one. In cross-device FL, expect 20–60% of selected clients to fail in any given round. Set min_fit_clients to 60–70% of your selection target — never require 100% of selected clients. Use min_available_clients to wait until enough clients are online before starting a round.
3
Use gRPC with TLS mutual authentication. Plain HTTP is never acceptable in production FL. Clients must verify the server's certificate (prevents model injection attacks) and the server must verify client certificates (prevents gradient poisoning from rogue participants).
4
Weight your aggregation by client dataset size. Unweighted averaging gives equal influence to a client with 50 samples and one with 50,000 samples. Always pass num_examples from clients and use it in FedAvg weighting: w_agg = Σ (n_k / n_total) × w_k.
5
Maintain a server-side validation set. Since client data never reaches the server, you need a small, representative, held-out dataset on the server to track global model quality over rounds. Without it, you are flying blind. Aim for 1–5% of total estimated data size.
6
Compress before encrypting, not after. Top-k sparsification or int8 quantisation must happen before TLS encryption. Encrypting first then compressing yields almost no size reduction (encrypted data is incompressible). Compression → DP noise → TLS is the correct pipeline order.
7
Track per-round client participation rate as a first-class metric. A sudden drop in participation (e.g. from 60% to 20%) is almost always a system signal — not a data signal. It means clients are failing eligibility checks, experiencing network issues, or the round timeout is too short. Log participation_rate alongside loss every round.
🚀
Coming Up in Topic 3

Topic 3: The Non-IID Problem & Advanced Aggregation Algorithms. Now that you understand the topology, we go deeper into what happens when client data is heterogeneous. We will cover FedProx, SCAFFOLD, FedNova, and MOON — algorithms specifically designed to handle non-IID data that FedAvg cannot converge on. We will benchmark all four on pathologically non-IID CIFAR-10 partitions.