The Story That Explains Federated Learning
For years this meant five small, mediocre models — each starved of data. Then someone proposed a clever ritual. "Nobody sends data. Instead, every Monday morning, a central coordinator emails out one shared model. Each hospital trains that copy overnight on its own private records. On Tuesday, they send back only the lessons the model learned — the adjusted dial settings, never the patients. The coordinator blends all five sets of lessons into one improved model, and the cycle repeats next Monday."
Each weekly Monday-to-Tuesday cycle is a training round. The data never moves. Only the model travels. After enough rounds, the shared model becomes as sharp as if it had seen every hospital's data at once — yet no record ever left its building. That, in one story, is Federated Learning, and the heartbeat of the whole thing is the training round.
Centralized vs. Federated — Where the Data Lives
In classical (centralized) training, all data is gathered into one place and the model learns from the pile. In federated learning, the data stays put on each device or organisation (a client), and only model parameters move to and from a central server. The diagram below shows the same model living in two worlds.
Left: raw records are pulled to one server (a privacy risk). Right: records never move — the model is sent out, trained locally, and only its parameters return.
The Training Round — the Heartbeat of Federated Learning
A training round (sometimes called a communication round or global round) is one complete cycle of sending the shared model out, training it locally, sending the updates back, and merging them into an improved shared model. Federated learning is simply this round repeated dozens or hundreds of times until the model stops improving. The animation below is one round, looping forever.
One full loop = one round. The global model that comes out becomes the input to the next round.
Anatomy of a Single Round — the Five Steps
Every round, no matter the algorithm, walks through the same five steps. The server orchestrates; the clients do the heavy local lifting.
Don't confuse a round with an epoch
This is the single most common point of confusion for newcomers.
| Term | Where it happens | What one unit means |
|---|---|---|
| Local epoch | Inside one client | One full pass over that client's local dataset. |
| Round (global round) | Across the whole system | One full broadcast → local-train → upload → aggregate cycle. |
So a single round may contain, say, E = 5 local epochs per client. More local epochs per round means less communication but a higher risk of clients "drifting" apart on non-identical data. Tuning this balance is the core craft of federated learning.
A Worked Example — One Round of FedAvg by Hand
The classic aggregation rule is Federated Averaging (FedAvg). The server doesn't just take a plain average of client models — it takes a weighted average, where each client's weight is proportional to how much data it has. A hospital with 5,000 patients should count more than a clinic with 500.
Suppose three hospitals join round t. For simplicity, imagine the model is a single number (one weight). After local training, each returns its own value:
| Client | Samples nk | Local weight wk | Share nk/n | Contribution (share × weight) |
|---|---|---|---|---|
| Hospital A | 5,000 | 0.80 | 0.500 | 0.400 |
| Hospital B | 3,000 | 0.60 | 0.300 | 0.180 |
| Hospital C | 2,000 | 0.30 | 0.200 | 0.060 |
| Total | 10,000 | — | 1.000 | 0.640 |
Implementing a Training Round in Python (from scratch)
Here is a minimal, dependency-light FedAvg loop. It makes the round structure crystal clear: the outer loop is rounds, the inner work is local training, and the server step is weighted aggregation.
import numpy as np
# ─── A toy "model" is just a vector of weights ───────────────
def local_train(global_w, client_data, lr=0.1, local_epochs=5):
"""Each client trains a COPY of the global model on its own data."""
w = global_w.copy() # never mutate the global model
X, y = client_data
for _ in range(local_epochs): # local epochs < one round
grad = X.T @ (X @ w - y) / len(y)
w = w - lr * grad
return w
def fedavg(client_weights, client_sizes):
"""Server aggregation: weighted average by sample count (FedAvg)."""
n = sum(client_sizes) # total samples this round
new_w = sum((nk / n) * wk
for wk, nk in zip(client_weights, client_sizes))
return new_w
# ─── THE FEDERATED TRAINING LOOP ─────────────────────────────
def federated_training(clients, n_features, rounds=20, frac=0.5):
global_w = np.zeros(n_features) # start from a blank model
for t in range(rounds): # <<< each t is ONE ROUND >>>
# 1. SELECT a subset of clients
m = max(1, int(frac * len(clients)))
selected = np.random.choice(len(clients), m, replace=False)
updates, sizes = [], []
for k in selected:
data = clients[k]
# 2. BROADCAST + 3. LOCAL TRAIN
wk = local_train(global_w, data)
# 4. UPLOAD the updated weights (not the data!)
updates.append(wk)
sizes.append(len(data[1]))
# 5. AGGREGATE into the new global model
global_w = fedavg(updates, sizes)
print(f"Round {t+1:2d}/{rounds} | ‖w‖ = {np.linalg.norm(global_w):.4f}")
return global_w
# ─── Demo: 4 clients, each with its own private data ─────────
np.random.seed(0)
true_w = np.array([2.0, -1.0, 0.5])
clients = []
for _ in range(4):
X = np.random.randn(200, 3)
y = X @ true_w + 0.05 * np.random.randn(200)
clients.append((X, y))
final = federated_training(clients, n_features=3, rounds=20)
print("Learned :", np.round(final, 3))
print("True :", true_w)
Notice the model converges to the true weights [2, -1, 0.5]
without any client ever sharing its X or y. Only the
trained weight vectors crossed the network.
Watching the Model Improve, Round by Round
The whole point of running many rounds is that accuracy climbs and loss falls each cycle — steeply at first, then leveling off. We usually stop when the gains per round become negligible (a stopping criterion). The animated chart shows a typical curve.
Each dot is the accuracy of the global model measured after that round's aggregation. Most learning happens early; later rounds polish.
The Knobs That Control a Round
| Hyperparameter | Symbol | What it controls | Trade-off if too high |
|---|---|---|---|
| Number of rounds | T | How many full cycles run in total. | Wasted compute & communication after convergence. |
| Client fraction | C | What share of clients participate each round. | More communication & stragglers slow each round. |
| Local epochs | E | How long each client trains before reporting back. | Clients "drift" apart on non-IID data → unstable global model. |
| Local batch size | B | Mini-batch size during local training. | Coarser local updates; less stable gradients. |
| Learning rate | η | Local step size each gradient update. | Local models overshoot & diverge. |
The famous FedAvg paper showed that increasing local epochs E and client fraction C can dramatically cut the number of rounds needed — trading cheap local computation for expensive network communication. That trade-off is the central economic lever of federated learning.
Golden Rules for Training Rounds
In one line: Federated learning trains a shared model through repeated training rounds — broadcast, local-train, upload, aggregate — so that many parties learn together while their data stays private and never moves.