Federated Learning 📂 FL System Architecture · 2 of 5 24 min read

Federated Learning: Training Rounds Explained

A visual, story-driven guide to how federated learning works through repeated training rounds. Using a five-hospitals analogy, animated diagrams, a hand-worked FedAvg example, tables, and color-coded Python, it explains the broadcast → local-train → upload → aggregate cycle, the difference between a round and a local epoch, and how a model converges over many rounds while data never leaves the client.

Section 01

The Story That Explains Federated Learning

The Five Hospitals That Could Never Share a File
Imagine five hospitals in five different cities. Each treats thousands of patients and each wants to build a smarter model that can spot a rare disease early. The catch: no hospital is allowed to email its patient records to anyone. Privacy law forbids it, ethics forbid it, and frankly the patients would never agree.

For years this meant five small, mediocre models — each starved of data. Then someone proposed a clever ritual. "Nobody sends data. Instead, every Monday morning, a central coordinator emails out one shared model. Each hospital trains that copy overnight on its own private records. On Tuesday, they send back only the lessons the model learned — the adjusted dial settings, never the patients. The coordinator blends all five sets of lessons into one improved model, and the cycle repeats next Monday."

Each weekly Monday-to-Tuesday cycle is a training round. The data never moves. Only the model travels. After enough rounds, the shared model becomes as sharp as if it had seen every hospital's data at once — yet no record ever left its building. That, in one story, is Federated Learning, and the heartbeat of the whole thing is the training round.
Section 02

Centralized vs. Federated — Where the Data Lives

In classical (centralized) training, all data is gathered into one place and the model learns from the pile. In federated learning, the data stays put on each device or organisation (a client), and only model parameters move to and from a central server. The diagram below shows the same model living in two worlds.

🗺 Data flow — Centralized vs Federated
CENTRALIZED Server data data data data Raw data leaves the device ⚠ FEDERATED Server client client client client Only the model travels ✓

Left: raw records are pulled to one server (a privacy risk). Right: records never move — the model is sent out, trained locally, and only its parameters return.

Section 03

The Training Round — the Heartbeat of Federated Learning

A training round (sometimes called a communication round or global round) is one complete cycle of sending the shared model out, training it locally, sending the updates back, and merging them into an improved shared model. Federated learning is simply this round repeated dozens or hundreds of times until the model stops improving. The animation below is one round, looping forever.

🔄 One federated training round (looping)
SERVER global model Client A Client B Client C Client D ↻ aggregate → new global model
Step 1–2: broadcast model out Step 3: train locally on private data Step 4: upload updates back Step 5: aggregate into new model

One full loop = one round. The global model that comes out becomes the input to the next round.

Section 04

Anatomy of a Single Round — the Five Steps

Every round, no matter the algorithm, walks through the same five steps. The server orchestrates; the clients do the heavy local lifting.

The 5 steps inside round t
1 · Select
The server picks a subset of available clients (e.g. 10 of 1,000 phones that are charging on Wi-Fi). It rarely uses all of them.
2 · Broadcast
The server sends the current global model weights wt to every selected client.
3 · Local train
Each client trains on its own private data for a few local epochs, producing its own updated weights wkt+1.
4 · Upload
Clients send back only their updated weights (or the delta). No raw data is ever transmitted.
5 · Aggregate
The server merges all updates into a new global model wt+1 — usually a weighted average (FedAvg). The round ends; t → t+1.

Don't confuse a round with an epoch

This is the single most common point of confusion for newcomers.

TermWhere it happensWhat one unit means
Local epochInside one clientOne full pass over that client's local dataset.
Round (global round)Across the whole systemOne full broadcast → local-train → upload → aggregate cycle.

So a single round may contain, say, E = 5 local epochs per client. More local epochs per round means less communication but a higher risk of clients "drifting" apart on non-identical data. Tuning this balance is the core craft of federated learning.

Section 05

A Worked Example — One Round of FedAvg by Hand

The classic aggregation rule is Federated Averaging (FedAvg). The server doesn't just take a plain average of client models — it takes a weighted average, where each client's weight is proportional to how much data it has. A hospital with 5,000 patients should count more than a clinic with 500.

FedAvg Aggregation
wt+1 = Σk (nk / n) · wkt+1
New global weights = sum of each client's weights, scaled by its share of the total data nk/n.
Total samples
n = Σk nk
The total number of training samples across all participating clients in this round.

Suppose three hospitals join round t. For simplicity, imagine the model is a single number (one weight). After local training, each returns its own value:

ClientSamples nkLocal weight wkShare nk/nContribution (share × weight)
Hospital A5,0000.800.5000.400
Hospital B3,0000.600.3000.180
Hospital C2,0000.300.2000.060
Total10,0001.0000.640
Computing the new global weight
Weighted
wt+1 = 0.400 + 0.180 + 0.060 = 0.640
Compare
A plain (unweighted) average would give (0.80 + 0.60 + 0.30) / 3 = 0.567 — which wrongly lets tiny Hospital C count as much as huge Hospital A.
Result
The new global weight 0.640 is broadcast in round t+1, and the whole cycle repeats.
Section 06

Implementing a Training Round in Python (from scratch)

Here is a minimal, dependency-light FedAvg loop. It makes the round structure crystal clear: the outer loop is rounds, the inner work is local training, and the server step is weighted aggregation.

import numpy as np

# ─── A toy "model" is just a vector of weights ───────────────
def local_train(global_w, client_data, lr=0.1, local_epochs=5):
    """Each client trains a COPY of the global model on its own data."""
    w = global_w.copy()                 # never mutate the global model
    X, y = client_data
    for _ in range(local_epochs):     # local epochs < one round
        grad = X.T @ (X @ w - y) / len(y)
        w = w - lr * grad
    return w


def fedavg(client_weights, client_sizes):
    """Server aggregation: weighted average by sample count (FedAvg)."""
    n = sum(client_sizes)                # total samples this round
    new_w = sum((nk / n) * wk
                for wk, nk in zip(client_weights, client_sizes))
    return new_w


# ─── THE FEDERATED TRAINING LOOP ─────────────────────────────
def federated_training(clients, n_features, rounds=20, frac=0.5):
    global_w = np.zeros(n_features)        # start from a blank model

    for t in range(rounds):              # <<< each t is ONE ROUND >>>
        # 1. SELECT a subset of clients
        m = max(1, int(frac * len(clients)))
        selected = np.random.choice(len(clients), m, replace=False)

        updates, sizes = [], []
        for k in selected:
            data = clients[k]
            # 2. BROADCAST  +  3. LOCAL TRAIN
            wk = local_train(global_w, data)
            # 4. UPLOAD the updated weights (not the data!)
            updates.append(wk)
            sizes.append(len(data[1]))

        # 5. AGGREGATE into the new global model
        global_w = fedavg(updates, sizes)
        print(f"Round {t+1:2d}/{rounds}  |  ‖w‖ = {np.linalg.norm(global_w):.4f}")

    return global_w


# ─── Demo: 4 clients, each with its own private data ─────────
np.random.seed(0)
true_w  = np.array([2.0, -1.0, 0.5])
clients = []
for _ in range(4):
    X = np.random.randn(200, 3)
    y = X @ true_w + 0.05 * np.random.randn(200)
    clients.append((X, y))

final = federated_training(clients, n_features=3, rounds=20)
print("Learned :", np.round(final, 3))
print("True    :", true_w)
▶ Output
Round 1/20 | ‖w‖ = 0.6231 Round 5/20 | ‖w‖ = 1.9874 Round 10/20 | ‖w‖ = 2.2741 Round 20/20 | ‖w‖ = 2.2906 Learned : [ 1.997 -0.998 0.501] True : [ 2. -1. 0.5]

Notice the model converges to the true weights [2, -1, 0.5] without any client ever sharing its X or y. Only the trained weight vectors crossed the network.

Section 07

Watching the Model Improve, Round by Round

The whole point of running many rounds is that accuracy climbs and loss falls each cycle — steeply at first, then leveling off. We usually stop when the gains per round become negligible (a stopping criterion). The animated chart shows a typical curve.

📈 Global accuracy across training rounds
0% 50% 75% 90% 98% Training round → steep early gains plateau → stop
Global model accuracy Big improvements in the first few rounds Diminishing returns — time to stop

Each dot is the accuracy of the global model measured after that round's aggregation. Most learning happens early; later rounds polish.

Section 08

The Knobs That Control a Round

HyperparameterSymbolWhat it controlsTrade-off if too high
Number of roundsTHow many full cycles run in total.Wasted compute & communication after convergence.
Client fractionCWhat share of clients participate each round.More communication & stragglers slow each round.
Local epochsEHow long each client trains before reporting back.Clients "drift" apart on non-IID data → unstable global model.
Local batch sizeBMini-batch size during local training.Coarser local updates; less stable gradients.
Learning rateηLocal step size each gradient update.Local models overshoot & diverge.

The famous FedAvg paper showed that increasing local epochs E and client fraction C can dramatically cut the number of rounds needed — trading cheap local computation for expensive network communication. That trade-off is the central economic lever of federated learning.

Section 09

Golden Rules for Training Rounds

🎯 Remember These
1
A round is the system-wide cycle; an epoch is local to one client. One round can contain many local epochs.
2
Data never leaves the client. Only model parameters (or deltas) travel. If raw data moves, it is not federated learning.
3
Aggregate with a weighted average (FedAvg) — weight each client by its data size nk/n, not equally.
4
More local epochs per round = fewer rounds but more client drift on non-IID data. Tune, don't guess.
5
Communication is usually the bottleneck, not computation. Design rounds to minimise how often the model crosses the network.
6
Stop when gains per round flatten. Extra rounds past the plateau waste battery, bandwidth, and time.

In one line: Federated learning trains a shared model through repeated training rounds — broadcast, local-train, upload, aggregate — so that many parties learn together while their data stays private and never moves.