Federated Learning: Training Rounds Explained

Section 01

The Story That Explains Federated Learning

📖 Real World Analogy

The Five Hospitals That Could Never Share a File

Imagine five hospitals in five different cities. Each treats thousands of patients and each wants to build a smarter model that can spot a rare disease early. The catch: no hospital is allowed to email its patient records to anyone. Privacy law forbids it, ethics forbid it, and frankly the patients would never agree.

For years this meant five small, mediocre models — each starved of data. Then someone proposed a clever ritual. "Nobody sends data. Instead, every Monday morning, a central coordinator emails out one shared model. Each hospital trains that copy overnight on its own private records. On Tuesday, they send back only the lessons the model learned — the adjusted dial settings, never the patients. The coordinator blends all five sets of lessons into one improved model, and the cycle repeats next Monday."

Each weekly Monday-to-Tuesday cycle is a training round. The data never moves. Only the model travels. After enough rounds, the shared model becomes as sharp as if it had seen every hospital's data at once — yet no record ever left its building. That, in one story, is Federated Learning, and the heartbeat of the whole thing is the training round.

Section 02

Centralized vs. Federated — Where the Data Lives

In classical (centralized) training, all data is gathered into one place and the model learns from the pile. In federated learning, the data stays put on each device or organisation (a client), and only model parameters move to and from a central server. The diagram below shows the same model living in two worlds.

🗺 Data flow — Centralized vs Federated

Left: raw records are pulled to one server (a privacy risk). Right: records never move — the model is sent out, trained locally, and only its parameters return.

Section 03

The Training Round — the Heartbeat of Federated Learning

A training round (sometimes called a communication round or global round) is one complete cycle of sending the shared model out, training it locally, sending the updates back, and merging them into an improved shared model. Federated learning is simply this round repeated dozens or hundreds of times until the model stops improving. The animation below is one round, looping forever.

🔄 One federated training round (looping)

Step 1–2: broadcast model out Step 3: train locally on private data Step 4: upload updates back Step 5: aggregate into new model

One full loop = one round. The global model that comes out becomes the input to the next round.

Section 04

Anatomy of a Single Round — the Five Steps

Every round, no matter the algorithm, walks through the same five steps. The server orchestrates; the clients do the heavy local lifting.

The 5 steps inside round t

1 · Select

The server picks a subset of available clients (e.g. 10 of 1,000 phones that are charging on Wi-Fi). It rarely uses all of them.

2 · Broadcast

The server sends the current global model weights w_t to every selected client.

3 · Local train

Each client trains on its own private data for a few local epochs, producing its own updated weights w^k_t+1.

4 · Upload

Clients send back only their updated weights (or the delta). No raw data is ever transmitted.

5 · Aggregate

The server merges all updates into a new global model w_t+1 — usually a weighted average (FedAvg). The round ends; t → t+1.

Don't confuse a round with an epoch

This is the single most common point of confusion for newcomers.

Term	Where it happens	What one unit means
Local epoch	Inside one client	One full pass over that client's local dataset.
Round (global round)	Across the whole system	One full broadcast → local-train → upload → aggregate cycle.

So a single round may contain, say, E = 5 local epochs per client. More local epochs per round means less communication but a higher risk of clients "drifting" apart on non-identical data. Tuning this balance is the core craft of federated learning.

Section 05

A Worked Example — One Round of FedAvg by Hand

The classic aggregation rule is Federated Averaging (FedAvg). The server doesn't just take a plain average of client models — it takes a weighted average, where each client's weight is proportional to how much data it has. A hospital with 5,000 patients should count more than a clinic with 500.

FedAvg Aggregation

w_t+1 = Σ_k (n_k / n) · w^k_t+1

New global weights = sum of each client's weights, scaled by its share of the total data n_k/n.

Total samples

n = Σ_k n_k

The total number of training samples across all participating clients in this round.

Suppose three hospitals join round t. For simplicity, imagine the model is a single number (one weight). After local training, each returns its own value:

Client	Samples n_k	Local weight w^k	Share n_k/n	Contribution (share × weight)
Hospital A	5,000	0.80	0.500	0.400
Hospital B	3,000	0.60	0.300	0.180
Hospital C	2,000	0.30	0.200	0.060
Total	10,000	—	1.000	0.640

Computing the new global weight

Weighted

w_t+1 = 0.400 + 0.180 + 0.060 = 0.640

Compare

A plain (unweighted) average would give (0.80 + 0.60 + 0.30) / 3 = 0.567 — which wrongly lets tiny Hospital C count as much as huge Hospital A.

Result

The new global weight 0.640 is broadcast in round t+1, and the whole cycle repeats.

Section 06

Implementing a Training Round in Python (from scratch)

Here is a minimal, dependency-light FedAvg loop. It makes the round structure crystal clear: the outer loop is rounds, the inner work is local training, and the server step is weighted aggregation.

import numpy as np

# ─── A toy "model" is just a vector of weights ───────────────
def local_train(global_w, client_data, lr=0.1, local_epochs=5):
    """Each client trains a COPY of the global model on its own data."""
    w = global_w.copy()                 # never mutate the global model
    X, y = client_data
    for _ in range(local_epochs):     # local epochs < one round
        grad = X.T @ (X @ w - y) / len(y)
        w = w - lr * grad
    return w


def fedavg(client_weights, client_sizes):
    """Server aggregation: weighted average by sample count (FedAvg)."""
    n = sum(client_sizes)                # total samples this round
    new_w = sum((nk / n) * wk
                for wk, nk in zip(client_weights, client_sizes))
    return new_w


# ─── THE FEDERATED TRAINING LOOP ─────────────────────────────
def federated_training(clients, n_features, rounds=20, frac=0.5):
    global_w = np.zeros(n_features)        # start from a blank model

    for t in range(rounds):              # <<< each t is ONE ROUND >>>
        # 1. SELECT a subset of clients
        m = max(1, int(frac * len(clients)))
        selected = np.random.choice(len(clients), m, replace=False)

        updates, sizes = [], []
        for k in selected:
            data = clients[k]
            # 2. BROADCAST  +  3. LOCAL TRAIN
            wk = local_train(global_w, data)
            # 4. UPLOAD the updated weights (not the data!)
            updates.append(wk)
            sizes.append(len(data[1]))

        # 5. AGGREGATE into the new global model
        global_w = fedavg(updates, sizes)
        print(f"Round {t+1:2d}/{rounds}  |  ‖w‖ = {np.linalg.norm(global_w):.4f}")

    return global_w


# ─── Demo: 4 clients, each with its own private data ─────────
np.random.seed(0)
true_w  = np.array([2.0, -1.0, 0.5])
clients = []
for _ in range(4):
    X = np.random.randn(200, 3)
    y = X @ true_w + 0.05 * np.random.randn(200)
    clients.append((X, y))

final = federated_training(clients, n_features=3, rounds=20)
print("Learned :", np.round(final, 3))
print("True    :", true_w)

▶ Output

Round 1/20 | ‖w‖ = 0.6231 Round 5/20 | ‖w‖ = 1.9874 Round 10/20 | ‖w‖ = 2.2741 Round 20/20 | ‖w‖ = 2.2906 Learned : [ 1.997 -0.998 0.501] True : [ 2. -1. 0.5]

Notice the model converges to the true weights [2, -1, 0.5] without any client ever sharing its X or y. Only the trained weight vectors crossed the network.

Section 07

Watching the Model Improve, Round by Round

The whole point of running many rounds is that accuracy climbs and loss falls each cycle — steeply at first, then leveling off. We usually stop when the gains per round become negligible (a stopping criterion). The animated chart shows a typical curve.

📈 Global accuracy across training rounds

Global model accuracy Big improvements in the first few rounds Diminishing returns — time to stop

Each dot is the accuracy of the global model measured after that round's aggregation. Most learning happens early; later rounds polish.

Section 08

The Knobs That Control a Round

Hyperparameter	Symbol	What it controls	Trade-off if too high
Number of rounds	T	How many full cycles run in total.	Wasted compute & communication after convergence.
Client fraction	C	What share of clients participate each round.	More communication & stragglers slow each round.
Local epochs	E	How long each client trains before reporting back.	Clients "drift" apart on non-IID data → unstable global model.
Local batch size	B	Mini-batch size during local training.	Coarser local updates; less stable gradients.
Learning rate	η	Local step size each gradient update.	Local models overshoot & diverge.

The famous FedAvg paper showed that increasing local epochs E and client fraction C can dramatically cut the number of rounds needed — trading cheap local computation for expensive network communication. That trade-off is the central economic lever of federated learning.

Section 09

Golden Rules for Training Rounds

🎯 Remember These

A round is the system-wide cycle; an epoch is local to one client. One round can contain many local epochs.

Data never leaves the client. Only model parameters (or deltas) travel. If raw data moves, it is not federated learning.

Aggregate with a weighted average (FedAvg) — weight each client by its data size n_k/n, not equally.

More local epochs per round = fewer rounds but more client drift on non-IID data. Tune, don't guess.

Communication is usually the bottleneck, not computation. Design rounds to minimise how often the model crosses the network.

Stop when gains per round flatten. Extra rounds past the plateau waste battery, bandwidth, and time.

In one line: Federated learning trains a shared model through repeated training rounds — broadcast, local-train, upload, aggregate — so that many parties learn together while their data stays private and never moves.