Federated Learning 📂 FedAvg Algorithm · 2 of 5 16 min read

Client Selection & Sampling in Federated Learning

A visual guide to how federated learning picks which clients train each round. Using a pollster analogy, animated diagrams, a worked probability table, comparison tables, and color-coded Python, it covers why you can't use every client, the eligibility funnel, and four sampling strategies — uniform, data-weighted, power-of-choice, and resource-aware — plus the speed-vs-fairness and participation-bias trade-offs.

Section 01

The Story That Explains Client Selection

The Pollster Who Cannot Phone Everyone
Imagine a pollster who wants to know what an entire country thinks. There are 200 million voters. Calling every single one is impossible — it would take years, cost a fortune, and most people wouldn't pick up anyway.

So the pollster does something clever: each evening she calls a small, carefully chosen sample — maybe 1,500 people — and only those who are actually home and willing to talk. The next evening, a fresh sample. Over many evenings, the rotating samples add up to a faithful picture of the whole country, at a tiny fraction of the cost.

Federated learning faces the exact same problem. A model might be trained across millions of phones, but you cannot — and should not — involve all of them in every round. Most are asleep, on battery, or off Wi-Fi. So each round the server selects and samples a small cohort of clients to do the training. Who you pick, and how, quietly decides how fast the model learns, how fair it is, and whether it ever converges at all. That is the art of client selection & sampling.
Section 02

Why Not Just Use Every Client?

In cross-device federated learning the client pool can be enormous and unreliable. Selecting a subset each round — controlled by the client fraction C — is not laziness; it is survival.

Problem with "use everyone"What goes wrong
ScaleCoordinating millions of devices per round is infeasible; the server becomes a bottleneck.
StragglersThe round can only finish when the slowest client reports. One dead-slow phone stalls everyone.
AvailabilityMost devices are not eligible — not charging, not idle, not on un-metered Wi-Fi.
Communication costEvery selected client downloads and uploads the full model. Bandwidth is the bottleneck.
Diminishing returnsBeyond a certain cohort size, extra clients barely improve the averaged update.
Section 03

The Selection Funnel — From Millions to a Cohort

Before any model trains, each round filters the giant pool down through three gates: who exists, who is eligible right now, and who gets sampled.

▾ The client selection funnel
All clients — millions Eligible: charging + idle + Wi-Fi Sampled cohort (size m = C·K) population K availability filter sampling rule

Only the bottom slice — the sampled cohort — actually downloads the model, trains, and uploads an update this round.

Section 04

How the Cohort Is Chosen — Sampling Strategies

Among the eligible clients, several rules decide who actually trains. The animation shows a pool where a rotating cohort lights up amber each "round."

🎯 A rotating sampled cohort across rounds (animated)
SERVER
idle client (not selected) sampled this round coordinating server

Across rounds the cohort rotates, so over time many clients contribute — just never all at once.

Four common sampling rules
Uniform
Pick m eligible clients at random, each equally likely. The FedAvg default — simple and unbiased in expectation.
Weighted
Sample with probability proportional to data size nk, so data-rich clients appear more often.
Power-of-choice
Draw a larger candidate set, ask their current local loss, keep the highest-loss clients — train where the model is weakest.
Resource-aware
Prefer fast, well-connected, charged devices to cut straggler delay — but watch for systematic bias against slower phones.
Section 05

A Worked Example — Selection Probabilities

Suppose K = 5 eligible clients and we sample m = 2 per round. Compare uniform vs data-weighted selection. Under uniform, each client's chance of being picked in a round is m / K = 2 / 5 = 40%, regardless of size.

Uniform per-round pick chance
P(k) = m / K
Every eligible client is equally likely. Data size is ignored at selection time.
Weighted selection probability
P(k) = nk / Σj nj
Each draw favours clients holding more data, nk.
ClientSamples nkUniform PWeighted P (per draw)
A5000.400.50
B2500.400.25
C1500.400.15
D700.400.07
E300.400.03
Total1,0001.00
Reading the table
Uniform
Tiny client E is sampled as often as huge client A. Fair to clients, but the averaged update may be noisy.
Weighted
A is picked ~17× more often than E (0.50 vs 0.03), matching its data dominance — faster convergence, but E rarely contributes.
Trade-off
Speed vs fairness. Weighted learns faster; uniform gives small clients a voice. The right choice depends on your goal.
Section 06

Client Selection in Python — Three Strategies

One function, a strategy switch. Each returns the indices of the clients chosen for this round — the cohort the server will then broadcast to.

import numpy as np

def select_clients(sizes, m, strategy="uniform",
                   losses=None, candidate_factor=3):
    """Return indices of m clients chosen from the eligible pool.

    sizes    : list of n_k (samples per eligible client)
    m        : cohort size to select this round
    strategy : 'uniform' | 'weighted' | 'power_of_choice'
    losses   : current local loss per client (for power-of-choice)
    """
    K     = len(sizes)
    sizes = np.array(sizes, dtype=float)
    m     = min(m, K)                       # never ask for more than exist

    if strategy == "uniform":
        # every eligible client equally likely
        return np.random.choice(K, m, replace=False)

    if strategy == "weighted":
        # probability ∝ data size n_k
        p = sizes / sizes.sum()
        return np.random.choice(K, m, replace=False, p=p)

    if strategy == "power_of_choice":
        # 1) draw a larger candidate set weighted by data
        d  = min(K, candidate_factor * m)
        p  = sizes / sizes.sum()
        cand = np.random.choice(K, d, replace=False, p=p)
        # 2) keep the m candidates with the HIGHEST local loss
        if losses is None:
            raise ValueError("power_of_choice needs per-client losses")
        loss = np.array(losses)[cand]
        top  = cand[np.argsort(loss)[::-1][:m]]   # train where weakest
        return top

    raise ValueError(f"unknown strategy: {strategy}")


# ─── Demo over a few rounds ──────────────────────────────────
np.random.seed(0)
sizes  = [500, 250, 150, 70, 30]      # 5 eligible clients
losses = [0.2, 0.9, 0.4, 0.7, 0.1]      # pretend current losses

for strat in ("uniform", "weighted", "power_of_choice"):
    picks = select_clients(sizes, m=2, strategy=strat, losses=losses)
    print(f"{strat:16s} → clients {sorted(picks.tolist())}")
▶ Output
uniform → clients [2, 4] weighted → clients [0, 1] power_of_choice → clients [1, 3]

See how weighted leans toward the data-rich clients 0 and 1, while power_of_choice homes in on clients 1 and 3 — the ones with the highest current loss, where the model most needs work.

Section 07

Choosing a Strategy — Trade-offs at a Glance

StrategyPicks byStrengthRisk
Uniform randomEqual chanceUnbiased, dead simpleSlower; noisy averaged update
Data-weightedSample size nkFaster convergenceSmall clients rarely heard (fairness)
Power-of-choiceHighest local lossTargets weak spots; fewer roundsNeeds loss queries; can over-focus
Resource-awareSpeed / battery / Wi-FiFewer stragglers; reliableSystematically excludes slow devices → bias
Section 08

Golden Rules for Client Selection & Sampling

🎯 Remember These
1
Never select everyone. Sample a cohort each round — scale, stragglers, and bandwidth demand it.
2
Filter for eligibility first (charging, idle, un-metered Wi-Fi), then sample from what remains.
3
Uniform sampling is unbiased and a safe default; reach for weighted or power-of-choice only when you need faster convergence.
4
Watch for participation bias: if slow or rural devices are always skipped, the model quietly learns only the fast majority.
5
Bigger cohorts help — until they don't. Past a point, extra clients bring diminishing returns for real communication cost.
6
Rotate over time so most clients eventually contribute; selection is about who this round, not who ever.

In one line: Client selection & sampling decides which small cohort trains in each round — balancing convergence speed, fairness, and the cost of stragglers and bandwidth, so a model can learn across millions of devices without ever touching all of them at once.