Client Selection & Sampling in Federated Learning

Section 01

The Story That Explains Client Selection

📖 Real World Analogy

The Pollster Who Cannot Phone Everyone

Imagine a pollster who wants to know what an entire country thinks. There are 200 million voters. Calling every single one is impossible — it would take years, cost a fortune, and most people wouldn't pick up anyway.

So the pollster does something clever: each evening she calls a small, carefully chosen sample — maybe 1,500 people — and only those who are actually home and willing to talk. The next evening, a fresh sample. Over many evenings, the rotating samples add up to a faithful picture of the whole country, at a tiny fraction of the cost.

Federated learning faces the exact same problem. A model might be trained across millions of phones, but you cannot — and should not — involve all of them in every round. Most are asleep, on battery, or off Wi-Fi. So each round the server selects and samples a small cohort of clients to do the training. Who you pick, and how, quietly decides how fast the model learns, how fair it is, and whether it ever converges at all. That is the art of client selection & sampling.

Section 02

Why Not Just Use Every Client?

In cross-device federated learning the client pool can be enormous and unreliable. Selecting a subset each round — controlled by the client fraction C — is not laziness; it is survival.

Problem with "use everyone"	What goes wrong
Scale	Coordinating millions of devices per round is infeasible; the server becomes a bottleneck.
Stragglers	The round can only finish when the slowest client reports. One dead-slow phone stalls everyone.
Availability	Most devices are not eligible — not charging, not idle, not on un-metered Wi-Fi.
Communication cost	Every selected client downloads and uploads the full model. Bandwidth is the bottleneck.
Diminishing returns	Beyond a certain cohort size, extra clients barely improve the averaged update.

Section 03

The Selection Funnel — From Millions to a Cohort

Before any model trains, each round filters the giant pool down through three gates: who exists, who is eligible right now, and who gets sampled.

▾ The client selection funnel

Only the bottom slice — the sampled cohort — actually downloads the model, trains, and uploads an update this round.

Section 04

How the Cohort Is Chosen — Sampling Strategies

Among the eligible clients, several rules decide who actually trains. The animation shows a pool where a rotating cohort lights up amber each "round."

🎯 A rotating sampled cohort across rounds (animated)

idle client (not selected) sampled this round coordinating server

Across rounds the cohort rotates, so over time many clients contribute — just never all at once.

Four common sampling rules

Uniform

Pick m eligible clients at random, each equally likely. The FedAvg default — simple and unbiased in expectation.

Weighted

Sample with probability proportional to data size n_k, so data-rich clients appear more often.

Power-of-choice

Draw a larger candidate set, ask their current local loss, keep the highest-loss clients — train where the model is weakest.

Resource-aware

Prefer fast, well-connected, charged devices to cut straggler delay — but watch for systematic bias against slower phones.

Section 05

A Worked Example — Selection Probabilities

Suppose K = 5 eligible clients and we sample m = 2 per round. Compare uniform vs data-weighted selection. Under uniform, each client's chance of being picked in a round is m / K = 2 / 5 = 40%, regardless of size.

Uniform per-round pick chance

P(k) = m / K

Every eligible client is equally likely. Data size is ignored at selection time.

Weighted selection probability

P(k) = n_k / Σ_j n_j

Each draw favours clients holding more data, n_k.

Client	Samples n_k	Uniform P	Weighted P (per draw)
A	500	0.40	0.50
B	250	0.40	0.25
C	150	0.40	0.15
D	70	0.40	0.07
E	30	0.40	0.03
Total	1,000	—	1.00

Reading the table

Uniform

Tiny client E is sampled as often as huge client A. Fair to clients, but the averaged update may be noisy.

Weighted

A is picked ~17× more often than E (0.50 vs 0.03), matching its data dominance — faster convergence, but E rarely contributes.

Trade-off

Speed vs fairness. Weighted learns faster; uniform gives small clients a voice. The right choice depends on your goal.

Section 06

Client Selection in Python — Three Strategies

One function, a strategy switch. Each returns the indices of the clients chosen for this round — the cohort the server will then broadcast to.

import numpy as np

def select_clients(sizes, m, strategy="uniform",
                   losses=None, candidate_factor=3):
    """Return indices of m clients chosen from the eligible pool.

    sizes    : list of n_k (samples per eligible client)
    m        : cohort size to select this round
    strategy : 'uniform' | 'weighted' | 'power_of_choice'
    losses   : current local loss per client (for power-of-choice)
    """
    K     = len(sizes)
    sizes = np.array(sizes, dtype=float)
    m     = min(m, K)                       # never ask for more than exist

    if strategy == "uniform":
        # every eligible client equally likely
        return np.random.choice(K, m, replace=False)

    if strategy == "weighted":
        # probability ∝ data size n_k
        p = sizes / sizes.sum()
        return np.random.choice(K, m, replace=False, p=p)

    if strategy == "power_of_choice":
        # 1) draw a larger candidate set weighted by data
        d  = min(K, candidate_factor * m)
        p  = sizes / sizes.sum()
        cand = np.random.choice(K, d, replace=False, p=p)
        # 2) keep the m candidates with the HIGHEST local loss
        if losses is None:
            raise ValueError("power_of_choice needs per-client losses")
        loss = np.array(losses)[cand]
        top  = cand[np.argsort(loss)[::-1][:m]]   # train where weakest
        return top

    raise ValueError(f"unknown strategy: {strategy}")


# ─── Demo over a few rounds ──────────────────────────────────
np.random.seed(0)
sizes  = [500, 250, 150, 70, 30]      # 5 eligible clients
losses = [0.2, 0.9, 0.4, 0.7, 0.1]      # pretend current losses

for strat in ("uniform", "weighted", "power_of_choice"):
    picks = select_clients(sizes, m=2, strategy=strat, losses=losses)
    print(f"{strat:16s} → clients {sorted(picks.tolist())}")

▶ Output

uniform → clients [2, 4] weighted → clients [0, 1] power_of_choice → clients [1, 3]

See how weighted leans toward the data-rich clients 0 and 1, while power_of_choice homes in on clients 1 and 3 — the ones with the highest current loss, where the model most needs work.

Section 07

Choosing a Strategy — Trade-offs at a Glance

Strategy	Picks by	Strength	Risk
Uniform random	Equal chance	Unbiased, dead simple	Slower; noisy averaged update
Data-weighted	Sample size n_k	Faster convergence	Small clients rarely heard (fairness)
Power-of-choice	Highest local loss	Targets weak spots; fewer rounds	Needs loss queries; can over-focus
Resource-aware	Speed / battery / Wi-Fi	Fewer stragglers; reliable	Systematically excludes slow devices → bias

Section 08

Golden Rules for Client Selection & Sampling

🎯 Remember These

Never select everyone. Sample a cohort each round — scale, stragglers, and bandwidth demand it.

Filter for eligibility first (charging, idle, un-metered Wi-Fi), then sample from what remains.

Uniform sampling is unbiased and a safe default; reach for weighted or power-of-choice only when you need faster convergence.

Watch for participation bias: if slow or rural devices are always skipped, the model quietly learns only the fast majority.

Bigger cohorts help — until they don't. Past a point, extra clients bring diminishing returns for real communication cost.

Rotate over time so most clients eventually contribute; selection is about who this round, not who ever.

In one line: Client selection & sampling decides which small cohort trains in each round — balancing convergence speed, fairness, and the cost of stragglers and bandwidth, so a model can learn across millions of devices without ever touching all of them at once.