The Story That Explains Client Selection
So the pollster does something clever: each evening she calls a small, carefully chosen sample — maybe 1,500 people — and only those who are actually home and willing to talk. The next evening, a fresh sample. Over many evenings, the rotating samples add up to a faithful picture of the whole country, at a tiny fraction of the cost.
Federated learning faces the exact same problem. A model might be trained across millions of phones, but you cannot — and should not — involve all of them in every round. Most are asleep, on battery, or off Wi-Fi. So each round the server selects and samples a small cohort of clients to do the training. Who you pick, and how, quietly decides how fast the model learns, how fair it is, and whether it ever converges at all. That is the art of client selection & sampling.
Why Not Just Use Every Client?
In cross-device federated learning the client pool can be enormous and unreliable. Selecting a subset each round — controlled by the client fraction C — is not laziness; it is survival.
| Problem with "use everyone" | What goes wrong |
|---|---|
| Scale | Coordinating millions of devices per round is infeasible; the server becomes a bottleneck. |
| Stragglers | The round can only finish when the slowest client reports. One dead-slow phone stalls everyone. |
| Availability | Most devices are not eligible — not charging, not idle, not on un-metered Wi-Fi. |
| Communication cost | Every selected client downloads and uploads the full model. Bandwidth is the bottleneck. |
| Diminishing returns | Beyond a certain cohort size, extra clients barely improve the averaged update. |
The Selection Funnel — From Millions to a Cohort
Before any model trains, each round filters the giant pool down through three gates: who exists, who is eligible right now, and who gets sampled.
Only the bottom slice — the sampled cohort — actually downloads the model, trains, and uploads an update this round.
How the Cohort Is Chosen — Sampling Strategies
Among the eligible clients, several rules decide who actually trains. The animation shows a pool where a rotating cohort lights up amber each "round."
Across rounds the cohort rotates, so over time many clients contribute — just never all at once.
A Worked Example — Selection Probabilities
Suppose K = 5 eligible clients and we sample m = 2 per round. Compare uniform vs data-weighted selection. Under uniform, each client's chance of being picked in a round is m / K = 2 / 5 = 40%, regardless of size.
| Client | Samples nk | Uniform P | Weighted P (per draw) |
|---|---|---|---|
| A | 500 | 0.40 | 0.50 |
| B | 250 | 0.40 | 0.25 |
| C | 150 | 0.40 | 0.15 |
| D | 70 | 0.40 | 0.07 |
| E | 30 | 0.40 | 0.03 |
| Total | 1,000 | — | 1.00 |
Client Selection in Python — Three Strategies
One function, a strategy switch. Each returns the indices of the clients chosen for this round — the cohort the server will then broadcast to.
import numpy as np
def select_clients(sizes, m, strategy="uniform",
losses=None, candidate_factor=3):
"""Return indices of m clients chosen from the eligible pool.
sizes : list of n_k (samples per eligible client)
m : cohort size to select this round
strategy : 'uniform' | 'weighted' | 'power_of_choice'
losses : current local loss per client (for power-of-choice)
"""
K = len(sizes)
sizes = np.array(sizes, dtype=float)
m = min(m, K) # never ask for more than exist
if strategy == "uniform":
# every eligible client equally likely
return np.random.choice(K, m, replace=False)
if strategy == "weighted":
# probability ∝ data size n_k
p = sizes / sizes.sum()
return np.random.choice(K, m, replace=False, p=p)
if strategy == "power_of_choice":
# 1) draw a larger candidate set weighted by data
d = min(K, candidate_factor * m)
p = sizes / sizes.sum()
cand = np.random.choice(K, d, replace=False, p=p)
# 2) keep the m candidates with the HIGHEST local loss
if losses is None:
raise ValueError("power_of_choice needs per-client losses")
loss = np.array(losses)[cand]
top = cand[np.argsort(loss)[::-1][:m]] # train where weakest
return top
raise ValueError(f"unknown strategy: {strategy}")
# ─── Demo over a few rounds ──────────────────────────────────
np.random.seed(0)
sizes = [500, 250, 150, 70, 30] # 5 eligible clients
losses = [0.2, 0.9, 0.4, 0.7, 0.1] # pretend current losses
for strat in ("uniform", "weighted", "power_of_choice"):
picks = select_clients(sizes, m=2, strategy=strat, losses=losses)
print(f"{strat:16s} → clients {sorted(picks.tolist())}")
See how weighted leans toward the data-rich clients 0 and 1, while power_of_choice homes in on clients 1 and 3 — the ones with the highest current loss, where the model most needs work.
Choosing a Strategy — Trade-offs at a Glance
| Strategy | Picks by | Strength | Risk |
|---|---|---|---|
| Uniform random | Equal chance | Unbiased, dead simple | Slower; noisy averaged update |
| Data-weighted | Sample size nk | Faster convergence | Small clients rarely heard (fairness) |
| Power-of-choice | Highest local loss | Targets weak spots; fewer rounds | Needs loss queries; can over-focus |
| Resource-aware | Speed / battery / Wi-Fi | Fewer stragglers; reliable | Systematically excludes slow devices → bias |
Golden Rules for Client Selection & Sampling
In one line: Client selection & sampling decides which small cohort trains in each round — balancing convergence speed, fairness, and the cost of stragglers and bandwidth, so a model can learn across millions of devices without ever touching all of them at once.