The Story That Explains Weighted Averaging
What is the sweetness of the final tank? Not the plain average of the three sweetness levels — that would let tiny Farm C count exactly as much as huge Farm A. The honest answer is the volume-weighted average: each farm's sweetness counts in proportion to how many litres it actually contributed. The 600-litre batch dominates the blend; the 100-litre batch barely nudges it.
That is precisely how a federated server combines the model updates it receives. Each client's update is a "batch," its data size nk is the "volume," and the new global model is the weighted average of all the updates. Get the weighting right and the blend is faithful; get it wrong and a tiny client can sour the whole tank.
Aggregation in federated learning is just a weighted average of vectors. The only real questions are: what do we weight by, and how do we stay robust when one contributor sends garbage.
The Aggregation Operator
After a round, the server holds K client updates — each a full vector of model weights wk. It must collapse them into one global vector. The standard rule is the data-size-weighted average.
The average is applied coordinate-wise across the entire weight tensor. The first weight of the global model is the weighted average of every client's first weight; the second of every client's second; and so on, for millions of parameters. Two models can only be averaged if they share the same architecture and parameter order.
Why Weight at All? Equal vs Proportional
The animated bar below splits the blend into each client's share of the total data. The segment widths are the weights αk — this is literally how much "say" each client gets in the average.
Under equal averaging all three segments would be 33% wide — letting tiny Client C distort the blend as much as Client A.
A Worked Example — Blending Three Update Vectors
Let each model be a 2-parameter vector [w₁, w₂]. Three clients return updates this round:
| Client | Samples nk | Update wk | Weight αk | αk · wk |
|---|---|---|---|---|
| A | 600 | [0.90, 0.20] | 0.60 | [0.540, 0.120] |
| B | 300 | [0.40, 0.80] | 0.30 | [0.120, 0.240] |
| C | 100 | [0.10, 0.10] | 0.10 | [0.010, 0.010] |
| Total | 1,000 | — | 1.00 | [0.670, 0.370] |
When the Average Breaks — Robust Aggregation
A weighted average has one dangerous weakness: it is not robust. A single client that sends a wildly wrong update — through a bug, a broken sensor, or a deliberate poisoning attack — can drag the mean far off course. The animated number line shows four honest updates clustered together and one malicious outlier. Watch the mean get yanked toward the attacker while the median barely moves.
The coordinate-wise median ignores the extreme value; the mean does not. This is why untrusted settings use robust aggregation.
When clients can't all be trusted, swap the plain mean for a coordinate-wise median, a trimmed mean (drop the most extreme values per coordinate before averaging), or defences like Krum and Bulyan. They sacrifice a little accuracy for resistance to a few bad actors.
The Family of Aggregation Rules
| Rule | Weights by | Robust to bad clients? | Best when… |
|---|---|---|---|
| Equal mean | Nothing (1/K) | No | Clients hold similar amounts of IID data |
| Data-weighted (FedAvg) | Sample size nk | No | Honest clients, uneven data sizes |
| Coordinate-wise median | — (order statistic) | Yes | Some clients may be faulty or malicious |
| Trimmed mean | Middle values only | Yes | A known small fraction of outliers |
| Krum / Bulyan | Geometric closeness | Yes | Adversarial / Byzantine threat model |
Aggregation in Python — One Function, Four Rules
The same inputs, a method switch. Each returns one aggregated weight vector the server adopts as the new global model.
import numpy as np
def aggregate(updates, sizes=None, method='weighted', trim=0.2):
"""Combine client weight vectors into one global vector.
updates : list of 1-D arrays, one per client (same shape)
sizes : list of n_k (needed for 'weighted')
method : 'mean' | 'weighted' | 'median' | 'trimmed'
"""
U = np.array(updates, dtype=float) # shape (K, d)
if method == 'mean': # equal average
return U.mean(axis=0)
if method == 'weighted': # FedAvg: α_k = n_k / n
a = np.array(sizes, float)
a = a / a.sum() # normalise to sum to 1
return (a[:, None] * U).sum(axis=0)
if method == 'median': # robust: per-coordinate median
return np.median(U, axis=0)
if method == 'trimmed': # drop extremes, then mean
k = int(trim * len(U))
Us = np.sort(U, axis=0)
core = Us[k: len(U) - k] if len(U) - 2 * k > 0 else Us
return core.mean(axis=0)
raise ValueError(f'unknown method: {method}')
# ─── Demo: 4 honest clients + 1 poisoned update ──────────────
np.random.seed(0)
honest = [np.array([0.50, 0.50]) + 0.02 * np.random.randn(2) for _ in range(4)]
poisoned = [np.array([9.00, -9.00])] # one malicious client
updates = honest + poisoned
sizes = [300, 300, 300, 300, 300] # equal data sizes
for m in ('mean', 'weighted', 'median', 'trimmed'):
g = aggregate(updates, sizes, method=m)
print(f'{m:9s} → {np.round(g, 3)}')
One poisoned client drags mean and weighted all the way to [2.2, −1.4] — nowhere near the honest cluster at [0.5, 0.5]. The median and trimmed mean shrug off the attacker and land almost exactly on the truth.
Golden Rules for Weighted Averaging
α_k = n_k / Σ n_j, or the global model lands on the wrong scale.In one line: Weighted averaging blends many client updates into one global model by summing them in proportion to each client's data — simple and effective when clients are honest, but swap in a robust rule the moment any contributor can't be trusted.