Weighted Averaging of Model Updates (Federated)

Section 01

The Story That Explains Weighted Averaging

📖 Real World Analogy

The Juice Factory's Mixing Tank

A juice factory receives orange juice from three farms and pours it all into one giant tank to bottle a single, consistent product. Farm A sends a 600-litre batch that is quite sweet; Farm B sends 300 litres of medium sweetness; Farm C sends a small 100-litre batch that is rather sour.

What is the sweetness of the final tank? Not the plain average of the three sweetness levels — that would let tiny Farm C count exactly as much as huge Farm A. The honest answer is the volume-weighted average: each farm's sweetness counts in proportion to how many litres it actually contributed. The 600-litre batch dominates the blend; the 100-litre batch barely nudges it.

That is precisely how a federated server combines the model updates it receives. Each client's update is a "batch," its data size n_k is the "volume," and the new global model is the weighted average of all the updates. Get the weighting right and the blend is faithful; get it wrong and a tiny client can sour the whole tank.

🧪

The Core Insight

Aggregation in federated learning is just a weighted average of vectors. The only real questions are: what do we weight by, and how do we stay robust when one contributor sends garbage.

Section 02

The Aggregation Operator

After a round, the server holds K client updates — each a full vector of model weights w^k. It must collapse them into one global vector. The standard rule is the data-size-weighted average.

Weighted average

w = Σ_k α_k · w^k

Each client gets a weight α_k; the global model is the sum of weighted client vectors.

The weights must sum to 1

α_k = n_k / n , Σ_k α_k = 1

In FedAvg, α_k is each client's share of the total data. Normalising to 1 keeps the result on the same scale.

⚠️

Averaging Happens Element by Element

The average is applied coordinate-wise across the entire weight tensor. The first weight of the global model is the weighted average of every client's first weight; the second of every client's second; and so on, for millions of parameters. Two models can only be averaged if they share the same architecture and parameter order.

Section 03

Why Weight at All? Equal vs Proportional

The animated bar below splits the blend into each client's share of the total data. The segment widths are the weights α_k — this is literally how much "say" each client gets in the average.

⚖️ Each client's share of the blend (α_k = n_k/n)

Client A — most data, biggest say Client B — medium Client C — least data, smallest say

Under equal averaging all three segments would be 33% wide — letting tiny Client C distort the blend as much as Client A.

Section 04

A Worked Example — Blending Three Update Vectors

Let each model be a 2-parameter vector [w₁, w₂]. Three clients return updates this round:

Client	Samples n_k	Update w^k	Weight α_k	α_k · w^k
A	600	[0.90, 0.20]	0.60	[0.540, 0.120]
B	300	[0.40, 0.80]	0.30	[0.120, 0.240]
C	100	[0.10, 0.10]	0.10	[0.010, 0.010]
Total	1,000	—	1.00	[0.670, 0.370]

🧮 Computing the blend, coordinate by coordinate

w₁

0.60×0.90 + 0.30×0.40 + 0.10×0.10 = 0.670

w₂

0.60×0.20 + 0.30×0.80 + 0.10×0.10 = 0.370

Weighted

Global update = [0.670, 0.370] — pulled toward data-rich Client A.

Equal

A plain mean would be [0.467, 0.367] — Client C wrongly counts as much as A.

Section 05

When the Average Breaks — Robust Aggregation

A weighted average has one dangerous weakness: it is not robust. A single client that sends a wildly wrong update — through a bug, a broken sensor, or a deliberate poisoning attack — can drag the mean far off course. The animated number line shows four honest updates clustered together and one malicious outlier. Watch the mean get yanked toward the attacker while the median barely moves.

🛡️ Mean vs median under a poisoned update

honest cluster malicious outlier mean — dragged off

The coordinate-wise median ignores the extreme value; the mean does not. This is why untrusted settings use robust aggregation.

🔑

Robust Alternatives

When clients can't all be trusted, swap the plain mean for a coordinate-wise median, a trimmed mean (drop the most extreme values per coordinate before averaging), or defences like Krum and Bulyan. They sacrifice a little accuracy for resistance to a few bad actors.

Section 06

The Family of Aggregation Rules

⚖️

Data-Weighted (FedAvg)

α_k = n_k / n

The default. Bigger datasets get a bigger say. Fast and faithful when clients are honest.

📊

Performance-Weighted

α_k ∝ quality

Weight by validation accuracy or inverse loss, so better-performing clients influence the blend more.

⏳

Staleness-Weighted

async FL

In asynchronous FL, down-weight updates computed on an old global model so stale work counts less.

Rule	Weights by	Robust to bad clients?	Best when…
Equal mean	Nothing (1/K)	No	Clients hold similar amounts of IID data
Data-weighted (FedAvg)	Sample size n_k	No	Honest clients, uneven data sizes
Coordinate-wise median	— (order statistic)	Yes	Some clients may be faulty or malicious
Trimmed mean	Middle values only	Yes	A known small fraction of outliers
Krum / Bulyan	Geometric closeness	Yes	Adversarial / Byzantine threat model

Section 07

Aggregation in Python — One Function, Four Rules

The same inputs, a method switch. Each returns one aggregated weight vector the server adopts as the new global model.

import numpy as np

def aggregate(updates, sizes=None, method='weighted', trim=0.2):
    """Combine client weight vectors into one global vector.

    updates : list of 1-D arrays, one per client (same shape)
    sizes   : list of n_k (needed for 'weighted')
    method  : 'mean' | 'weighted' | 'median' | 'trimmed'
    """
    U = np.array(updates, dtype=float)        # shape (K, d)

    if method == 'mean':                       # equal average
        return U.mean(axis=0)

    if method == 'weighted':                   # FedAvg: α_k = n_k / n
        a = np.array(sizes, float)
        a = a / a.sum()                       # normalise to sum to 1
        return (a[:, None] * U).sum(axis=0)

    if method == 'median':                     # robust: per-coordinate median
        return np.median(U, axis=0)

    if method == 'trimmed':                    # drop extremes, then mean
        k  = int(trim * len(U))
        Us = np.sort(U, axis=0)
        core = Us[k: len(U) - k] if len(U) - 2 * k > 0 else Us
        return core.mean(axis=0)

    raise ValueError(f'unknown method: {method}')


# ─── Demo: 4 honest clients + 1 poisoned update ──────────────
np.random.seed(0)
honest   = [np.array([0.50, 0.50]) + 0.02 * np.random.randn(2) for _ in range(4)]
poisoned = [np.array([9.00, -9.00])]            # one malicious client
updates  = honest + poisoned
sizes    = [300, 300, 300, 300, 300]      # equal data sizes

for m in ('mean', 'weighted', 'median', 'trimmed'):
    g = aggregate(updates, sizes, method=m)
    print(f'{m:9s} → {np.round(g, 3)}')

OUTPUT

mean → [ 2.2 -1.4 ] weighted → [ 2.2 -1.4 ] median → [ 0.503 0.498] trimmed → [ 0.504 0.499]

🎯

Read the Output

One poisoned client drags mean and weighted all the way to [2.2, −1.4] — nowhere near the honest cluster at [0.5, 0.5]. The median and trimmed mean shrug off the attacker and land almost exactly on the truth.

Section 08

Golden Rules for Weighted Averaging

🧪 Aggregation — Non-Negotiable Rules

Weights must sum to 1. Always normalise α_k = n_k / Σ n_j, or the global model lands on the wrong scale.

Weight by data size by default. A plain mean lets a tiny client distort the blend as much as a huge one.

Average coordinate-wise over identical architectures. You cannot average models with different shapes or parameter orderings.

The mean is not robust. If any client may be faulty or malicious, switch to median, trimmed mean, or Krum.

In asynchronous FL, down-weight stale updates computed on an out-of-date global model.

Aggregation is the one place the server sees every contribution at once — it's also the best place to add privacy noise (secure aggregation, differential privacy).

In one line: Weighted averaging blends many client updates into one global model by summing them in proportion to each client's data — simple and effective when clients are honest, but swap in a robust rule the moment any contributor can't be trusted.