Notes on Federated Learning and Differential Privacy

#machinelearning #privacy #ai #python

Notes on Federated Learning and Differential Privacy

2026-05-31 · privacy-preserving ML

Working notes on building federated learning (FL) from scratch, what actually breaks under
Non-IID data, and how differential privacy (DP) and secure aggregation fit on top —
including the honest negative results that the marketing slides leave out. They follow the
implementation in
federated-learning-lab
(FedAvg / FedProx / SCAFFOLD, DP-SGD, secure aggregation; 33/33 tests, literature
cross-validated).

1. What federated learning actually is

The data never moves. Instead of pooling everyone's data on one server, each client trains
locally and sends model updates to a server that aggregates them. The canonical loop
(FedAvg) is:

Server broadcasts the global model.
Each client does a few local SGD epochs on its own data.
Each client sends back its updated weights.
Server averages the weights (weighted by client data size) → new global model.

That's it. The elegance is that raw data stays on-device; the difficulty is that the clients'
data distributions are not identical.

2. The Non-IID problem (where FedAvg starts to hurt)

FedAvg implicitly assumes every client sees roughly the same distribution. Real clients don't —
one hospital sees different cases than another, one phone's keyboard sees different language.
Under Non-IID data, each client's local optimum pulls in a different direction, so averaging
their updates produces client drift: the global model lands somewhere none of them wanted.

Two well-known fixes, both implemented and measured in the lab:

FedProx — add a proximal term that penalises drifting too far from the global model. Stabilises training when clients are heterogeneous.
SCAFFOLD — track control variates (correction terms) that estimate and subtract the drift direction. More state to communicate, but corrects the bias FedProx only damps.

The honest finding worth repeating: on a strongly Non-IID split (e.g. label-skewed MNIST), the
fancy methods don't always beat plain FedAvg by much — and sometimes the dominant lever is
just more communication rounds. Reporting the case where your method doesn't win is what
separates a lab from a brochure.

3. Differential privacy: the model still leaks

Keeping data on-device is not privacy. Model updates leak information about the data that
produced them — membership inference and gradient-inversion attacks reconstruct training samples
from gradients. To get a real guarantee you add differential privacy.

DP-SGD makes each training step private by:

Per-sample gradient clipping — bound each example's contribution to a max norm C.
Gaussian noise — add noise calibrated to C to the summed gradients.

The result is a formal (ε, δ) guarantee: the trained model is provably almost the same
whether or not any single example was in the data. The cost is the privacy–utility
trade-off — smaller ε (stronger privacy) means more noise and lower accuracy. There is no
free lunch; the contribution is measuring the curve, not claiming privacy is costless.

4. Secure aggregation: hide the individual update

DP bounds what the final model leaks. Secure aggregation addresses a different threat: a
curious server seeing each client's individual update. With secure aggregation, clients mask
their updates so the server can compute only the sum — no single client's contribution is
visible — yet the masks cancel in aggregate. DP (what the model leaks) and secure aggregation
(what the server sees) are complementary, not substitutes.

5. Why "from scratch" and "33/33 tests"

Privacy ML is exactly the domain where a subtly wrong implementation gives a false sense of
safety — a clipping bug or a miscalibrated noise multiplier silently voids the ε guarantee. So
the lab:

implements each algorithm from scratch (FedAvg / FedProx / SCAFFOLD, plus FedPer / Byzantine-robust / FedAdam / FedLoRA),
cross-validates against the literature so behaviour matches published results, and
ships 33/33 passing tests and explicit negative results.

For privacy and security work, the test suite and the reproduction are the credibility.

Takeaway

Federated learning moves the model, not the data — but on-device ≠ private. Non-IID data breaks
naive averaging (FedProx/SCAFFOLD help, sometimes only a little); DP-SGD buys a formal (ε, δ)
guarantee at a measurable accuracy cost; secure aggregation hides individual updates from the
server. The trustworthy version of all three is the one with the tests and the honest curves.

→ From-scratch implementations, tests, and negative results:
github.com/waynehacking8/federated-learning-lab