Gaussian Mixture Models: Soft Clustering with the EM Algorithm

#machinelearning #ai #beginners #datascience

K-Means is the clustering algorithm everyone learns first. But it makes two strong assumptions: every point belongs fully to exactly one cluster (hard assignment), and clusters are round blobs. Real data breaks both. Gaussian Mixture Models (GMMs) relax them — elliptical clusters, and soft probabilistic membership — and they're fit with the elegant EM algorithm.

Data as a mixture of Gaussians

A GMM assumes your data was generated by several Gaussian "bell curves" mixed together. Each component has a mean (centre), a covariance matrix (its shape — how wide, tall, and tilted), and a mixing weight (what fraction of the data it explains). Fitting the model means finding those.

The chicken-and-egg problem

To place the Gaussians you'd need to know which points belong to which cluster — but to assign points you'd need the Gaussians. EM breaks this loop: guess, softly assign, re-estimate, repeat.

E-step: responsibilities

For each point and cluster, compute weight × Gaussian density, then normalise so a point's numbers sum to 1. Those are the responsibilities — soft assignments. A point deep in one cluster gets ~[1,0,0]; a point between two gets [0.5, 0.5, 0].

const r = model.map(k => k.weight * gauss(p, k.mean, k.cov));
const s = r.reduce((a,b)=>a+b, 0);
const resp = r.map(v => v / s);   // sums to 1

M-step: re-fit each Gaussian

Treat the responsibilities as soft counts and re-estimate each Gaussian: the new mean is a responsibility-weighted average of the points; the new covariance is the weighted spread around it (this is what lets the ellipse stretch and tilt); the new weight is its share of total responsibility.

Why the log-likelihood always climbs

EM has a beautiful guarantee: each full E+M iteration never decreases the data's log-likelihood. It rises and plateaus — so watch it to detect convergence. It can settle into a local optimum depending on the random start, which is why people run it a few times and keep the best.

GMM vs K-Means

K-Means is actually a special case of GMM (force spherical, equal covariances and hard assignments). GMM is strictly more expressive — elliptical, differently-sized, overlapping clusters, plus a measure of uncertainty. You still choose K; use BIC or AIC to compare.

Watch the ellipses grow to fit 2D data and points blend colours where clusters overlap:
https://dev48v.infy.uk/ml/day22-gmm-em.html