Active Inference, The Learn Arc — Part 16: Session §2.2 — Why Free Energy, and What the Bound Buys You

#activeinference #bayesian #math #elixir

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 15: Session 2.1. This is Part 16.

The session

Chapter 2, §2. Session title: Why free energy? Route: /learn/session/2/s2_why_free_energy.

Session 2.1 named the problem: P(o) is intractable. Session 2.2 is where the book solves it with one move — swap the exact posterior for a tractable Q(s) and measure the gap.

The move

Variational inference in one identity:

F[Q, o]  =  KL( Q(s)  ||  P(s|o) )  −  log P(o)

Two terms. The first (KL divergence) is ≥ 0 by definition — because KL is always non-negative — and ≈ 0 when Q is close to the true posterior. The second is the surprise — the negative log evidence we couldn't compute.

Re-arrange:

F[Q, o]  ≥  − log P(o)

So:

Minimize F with respect to Q → Q approaches the true posterior (KL shrinks).
The minimum value of F is a tight upper bound on surprise.

That's Eq. 2.5 in the book. It is the single most important inequality in Active Inference. Everything after this is exploration of what you can do with it.

Why this buys everything

Three downstream wins, each is an entire chapter later:

Perception (Chapter 4, Eq. 4.13) — fix o, descend F with respect to Q(s). You get the posterior without ever computing P(o).
Action (Chapter 3, Eq. 3.7 + Chapter 4, Eq. 4.14) — run the same argument forward in time with o as a variable. You get expected free energy — the value of a plan.
Model evidence (Chapter 9) — the minimum value of F at convergence is the log Bayes factor comparing this model to an empty null. Fitting data to Active Inference models is just minimizing F twice — once over Q, once over the model's parameters.

Three jobs, one quantity, one gradient. The free-energy bound is the lever.

The linked lab

Session 2.2 is where the /cookbook/bayes-one-step-coin recipe stops being a toy. You're not just watching Bayes' rule — you're watching a one-hypothesis-pair variational inference where the agent's Q is a categorical over two hypotheses. When you feed the agent heads, the free-energy gradient pulls Q from uniform toward "biased" — without ever enumerating the observation space. In this toy it doesn't matter; in any interesting case it's the whole game.

Glass labels the update signal with equation_id. Open /glass/agent/<id> during the run, click the equation label, read the full record. Chapter 2 stops being an abstract chapter and becomes a line of code you can trace.

The two vocabularies worth learning here

Session 2.2 introduces the two terms that show up in every Active Inference paper you'll read:

Variational distribution Q — your tractable approximation to the true posterior. In the Workbench, Q lives in agent.state.beliefs.
Free energy F — the functional being minimized. The word "free" is thermodynamic baggage; ignore the metaphor, trust the math.

When an engineering paper says "we trained the VAE by minimizing the ELBO," they're minimizing −F. The ELBO is just free energy with the sign flipped. The Active Inference and deep-learning communities have been running the same optimization under different names since 2014.

The concepts this session surfaces

KL divergence — D_KL(Q || P) = ∑ Q(s) log [Q(s) / P(s)]. Non-negative; zero iff Q = P.
Jensen's inequality — why KL is non-negative.
Variational family — the set of tractable Qs you're restricting to.
Upper bound on surprise — the core property that makes F useful.

The quiz

Q: Minimizing F with respect to Q gives you:

☐ A better prior.

☐ An approximation to P(s|o) (the posterior). ✓

☐ The exact value of P(o).

☐ A discount on future reward.

Why: The minimum of F with respect to Q drives the KL-divergence term to zero, which means Q ≈ P(s|o). You do not compute P(o); you sidestep it by computing F.

Run it yourself

/learn/session/2/s2_why_free_energy — the session page.
/cookbook/bayes-one-step-coin — where F collapses to a Bayes step.
/cookbook/vfe-decompose-complexity-accuracy — F split into its complexity and accuracy terms.
/equations — filter by by_family: :vfe for the full VFE registry.

The mental move

Chapter 2's whole move, in one sentence: when you can't compute the posterior, compute something that bounds the posterior, then minimize the bound. Every interesting Active Inference paper after 2015 is a specialisation of that sentence.

Part 17: Session §2.3 — The cost of being wrong. We look at what free energy actually means when your model is wrong. Where the surprise lives, how the bound tightens, and why bad models can still be used if you know what "bad" is costing you.

⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

← Part 15: Session 2.1 · Part 16: Session 2.2 (this post) · Part 17: Session 2.3 → coming soon