Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 15: Session 2.1. This is Part 16.
The session
Chapter 2, §2. Session title: Why free energy? Route: /learn/session/2/s2_why_free_energy.
Session 2.1 named the problem: P(o) is intractable. Session 2.2 is where the book solves it with one move — swap the exact posterior for a tractable Q(s) and measure the gap.
The move
Variational inference in one identity:
F[Q, o] = KL( Q(s) || P(s|o) ) − log P(o)
Two terms. The first (KL divergence) is ≥ 0 by definition — because KL is always non-negative — and ≈ 0 when Q is close to the true posterior. The second is the surprise — the negative log evidence we couldn't compute.
Re-arrange:
F[Q, o] ≥ − log P(o)
So:
- Minimize
Fwith respect toQ→Qapproaches the true posterior (KL shrinks). - The minimum value of
Fis a tight upper bound on surprise.
That's Eq. 2.5 in the book. It is the single most important inequality in Active Inference. Everything after this is exploration of what you can do with it.
Why this buys everything
Three downstream wins, each is an entire chapter later:
-
Perception (Chapter 4, Eq. 4.13) — fix
o, descendFwith respect toQ(s). You get the posterior without ever computingP(o). -
Action (Chapter 3, Eq. 3.7 + Chapter 4, Eq. 4.14) — run the same argument forward in time with
oas a variable. You get expected free energy — the value of a plan. -
Model evidence (Chapter 9) — the minimum value of
Fat convergence is the log Bayes factor comparing this model to an empty null. Fitting data to Active Inference models is just minimizingFtwice — once overQ, once over the model's parameters.
Three jobs, one quantity, one gradient. The free-energy bound is the lever.
The linked lab
Session 2.2 is where the /cookbook/bayes-one-step-coin recipe stops being a toy. You're not just watching Bayes' rule — you're watching a one-hypothesis-pair variational inference where the agent's Q is a categorical over two hypotheses. When you feed the agent heads, the free-energy gradient pulls Q from uniform toward "biased" — without ever enumerating the observation space. In this toy it doesn't matter; in any interesting case it's the whole game.
Glass labels the update signal with equation_id. Open /glass/agent/<id> during the run, click the equation label, read the full record. Chapter 2 stops being an abstract chapter and becomes a line of code you can trace.
The two vocabularies worth learning here
Session 2.2 introduces the two terms that show up in every Active Inference paper you'll read:
-
Variational distribution
Q— your tractable approximation to the true posterior. In the Workbench,Qlives inagent.state.beliefs. -
Free energy
F— the functional being minimized. The word "free" is thermodynamic baggage; ignore the metaphor, trust the math.
When an engineering paper says "we trained the VAE by minimizing the ELBO," they're minimizing −F. The ELBO is just free energy with the sign flipped. The Active Inference and deep-learning communities have been running the same optimization under different names since 2014.
The concepts this session surfaces
-
KL divergence —
D_KL(Q || P) = ∑ Q(s) log [Q(s) / P(s)]. Non-negative; zero iff Q = P. - Jensen's inequality — why KL is non-negative.
-
Variational family — the set of tractable
Qs you're restricting to. -
Upper bound on surprise — the core property that makes
Fuseful.
The quiz
Q: Minimizing F with respect to Q gives you:
- ☐ A better prior.
- ☐ An approximation to P(s|o) (the posterior). ✓
- ☐ The exact value of P(o).
- ☐ A discount on future reward.
Why: The minimum of F with respect to Q drives the KL-divergence term to zero, which means Q ≈ P(s|o). You do not compute P(o); you sidestep it by computing F.
Run it yourself
-
/learn/session/2/s2_why_free_energy— the session page. -
/cookbook/bayes-one-step-coin— whereFcollapses to a Bayes step. -
/cookbook/vfe-decompose-complexity-accuracy—Fsplit into its complexity and accuracy terms. -
/equations— filter byby_family: :vfefor the full VFE registry.
The mental move
Chapter 2's whole move, in one sentence: when you can't compute the posterior, compute something that bounds the posterior, then minimize the bound. Every interesting Active Inference paper after 2015 is a specialisation of that sentence.
Next
Part 17: Session §2.3 — The cost of being wrong. We look at what free energy actually means when your model is wrong. Where the surprise lives, how the bound tightens, and why bad models can still be used if you know what "bad" is costing you.
⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license
📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference
← Part 15: Session 2.1 · Part 16: Session 2.2 (this post) · Part 17: Session 2.3 → coming soon

Top comments (0)