Active Inference, The Learn Arc — Part 3: Chapter 2 — The Low Road from Bayes to Free Energy

#activeinference #bayesian #math #elixir

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 2: Chapter 1. This is Part 3.

The hero line

The Workbench's canonical metadata renders Chapter 2 as:

From Bayes' rule to variational free energy — the minimal machinery.

That is the whole chapter. From an identity everyone already knows — Bayes' rule — to the single functional that the rest of the book runs on. Everything you'll hear about "the Free Energy Principle" on a podcast is built from four lines of algebra, and this chapter writes them down.

Start where you already are

If you've ever estimated anything from noisy data, you already know the identity at the top of the chapter:

P(state | observation) = P(observation | state) · P(state) / P(observation)

That's Bayes' rule, Eq. 2.1 in the book. The posterior on the left. The likelihood × prior in the numerator. The evidence in the denominator.

The evidence — P(observation) — is the villain of the whole chapter. It's an integral over every possible state the world could be in. In a maze with a few cells you can write it down. In a brain with billions of neurons, you absolutely cannot.

Chapter 2 is the book's answer to that problem.

The cookbook has 50 variants of this

Before we walk through the math, open /cookbook and you'll see the Workbench's running inventory of every Active Inference flavor the book covers, from Bayes on a coin all the way to hierarchical predictive coding:

For this chapter, the one to watch is bayes-one-step-coin. One coin, one observation, one posterior. Zero moving parts other than the math itself.

Each recipe card in the Workbench has:

A Math block with the relevant equation in LaTeX.
A four-audience explanation (kid / real-world / equation / derivation) that matches your learning path.
A Runtime section listing the exact spec, world, horizon, policy depth, and preference strength.
Cross-references to the book equations, the sessions, the labs.
Three clickable Run in Builder / Labs / Studio buttons — each boots a real Jido agent against a real world.

That cross-reference block matters. The equations aren't wallpaper; they're hyperlinks. Click /equations/vfe and you land on the full record for variational free energy — chapter, section, symbols table, dependencies, verification status.

The "low road" in four beats

Here's the chapter, compressed without losing shape.

Beat 1: marginalize the evidence away. The denominator in Bayes' rule is intractable. So instead of computing P(observation) exactly, pick a simpler distribution Q(state) and measure how far off Q is from the true posterior. The measure is KL divergence.

Beat 2: define free energy. Define a quantity F[Q, observation] that equals the KL divergence between Q and the true posterior plus the negative log evidence — the surprise of the observation.

F = KL( Q(state) ‖ P(state | observation) ) − log P(observation)

That's Eq. 2.5. Two terms. One you can't compute. One is non-negative by definition.

Beat 3: the inequality. Because KL divergence is ≥ 0:

F ≥ − log P(observation) = surprise

So minimizing F with respect to Q squeezes Q toward the true posterior AND upper-bounds your surprise at the observation. Two birds, one gradient step. That is Eq. 2.6 and it is the single most important inequality in the book.

Beat 4: action enters. Now swap the observation for one you haven't seen yet, re-run the same argument, and you get expected free energy — the value of a plan. But that's Chapter 3. Chapter 2 stops exactly here, having shown that belief updating is free-energy minimization in Q.

Why this matters for everything after

Three claims fall out for free — each one is an entire downstream chapter:

Perception is free-energy minimization in Q (Chapter 2's concluding move).
Action is free-energy minimization in the *observation* — the agent picks actions that produce observations it expects to see. (Chapter 3.)
Learning is free-energy minimization in the *parameters of the generative model* — A, B, C, D matrices getting updated by sufficient statistics. (Chapter 4 and 7.)

Three faces of one functional. Every subsequent chapter in the book picks one of those faces and zooms in.

Run it yourself

Open /cookbook/bayes-one-step-coin and scroll to the Math block. You'll see p(h | d) ∝ p(d | h) p(h). That's it. One identity.

Now click Run in Studio. A tracked agent instantiates with a uniform prior over two hypotheses {coin is fair, coin is biased}. You feed it one observation (heads). The posterior updates. The probability mass slides toward "biased." That's perception in its minimal form — the ratio of P(observation | biased) / P(observation | fair) doing one step of work.

Every subsequent recipe under the perception-* and bayes-* tags scales this up:

bayes-sequential-urns — sequential evidence; 5 urns, update as observations arrive.
bayes-odds-log-linear — reframe Bayes as summing log-odds; gives you intuitions for neurons as evidence accumulators.
perception-sweep-iteration-budget — what happens when the agent can only afford k message-passing iterations per step.

All 50 recipes are under /cookbook. Filter by tag bayesian for the first eight, then by perception for the next batch.

The Glass Engine point

Every signal the agent emits while running any of these recipes is tagged with the equation that produced it. Open /glass/agent/<id> during a run and you'll see the posterior update, labeled with Eq. 4.13 (state-belief update) inline. The codebase can't tell you the story Parr/Pezzulo/Friston are telling in words; but it can show you which line of math fired at which step, and that's the bridge between reading the chapter and trusting it.

The mental move

Bayes' rule is not about coins or urns. It's about how evidence updates belief. Chapter 2's move is to turn that one-liner into a quantity you can actually minimize — because brains don't compute integrals, they descend gradients.

Minimize F[Q, o] → you get the posterior (perception).
Minimize F[Q, policy] → you get a plan (Chapter 3).
Minimize F[θ, data] → you get learning (Chapter 4 and 7).

One functional. Three jobs. That's "the low road."

Part 4: Chapter 3 — The High Road to Active Inference. We add time. One plan has two columns of cost — risk (pragmatic: your observations don't match your preferences) and ambiguity (epistemic: you don't know enough yet). Expected Free Energy. Softmax over policies. Why an Active Inference agent is simultaneously curious and goal-directed with no extra knobs. Runnable as /cookbook/efe-decompose-epistemic-pragmatic.

⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

← Part 2: Chapter 1 · Part 3: The Low Road (this post) · Part 4: The High Road →