DEV Community

ORCHESTRATE
ORCHESTRATE

Posted on

Active Inference, The Learn Arc — Part 15: Session §2.1 — Inference as Bayes' Rule, and the Hole Free Energy Fills

Session 2.1 — Inference as Bayes' rule

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 14: Session 1.3. This is Part 15.

The session

Chapter 2, §1. Session title: Inference as Bayes' rule. Route: /learn/session/2/s1_inference_as_bayes.

You've done Chapter 1. You know the claim. You've seen one loop close in /world. Now Chapter 2 gets to work. Session 2.1 is where the book starts proving Session 1.1's one-liner — and it starts with the most familiar identity in probabilistic reasoning: Bayes' rule.

The hero identity

P(state | observation) = P(observation | state) · P(state) / P(observation)
Enter fullscreen mode Exit fullscreen mode

Posterior equals likelihood-times-prior divided by evidence. That's Eq. 2.1 in the book. It's Session 2.1's entire topic in one line.

Three things to notice about that identity:

1. The numerator is trivial to compute. Given a model P(o|s) and a prior P(s), you multiply. Done.

2. The denominator is a nightmare. P(o) is the marginal — the probability of seeing this observation across every possible hidden state. In a world of 4 corridor cells, it's a sum of 4 terms. In a brain with billions of plausible states, the integral is intractable.

3. Without P(o) you cannot normalize. The numerator gives you proportional posterior mass. To get a probability distribution you need the normalizer. And the normalizer is the nightmare term.

Session 2.1 doesn't solve the problem. It names the problem. The solution — variational free energy — arrives in Session 2.2.

What the session page shows you

Open /learn/session/2/s1_inference_as_bayes and the path-specific narration gives you the identity in your vocabulary:

Kid

Your brain plays a guessing game. It starts with a hunch, sees a clue, and updates. Every new clue changes the odds, but only a little. That's Bayes' rule.

Real-world

Bayes' rule says: take what you already believed (prior), multiply by how likely the evidence is if your belief were true (likelihood), normalize. Out pops what you should believe now (posterior). The catch: normalizing is hard.

Equation

P(s|o) = P(o|s) · P(s) / P(o). The numerator is tractable; the evidence P(o) requires marginalizing over all states. For interesting problems this sum is exponentially expensive — and this is the hole variational inference fills.

Derivation

Bayes' identity in its general form P(s|o) = P(o|s)P(s) / ∑_s' P(o|s')P(s') treats the evidence as a constant normalizer. In high-dimensional hidden-state spaces, that sum is a sum over configurations; in continuous spaces it's an integral. Both are generally intractable. The minimum-KL variational approach of Chapter 2's next section replaces the exact posterior with a tractable family Q and upper-bounds the negative log evidence (the "surprise") by F[Q] = D_KL(Q ‖ P(s|o)) − ln P(o).

The most important exercise in the chapter

Open /cookbook/bayes-one-step-coin in another tab. You'll see the minimum recipe that exercises Eq. 2.1:

  • Two hypotheses: {coin is fair, coin is biased}.
  • One observation: a single toss outcome, heads or tails.
  • One prior: uniform P(fair) = P(biased) = 0.5.
  • One likelihood: P(heads | fair) = 0.5, P(heads | biased) = 0.9.

Click Run in Studio and a tracked Jido agent instantiates with that prior. Feed it heads. The Eq. 4.13-style belief update (which in this minimal case reduces to Bayes' rule exactly) shifts the posterior toward "biased."

That one step is Bayes' rule running. The Workbench's Glass trace labels the signal with equation_id: "eq_4_13_state_belief_update" (Eq. 4.13 is the general form; in a one-hypothesis, one-observation case it collapses to Eq. 2.1). Click the equation label in Glass and you land on the equation page, chapter cited, verification status visible.

The concepts this session surfaces

Four glossary chips on Session 2.1's page:

  • posteriorP(s|o), the quantity we want.
  • likelihoodP(o|s), the observation model.
  • priorP(s), what you believed before.
  • evidenceP(o), the intractable normalizer.

Every downstream chapter has opinions about at least three of these. Session 2.1 introduces them clean.

The linked Workbench stops

Session 2.1 points at:

The one-page history

Session 2.1 ends with a quiet history move: "Bayes' rule has been the right answer since 1763. The reason Active Inference is new is not that the rule changed, but that we finally have a tractable way to approximate the evidence term for real brains." The rest of Chapter 2 walks that claim.

The quiz

Q: The "hard part" of applying Bayes' rule at scale is:

  • ☐ Computing the prior.
  • ☐ Computing the likelihood.
  • ☐ Computing the evidence P(o). ✓
  • ☐ Normalizing doubles to probabilities.

Why: The prior and likelihood are modeling choices; you pick them. The evidence requires summing (or integrating) over every possible hidden state, which is exponentially expensive for interesting models. That's the problem Chapter 2's next session solves.

The mental move

When you next read a paper that says "we used variational inference to approximate the posterior" — this is what they mean. They had Bayes' rule, the evidence term was intractable, they swapped in a tractable Q and minimized divergence. Session 2.1 is the setup. Session 2.2 (in Part 16) is the payoff.

Run it yourself

Next

Part 16: Session §2.2 — *Why free energy?* We answer the problem Session 2.1 posed. Variational inference. KL divergence. The free-energy bound. One of the book's three big equations, derived line by line.


⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

Part 14: Session 1.3 · Part 15: Session 2.1 (this post) · Part 16: Session 2.2 → coming soon

Top comments (0)