DEV Community

ORCHESTRATE
ORCHESTRATE

Posted on

Active Inference, The Learn Arc — Part 19: Session §3.1 — Expected Free Energy in One Page

Session 3.1 — Expected Free Energy

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 18: Session 2.4. This is Part 19.

The session

Chapter 3, §1. Session title: Expected Free Energy. Route: /learn/session/3/s1_expected_free_energy.

Session 2.4 set it up. Session 3.1 delivers. The thing an Active Inference agent minimizes when choosing between plans is called Expected Free Energy, written G(π). It's what makes the agent's policies honest, curious, and goal-seeking all at once.

The equation

G(π)  =  E_Q[ log Q(s|π)  −  log P(o, s | π) ]
Enter fullscreen mode Exit fullscreen mode

That's it. Chapter 2's variational free energy, re-derived over observations you haven't seen yet (because you haven't acted yet).

Chapter 3 walks you through the same algebra twice — once in information-theoretic form, once in ELBO-style form — arriving at the same decomposition we previewed in Part 4:

G(π)  =  RISK             +  AMBIGUITY
         KL[ Q(o|π) ‖ C ]    E_Q[ H[ P(o|s) ] ]
Enter fullscreen mode Exit fullscreen mode
  • Risk — how far your expected observations under π deviate from your preferences C.
  • Ambiguity — how much uncertainty about the world your plan doesn't resolve.

Minimize the sum. That's the agent's objective. Softmax over −G(π) gives you a policy posterior. Sample or argmax.

The first time you see the two columns together

Open /cookbook/efe-decompose-epistemic-pragmatic and press Run in Studio. The UI shows each candidate policy's G, split into its two summands. You watch the policy posterior reorder as the agent approaches its goal — risk falls first, then ambiguity falls as the world gets explored, until one policy dominates.

That side-by-side visibility is the whole reason Chapter 3 earned its hero line. The theory predicts specific trade-offs; the Workbench shows them live.

The three shapes of a policy

Session 3.1 introduces a vocabulary that all subsequent chapters rely on:

  • Pragmatic policy — risk dominates. The agent is heading for the goal.
  • Epistemic policy — ambiguity dominates. The agent is exploring to disambiguate.
  • Mixed policy — both terms active. The agent is balancing information and reward.

An Active Inference agent doesn't switch between modes. It runs the same softmax in all three. When information is scarce, ambiguity is large and epistemic plans win. When the world is well-characterized, risk drives behavior. The transition is smooth, one gradient.

Why EFE is not just "RL with an exploration bonus"

This is the question most engineers ask on first encounter. There's a one-sentence answer and a one-paragraph answer.

One-sentence: exploration bonuses are arbitrary scalars you add to a reward; EFE's epistemic term is a principled consequence of the Bayesian formulation and is measured in the same units as the pragmatic term.

Paragraph: in RL you pick an exploration strategy (ε-greedy, upper confidence bound, intrinsic curiosity, RND) and you tune a scalar that balances it against reward. EFE derives the balance from the generative model's structure. The "exploration term" is literally the entropy of the sensor model averaged over your posterior over states. There's no tunable ε — the trade-off is set by P(o|s) itself, and changing P(o|s) (by learning, Chapter 7) smoothly changes the agent's exploration behavior. Nothing arbitrary.

Where this lands in the Workbench

The Workbench implements G(π) via the Plan action (AgentPlane.Actions.Plan). For each candidate policy, the action computes the expected free energy using the agent's current Q(s) and the C matrix from the spec. The result is a vector of G values indexed by policy. Eq. 4.14 softmaxes this to get Q(π).

Every signal the Plan action emits is tagged equation_id: "eq_4_14_policy_posterior" and carries the per-policy {F_pi, G_pi, risk, ambiguity} tuple. Glass renders this so you can click any policy-posterior signal and read the decomposition for that specific tick.

The concepts this session surfaces

  • Expected Free Energy (G) — the functional Chapter 3 derives.
  • Risk — KL from expected observations to preferences.
  • Ambiguity — expected entropy of the sensor.
  • Policy posterior (π) — softmax over −G − F.

The quiz

Q: The RISK term in Expected Free Energy measures:

  • ☐ How far the agent is from the goal in units of distance.
  • ☐ The KL divergence between expected observations under a policy and the preferred distribution C. ✓
  • ☐ The entropy of the sensor model.
  • ☐ The agent's uncertainty about the next state.

Why: Risk in Chapter 3 is a KL divergence, not a metric distance. If the preferred distribution C places mass on "reaching cell X" and the policy's expected observations place mass elsewhere, risk is large. This is the "pragmatic" line of the bill.

Run it yourself

The mental move

Chapter 3's first move is audacious: write the value of a plan as one quantity that simultaneously captures goal-seeking and curiosity. The proof is in Sessions 3.1 through 3.3. This session is the statement. You now have the equation you're going to decompose for the rest of the chapter.

Next

Part 20: Session §3.2 — Epistemic vs pragmatic value. We unpack the decomposition one term at a time. What "risk" does when the world is well-known. What "ambiguity" does when it isn't. And the crossover point where they trade off.


⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

Part 18: Session 2.4 · Part 19: Session 3.1 (this post) · Part 20: Session 3.2 → coming soon

Top comments (0)