Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 17: Session 2.3. This is Part 18.
The session
Chapter 2, §4. Session title: Action as inference. Route: /learn/session/2/s4_action_as_inference.
Chapter 2 has been about perception. Session 2.4 is the chapter's final move — the move that makes the rest of the book possible. It shows that the same free-energy machinery you used to infer Q(s) can be used to pick actions, by letting the observation itself become a variable.
The flip
In Sessions 2.1–2.3, the observation o was fixed. You saw it; the agent inferred.
In Session 2.4, the observation o becomes expected — a function of the policy you might pick:
o → o_π = o sampled under policy π
Re-derive free energy with o_π instead of o, and instead of a scalar F[Q, o] you get a functional parameterized by the policy:
F[Q, o_π] = "if I followed policy π, how surprising would the resulting observation be?"
Now minimize this with respect to the policy. The policy that minimizes expected surprise is the policy that makes you see observations you already believe.
That's action as inference. No reward function. No value network. Just surprise minimization, run forward in time.
Why this is a quiet revolution
Most of reinforcement learning is built around reward: a scalar signal you hand the agent, and the agent maximizes expected total reward. Active Inference has no reward. Instead it has a preference distribution P(o) (Chapter 3 calls it C), and actions are picked to make observations match preferences.
The reframing sounds subtle until you see what it buys:
Curiosity for free. Actions that reduce ambiguity about the world (epistemic actions) reduce expected surprise, so they get selected even without any explicit reward. An Active Inference agent explores because exploration minimizes the same functional that goal-seeking does.
Goals are distributions, not scalars. Want the agent to reach state X and avoid state Y and stay near state Z? Put mass on all three in your
C. Multi-objective agency, no Pareto frontier.Behavioral equivalences get cheap. Because action minimizes expected free energy — and expected free energy factors into KL-to-preference plus entropy-of-sensor — you get a clean mapping from RL's reward-shaping tricks to Active Inference's
C-shaping tricks. Chapter 3 makes this exact.
The session's hero insight
"You pick actions the way you pick beliefs — by minimizing free energy."
One gradient. Two arguments (Q and π). Two cognitive operations (perceive and decide). One substrate.
Where it all goes
Session 2.4 is the launchpad for Chapter 3. Chapter 3's Expected Free Energy decomposes exactly this quantity — the F[Q, o_π] you just met — into risk (goal distance) and ambiguity (epistemic uncertainty). The decomposition tells you why an Active Inference agent explores when it needs information and exploits when it knows what to do, without needing two separate mechanisms.
Chapter 4 Eq. 4.14 then makes this concrete: softmax over −G(π) − F(π) gives you the policy posterior. Same F from Chapter 2, same decomposition from Chapter 3, one softmax. An Active Inference agent is this computation.
The first runnable demo
/cookbook/efe-decompose-epistemic-pragmatic is the recipe that takes Session 2.4's flip and shows it alive. The agent has two policies; for each, it computes expected free energy and splits it into risk + ambiguity. When you Run in Studio, you'll see the decomposition update per tick.
The session itself foreshadows this recipe — you don't run it yet. Session 2.4 sets up the argument; Chapter 3 earns it.
The concepts this session surfaces
- Action as inference — actions are variational parameters.
-
Expected observation —
oas a function ofπ. -
Expected surprise — what minimizing
F[Q, o_π]targets. -
Preference distribution
C—P(o)as the goal.
The quiz
Q: In Active Inference, "reward" maps to:
- ☐ A scalar signal from the environment.
- ☐ A preference distribution P(o) the agent tries to make true. ✓
- ☐ The policy posterior's entropy.
- ☐ The variational family's complexity.
Why: There is no scalar reward. The agent picks actions to make future observations match its preferred distribution C = P(o). Chapter 3 decomposes this into risk (goal distance) and ambiguity (epistemic uncertainty). Both fall out of the same gradient.
Run it yourself
-
/learn/session/2/s4_action_as_inference— session page. -
/cookbook/efe-decompose-epistemic-pragmatic— the running demo for Chapter 3 ahead. -
/cookbook/epistemic-curiosity-driver— pure exploration, no goal. -
/cookbook/preference-goal-vs-avoid— multi-modal C in action. -
/equations— EFE family equations.
The mental move
The book has now shown you that one functional handles perception (Chapter 2), and it's about to show you the same functional handles action (Chapter 3). Before you continue, pause: this is the unification claim from Session 1.1 being defended in writing for the first time. Sessions 1.1 through 2.4 were setup. Session 3.1 is the payoff.
Next
Part 19: Session §3.1 — Expected Free Energy. Chapter 3's opening. One equation. Two terms. The value of a plan as a bill with two lines — risk + ambiguity. The reason Active Inference works.
⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license
📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference
← Part 17: Session 2.3 · Part 18: Session 2.4 (this post) · Part 19: Session 3.1 → coming soon

Top comments (0)