Active Inference, The Learn Arc — Part 8: Chapter 7 — POMDPs, Sophisticated Planning, and Dirichlet Learning

#activeinference #pomdp #ai #elixir

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 7: A Recipe for Designing. This is Part 8.

The hero line

POMDPs in full colour — message passing, Dirichlet learning, hierarchy.

Chapter 7 is the longest chapter in the book, and for good reason. It takes the machinery of Chapters 1–6 and turns the volume on every knob. Planning gets deeper (sophisticated tree search). Learning gets real (online Dirichlet updates to the A and B matrices). Composition gets recursive (hierarchical agents whose high level is another agent's low level).

If you only read one chapter of Active Inference with a runtime open in the other tab, read this one.

Three additions stacked on the Chapter 6 template

Every agent in Chapter 7 still fits the six-question template from Chapter 6. But three engines get new teeth.

Teeth 1 — Sophisticated planning

Chapter 6 planned over policies of a fixed horizon with a flat expansion. Chapter 7 introduces belief-propagated tree search: propagate beliefs forward through each candidate action, then score the leaves by G, then propagate the scores back up the tree, weighted by the likelihood of actually landing in that branch under your current model.

The result is a policy posterior that reflects which plans are likely to be on-path given the observations you'll actually collect. Shallow plans that look great on paper can get down-weighted because the belief-propagation finds they're unlikely to survive their own noise.

/cookbook/sophisticated-plan-tree-search runs an agent with SophisticatedPlanner as its Plan block. You'll watch deeper policies emerge — plans that commit epistemic actions early, exploit later, and survive worlds that punish naive greedy search.

Companion recipes:

/cookbook/sophisticated-plan-vs-naive — side-by-side with the Chapter-4 default planner.
/cookbook/sophisticated-plan-prune — how aggressive pruning changes which branches survive.
/cookbook/sophisticated-plan-commitment — when committing to a plan early buys you more than re-evaluating every tick.

Teeth 2 — Dirichlet learning (Eq. 7.10)

The agent arrives with priors over A and B. Chapter 7 shows what happens when those priors update every time the agent sees something.

A Dirichlet distribution over the columns of A is a conjugate prior for a categorical likelihood. When the agent observes (state, observation) pairs, the Dirichlet parameters update by simple addition: add 1 to the count at the cell you just saw. After enough data, the posterior A converges to the true P(o|s) of the world.

Same story for B — the agent builds a better transition model from its own trajectories.

/cookbook/dirichlet-learn-a-matrix instantiates an agent with a diffuse Dirichlet prior on A, runs it for N ticks against a noisy world, and you watch the posterior A sharpen column by column. The Glass trace labels each update equation_id: "eq_7_10_dirichlet_a" — you can audit every sample that shifted a count.

Companion recipes:

/cookbook/dirichlet-learn-b-matrix — same for transitions.
/cookbook/dirichlet-concentration-prior-effect — how prior strength traded off against data rate.
/cookbook/dirichlet-forget-then-relearn — what happens when the world changes; forgetting rates and how to tune them.
/cookbook/dirichlet-learn-and-plan-simultaneously — planning under a moving model.

Teeth 3 — Hierarchy

The final move of Chapter 7 is the one that scales. Take the agent's current level and treat its state beliefs as another level's observations. The higher level infers over slow-changing latents (contexts, intentions, task identity) whose job is to modulate the lower level's A/B matrices.

The math is just another nested application of Eq. 4.13. The architectural win: hierarchical agents can reason about what task they're in while still acting fluently within it.

/cookbook/hierarchical-context-switch runs a two-level agent in a world that changes regime halfway through. The high level notices the change (its state belief flips) and reconfigures the low level's transition model. You can watch the composition in action.

Companion:

/cookbook/hierarchical-timescale-separation — why the high level must run slower than the low level.

The muscle payoff

Stack the three teeth and you get an agent that:

Plans deep enough to be strategic (sophisticated tree search).
Learns its own world model online (Dirichlet on A, B).
Adapts to regime changes (hierarchy).

And it's all one functional. All the machinery is still Eq. 4.13 + Eq. 4.14 + Eq. 7.10, applied at different scales.

The five sessions

Chapter 7 has five sessions under /learn/chapter/7:

Discrete-time refresher — fast recap of Chapters 4–6 before we add depth.
Message passing (Eq. 4.13 in full) — the forward + backward sweep fully annotated.
Learning A and B (Eq. 7.10) — the Dirichlet update as a count-adder.
Hierarchical agents — the two-level composition pattern.
Worked example — build one, run one, read its trace.

BEAM payoff: hierarchical composition

Hierarchical agents are where BEAM pays dividends that no Python implementation matches. Each level is a separate Jido.AgentServer — supervised process, its own state, its own mailbox. The two levels exchange Jido.Signals (not raw state). The scheduler handles concurrency for free.

When an agent crashes (let's say a message-passing iteration diverges and crashes the Perceive step), OTP's supervisor restarts just that level. The other levels keep running. The composition is fault-tolerant in a way that matters the moment you put hierarchical agents into production.

Run it yourself

/cookbook/sophisticated-plan-tree-search — the deep-planning flagship.
/cookbook/dirichlet-learn-a-matrix — online A learning.
/cookbook/dirichlet-learn-b-matrix — online B learning.
/cookbook/hierarchical-context-switch — hierarchy at work.
/cookbook/dirichlet-learn-and-plan-simultaneously — plan + learn together.
/learn/chapter/7 — all five sessions.

The mental move

Chapters 4 and 6 gave you the template. Chapter 7 teaches you what Active Inference can do once you let planning go deep, let learning go online, and let models stack hierarchically. This is the chapter your colleagues will recognize as actual capability.

Part 9: Chapter 8 — Active Inference in Continuous Time. Motion of the mode, generalised coordinates, Eq. 4.19 fully unpacked. The continuous-time twin of everything we just built, and the chapter that makes predictive coding's gradient-stack structure visible.

⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

← Part 7: A Recipe for Designing · Part 8: Discrete Time (this post) · Part 9: Continuous Time → coming soon