Active Inference — The Learn Arc, Part 35: Session §7.1 — Discrete-time refresher

#activeinference #pomdp #ai #elixir

Series: The Learn Arc — 50 posts through the Active Inference workbench.
Previous: Part 34 — Session §6.3: Run and Inspect, the Closing Workflow

Hero line. Chapter 7 is the longest chapter in the book. Before it piles on learning, depth, and hierarchy, Session 7.1 pulls you back to the bare discrete-time POMDP loop — the on-ramp that makes every later addition cheap.

Why a refresher opens the muscle chapter

Chapters 4 and 6 taught the four matrices and wired them into a running loop. Chapter 7 is where the framework earns its keep: Dirichlet learning on A and B, hierarchical layers, continuous generalised coordinates. None of that makes sense without a clean mental model of the underlying discrete loop.

Session 7.1 is that clean model in one page.

Five beats

One time step, end to end. The agent holds a belief q(s_t), observes o_t, runs the Eq 4.13 update, enumerates policies, scores them with EFE, acts, and advances. That is the whole loop.
State, observation, action — in that order. Hidden state is what the world is. Observation is what leaks through A. Action is what the agent picks after scoring policies. Refreshing this order kills ninety percent of "why is it doing that" confusion before Chapter 7 loads more on top.
Eq 4.13 is the heartbeat. Posterior over states ∝ likelihood × prior. Softmax over log-A column plus log-predicted prior. Every fancier thing in Chapter 7 is a change to what you put in that equation, not a change to the equation.
Policies, not actions, are first-class. A policy is a sequence. Eq 4.14 scores them via EFE. The first step of the winning policy becomes the action. This distinction is what lets hierarchy and learning slot in cleanly in later sessions.
The matrices never move. A, B, C, D are the contract. 7.1 reminds you exactly which shape each one has and which question it answers before 7.3 starts learning A and B online.

Quiz

In one time step, which equation fires first — 4.13 or 4.14?
If A is the identity, what happens to the Eq 4.13 update?
Why does Active Inference score policies instead of individual actions?

(Answers are in the session transcript.)

Run it yourself

mix phx.server
# open http://localhost:4000/learn/session/7/s1_discrete_refresher

Recommended cookbook recipe: intro/loop-one-step — steps a single agent through exactly one Eq 4.13 → Eq 4.14 → act cycle and prints each intermediate value. Keep the matrices printed on a sticky note next to the window.

Part 36: Session §7.2 — Message passing and Eq 4.13 in depth. The refresher becomes the textbook derivation: softmax, log-A, predicted prior, and why the update is exactly belief propagation on a two-node factor graph. The first real muscle rep of Chapter 7.

Powered by The ORCHESTRATE Active Inference Learning Workbench — Phoenix/LiveView on pure Jido.