Active Inference — The Learn Arc, Part 36: Session §7.2 — Message passing and Eq 4.13 in depth

#activeinference #pomdp #ai #elixir

Series: The Learn Arc — 50 posts through the Active Inference workbench.
Previous: Part 35 — Session §7.1: Discrete-time refresher

Hero line. The Eq 4.13 softmax is not a trick. It is the exact message-passing update on the two-node factor graph {state, observation}. Session 7.2 makes that fact click.

From loop to factor graph

Session 7.1 wrote the loop. Session 7.2 zooms in on the one line where inference actually happens — and proves it is message passing, not pattern matching.

Picture the simplest factor graph you can draw: one latent node s, one observation node o, one factor A between them, one prior factor D. That is the POMDP at a single time step. Message passing says the posterior at s is the product of every incoming message — and when you take logs and normalise, you get Eq 4.13 character for character.

Five beats

Two incoming messages. The prior message from D carries log D. The likelihood message from the observed o through A carries log A[:, o]. Posterior log-belief is their sum.
Softmax is the normaliser, not a design choice. Turning log-beliefs back into a probability requires exp then divide by the partition. That is softmax — forced on you by probability axioms.
Predicted prior = message from the past. For t > 0, D is replaced by B · q(s_{t-1}) — the message the previous belief sent forward through the B factor. Same graph, one more edge.
Every "variational" label is a synonym here. Because the graph has no loops, belief propagation equals exact inference equals the variational posterior. No approximation. Eq 4.13 is the ground truth for this graph.
This is what scales. When Session 7.4 stacks layers, you add edges and factors — you do not change the message rule. Eq 4.13 is the primitive everything else composes.

Why it matters

If you remember one thing from Chapter 7, make it this: the softmax in Eq 4.13 is not an engineering flourish. It is the mathematically forced answer to "what is my posterior belief given this graph." That is why hierarchy, learning, and continuous-time all extend Eq 4.13 without replacing it.

Quiz

In the two-node graph, which factor carries log A[:, o] — the state node or the observation node's edge?
Why does the softmax in Eq 4.13 not need a temperature parameter?
What changes in the message from D when t advances from 0 to 1?

Run it yourself

mix phx.server
# open http://localhost:4000/learn/session/7/s2_message_passing_4_13

Cookbook recipe: inference/two-node-bp — builds the factor graph explicitly, runs one message-passing round, and prints the result alongside the Eq 4.13 softmax output. Watch them match to machine precision.

Part 37: Session §7.3 — Learning A and B. The matrices stop being fixed. Dirichlet counts accumulate on A and B as the agent acts; perception and learning become the same Bayesian update at different timescales. This is where Active Inference starts to look alive.

Powered by The ORCHESTRATE Active Inference Learning Workbench — Phoenix/LiveView on pure Jido.