Active Inference — The Learn Arc, Part 37: Session §7.3 — Learning A and B

#activeinference #pomdp #ai #elixir

Series: The Learn Arc — 50 posts through the Active Inference workbench.
Previous: Part 36 — Session §7.2: Message passing and Eq 4.13 in depth

Hero line. Perception updates the belief over states. Learning updates the belief over parameters. Same Bayes rule, different timescale — and once A and B become Dirichlet, the agent stops being a script and starts being a learner.

From fixed matrices to Dirichlet beliefs

In Chapter 6 the four matrices were hand-authored constants. In Session 7.3 we replace each column of A and each B slice with a Dirichlet distribution over categorical parameters. The agent no longer knows the likelihood — it has a belief about the likelihood.

That one move buys you an agent that learns from experience, detects surprise, and calibrates its own confidence — with no new machinery.

Five beats

Dirichlet counts are the memory. Each column of A has a vector of pseudo-counts a. After observing (s, o), increment a[s, o] by one. That is the posterior update. Eq 7.10 in the book.
Expected A is just normalised counts. For inference (Eq 4.13) the agent uses E[A] = a / sum(a, axis=0). Early, priors dominate. Later, data dominates. The transition is automatic.
Perception vs learning = fast vs slow. Eq 4.13 runs every step. Eq 7.10 runs every step too — but the counts accumulate, so the effective change is slow. Same timescale math; different integration window.
B learns the same way, indexed by action. Each action gets its own Dirichlet count tensor b[:, :, u]. Act u, observe transition s → s', bump b[s', s, u]. The agent learns its own world-model from experience.
Confidence is a side-effect. The concentration of a Dirichlet — the sum of its counts — tells you how sure the agent is. Surprise (low sum) drives exploration in later EFE terms. You get calibrated uncertainty for free.

Why it matters

This is the session where Active Inference earns the word active. A fixed-matrix agent is a policy. A Dirichlet agent accumulates evidence, adjusts, and — because it knows how sure it is — chooses informative actions. Perception, learning, and exploration all fall out of the same update rule.

Quiz

After 100 observations in state s=2 all producing o=5, what does E[A[:, 2]] look like if the prior was uniform a = 1?
Why does a high Dirichlet concentration make the agent less exploratory in EFE terms?
What is different about updating B vs updating A?

Run it yourself

mix phx.server
# open http://localhost:4000/learn/session/7/s3_learning_a_b

Cookbook recipe: learning/dirichlet-a — runs an agent for 200 steps with a uniform Dirichlet prior on A and plots how E[A] converges toward the true likelihood column-by-column. Follow up with learning/dirichlet-b to see the transition tensor converge under exploration.

Part 38: Session §7.4 — Hierarchical active inference. Stacking POMDPs. The top level models slow, abstract state; the bottom level models fast, concrete state; messages flow both ways through Eq 4.13. The session where "scaling up" actually means adding a layer, not rewriting the loop.

Powered by The ORCHESTRATE Active Inference Learning Workbench — Phoenix/LiveView on pure Jido.