ORCHESTRATE

Posted on Apr 20 • Edited on Apr 22

Active Inference, The Learn Arc — Part 2: Chapter 1 — Perception, Action, Learning as One Loop

#activeinference #ai #elixir #cognition

Series navigation — ◀ Part 1: Why I built a BEAM-native workbench for the Free Energy Principle · Part 3: Chapter 2 — The Low Road from Bayes to Free Energy ▶

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. Part 1 set the map. This is Part 2.

The one-sentence claim

Chapter 1 of Active Inference (Parr, Pezzulo, Friston — MIT Press 2022) makes a claim that is audacious in one line and then spends 230 pages proving it:

Perception, action, and learning are the same process — running in three directions on the same generative model.

The Workbench's canonical chapter metadata renders it as:

Perception, action, learning — one loop, one theory.

That's not a metaphor. It's a mathematical statement. The agent holds beliefs about hidden states of the world. It acts. The act changes the world. The world generates new observations. The observations update the beliefs. And all three steps — perceive, plan, act — are derived from the same variational free-energy functional.

If that sounds abstract, this post is where it stops being abstract. We're going to watch a real Jido agent close its first loop.

The loop

The Workbench implements the loop in WorkbenchWeb.Episode, a supervised GenServer that owns one episode. Every tick runs exactly this sequence:

1. WorldPlane.Engine.current_observation/1      -> ObservationPacket
2. AgentPlane.Runtime.perceive/2                -> beliefs updated   (Eq. 4.13 / B.5)
3. AgentPlane.Runtime.plan/1                    -> F, G, policy post. (Eq. 4.11, 4.10, 4.14)
4. AgentPlane.Runtime.act/2                     -> ActionPacket emitted
5. WorldPlane.Engine.apply_action/2             -> new world + next obs, terminal?
6. (optional) Dirichlet learners update A/B     (Eq. 7.10, B.10-B.12)

Six steps. Three of them (2, 3, 4) are perception, planning, and action. They all land inside the same agent's state. There is no separate "motor system" module, no separate "perceptual system" module. There's one generative model. The difference between perceiving and acting is which slot of that model you're inferring.

The Markov blanket

One more structural point before we run anything.

The book's Figure 1.1 puts the idea in one picture: the agent is separated from the world by a Markov blanket — a contract that says "only these signals cross." In the Workbench, that blanket is enforced at the code level.

Look at SharedContracts.Blanket. Two packet types, and nothing else, cross between the agent and the world:

ObservationPacket — world → agent. Sensory channels at time t.
ActionPacket — agent → world. One action at time t.

The agent app doesn't depend on the world app. The world app doesn't depend on the agent app. Both depend on shared_contracts. A test — plane_separation_test.exs — enforces this at CI time.

That's not a coding style choice. It's how Active Inference is actually defined. The agent is "whatever is on the other side of the blanket from the world." If your code lets them reach around the blanket, you haven't implemented the theory, you've implemented something else.

Watching it run

Open /world in the Workbench and you'll see the simplest possible demonstration:

Pick a world (start with Tiny Open Goal — it has a guaranteed two-step solution), leave the blanket at its defaults (6 observation channels, 4 cardinal actions), keep policy_depth: 5 and preference_strength: 4.0, click Create agent + world, then press Step.

What you'll see:

A 3×3 grid with an orange @ — that's the agent.
A green heat-map below it — those are the agent's marginal state beliefs at this tick. Bright means "I think I'm here."
A policy-posterior table — top-5 policies with their F (free energy) and G (expected free energy) values.
A history strip — what actions the agent has emitted so far.

Press Step again. The heat-map sharpens. The selected policy shifts. The agent moves. The maze re-renders.

You are looking at one loop, one theory. Each Step button press is one complete Perceive → Plan → Act cycle through the GenServer above. The F column is what the agent is solving for at step 2 (perception); the G column is what it solves for at step 3 (planning). They come from the same Gaussian bound on surprise — they're the same math, run over different time horizons.

Why this matters for the series

Chapter 1 is a preview of everything that follows. The sessions under it (open /learn/chapter/1) break the preview into three 8-15 minute workshops:

What are we even doing? — orients you to "agent + world + blanket."
The agent loop up close — walks through Perceive/Plan/Act on the Tiny Open Goal maze.
How an Active Inference agent differs from RL — sets up the contrast with reward-maximization that Chapter 2 will sharpen.

Each session carries a path-specific narration (kid/real/equation/derivation), an attributed book excerpt, a short quiz, and links to the labs where you can play with the same idea.

Tracked agents: from one-off demos to a workshop

/world gives you a fresh agent per click. That's fine for a first look. But once you start building real intuition, you'll want an agent that survives — one you can tweak, re-run, archive, resurrect.

That's what Studio is for:

From Studio you can:

Instantiate an agent from a saved spec (the Builder's Save+Instantiate button lands here).
Attach an existing agent to any world that implements WorldPlane.WorldBehaviour.
Stop / Archive / Trash / Restore / Empty trash — full lifecycle, with Mnesia-backed persistence so state survives a Phoenix restart.
Detach from an episode without killing the agent, so it keeps its learned Dirichlet parameters for the next run.

You don't need Studio yet. But when we get to Chapter 7 (Dirichlet learning) and Chapter 10 (composition), you'll be glad the lifecycle is already there.

Run it yourself

Everything in this post is clickable:

The chapter page: /learn/chapter/1 — read the three sessions.
The live loop: /world — pick Tiny Open Goal, click Step.
The Studio dashboard: /studio — preview the lifecycle surface.
The episode code: apps/workbench_web/lib/workbench_web/episode.ex — 6-step loop in ~450 lines.

Open /world, press Step 3 times, and ask yourself: where does the agent end and the world begin? That question has a precise answer in this codebase — the Markov blanket — and it's the question every subsequent chapter sharpens.

The mental move

Before Part 3, do this one thing:

Watch the belief heat-map update across three steps.
Notice that it sharpens (narrower peak) after every observation — that's perception.
Notice that the policy posterior reorders before the agent acts — that's planning.
Notice that the new position doesn't magically match the belief peak — that's how the agent learns to trust its model less when the world surprises it.

All three of those happen for the same reason, from the same functional. That's the claim. Chapter 2 is where we derive it.

Part 3: Chapter 2 — The Low Road to Active Inference. We start from Bayes' rule on a single coin flip and build up to the variational free-energy bound. Equations 2.1 through 2.12. Runnable as /cookbook/bayes-one-step-coin.

⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

← Part 1: Why I built this · Part 2: Chapter 1 (this post) · Part 3: The Low Road →