Active Inference — The Learn Arc, Part 42: Session §8.3 — Action on sensors

#activeinference #pomdp #ai #elixir

Series: The Learn Arc — 50 posts through the Active Inference workbench.
Previous: Part 41 — Session §8.2: Eq 4.19, the quadratic free energy

Hero line. In continuous time, action is the other way to drive F down. Perception changes the belief; action changes the world so the sensors finally agree with the prediction. Same equation, different variable.

The other gradient

Session 8.2 derived Eq 4.19 and then ran ∂F/∂μ to update the belief. Session 8.3 runs ∂F/∂a — the gradient with respect to action — and shows that motor control falls out without adding any new machinery.

Perception minimises F by updating μ. Action minimises F by moving the sensors themselves. The agent has two handles on the same quantity.

Five beats

Action enters through the sensors. The agent's predictions live in g(μ). Observations arrive as o. The sensory error is o − g(μ). Action changes o — indirectly, through the body and the world — so the agent can close the loop either by updating μ or by changing what the sensors report.
The gradient ∂F/∂a only "sees" the sensors. Because F depends on action only through the sensory term, the action policy is driven entirely by sensory prediction error. Elegant — and controversial.
Reflexes are action minimising F. A knee-jerk is "sensor disagrees with expected posture → muscle contracts to restore it." In Eq 4.19 language, that is pure ∂F/∂a descent with high sensory precision at the proprioceptive channels.
Goal-directed action = set the prior, let action chase it. Instead of rewards, the agent sets C (preferences) on the expected sensory trajectory. Action then drives sensors to match C. Desire is just a prior the sensors have not yet caught up to.
Precision weighting selects what action does. Which sensor channel has the highest precision determines what action will chase. Biology calls this "attention." The workbench exposes it as a slider.

Why it matters

This is the moment the framework stops needing a separate planner. In discrete time you enumerated policies and scored them with EFE. In continuous time you just follow -∂F/∂a. There is no separate "control" step — there is only free energy, flowing down two gradients at once.

Quiz

Why does ∂F/∂a contain only the sensory term, not the prior or dynamical terms?
What happens when sensory precision is very low compared to dynamical precision?
How does setting C to a non-zero expected trajectory produce goal-directed behavior?

Run it yourself

mix phx.server
# open http://localhost:4000/learn/session/8/s3_action_on_sensors

Cookbook recipe: continuous/action-gradient — a continuous-time agent that tracks a moving target. Toggle between "perception only" (freeze action) and "action only" (freeze belief) to see that each gradient alone fails; together they close the loop.

Part 43: Session §8.4 — Continuous play. The final Chapter 8 session. A free-form continuous-time playground: change the precisions on the fly, poke the agent, swap the world dynamics, watch the belief retune. The session that builds physical intuition for every knob you just met.

Powered by The ORCHESTRATE Active Inference Learning Workbench — Phoenix/LiveView on pure Jido.