Bird Meadow: a multi-agent Active Inference world I'd like the community to poke holes in

#activeinference #elixir #opensource

TL;DR. I'm Michael Polzin. I just shipped, as open source, a multi-agent Active Inference world — birds that hear and sing — running on top of audit-corrected variational free energy / expected free energy math from Parr, Pezzulo & Friston (2022, MIT Press). It's pure Elixir on the BEAM (Jido v2.2.0 — no Python, no LangChain). 78 tests pass. Five audit anchors verified against a brute-force forward-backward ground truth. Six scenarios reproduce visually in a Phoenix LiveView at /labs/meadow.

I am asking the Active Inference / Elixir / scientific-computing communities to poke holes in this. If the math is wrong, or if my falsifiable empirical claims don't reproduce, I want to hear it now — publicly, with the receipts attached. The repo is below.

Repo: https://github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench
Latest commit: 650a185 (2026-05-07)

What's verified

Five audit anchors corresponding to claims about the variational inference identity, each tested against a brute-force forward-backward HMM (AgentPlane.ExactInference) on small enumerable bundles:

F[q] >= -ln p(y) — agent_plane/test/meadow/vfe_bound_test.exs. Passing for every length-3 obs sequence under stay/stay and flip/stay actions, with exact-marginal q, uniform q, and point-mass-wrong q.
ELBO[q] <= ln p(y) — agent_plane/test/meadow/elbo_bound_test.exs. Passing under same conditions.
q (recognition) vs p(eta given y) (exact posterior) code-path separation — agent_plane/test/meadow/q_vs_p_naming_test.exs. Code-grep + spec-level enforced; the two cannot collide in source.
Inter-agent CI (Markov-blanket) partition — agent_plane/test/meadow/blanket_ci_test.exs. Replay determinism with :argmax selection: bird A's beliefs are bitwise-identical when bird B is replaced by a scripted-action stand-in.
No thermodynamic over-claim — agent_plane/test/meadow/no_thermo_overclaim_test.exs. Recursive lint over apps/{agent_plane,world_plane}/lib for enthalpy/helmholtz/gibbs outside disclaimed docstrings.

A subtle thing I caught while writing this: my first textbook chain VFE used log(B * q_prev) (the Jensen-tightening form). The mean-field bound F[q] >= -ln p(y) requires log(B) * q_prev (the "expected log") instead. Both are valid VFE decompositions, but only the latter satisfies the joint mean-field bound that the audit anchor cites. The bound test specifically exercises the textbook form. If you want to nitpick this further I'd love the conversation.

What's visible in the live UI

mix phx.server then http://localhost:4000/labs/meadow. Click cells to place birds, pick a tier (Convergent, Simple, Complex, Resonant), pick a preferred song token (t1-t4), press Start.

I drove six scenarios end-to-end through the LiveView in Chrome:

Scenario	Setup	Outcome
A	Same-prior ConvergentBirds at corners of 8x8, distance 14	Cluster at distance ~5 by t=321 (reached distance 1 at t=65)
B	Orthogonal-prior pair, same setup	Looser cluster, distance ~3 at t=176
C	SimpleBirds (uniform-A on hearing factors) at corners	Never moved. Birds only sing. Audit prediction confirmed
D	4 ConvergentBirds, mixed t1/t2 priors	Clusters form, but cross token boundaries at v1
E	4x4 grid, same-prior pair always in hearing range	Tight tracking - Bird 2 picks `move_north` toward singing Bird 1
F	UI safety guards (duplicate, empty start, remove, reset)	All work as designed

What I am being honest about

These are real, named limits — not hidden:

ConvergentBird is drawn to any audible source. Token preference modulates the strength of attraction, not its presence. Matching priors give a tighter cluster (Experiment 1: median 4 vs 8 control) but orthogonal-prior pairs still drift together. Stronger token discrimination would need a partner_token-conditional A-factor structure.
Call-response at policy_depth >= 2 is throttled by Jido's per-action 60s timeout at experimental scale on 1000-dim observation matvecs in pure Elixir. The integration test passes at depth 1; the call-response hypothesis at depth 2 needs an Nx-backed math path. Documented in source.
ResonantBird's hierarchical meta-loop is currently a context-swap heuristic, not a full hierarchical Bayesian planner. The existing AgentPlane.Hierarchical is maze-coupled; rewiring for meadows is plumbing, not new science.
Spatial convergence required adding a tier. The original plan claimed SimpleBird would converge. It doesn't — SimpleBird's A is uniform conditional on state. ConvergentBird (5-state partner_bearing factor with a bearing-update B kernel) is the minimal POMDP factor structure that makes EFE produce a movement gradient. This is named honestly in the source moduledoc.

How to reproduce, locally, in under 5 minutes

git clone https://github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench.git
cd TheORCHESTRATEActiveInferenceWorkbench/active_inference

# Fast scientific suite (~60s on a laptop):
mix test apps/world_plane/test/worlds/ \
         apps/agent_plane/test/meadow_obs_adapter_test.exs \
         apps/agent_plane/test/bundle_builder/ \
         apps/agent_plane/test/meadow/ \
         apps/workbench_web/test/workbench_web/

# Run the experiments at smoke scale (~4 min):
mix test apps/agent_plane/test/meadow/experiment_one_test.exs \
         apps/agent_plane/test/meadow/experiment_two_test.exs \
         --include slow_experiment

# Open the UI:
MIX_ENV=dev mix phx.server   # then http://localhost:4000/labs/meadow

What I'd love from this community

Active inference researchers: is the partner_bearing factor honest to the spirit of Friston's framework? Are my audit anchors the right ones? What additional ones would you want?
Elixir / Nx people: what's the cleanest path to put the inner matvec on Nx so we can run policy_depth >= 2 within Jido's per-action timeout?
Anyone: clone, run, file an issue, send a PR. Tell me where the reasoning is wrong. I built this expecting to be corrected.

The commit message and project memory both say it: this build was done to take a previously-private audit and demonstrate it as working code, in public, with the math honest and the gaps named. If the community confirms — or refutes — any of this, the truth wins either way.

Built by Michael Polzin (THE ORCHESTRATE METHOD / LEVEL UP). Code is CC BY-NC-ND. The mathematical content is from Parr, Pezzulo & Friston (2022) Active Inference, MIT Press. Generated with substantial Claude Code pair-programming, all of which is reviewable in the commit history.