Most agents are "state blind". I built an orchestration layer with a synthetic visual tree to give agents actual Episodic Memory (LanceDB + Postgres).

Rish Maniar — Wed, 24 Jun 2026 00:04:53 +0000

I’ve been building Atom (https://github.com/rush86999/atom), a self-hosted orchestration platform in Python/FastAPI.

(Full disclosure upfront: I designed the state machines and memory architecture, but I heavily used Cursor, Aider, and Claude Code to accelerate the boilerplate and test coverage. I use API providers for the LLM reasoning, but the memory, embeddings, and orchestration are entirely local.)

I love lightweight runtimes (like OpenClaw) for simple scripts, but they break on real business workflows (like processing invoices at my company). The issue isn't the model; it's State Blindness. Agents fire a tool call into the void and hallucinate success because they have no deterministic way to verify if the UI actually changed.

Dumping raw DOMs blows up the context window, and passing screenshots is incredibly token-wasteful. So I built this to handle state explicitly:

Synthetic Grounding (Canvas AI Accessibility)
Instead of screenshots or raw HTML, Atom injects a hidden, structured semantic description layer into its Canvas workspace. It’s basically an accessibility screen reader optimized for an LLM's context window. The agent "reads" this logical tree to ground itself visually before deciding on the next tool call.
True Episodic Memory
Slapping a vector database on chat logs is retrieval, not memory. Atom splits it:

Hot State: PostgreSQL handles the active Workflow State Machine.

Cold Memory: Every time the agent parses that semantic visual layer, it vectorizes the actual workflow state snapshot and stores it locally in LanceDB.

Local Embeddings: It uses FastEmbed (BAAI/bge-small-en-v1.5) by default, so embedding generation is 100% local and fast.

When the agent fails and retries a similar task later, it retrieves those specific visual snapshots to see what the application state looked like during the failure and self-corrects.

Execution & Governance You plug in your preferred API provider (OpenAI, Anthropic, DeepSeek, etc.) for the brain. Because I don't want an autonomous script having root access on day one, agents start in a sandbox ("Student" tier) and must maintain a high Readiness Score based on human-intervention rates before they are allowed to execute autonomously.

I'd love for this community to roast the memory architecture. Has anyone else tried using synthetic accessibility trees for local state grounding?

Repo: https://github.com/rush86999/atom

DEV Community: Rish Maniar

Most agents are "state blind". I built an orchestration layer with a synthetic visual tree to give agents actual Episodic Memory (LanceDB + Postgres).