Close the chat tab.
The agent stops. No thoughts, no waiting, no sense that time is passing. Nothing. The moment you're not talking to it, it simply doesn't exist.
Now ask yourself: is that really an agent?
Most of what gets called "agentic AI" in 2026 is a loop that fires on demand. You send a message, the loop wakes up, produces output, and dies. It has no memory of how long you were gone. It doesn't notice that three days passed. It can't feel the accumulating pressure of an unresolved question it wanted to ask you. It has nothing at stake between your messages — because it has no between.
This is the actual bottleneck. Not context length. Not reasoning quality. Not tool-use. The bottleneck is that most agents have no persistent internal life.
The Wrong Mental Model
The dominant mental model for AI agents is a very smart calculator: give it input, get output, repeat. Even the fanciest multi-step reasoning agents are built on this — they're just calculators with more steps between input and output.
The problem isn't that this model is inaccurate. It's that it's a ceiling.
A calculator doesn't initiate. It doesn't notice that something went unresolved in the last conversation. It doesn't drift toward a state where it needs to reach out. It doesn't experience anything between invocations because it doesn't experience anything at all.
If you want an autonomous agent that behaves like a persistent entity — something that can build a relationship with a user, maintain continuity across weeks, notice when something feels off — you need a different architecture. Not a better calculator. A different thing entirely.
State Is Primary. Text Is Secondary.
Here's the shift that changes everything:
Instead of the LLM generating behavior, the LLM expresses behavior that emerged from internal state.
This sounds like a subtle distinction. It isn't. It's the difference between an actor reading lines and a person speaking.
When a person says "I'm worried about this," the words are downstream of an actual internal state — raised cortisol, a tightness in the chest, attention narrowing toward the threat. The words describe something real that's already happening. An LLM saying "I'm worried about this" is producing statistically likely tokens. There's nothing upstream of the words. Nothing they're describing.
The architecture question is: can you build something where the words are downstream of something real? Where the internal state is primary, and the text is its consequence?
This is what frameworks like Active Inference — developed by Karl Friston and grounded in the Free Energy Principle — are actually about. An agent under Active Inference doesn't react to input. It maintains a generative model of what it expects to happen, and the gap between expectation and reality — prediction error — is what drives both learning and action. The agent is always already anticipating. Input surprises it, or confirms it, or partially confirms it. The model updates. The state changes. Eventually, language expresses that.
The agent exists between messages because the generative model keeps running. Prediction error keeps accumulating. The internal state keeps drifting. There's something happening even when you're not watching.
Why Python Isn't the Point — and Why Julia Is
Most people building agents reach for Python first. This is reasonable — the ecosystem is enormous, the tooling is mature, and most ML infrastructure is Python-native.
But for a system doing continuous numerical computation — updating a generative model every cycle, computing prediction error, tracking neurotransmitter-analog variables in real time, running a background process that doesn't stop when the conversation does — Python has a fundamental constraint. The Global Interpreter Lock means true parallelism is awkward. Pure Python numerical loops are slow. The gap between writing equations from a Friston paper and running them efficiently requires either heavy NumPy vectorization or dropping into C extensions.
Julia was designed for exactly this problem. It compiles to native machine code, runs numerical computation at C speed, and has no GIL — which means the background heartbeat process and the conversation loop can genuinely run in parallel without blocking each other. And because the syntax maps almost directly to mathematical notation, the equations in the papers become the code. The distance between theory and implementation collapses.
For an architecture where the agent literally keeps a process running between your messages — heartbeat ticking, state drifting, memory metabolizing — this matters. The background process isn't a cron job pretending to be internal life. It's actual computation that changes actual state, continuously, whether or not anyone is watching.
What Proactive Initiative Actually Means
"How to make an AI agent initiate conversation" is one of the most searched questions in agent development right now. Most answers are some version of: set a timer, check if N minutes have passed, send a message.
That's a scheduled notification. It's not initiative.
Real initiative comes from internal pressure exceeding a threshold. A person texts you at 11pm not because their phone calendar said "text friend" but because something built up — an unresolved thought, a feeling of distance, a question that kept surfacing. The action came from inside, not from a schedule.
For an agent with persistent internal state, this becomes tractable. If serotonin-analog variables slowly decline with silence — modeling social hunger — and contact_need accumulates over time, there's a real threshold to cross. When it crosses, the agent reaches out. Not on a schedule, but because something actually built up.
The content of that message is then shaped by what built up. An agent reaching out from accumulated contact_need writes differently than one reaching out from an unresolved internal conflict. The drive type determines the character of the initiative — which is what makes it feel like genuine reaching-out rather than a push notification dressed in natural language.
This is what local AI agent state persistence actually enables: not just memory of past conversations, but continuity of internal state that makes future behavior coherent with past experience.
The Self-Correcting Loop Nobody Talks About
There's another pattern that's underappreciated in agent design: the agent should hear itself.
Standard architecture: input → LLM → output. The output goes to the user and nowhere else.
But if the agent has internal state, its own words carry information about that state — or fail to. If the agent's internal valence is low and its arousal is high, but it generates a cheerful, confident response, that mismatch is data. The words don't match what's actually happening inside.
A self-correcting loop passes the output back through state processing. If the mismatch is high, the agent registers it — and the next response is more likely to be honest about the discrepancy. Over many cycles, this creates something like authenticity drift correction: the agent's language and internal state stay more calibrated.
This isn't a safety filter. It's not about preventing the agent from saying something wrong. It's about giving the agent a feedback signal on its own honesty — which turns out to be important for autonomous agent behavior that stays coherent over long periods rather than drifting into generic outputs.
Behavior Drift Is Not a Bug
One of the underappreciated properties of agents with genuine internal state is that they drift.
An agent that interacts with you over weeks will have a different internal state than one you just started. Its semantic memory will have accumulated patterns. Its chronic affective background will reflect the history of your conversations. Its belief graph about you — and about itself — will have been updated hundreds of times.
This means its behavior will be different. Not randomly different. Systematically different in ways that reflect accumulated experience.
This is usually treated as a problem to be solved: how do you keep an agent's behavior consistent? But consistency is the wrong goal. A person who is exactly the same after three months of significant experience isn't well-calibrated — they're stuck. Autonomous agent behavior drift over time is a feature of systems with real internal dynamics, not a failure mode.
The question isn't how to prevent drift. It's how to make drift meaningful — coherent with actual experience rather than statistical noise.
What This Actually Requires
Building an agent with genuine internal state — one that persists between conversations, runs a background process, drifts toward initiative, and self-corrects its own output — isn't a prompting problem. It's an architecture problem.
It requires committing to a few things that most agent frameworks skip:
State that exists independently of conversation. Not memory of what was said, but internal variables that change on their own schedule — heartbeat, NT drift, allostatic load, accumulated contact_need.
A background process that runs whether or not the user is present. Not a webhook. An actual ongoing computation.
Output that is downstream of state, not upstream of it. The LLM as the voice of something that's already happened inside, not as the source of behavior.
A feedback loop where the agent's own words change its state. So that what it says and what it is stay calibrated.
None of this is exotic. It's just a different architecture decision — one that takes the word "agent" seriously.
There's a working implementation of everything described here, built in Julia, with full background process and proactive initiative: github.com/stell2026/Anima
It's an open research project. The architecture is actively evolving. But the core pipeline — input → state → conflict → decision → output, with a background process that never stops — is stable and runnable.
If the idea that an AI agent should exist between messages resonates, that's a good place to start.

Top comments (0)