Persona drift is an architecture problem, not a prompt problem

Haijun Wen — Mon, 08 Jun 2026 18:33:30 +0000

Most production conversational AI degrades the same way: after enough turns, the persona drifts. The voice you carefully prompted for averages out into something blander. People notice by about week six.

I spent the last year building a compound-AI engine called ArcOS around a single bet: language belongs to the model, but staying in character, following rules, and remembering belong in hard-coded logic — not in a system prompt.

The default that fails

One LLM call per turn: prompt + history → answer. As history grows, the model averages over inconsistent past turns. Persona drifts toward the mean. You can fight it with longer prompts, but you're patching a structural problem with text.

Five stages instead of one

ArcOS splits a turn into five stages:

Perception — translate the message into a structured signal.
Strategy — deterministic code decides what should happen.
Assembly — build the exact context the writer will see.
Generation — the model writes the reply.
Memory — extract, store, and recall facts for next time.

Stages 1, 4, and 5 use a language model. Stages 2 and 3 are deterministic code — 75 hard-coded decision blocks, not instructions hidden in a prompt.

Why assembly is the lever

The context the writer sees is built from fixed sections, with the persona/instruction block placed structurally last — at the tail. The writer model can't place anything after it, so the persona stays anchored no matter how long the conversation runs.

Memory that doesn't smear

Memory is its own stage. It stores interactions and recalls them semantically, with a bi-temporal design that separates when something happened from when it was recorded. You get precise recall instead of a summarized, lossy blur.

Model-agnostic by construction

Because persona, control, and routing live in code and config rather than the model, swapping the backbone (Gemma, Llama, Qwen) is a configuration change, not a rewrite. No single-vendor lock-in.

It runs in production today on web and desktop, on an NVIDIA H200 I own and operate, with a native iOS app in development. I built most of it solo, AI-native.