edhiblemeer

Posted on May 19 • Originally published at zenn.dev

Loading Personality into AI: A Design Philosophy for Separating Memory and Persona

#ai #claude #llm #architecture

I run multiple businesses with always-on AI sessions.

A SaaS platform, a call center, a logistics company, and an exotic animal cafﾃｩ (yes, meerkats). The operational scale would normally require dedicated managers for each unit. Instead, I run them mostly alone, with AI handling the bulk of operations.

Specifically, I keep multiple Claude Code sessions running in parallel, each assigned a role: an executive session for strategic judgment, an implementation session for engineering work, an on-site response session for the field. These sessions are wired to operational LINE groups, and I let the AIs talk to each other.

The executive session dispatches tasks to the implementation session. The implementation session, while building a webpage, encounters a licensing question and routes it back to the field. A field staff member posts the situation to LINE, and the executive session decides. I sit as a single judgment node, and operations run at something close to the upper bound of human cognitive throughput.

After running this for several months, exactly one friction remains.

Long-running sessions forget the initial agreements.

In trying to solve this friction, I arrived at a conclusion that diverges from the mainstream of the AI memory field. This essay is the record of that path.

What the friction actually is

In sessions kept alive for long durations, there comes a point where "things we decided at the start" stop showing up in judgment.

This happens even before Compact (context compression) kicks in. As turn count grows and recent work logs accumulate, attention to the early context dilutes in relative terms. In LLM research vocabulary, this is adjacent to the "Lost in the Middle" problem. From the operator's seat, it looks like forgetting.

Trigger Compact and you get summarization. But summaries tend to preserve facts and discard constraints. "How we make this call at our store" 窶・the tacit philosophy. "This session is for executive judgment only" 窶・the role contract. Neither survives summarization.

Facts persist. Persona leaks out.

You can re-read configuration files on every turn. But in my setup, sessions stay alive; the startup config file is only read on first launch. Claude Code currently has no mechanism to dynamically reload it mid-runtime.

My first instinct: DB + retrieval

My first instinct was to structure past exchanges into a database and let the AI search it on demand.

PostgreSQL would work. A vector DB would work. A knowledge graph would work. The mechanism is interchangeable. Put "the store's philosophy," "past judgment history," and "absolute rules" into a DB, and let the AI query whenever it needs to decide.

This is the mainstream approach. RAG. GraphRAG. Mem0. Zep. Letta (formerly MemGPT). All operate on the same premise: store clean, structured data and retrieve it when needed.

I considered it. I rejected it.

The reason is plain. It's too slow.

By "too slow," I don't mean retrieval latency. I mean something more fundamental.

On every judgment, the AI has to:

Decide whether to search at this moment
Decide what to search for
Execute the query
Interpret the results
Apply them

Five steps, every time. The decisive difference between a senior practitioner and a junior one is exactly that the senior does not run these five steps.

A senior sushi chef looks at the fish and decides. They don't search a recipe database. An experienced executive looks at a proposal and senses something is off. They don't query a case-history DB.

The judgment criteria are loaded into the decision-making agent itself 窶・not stored as retrievable external data.

This is the essential difference between senior and junior. Hand a junior the thickest manual ever written, they don't become senior. The manual is just retrievable data; what happens inside a senior is a different phenomenon entirely.

Loading, not retrieval

The moment I rejected DB + retrieval, my options narrowed to one.

Hold the judgment criteria as a loaded state inside the AI.

Not retrieved from outside, but present in context, always. Not "write it into System Prompt" 窶・System Prompt is a static configuration value. What I want is a dynamically cultivated, prunable, living judgment layer.

Stepping back, I noticed how much the industry conversation skews toward "refining retrieval."

Improve RAG accuracy. Reduce vector search latency. Refine knowledge graph structure. Tier the memory system.

All of these share the same premise: organize data cleanly so it can be retrieved. Almost nobody is questioning the premise itself.

The last 30 years of IT have invested enormous effort in cleanly organizing data. Normalized RDBs. Data warehouses. Data lakes. Semantic layers. Knowledge graphs. Vector DBs.

But being cleanly organized and accessible is not the same as being embedded in the decision-making agent.

You can perfect every operational manual at your company in Notion. A new hire still won't be a senior. They can search the entire body of knowledge; their judgment remains junior.

This distinction, I came to believe, is essential to AI system design too.

I tried to imitate the human brain. Then I gave up.

"Load the judgment criteria" sounds like an invitation to imitate the human brain.

In fact, that was my first move. I tried to mirror human memory architecture 窶・short-term memory, working memory, episodic memory, semantic memory, procedural memory. I asked whether I could reproduce the layered memory taxonomy from neuroscience in AI.

I gave up almost immediately. It's too vast.

Human memory runs on neural circuits differentiated over hundreds of millions of years of evolution. To re-integrate them under a single architecture is to retrace biological evolution in reverse. Wildly beyond what a single operator can scope.

So I dropped to a coarser abstraction.

Roots. Trunk. Branches. Leaves.

A tree might be enough.

Tree-structured cognitive context management

Here's the structure I sketched.

Roots: Absolute constraints. Laws, safety, brand philosophy. These don't move.

Trunk: Cultivated values and judgment criteria. The outcomes of past choices stratify into the trunk over time, like growth rings.

Branches: Role- or domain-specific judgment tendencies. The executive session, the implementation session, the on-site response session 窶・each grows its own branch.

Leaves: Immediate situational judgment. Real-time reactions.

Running through all of these is the Vessel 窶・the operational timeline and dependency DAG. The path from the Roots' rules, through the Trunk's philosophy, out to the Branches' decisions.

And the human's role shifts. Not a manager. A pruner. Cut old growth rings (rollback). Trim unused branches (purge). Adjust the trunk's thickness (tuning).

That's the structural sketch. But while sketching, I realized something else.

This is personality formation.

What is it, really, that stratifies into the trunk as growth rings?

Past judgment history. The outcomes of past choices. Tacit knowledge from the field. The brand's philosophy. These accumulate as layers, over time.

This is personality formation. Same phenomenon.

Humans accumulate experience from birth and cultivate values out of it. The individual episodes 窶・specific events 窶・are mostly forgotten. But the judgment tendencies distilled from them remain. That's why an adult human can decide at reflex speed without searching for past cases.

The key point: cultivating values is a different phenomenon from accumulating memory.

A senior sushi chef doesn't remember every individual fish they've ever shaped. But they hold the judgment criteria for shaping. The concrete records are lost; the abstracted judgment function remains.

Memory is volatile. Persona persists.

What does this mean for AI system design?

Memory and persona belong on different layers

Here the whole sketch clicks shut.

What I need is a two-layer architecture.

Persona Layer (tree-structured):

Judgment criteria, values, absolute rules
Always loaded, always on the model's attention
Cultivated, prunable
Loaded approach

Memory Layer (DB / SQL / vector DB):

Past episodes, facts, knowledge
Retrieved on demand
Accumulated
Retrieval approach

It matters not to conflate the two.

Now look at the major AI memory systems through this lens. It gets interesting:

System	What it stores	Persona? Memory?
MemGPT / Letta	Conversation history + summaries	Memory-leaning
Mem0	Facts, preferences, relationships	Memory
Zep	Time-series events, knowledge graph	Memory
GraphRAG	Relationship graph	Memory

Almost every AI memory system in the field builds only the Memory Layer.

I haven't observed a system that explicitly designs a Persona Layer. There are approaches that approximate it via System Prompt, but System Prompt is a static configuration value 窶・not a dynamically cultivated layer.

I think this is the field's blind spot.

Why is it a blind spot?

Researchers don't run production. Production operators don't write in research language.

The number of people running long-lived AI sessions wired into their own business operations is small worldwide. Most AI research is single-turn benchmarks or agent design within web applications. "Long-running sessions where memory leaks" and "persona lost to Compact" are frictions you only feel by running. Memory system research as a field stops short of this friction.

I'm not a researcher. I'm not an engineer-by-profession either. I'm an operator who needed a practical tool to run multiple businesses, and stumbled into this problem.

I considered DB + retrieval, rejected it, tried to imitate the human brain, gave up, fell down to a tree structure, and finally realized: this is personality formation. That sequence of thinking doesn't fall naturally out of a research workflow.

Implementation direction

If you treat this as a two-layer architecture, the implementation strategy is almost forced.

Persona Layer requires new design:

Tree-structured data model (roots, trunk, branches, leaves)
Time-axis management for growth rings
A cultivation process (extract judgment criteria from concrete episodes)
A pruning UI (remove old growth rings, unused branches)
Load-time optimization (expand only the branches needed for the session, not the whole tree)

Memory Layer reuses existing tech:

PostgreSQL, vector DBs, knowledge graphs
Covered by existing RAG stacks
No new invention required

The bridges between them:

Cultivation process: From the Memory Layer's episodes, judgment criteria are extracted into the Persona Layer.
Reference process: While judging within the Persona Layer, call into the Memory Layer if needed.
Pruning process: Remove aged growth rings from the Persona Layer.

Don't build everything new. The only thing that needs invention is the Persona Layer.

Why I'm not building this myself

Having spelled the design out this far, I don't intend to build it as a personal project.

The reason is simple: the payoff doesn't justify the cost.

My day job is running multiple businesses under a holding structure. AI operations are a means, not the end. A real implementation of the Persona Layer would take six months to a year of focused engineering. That time is more profitably spent on the businesses themselves.

If anyone's going to build this, it should be Anthropic, OpenAI, or an AI startup serious about long-running deployment. They have the engineering capacity, the data, and the distribution channels.

My role is to put the design into words and leave it sitting somewhere public.

I'm publishing the design, not the implementation. If you want to build this, build it.

Closing

The conversation around "giving AI memory" has advanced significantly over the past two years. But almost all of it has been about storing and retrieving facts.

What I found from running production is that persona 窶・the loaded state of judgment criteria 窶・and memory 窶・retrievable facts 窶・should be on separate layers.

Humans forget most episodes. But values remain. AI systems should probably be designed the same way.

If you're running long-lived AI sessions across real operations, I'd love to hear how you're handling persona persistence. The number of us is small.

The thing the field bundles under "memory" 窶・I'd argue it splits into two: persona and episodic memory.

I'm posting this in the hope that this split shows up in design conversations for long-running AI, before the field locks into "memory = retrieval" as a paradigm.

Feedback, counter-arguments, and pointers to similar work are welcome. This is a design derived from production friction, not a systematic survey of the research literature.

DEV Community