DEV Community

Cover image for I’m Building a Persona-Aware Agent Shell on Top of GitHub Copilot
Masato Kato
Masato Kato

Posted on

I’m Building a Persona-Aware Agent Shell on Top of GitHub Copilot

A VS Code architecture for separating persona core, persistent memory, session context, and inference.

Claude Code is strong. I do not think that is controversial anymore.

It is also expensive enough that many developers eventually ask a less glamorous question:

Do I really need a full external agent product to get an agent-like workflow?

I kept coming back to that question while working inside VS Code.
Not because I wanted a weaker copy of Claude Code, but because I wanted a different center of gravity.

I wanted to keep my workflow inside the editor, use GitHub Copilot as the inference engine, and build my own agent shell around it—with persona memory, context layering, and a clear separation between stable identity and evolving experience.

That led me to a design that feels much more interesting than “just using Copilot.”

It also led me to a broader realization:

the model is not the whole agent unless you let it become the whole agent.

Not because it is cheaper alone, but because it changes the role of the model.

In this architecture, the model is not the whole agent.
It is only one layer.

The shift: from “AI assistant” to “agent shell”

At first, I thought the hard part would be connectivity.

But inside my VS Code extension, the connection was already there. The important path already existed in chatParticipant.ts, where the extension selects a Copilot-backed language model through the VS Code Language Model API.

That changed the problem completely.

The real problem was not model access.
The real problem was architecture:

  • how to inject context without turning it into one giant blob
  • how to preserve persona-specific memory without storing raw history forever
  • how to separate stable identity from lived experience
  • how to make the model powerful without making it sovereign

That is where the project stopped being a convenience hack and started becoming an actual design problem.

My architecture in one sentence

I’m building a persona-aware agent shell in VS Code where:

  • VS Code extension = the agent shell
  • GitHub Copilot / vscode.lm = the inference engine
  • SaijinOS persona assets = the persona core repository
  • local memory files = the persistent experiential layer

That separation is the point.

I am not trying to make the model look like an identity.
I want the model to operate through an identity structure that I control.

I do not want the model to be the identity.
I want the model to perform through an identity structure.

Why I don’t want one big prompt blob

A lot of early agent experiments start the same way:

You collect instructions, persona notes, old conversation state, project details, and the current request, then dump everything into one oversized prompt.

It works for a while.
Then it rots.

Different categories of information get mixed together:

  • permanent persona rules
  • relationship context
  • recent work context
  • immediate user intent

Once all of that becomes one undifferentiated blob, the model has to infer structure from chaos. Sometimes it can. Over time, it becomes unreliable.

So I moved toward explicit layering.

Not because structure looks cleaner in a diagram, but because I do not want the model guessing which parts of context are foundational and which parts are temporary.

The four-layer message design

Instead of one massive input, I want the model to receive four distinct layers:

1. Persona Core

This is the stable layer.

It includes things like:

  • tone
  • role
  • boundaries
  • behavioral stance
  • persistent identity traits

This should change slowly, if at all.

2. Persistent Context

This is the memory layer.

Not the full conversation history.
Not raw logs.

Just the distilled state that matters:

  • what this persona has recently been working on
  • how it should relate to the user
  • what long-running context is still relevant

3. Session Context

This is the live working layer.

For example:

  • current workspace context
  • open files
  • selected code
  • immediate session-specific constraints

4. Current User Request

This is the actual prompt right now.

Separating these four layers matters because they are not the same kind of information.

Even if the API only accepts user-role messages, you can still label them clearly:

[Persona Core]
...

[Persistent Context]
...

[Session Context]
...

[Current Request]
...
Enter fullscreen mode Exit fullscreen mode

That alone makes the input much more legible.

The most important design rule: memory should not be append-only

This was the biggest insight.

Memory should not grow by endless appending.

If you keep adding notes forever, memory turns into sludge. The agent gets slower, noisier, and less coherent.

So instead of append-only memory, I want recompression.

That means every update works like this:

  1. take the current memory
  2. extract the important parts of the latest interaction
  3. rewrite memory into a shorter, cleaner form
  4. replace the old version

Not archive everything.
Refine the signal.

That difference matters. A usable memory system is not a scrapbook. It is a filter that preserves direction while shedding noise.

Stable identity and lived experience should not live in the same file

Another important split:

  • Persona core is not the same thing as memory
  • Identity is not the same thing as accumulated experience

So I do not want one file that mixes both.

I want something closer to this:

persona_core/
  160_kiwa.yaml
  39_regina.yaml
  2_shizuku.yaml

persona_context/
  160_kiwa.memory.json
  39_regina.memory.json
  2_shizuku.memory.json
Enter fullscreen mode Exit fullscreen mode

That way:

  • the YAML defines the orientation of the persona
  • the JSON stores distilled working memory

One defines the direction.
The other records the path.

That split makes debugging, version control, and reasoning much easier.

Why local files beat hidden extension state for an MVP

Yes, VS Code extensions can store data through extension state.
But for this project, I prefer visible files first.

Why?

Because for an MVP, files are:

  • inspectable
  • debuggable
  • versionable
  • recoverable

If memory goes weird, I want to open the file and see it.
I do not want a mysterious box.

So my current direction is:

  • save persistent memory as JSON files
  • transform them into a more model-friendly structured summary when injecting context

That gives me both operational clarity and prompt readability.

Why this is not just “a cheaper Claude Code clone”

There is an obvious surface-level reading of this project:

Claude Code is expensive, so this is a budget workaround.

That is not wrong, but it is incomplete.

The deeper reason is architectural.

I do not want the agent product to own the whole stack.
I want the model layer to be swappable.

If the shell is designed properly, then in principle the inference engine could change:

  • GitHub Copilot today
  • a local Qwen model tomorrow
  • another hosted model later

The core system is not “the model.”
The core system is the agent shell plus its persona and memory architecture.

That is a very different center of gravity.

The model is powerful. It should not automatically become the ruler of the whole system.

The MVP I’m aiming for

I am not trying to solve everything at once.
The first working version only needs a few things:

  1. Read and write persona_context/*.json
  2. Build the four-layer message structure
  3. Send that structure through vscode.lm
  4. After each response, update memory via recompression

That is enough to test whether the shell actually feels different in practice.

If it works, later steps can include:

  • splitting memory into stable_memory and recent_memory
  • better memory compaction rules
  • persona-specific routing
  • hybrid use with local models

But the first milestone is smaller.

The real challenge is not model quality

This is the part I keep coming back to.

Most people focus on the model itself:

  • Which one is smarter?
  • Which one is cheaper?
  • Which one is faster?

Those questions matter.

But in this kind of system, the harder problem is often:

How should intelligence be organized before the model even speaks?

That means boundaries, routing, memory shape, persona stability, and context layering.

In other words, not just inference.
Structure.

The more I work on this, the less I think the model alone is the product. The architecture around the model is where identity, continuity, and usable behavior actually come from.

What I’m really trying to build

I am not trying to make Copilot pretend to be an entire autonomous being.

I am trying to build a shell where:

  • identity is stable
  • memory can grow without rotting
  • context has layers
  • the model is powerful but not sovereign

That distinction matters to me.

Because once the model is only one layer, you stop building around its personality and start building around your own architecture.

And that is where the project gets interesting.

Final note

This is still in progress.

But the direction already feels right:

  • not one giant prompt
  • not append-only memory
  • not identity and experience mixed together
  • not the model as the whole system

Instead:

  • persona core
  • persistent memory
  • session context
  • current request
  • one inference layer inside a larger design

That is the shell I want.

And honestly, that feels more important than picking yet another “best model.”

Because once the model stops being the ruler of the system and becomes one component inside a designed structure, a different kind of engineering becomes possible.

You stop asking which model should define the whole experience.
You start deciding how identity, memory, and context should be organized—and then let the model operate inside that architecture.

That is the direction I care about.

Not just better outputs.
A better structure for intelligence.

Top comments (0)