Santi Santamaría Medel

Posted on May 9

Coding agents don’t need more context. They need continuity.

#ai #agents #python #devex

I’ve been working with coding agents for quite a while now.
I’ve been a software engineer for more than 15 years, and at first it was hard for me to accept that the rules of the game had changed forever.

I’ve stopped thinking of coding agents as autocomplete. In many tasks, they can reason through codebases and produce solid implementations. But one thing still feels missing.

I haven’t managed to feel that I’m working side by side with an engineer who knows the repository. Someone familiar with the project’s codebase, its strategies, its typical errors, the commands that should be run and the ones that shouldn’t.
A veteran teammate, not a rookie who has to review the whole repo, starting from the README and the Makefile, before writing a single line of code.

At first I thought it was all about refining prompts.

Then I focused on operational memory, skills, MCPs, rules, global instructions, AGENTS.md, CLAUDE.md, and everything I kept reading over and over again in articles and posts.

I also had a “context” phase. I became obsessed with improving the context my agent was working with.

And yet I still had the same feeling.

The more I obsessed over prompts, memory, skills, and context, the more I started to feel that what the agent was missing was continuity.
Something more human. Something closer to what a teammate would ask on their first day at work:

Where were we?
What did we do yesterday?
What hypotheses did we discard?
Which file mattered?
Which test was the right one?
What should I not touch?
Where do I start?

Since I work intensively in large repositories, I saw a major limitation in Codex (the agent I use mainly) starting every session again from the README. It frustrated me to watch it rediscover the repo, try overly broad commands, or attempt to run huge test suites that had nothing to do with the task at hand.

So I started building a tool focused on operational continuity.

I called it AICTX.

In one sentence: aictx is a repo-local continuity runtime for coding agents.

The idea is that each new session behaves less like an isolated prompt and more like the same repo-native engineer continuing previous work.

After many iterations, the workflow has consolidated into something like this:

user prompt
→ agent extracts a narrow task goal
→ aictx resume gives repo-local continuity
→ agent receives an execution contract
→ agent works
→ aictx finalize stores what happened
→ next session starts from continuity, not from zero
→ the user receives feedback about continuity

AICTX stores and reuses things like work state, handoffs, decisions, failure memory, strategy memory, execution summaries, RepoMap hints, execution contracts, and contract compliance signals.

All of them are auditable artifacts that are easy to inspect at repo level.

On the other hand, one of the things I like most about the tool is that I can enable portability and keep the most important continuity artifacts versioned, so I can continue the task on my personal laptop, my work laptop, or anywhere else.

The execution contract part feels especially interesting to me. Instead of giving the agent a vague block of context, AICTX tries to give it an operational route:

first_action
edit_scope
test_command
finalize_command
contract_strength

I wanted to check whether this actually worked, not just rely on my own impressions while watching the agent work with AICTX. So I created a small Python demo repo and ran the same two-session task twice:

Before talking about the test itself, it’s worth stressing that I mainly work with Codex, so the test has the most validity and accuracy with Codex.

The task was intentionally simple: add support for a new BLOCKED status, and then continue in a second session to validate parser edge cases.

This is important: the demo is not designed under conditions where AICTX has the maximum possible advantage. The repository is small, the task is simple, and the continuation prompt without AICTX includes enough manual context.

Even so, in the second session a clear difference appeared.

(Note: all demo metrics are available here)

Session 2

Metric	with_aictx	without_aictx	Difference
Files explored	5	10	-50.0%
Files edited	1	3	-66.7%
Commands run	8	15	-46.7%
Tests run	1	4	-75.0%
Exploration steps before first edit	6	15	-60.0%
Time to complete	72s	119s	-39.5%
Total tokens	208,470	296,157	-29.6%
API reference cost	$0.5983	$0.8789	-31.9%

The most interesting difference for me was not the tokens. It was where the agent started.

With AICTX:

first_relevant_file = tests/test_parser.py first_edit_file = tests/test_parser.py

Without AICTX:

first_relevant_file = README.md first_edit_file = src/taskflow/parser.py

With AICTX, the second session behaved more like an operational continuation.
Without AICTX, it behaved more like a new agent reconstructing the state of the project.

Across both sessions, the savings were more moderate:

Metric	with_aictx	without_aictx	Difference
Files explored	13	19	-31.6%
Commands run	19	26	-26.9%
Tests run	3	6	-50.0%
Time to complete	166s	222s	-25.2%
Total tokens	455,965	492,800	-7.5%
API reference cost	$1.3129	$1.4591	-10.0%

Honest result: AICTX did not magically win at everything.

In the first session, it had overhead. There wasn’t much accumulated continuity to reuse yet, so it doesn’t make sense to sell it as a universal token saver.

There is also another important nuance: the execution without AICTX found and fixed an additional edge case related to UTF-8 BOM input. So I also wouldn’t say that AICTX produced “better code.”

The honest conclusion would be this:

AICTX produced a correct, more focused continuation with less repo rediscovery.

The execution without AICTX produced a broader solution, but it needed more exploration, more commands, more tests, and more time.

For me, this fits the initial hypothesis quite well:

AICTX is not a magical token saver.
It has overhead in the first session.
Its value appears when work continues across sessions.
The real problem is not just “giving the model more context.”
The problem is making each agent session feel less like starting from zero.

And I suspect this demo actually reduces the real size of the problem. In a large repo, where the previous session left decisions, failed attempts, scope boundaries, correct test commands, and known risks, continuity should matter more.

I still don’t fully get the feeling of continuity I’m looking for, but I’m starting to get closer. To push that feeling a bit further, AICTX makes the agent give operational-continuity feedback to the user through a startup banner at the beginning of each session and a summary output at the end of each execution.

The tool is still alive, and I’m still scaling it while trying to solve my own pains. I’d love to receive feedback: positive things, possible improvements, issues people notice, or even PRs if anyone feels like contributing.

If anyone wants to try it:

    pipx install aictx
    aictx install
    cd repo_path
    aictx init
    # then just work with your coding agent as usual

With AICTX, I’m not trying to replace good prompts, skills, or already established memory/context-management tools. I’m simply trying to make operational continuity easier in large code repositories that I iterate on once and again.

I’d be really happy if it ends up being useful to someone along the way.

If you try it, I’d love to know whether it improves your workflow, or whether it gets in the way.

Top comments (1)

Santi Santamaría Medel • May 9

Happy to share more info about the tool itself regarding architecture, code, iterations, artifacts, portability or whatever.
On the other hand I could explain in deep the or the metrics extraction process if someone is interested.
Just ask!