Programming with Jack Chew

Posted on May 24

2026 Q1 is the year developers still build the agent harness. 2026 Q3 / 2027 is the year the LLM builds its own harness.

#agents #ai #llm #tooling

2026 Q1 is the year developers still build the agent harness.

2026 Q3 / 2027 is the year the LLM builds its own harness.

Today, every AI coding agent — Claude Code, Cursor, Codex, Gemini CLI, Aider, you name it — depends on the same hidden layer:

the files that brief the agent before it starts work.

AGENTS.md
CLAUDE.md
.cursor/rules
SKILLS/
MCP server lists
memory schemas
test commands
lint commands
“Do not touch these paths.”
“Require human approval before this.”

Different IDE, same boilerplate.
Different repo, same boilerplate.
Different agent, same boilerplate.

That is the agent harness problem.

The hidden work behind AI coding agents

Most people talk about the coding agent itself.

But in practice, the quality of an AI coding session often depends on the context layer around the agent.

Before the agent starts coding, it needs to know:

what kind of project this is
what framework it uses
what files are important
what commands run tests
what commands run linting
what paths should not be touched
what tools are available
what memory should persist
what failure modes to avoid
what coding conventions to follow
when human approval is required

Without this layer, even strong coding agents can make subtle mistakes.

With this layer, the same agent can behave much more consistently.

That layer is what I call the harness.

Why this still exists in 2026

In theory, the LLM should be able to inspect a repo and generate all of this itself.

In practice, we are not fully there yet.

The models are smart enough to do real coding work, but not always reliable enough to deterministically generate perfect project-specific ground truth from scratch on every fresh repo, every time.

They can do it sometimes.

Not always.

So the human stays in the loop.

We write the same repo instructions again.

We copy the same rules across projects.

We maintain separate files for Claude Code, Cursor, Codex-style agents, Continue, Windsurf, and others.

Small work per repo.

Painful in aggregate.

The future: self-generating harnesses

I think this is temporary.

Soon, the coding model should be able to:

read the repo
understand the task
detect the project type
generate the right harness
connect the right tools
create memory schemas
write validation scripts
refine the loop until the task is complete

At that point, the harness layer disappears as a separately authored artifact.

But until then, developers still need a bridge.

I built harnessforge

I built harnessforge to test this idea.

It is a local, open-source harness generator for AI coding agents.

It is not another coding agent.

Your coding agent stays the brain.

harnessforge just lays down the ground truth the agent reads before work begins.

Run:

uvx harnessforge init

or install:

pip install harnessforge

In a few seconds, fully local with no network calls by default, it inspects your repo and generates startup files commonly used by AI coding agents.

What it generates

Depending on the project and blueprint, harnessforge can generate files such as:

AGENTS.md
SOUL.md
TOOLS.md
MEMORY.md
SKILLS/
.claude/CLAUDE.md
.cursor/rules
.continue/
.windsurf/rules
blueprint-specific validators

The goal is simple:

give the coding agent a stronger starting point.

Current blueprints

The current version includes these blueprints:

`rag-agent`

For retrieval systems, knowledge-base agents, citation enforcement, and grounded responses.

`finance-agent`

For finance or stock-related agents, including market-data handling and validation rules around trade execution safety.

`support-agent`

For customer support flows such as intent detection, knowledge-base lookup, ticket creation, escalation, and ticket lineage.

`workflow-agent`

For multi-step orchestration with tool logs, idempotency, and validation structure.

`python-cli-app`

A default blueprint for greenfield Python CLI projects.

Why this matters

The important idea is not the specific files.

The important idea is that coding agents need a reliable project-specific operating context.

Today, we manually maintain that context.

Tomorrow, the model may generate it automatically.

harnessforge is meant to sit in the middle.

A bridge, not a moat.

Use it now.

Throw it away when the models catch up.

Example workflow

uvx harnessforge init

Then open Claude Code, Cursor, Codex, Gemini CLI, Aider, or another coding agent inside the repo.

The agent now has project-specific context files to read before it starts work.

Instead of starting from a blank repo, the agent starts with:

project rules
tool definitions
memory structure
validation expectations
blueprint-specific failure modes
agent-specific startup files

The coding agent still writes the code.

The harness just gives it the right context.

The bet

My bet is:

2026 Q1: developers still build the agent harness.

2026 Q3 / 2027: the LLM builds its own harness.

Until that happens, a local deterministic harness generator can make AI coding workflows more reliable.

GitHub:
https://github.com/jcaiagent7143-ui/harnessforge

PyPI:
https://pypi.org/project/harnessforge/

I would love feedback from developers using Claude Code, Cursor, Codex, Gemini CLI, Aider, Continue, Windsurf, or other coding agents in real repos.

How are you managing your agent harness today?

Are you manually maintaining AGENTS.md, CLAUDE.md, .cursor/rules, MCP configs, memory files, and validation rules?

Or do you think the next generation of coding models will generate this layer automatically?

DEV Community