Ian Johnson

Posted on Jun 9 • Originally published at tacoda.Medium on Jun 9

Introducing Keystone: The First Agent Harness Framework

#agenticworkflow #softwareengineering #harnessengineering #softwaredevelopment

Setting up an agent harness from scratch is enough work that most teams never start. You sit down on a Monday morning, look at an empty .claude/ directory — or .codex/, or .cursor/ — and decide to do it next week. Next week becomes next month. The agent keeps making the same mistakes it made the first day, and nobody has time to write the file that would stop them.

That’s the problem Keystone solves. As of today, Keystone is at 1.0. It’s free, MIT-licensed, and ready to install on a real project. I’m responsive to feedback — if something is wrong or missing, tell me and it has a real chance of landing in the next release.

What Keystone is

Keystone is the agent harness framework for any project. Think of it the way you think of a web framework. Rails, Django, Laravel give you the components, conventions, and slots to build a web app: routes, controllers, views, middleware, a directory layout that tells you where each piece lives. Keystone does the same for an agent harness. Its components are guides the agent loads on every turn, a corpus of on-demand context, sensors that catch mistakes before they reach a commit, actions the agent can run, playbooks that chain those actions into workflows, and adapters that bind the whole thing to whichever coding agent you use.

That last piece is the headline. Keystone is agent-agnostic. Claude Code, Codex, Cursor — the adapter layer renders the same harness into the file shape each tool expects. One source of truth, one set of conventions, and the team’s rules ride along regardless of which agent a developer happens to open today.

The build itself is small. A Go binary lays the files down; from there it’s markdown and conventions all the way. No daemon to keep alive, no SaaS to log into, no API key to rotate. If you uninstall Keystone tomorrow, the harness it scaffolded keeps working — and you keep ownership of every line.

That part matters more than it sounds. Most harness tooling I’ve looked at wants to be the thing you depend on. Keystone wants to be the thing that gets you to a working harness and then gets out of the way.

Design principles

The product is opinionated, and the opinions are worth naming out loud. These are the principles I keep coming back to when deciding what goes in and what stays out.

The harness is just markdown. Rules live in plain .md files. No DSL, no YAML schema, no compiler step. If you can write a paragraph, you can write a rule. If you can git diff a file, you can review what changed.

Any agent, one harness. The adapter layer translates a single set of files into Claude Code, Codex, Cursor, or whatever comes next. Switching agents — or running two on the same repo — does not mean rewriting the rules.

Easy to change. Need to update a rule? Open the file, edit the markdown, save. No build step, no migration, no internal index to rebuild. The change takes effect the next time the agent loads the file. The cost of fixing a wrong rule should be the cost of editing a paragraph, and in Keystone it is.

The team owns the harness. Files live in your repo, on your branch, in your review process. PRs that touch the harness go through the same review as PRs that touch the code. Nobody else can change the rules under you, and the rules version with the commits they describe.

A place for information at every layer. The framework has slots at the project, team, and org level. Org policies are shared across every repo in the company; team policies sit between, for teams big enough to need their own conventions. Strict policies lock things projects can’t override. Non-strict policies act as defaults the project can adjust. The cascade only kicks in when there are layers worth cascading — solo developers ignore the org and team layers entirely. All layers are agnostic and can be nested arbitrarily to model any governance or compliance model.

Optional everything. The default install gives you a working set, but every piece is removable. Don’t want the security review? Delete the agent. Don’t like the six-phase workflow? Use four phases. The harness should fit the team, not the other way around.

Cheap checks before expensive ones. Lint, type-check, and tests run first. They’re fast and unambiguous. Only after the cheap checks pass do the inferential reviewers (functional, security, risk) get a turn. A good harness fails fast on the easy things and saves the expensive thinking for what’s left.

Review feedback feeds the rules. The hardest part of running an agent is noticing when it has drifted from the team’s norms. Keystone ships with a learning loop that turns review feedback into rules, so the same correction doesn’t get retyped in every PR.

Boy-scout maintenance is part of the design. Rules rot. The code moves, the rules describing the code drift, and a harness without a pruning step becomes wrong faster than people expect. Keystone has the cleanup loop wired in from day one. Touching a file? Surface its smells and rule violations along the way.

Small interfaces, deep implementations. A few slash commands, a few skills, a few agents. Each does one thing the team will use repeatedly. Adding a hundred features that get used once is the wrong shape. Adding the ten that get used every day is the right one.

Boring is a feature. Markdown, shell scripts, conventional file layouts, and the standard tools the agent already understands. The harness should look familiar the second a new engineer opens the repo. If the cleverness shows, it’s probably too clever.

What ships in 1.0

The framework is built on six core abstractions, plus the loops that keep them honest.

Guides — ambient rules the agent loads on every turn, organized in three tiers. RULES for project conventions, GOLDEN RULES for the team’s strong opinions, IRON LAWs for the lines that don’t move.
Corpus — on-demand documents the agent pulls in when it touches relevant code. Domain context, style notes, testing patterns, deployment rules.
Sensors — 23 automated checks wired to your existing tools, firing at phase boundaries. Lint, type-check, and tests on the cheap side; functional, security, and risk reviewers that fan out on the diff in parallel on the inferential side.
Actions — single lifecycle units the agent can invoke. Small, composable, one job each.
Playbooks — ordered action sequences. The default is the six-phase task workflow: spec → orient → implement → verify → review → release.
Adapters — agent-specific bindings. The same harness, rendered into the file layout Claude Code, Codex, or Cursor expects.

Around those abstractions, two flywheels keep the harness from going stale.

A learning loop that converts review feedback into rules with one command.
A pruning loop that surfaces stale content as you touch the files it describes.

State ledgers track code debt and quality signals across changes. Plugins let teams and orgs share policy across repos without forcing every project into the same shape.

Most of these are things I have been running in my own projects for the better part of a year. 1.0 is the first time they are packaged for someone else to use without me sitting next to them.

Where it fits

Keystone is built to scale down as well as up, and to fit whatever agent you’ve already settled on.

Solo developer. Use the project layer only. Skip the team and org plugins entirely. You get a working harness on a single repo, with verification gates and a learning loop, and nothing extra in the way.

Small team. Use the project layer in each repo and a single shared policy plugin at the org level for the conventions that hold across the whole team. The team layer collapses into the org for teams under a dozen people.

Multiple teams in a company. This is where the cascade pays off. Org-level baselines (security, compliance, design system) live at the top. Team-level conventions, such as the backend team’s testing norms, or the mobile team’s release rhythm, sit in the middle. Project repos pull from both, override what they need to, and add what is unique to the codebase.

Enterprise. Same cascade, more strict policies. The pieces a large company tends to need — locked baselines, an audit trail of harness changes, a shared rule library across hundreds of repos — are the pieces Keystone was designed around.

Mixed-agent shops. One adapter per agent. The same Guides, Corpus, and Sensors render into each tool’s expected layout, so the developer using Codex and the developer using Claude Code are working off the same conventions.

The honest version: every layer is optional. If your shape doesn’t match any of the five above, take the parts that fit and drop the rest. The cascade is there when you need it and invisible when you don’t.

What it doesn’t do

A 1.0 is the right time to be clear about limits.

Keystone is not a model. It does not run the agent. It does not host a service. It is a scaffolder and a set of conventions for the agent to follow. You bring the agent, Keystone brings the structure.

Keystone is also not a magic guarantee. A harness shapes the work; it does not replace judgment. The verification gates catch a lot of mistakes. They will not catch all of them. The learning loop captures rules the team writes down; it cannot capture rules nobody articulates. The team still owns the code, and the code still owns the consequences.

If you are looking for a tool that promises the agent will Just Work, this is the wrong tool. If you are looking for a starting point that makes the harness practical to maintain, that is what Keystone is for.

Try it this week

1.0 is out. A few concrete moves, in the order I’d take them:

Install it on a real project. Not a throwaway, not a fresh repo with nothing in it. A codebase with actual constraints. That is the install that tells you whether the defaults match your team. One command to install; the URL and instructions are at tacoda.dev/keystone.

Pick the adapter that matches the agent you already use. Claude Code, Codex, Cursor — the adapter handles the file layout so you don’t think about it. If your team uses more than one agent, install more than one adapter and let the same harness drive both.

Run the six-phase loop on one real change. Pick a feature or a bug fix you would be doing this week anyway. Walk it through spec, orient, implement, verify, review, release. Notice where the harness helps. Notice where it gets in the way. Both are useful signal.

Customize one rule. The defaults are starting points, not commandments. Find a rule that doesn’t fit your team, open the file, and rewrite the paragraph. Save. That’s the whole loop. There is no rebuild, no migration, no waiting for the agent to re-index. If changing your harness costs more than this, something is wrong and I want to hear about it.

Tell me what is missing. The repo lives on GitHub. File an issue for a bug. Open a discussion for a question or a design suggestion. I read every one, and the path past 1.0 runs through whatever real use turns up.

If you want to contribute, the door is open. MIT license, so it’s company-friendly. PRs welcome. The codebase is small enough that a useful contribution doesn’t have to be a multi-week project.

1.0 is the start, not the finish. The shape is set; the polish is where the next month goes. If you have been waiting for a moment to shape a piece of harness tooling early enough to matter, this is it.

DEV Community