I’ve been poking at a bunch of AI agent frameworks and coding tools this past year. For personal projects, I often just use Hermes Agent or something similar because it's fast and saves tokens.
But honestly? When I actually have to ship something for production, I can't just use those raw agent setups. Between security compliance, instability, and the sheer complexity of real-world codebases, it’s just too risky.
For production, I keep going back to CLI tools like Claude Code, Codex, or Gemini CLI.
Why? Because in production:
- Perfect > Fast: I'd rather it take longer but be absolutely correct and secure.
- Traceability & Long Plans: I need to track the exact progress of long-running plans without having to baby-sit it or intervene constantly.
- Consistent Quality: No matter which team member kicks off the task, the output quality and adherence to our repo's standards need to be exactly the same.
And I realized the way to achieve this isn't by finding a magical new model. It's by tuning the harness.
These CLIs (Claude, Codex, Gemini) already give you a pretty solid baseline harness for free (planners, hooks, auto mode, skills). But that baseline has no idea what my specific repo cares about. It doesn't know my team's review rules, what "Done" looks like for us, or what artifacts we need to persist.
So, I started focusing on Harness Fine-Tuning—writing my team's specific review rules, producer/reviewer pairs, and task shapes into actual version-controlled files, rather than trying to re-explain them in a prompt every single session.
I've finally open-sourced my personal harness setup: harness-loom.
It’s not another agent framework. It sits on top of whatever harness your CLI already ships and lets you shape it to fit your production repo. You define your rules in one canonical place (.harness/loom/), and it derives the specific configs for Claude, Codex, or Gemini.
I’m still in the process of porting over all the specific features from my private setup into the open-source repo, but the core factory is there and ready to use. I'll be updating it quickly!
If you are trying to use AI assistants for serious production work and want them to act more like a predictable system rather than a one-off chat, I'd love for you to poke at it.
🔗 GitHub Repo: harness-loom
Has anyone else felt the need to shift from "prompt engineering" to "harness engineering" for production work?
Top comments (0)