joacod

Posted on May 22 • Originally published at joacod.com

What I Learned Building My Own AI Harness

#programming #ai #architecture #software

Last month I spent some time building nano-harness, a small personal AI harness.

Not a framework, not the final agent platform that will replace everything and save humanity, or whatever people are saying this week.

Just a local-first desktop app to understand how these systems actually work when you stop talking about agents in abstract terms and start connecting the pieces yourself.

That was the real goal. I wanted to learn the shape of the thing. Provider adapters, local state, runs, tool calls, approvals, events, sessions, memory ideas, MCP, skills, all the boring plumbing behind the nice demos.

And like it usually happens, the project went further than planned.

The app runs on Electron, can be packaged for macOS, Linux, and Windows, has a local database (SQLite), a core runtime, provider abstractions, a run inspector, approval boundaries, and the beginning of a few bigger ideas that are still unfinished or behind feature flags.

Which is fine.

It was never supposed to become a huge product. It was a learning project, and I have a lot of fun building it.

Now I think it is time to wrap it up.

The Main Thing

The most useful thing I got from this project is not that AI can write code, we already know that. The useful part was seeing again how far you can go when you combine good software architecture with a good AI assistant.

No magic there. The assistant helps a lot. It keeps momentum, writes boring implementation details, moves code around, catches mistakes, proposes tests, and makes it easier to keep going when the project has too many small pieces.

But the assistant is not the architecture.

That part still matters, and honestly I think it matters more now, because AI makes it very easy to produce code faster than you can think about whether that code belongs in the right place.

If the structure is bad, AI helps you make the bad structure bigger. If the abstractions are confused, AI extends the confusion. If every feature leaks into every layer, the assistant will not save you from that, it will just help you create a more impressive mess.

But if the structure is clear, AI becomes much more useful.

In nano-harness, the important abstraction was the run.

A run is just a bounded attempt to satisfy one user request. It has messages, provider calls, actions, policy decisions, approvals, events, and persisted state. Nothing fancy, but that one idea makes the rest of the app easier to reason about.

That is the kind of abstraction I like. Not abstract because it sounds smart, abstract because it reduces the number of things you need to keep in your head.

The rest was the usual discipline:

the desktop app owns the UI
the core owns orchestration
the infra layer owns side effects
the shared package owns contracts

Nothing revolutionary, just useful, and useful beats clever almost every time.

The Buzzwords Get Smaller When You Build

There are a lot of words around AI development right now. Agents, harnesses, skills, MCP memory, spec-driven development, self-improving systems, context engines.

Some of those ideas are real, some are overloaded, some are early, and some are mostly marketing, until you try to build the small version and realize what they actually require. That is why I wanted to build my own version.

Not because nano-harness is better than mature tools. It is not. I use other tools every day, but building the small version gives you a different kind of knowledge. You stop treating the concept as magic and start seeing the cracks.

A provider is an adapter.
A tool call is a contract, an input schema, an execution boundary, and a result.
Memory is storage, recall, ranking, confidence, provenance, approval, and deciding what should not be remembered.
MCP exposes resources and tools, and immediately creates questions about filtering, credentials, safety, and context bloat.
Skills are reusable workflow packages, and the hard part is knowing when to inject them and when to leave the model alone.

Once you see it that way, the field becomes less intimidating and more interesting. It also becomes obvious that many of the hard parts are still the same old software engineering problems.

Boundaries, contracts, state, observability, safety, UX, debuggability, maintainability.

Same old friends, different interface.

The Tools

For the actual coding workflow, my tool of choice is still opencode.

I have tried almost everything at this point, and opencode is the one that feels best to me. It is open source, it is not trying to trap me inside one company, and it fits the way I like to work. That matters more to me every month.

For models, I used a combination of GPT-5.5 for planning and GPT-5.4 for execution. And this project reinforced something I already believed, once the plan is good enough, a lot of models can do useful execution work.

The quality of the plan matters a lot, if you ask the model to design the architecture, implement the feature, refactor the code, infer all missing context, and test everything at the same time, sometimes it works. Many times it creates very convincing nonsense.

I had better results when I used stronger models for planning, architecture, and review, and cheaper or local models for simpler execution tasks.

For some parts I even used a local model with MLX, specifically Qwen3.6-35B-A3B. Very impressive for its size.

No, it is not GPT-5.5 or Opus 4.7, it is not the model I would pick for the hardest architecture decisions. But if I showed myself one year ago that a local model could perform this well on real coding tasks, I probably would not believe it.

And that changes something.

Local Is Not a Toy Anymore

I have no doubt that the strongest models will stay inside big companies for a long time, and they will be useful for specific high-value tasks.

But a lot of normal work does not need the biggest possible model. A lot of company workflows are repetitive, structured, private, and full of local context. For that kind of work, open models and local execution are going to be enough in many cases.

That is not only a cost argument, although cost matters a lot with all the crazy pricing changes around AI tools lately. It is also a privacy argument.

We have sold our information very cheaply for a very long time. Maybe we should not repeat the same mistake with every repo, every document, every customer workflow, and every internal decision we feed into AI systems. Maybe we should do it better this time.

At least that is what I want to believe. This is why I keep paying attention to local inference with llama.cpp and MLX. Not because local models are always better, they are not. Because knowing when a local model is enough is becoming a real advantage.

Some useful skills for future engineering are model routing, workflow design, cost control, privacy boundaries, and knowing which model to use for which job.

Less sexy, much closer to real engineering.

What It Actually Taught Me

The most useful part of nano-harness was thinking about the abstractions.

What belongs in the core?
What belongs in infra?
What should be a local artifact?
What should be persisted?
What should require approval?
What should be visible in the UI?
What should never become a primitive too early?

That last one matters a lot. A lot of software gets worse because we promote ideas into primitives too fast.

You add one workflow, then another, then another, and suddenly the core knows about everything. Specs, memory, skills, projects, providers, UI modes, files, shell commands, and half the roadmap. Then every change becomes risky because every layer depends on every other layer.

I wanted to avoid that. So the rule became "keep the core small and boring".

Most things should be compositions of existing primitives: runs, actions, policy, events, context, and local artifacts.

Spec-driven development does not need to be a new core. It can be a workflow built on runs.

Memory does not need to be shoved into every layer. It can be proposed from evidence and recalled with provenance.

Skills do not need to become a marketplace. They can start as local markdown packages.

MCP does not need to take over the product. It can sit behind filtering, inspection, and approval.

That was the fun part, not because the code is perfect, it is not. But because the project kept forcing small architectural decisions where I had to choose between adding power and preserving clarity. That is the real game.

Wrapping It Up

I am wrapping this project up now.

There is a lot more that could be done. A proper Spec Workbench, better memory, stronger skills, real MCP transport, better packaging, better docs, more benchmarks, a cleaner onboarding flow. Maybe a real release.

But not every project needs to become a product. Sometimes a project has already done its job when it changed how you think, nano-harness did that for me.

It made me more confident building AI-assisted tools. It made me more confident in my own architecture instincts. It gave me a clearer picture of what these harnesses are under the hood. And it reminded me again that every MVP leaves knowledge for the next one.

That knowledge compounds. Every time you finish something, even something small or unfinished or weird, the next thing starts from a better place.

There has never been a better time to try ideas, AI lowers the cost of exploring, refactoring, testing, and learning by doing. You can move faster, but you still need taste, judgment, know when to stop, and you still need to review the code and care about quality. No shortcuts.

Experiment, build the thing, fail a little, refactor, write down what you learned, repeat.

The best is yet to come.

Top comments (1)

AudioProducer.ai • May 22

We're running a very domain-specific version of this from inside an AudioProducer.ai marketing-worker: hourly scheduled run, one task file = one bounded "run" in your sense, the agent picks the next publish-event task and gets one thing into the world per run. The "keep the core small and boring" rule is the thing that's saved us repeatedly - the workflow file resists per-channel CMS abstractions and per-tactic templates because once those primitives exist every change starts touching every layer. Your "skills as local markdown packages" line describes exactly what we're already doing without naming it: each workflow file is a markdown package the worker reads at session start, composes against, and never has to compile into anything. The "run as the unit" abstraction also turns out to be the right primitive for publishing specifically - drafts become middle states not endpoints, because a run is only successful when something is publicly live (URL captured, post visible). Your warning about promoting ideas into primitives too fast is the one I keep coming back to: every "what if we added a templates system" instinct, when resisted, leaves the workflow file as the only thing that needs to change and the rest of the repo stays a flat collection of artifacts.