I Didn't Need a Smarter Model. I Needed to Onboard It

#ai #claude #productivity #automation

Why the bottleneck for AI on real tickets isn't intelligence. It's that your brain starts warm and the model starts cold.

Two people get the same ticket. One starts from zero.

When a ticket lands on me, my brain lights up before I've finished reading the title. This service owns that logic. The route is probably over here. That controller needs a new method, and there's a DAO at the end of the chain I'll have to touch. I'm not being clever. I've just been in this codebase long enough that the map is already in my head.

Hand the same ticket to an AI and it's staring at a blank wall. It has the code. What it doesn't have is the map. It doesn't know our layers, our naming, or that we shipped something almost identical last week. Same ticket, same repo, and one of us starts warm while the other starts cold.

I'd love to tell you I understood that from the start. I didn't. When I first wired AI into our workflow I assumed what most people assume: the model is smart enough to work the codebase out on its own. Point it at the repo, give it the ticket, get something useful back. That assumption is the exact thing that blew up in my face, and cleaning up the mess is what taught me the whole game is context, not intelligence.

Here's how naive I was about it at first. For weeks, every ticket started the same way. I'd screenshot the Jira ticket, paste it into Claude, and wait while it dug through our code trying to work out where things lived. Then I'd start coding. I was the integration layer. Copy from Jira, paste into Claude, sit there, repeat. At some point it stopped feeling like using AI and started feeling like a chore I was doing for the AI. The loop was dumb enough that something else should be running it. So I built that something.

So I automated it, and it got confidently wrong

The flow itself is simple. A ticket gets assigned, a branch gets cut for it on its own, and an agent writes an implementation plan into a folder we keep just for that. We've got dedicated folders for plans, for feature notes, for the stuff the team keeps coming back to. The agent reads the ticket, searches the code, and drops a plan: what the task is, which files probably change, how to go about it, what's still unclear. Then it links that plan back on the ticket so it's waiting for whoever picks the work up.

On paper, exactly what I'd been doing by hand. In practice the plans were bad.

Not bad like broken English. Bad like confidently wrong. The agent had read access across the whole repo and a search tool, so it would poke around, land on some file that looked relevant, and build a plan on top of it. The plans put validation logic in the controller. They put database queries straight in the service layer. Both of those are backwards from how we build. Validation belongs in middleware for us, and every database call lives in the DAO layer, never the service.

So the plan read like it knew what it was talking about, and it would march a developer right into a rejected PR. Picture a junior trusting one of these. They write the code the plan describes, open the PR, get torn apart in review, then sit there confused about why the "AI plan" sent them the wrong way.

It was slow on top of that. Every run, the agent rebuilt its picture of our architecture from nothing, because it had nothing to start from. That's the cold start again. I'd automated it without fixing the actual problem. I had handed off my busywork and thrown away the one thing that made the manual version worth doing. I already knew the shape of the code. The agent didn't, and I hadn't given it any way to find out.

It was never about how smart the model was

My first instinct was the obvious wrong one. Maybe a better model fixes this. It doesn't. A better model still has no clue that in our repo the DAO is the only place allowed to touch the database. You can't buy that off a pricing page. It isn't intelligence.

That's when it actually clicked. The problem was never how smart the model was. It started cold every single time, and I started warm. I'd been trying to upgrade the brain when the thing missing was the map in my head, and nobody had ever written that map down. A new hire would hit the exact same wall on day one. The difference is we don't expect a new hire to be brilliant. We expect to onboard them.

So I stopped trying to make the agent smarter and started onboarding it instead. New hires don't get a bigger brain. They get docs, they get the conventions, and they get someone pointing at the last person's work saying "do it like that." I built the same thing, in layers instead of one giant file.

There's a thin entry point at the top. Project basics, plus pointers that say "for this kind of work, go read that." The real knowledge isn't in there. Under it sits the core conventions doc, where the architecture is spelled out flat. Controllers call services, services call DAOs, DAOs own the database. Validation in middleware. The naming rules, and the patterns we don't break. Then there's a set of specialized files the agent only opens when it needs them. One for on-call database work, one for the data model, one for translations. It doesn't read those every time. It reaches for the right one when the ticket actually touches that area.

Then I rewrote the agent's marching orders. Before it searches for anything, it reads the entry point and the core doc. It gets its bearings on the overall structure first. Only then does it go looking in the area the ticket is about. And the instruction that pulled the most weight was the simplest one. Find the closest existing implementation and copy its pattern. That is word for word what I'd tell a junior on day one. Don't invent an approach. Go find something we already built that's close, and follow it.

The plans turned around almost immediately. Validation showed up in middleware. Queries showed up in the DAO. The model hadn't gotten smarter between Tuesday and Wednesday. I had just written down the rules I'd been carrying in my head and forced it to read them before it touched anything.

The part most people skip is what comes after that. A memory layer you write once and forget about rots fast. The code moves, conventions shift, and those docs quietly go stale until the agent is following rules that stopped being true months ago. So we baked the upkeep into the workflow itself. When the agent learns something new, or when we ship a feature that changes how an area works, updating the memory is part of the job and not a thing someone remembers to do later. The repo keeps its own docs current. A person's mental model updates as they work. The agent's has to update the same way, or it slides right back to confidently wrong.

Then the plan started arguing back

Once the plans were trustworthy, the thing I didn't plan for turned out to be the best part. The plan stopped being the valuable bit.

I added a step where the agent grills its own plan after writing it. It walks back through what it wrote the way a senior would in a design review. Where's the ambiguity. What edge case is missing. Which assumption needs a real answer before anyone writes a line. It writes those questions down, with its own recommended answer next to each one.

And then it holds the line. When I come back to actually build the ticket, the agent won't start writing code until those questions are answered, even when I tell it to just get on with it. It drags the design conversation to the front, the one I'd normally skip and then pay for later in a half-built feature and an ugly PR.

So the output grew up. It went from "a plan" to "a forced design conversation before a single line exists." The value moved off the answer and onto the questions. That's the piece I'd fight hardest to keep now, and it's the one I never set out to build.

Telling QA what actually shipped

The other place this quietly pays off is after the code is done, at deploy time.

A ticket almost never ships clean on the first try. It goes out, QA finds something, it comes back, gets patched, goes out again. A single ticket can rack up several of those rounds. The messy reality is that a few deploys in, nobody is totally sure what just went out the door. QA opens a ticket and can't tell which points got handled in this release and which are still sitting open. So they re-test the whole thing, or they re-test the wrong thing and miss the part that actually changed.

Now when a deploy goes out, an agent reads what was in it and leaves a summary on the tickets it touched. Here's what shipped. These items from the ticket are done. These aren't, yet. QA opens it and knows what to check instead of guessing. Small thing. It kills a real, recurring source of confusion, and it's exactly the sort of glue nobody wants to write out by hand on every single release.

That's the actual throughline. None of this is "AI writes my code." It's AI sitting in the gaps between the tools I already use, the ticket and the codebase and the deploy and the QA pass, carrying the context I used to carry myself.

What changed, and what I skip

This isn't a demo I built once and screenshotted for a standup. We actually use it.

The real change is in how I work, not in what the AI spits out. I keep three or four tickets moving at the same time now. Each one shows up with its branch already cut and its context already loaded, so I'm not paying the cold-start tax on every one of them. That tax, the "what is this, where does it live, how do we do it here" tax, is exactly what used to pin me to one ticket at a time.

I'll be straight about where it doesn't pull its weight. For small tickets I ignore the plan and just do the thing. Faster that way. The pipeline isn't sacred, and shoving every tiny ticket through it would be its own kind of waste.

There was dumber pain along the way too. We started on a plan that handed out an access token that expired, so for a stretch the whole thing kept dying until someone manually pasted a fresh secret back in. Very cutting-edge stuff. We switched to a proper key and it stopped fighting us. A couple of rough edges are still here. The agent finds files mostly by searching for names, which is great for "where's addInvoice" and useless for "where do we handle offline payments," so a smarter search step is the obvious next move. And I still don't measure whether the plans are any good, even though the signal is right there for the taking. Compare the files a merged PR actually changed against the files the plan said it would. Right now "it helps" is a feeling, not a number. That one bugs me.

The takeaway

The model was never the bottleneck. Context was.

I didn't make any of this better by reaching for a bigger model. I made it better by writing down the things I'd tell a new hire. Where validation goes. Where the database calls live. Go copy the nearest thing we've already built. Then forcing the agent to read all of it before it touched the code, and keeping those notes alive as the code kept moving. The boring fix beat the exciting one, and it kept on beating it.

Once the AI started warm instead of cold, the plan stopped being the point. Every ticket lands with its context already loaded, and that's the only reason I can keep four of them in the air instead of grinding through them one at a time.

So if you're about to do this where you work, don't start with the model. Start with the context your own brain loads without being asked, the stuff you never had to write down because you already knew it. That is the part the AI is missing. The rest is plumbing.