6 months of heavy Claude Code use: 4 prompt patterns that survived my notebook cleanup

#ai #claude #productivity #programming

I've been using Claude Code as a daily driver for about six months — not as a novelty, but as the thing I reach for when I'd otherwise grep, refactor, or write a migration script by hand. Over that time I rewrote my prompts four or five times. The version I use now barely resembles what I started with. Here's what changed.

1. Generic prompts produce generic output

The first month, almost every prompt I wrote looked like "refactor this function" or "find bugs in this file". The results were what you'd expect: plausible, hedged, sometimes useful, rarely something I'd commit. The model wasn't bad — my prompt was telling it to guess what I wanted.

Compare "refactor this function" to:

Find the function `<NAME>` in `<PATH>`. Identify one cohesive block
(5–20 lines) that does a single thing. Propose an extraction:
- Show the new helper signature (params + return type) and full body
- List every call site that needs updating with file:line
- Identify any closed-over variables that need to become parameters
- Run the type-checker after the refactor and report errors

Before applying, summarize the blast radius and wait for my approval.

Same task. The second version produces a code review I can act on. The first produces homework I have to grade. The difference isn't Claude getting smarter — it's that the second prompt has constrained the output enough that there's only one shape the answer can take. Specificity is leverage.

2. Every prompt should demand structured output

The prompts that survived my notebook cleanup all share one trait: they force Claude into a taxonomy. CONFIRMED_DEAD vs LIKELY_DEAD vs FALSE_POSITIVE. ROOT_CAUSE picks-exactly-one from a fixed list. SAFE_INLINE vs NEEDS_REVIEW. The taxonomy doesn't have to be elaborate — it just has to be there.

Why does this matter? Because unstructured output is unreviewable at scale. If Claude hands me three paragraphs about "potential issues", I have to read all three and decide whether each one is a real finding or a hedge. If Claude hands me a list classified into three buckets with evidence per item, I can skim the CONFIRMED bucket, sanity-check the LIKELY bucket, and ignore the FALSE_POSITIVE bucket. The taxonomy is doing triage for me.

That's the difference between "Claude is impressive" and "Claude is useful in production".

3. Safety gates beat reverting commits

Early on I let Claude apply changes directly. It was faster. It was also occasionally disastrous. The single worst case: I asked Claude to "clean up unused imports" in a directory, and it removed an import that was only referenced inside a template string for dynamic loading. The build passed locally. The staging deploy failed because the dynamically-loaded module was no longer in the bundle. I lost an afternoon to a git bisect that should have taken five minutes.

The fix wasn't smarter prompts — it was a gate. Every prompt I keep now ends with some variant of:

"Do not apply. Output the proposal and wait for approval."

The cost is one extra round-trip. The benefit is I see the blast radius before it lands. Propose-before-apply is the single highest-ROI change I made.

4. What I changed in my workflow

I deleted ~80% of my prompt notebook. The keepers all had explicit taxonomies or explicit gates.
I stopped writing one-shot prompts for novel problems. If I'd only use a prompt once, the prompt isn't the artifact — the answer is. I just ask conversationally.
I added file:line citations as a requirement to any prompt that reports findings. Vague pointers cost me too much grep time.
I made "wait for my approval before applying" the default ending, not an exception.
I started keeping a small file of anti-patterns — prompts I'd written that produced confidently wrong output — so I'd remember why I cut them.

The meta-lesson

Prompt design is a real engineering discipline now, not a soft skill. The teams that treat it like one will get production-grade output. The teams that treat Claude as a chatbot will keep getting chatbot output and blame the model.

If this resonated and you want the prompt set I ended up with — 50 prompts, 10 workflows, 5 drop-in skills, 8 CLAUDE.md templates by tech stack — I packaged it as The Senior Engineer's Playbook for Claude Code, $9 (launch price, going to $15 after the first 100 sell).

See sample prompts and grab the playbook →

14-day refund, no questions. If a single prompt doesn't save you the $9 in real work, I'll refund and we part on good terms.

Happy to debate any of the patterns above in the comments.

Top comments (1)

Ken Imoto • May 21

Pattern 3 (the approval gate) is the one I had to learn the hard way. My publish harness shipped 3 broken articles before I added a hold-and-confirm step on the at-job. The "cite line numbers" rule I'd promote to non-negotiable - single highest leverage rule against hallucinated refactors that touch the wrong symbol. The structured classification idea is interesting; I do something similar with ERR trap codes so silent fails get categorized rather than swallowed.