Boris Kl

Posted on Jun 6

I set up Claude Code for a real production project. Here's what actually earned its keep

#ai #productivity #claudecode #automation

Everyone's got a "10 AI coding tricks" post. This isn't that. This is what's left after three weeks of running Claude Code on a real project — a bilingual booking bot for a beauty salon (Telegram + WhatsApp, Postgres, Google Calendar) — once the novelty wore off and only the useful parts survived.

Out of the box, Claude Code is a very smart intern with amnesia. Every session it shows up brilliant and clueless. The whole game is fixing the clueless part. Four things did that for me: a CLAUDE.md file, two custom agents, one skill, and two hooks. Everything else I tried, I deleted.

CLAUDE.md: the file that pays rent every single day

CLAUDE.md sits in your repo root and gets read at the start of every session. Mine started as three lines. It grew every time the assistant did something I had to undo.

That's the trick, honestly. Don't write CLAUDE.md upfront — grow it from failures. Mine now includes things like:

## Architecture rules
- Business logic lives in src/core/ and must not know about
  Telegram or WhatsApp. Channel code lives in src/adapters/.
- All times stored in UTC; convert only for display.
- Booking creation must stay double-booking-safe — never remove
  locks or constraints around it.

## Working agreements
- Before "done": run typecheck && lint && test and show the result.
- Schema changes go through a migration file. Always.
- Prefer the smallest diff that does the job.

Each of those lines exists because the assistant once did the opposite. It put Telegram-specific code in core logic — new rule. It "fixed" a timezone bug by converting at storage time — new rule. It reported "done" with failing types — new rule.

Three weeks in, I almost never repeat an instruction. That file is the difference between an assistant and a goldfish.

Custom agents: the reviewer I argue with

Custom agents live in .claude/agents/ as markdown files with a system prompt. You invoke them for a specific job, they do it with their own instructions and tool limits, and they don't pollute your main session's context.

The one that earns its keep daily is a code reviewer:

---
name: code-reviewer
description: "Reviews changes for bugs and security issues"
  before they are committed.
tools: Read, Grep, Glob, Bash
---

You are a strict but practical code reviewer.
Check, in this order: correctness (timezone boundaries,
double-booking windows), security (unvalidated webhook input,
SQL built by concatenation, missing signature checks),
project rules from CLAUDE.md, and whether behavior changed
without a test changing.
Report findings ordered by severity, with file:line and a
concrete fix. If something is fine, don't pad the review.

The point isn't that it catches everything. The point is that it's a different context with one job. The main session wrote the code and is biased toward liking it. The reviewer agent reads it cold. It regularly catches things the main session waved through — a webhook handler that trusted message_id without checking the signature, a slot calculation that broke across midnight.

It found the midnight bug before my client's customers did. That one agent paid for the whole setup.

A skill that stops me from skipping steps

Skills are reusable workflows — a SKILL.md file describing a procedure the assistant follows when you invoke it. I have exactly one that matters, /add-feature:

Restate what we're building, confirm.
List files that will change and why. Smallest possible diff.
Implement, following CLAUDE.md.
Write tests for the changed units.
Run the code-reviewer agent on the diff. Fix what it finds.
Summarize: what changed, how to try it, what I must do manually.

Nothing clever. It's a checklist. But here's the thing about checklists — they work precisely because on the fifth feature of the day, I would skip the review step. The skill doesn't get tired at 11pm. Pilots figured this out decades ago; we're just catching up.

Hooks: the two-line insurance policy

Hooks run shell commands on events. I only need two.

The first blocks any edit to secrets files. The assistant has no business touching .env, ever, and now it physically can't — a PreToolUse hook checks the file path and exits with an error if it looks like secrets. Cost me five minutes to write. Worth it the first time a refactor tried to "helpfully" update an env var.

The second runs the typecheck after every file edit and pipes problems straight back into the session. The assistant sees its own type errors immediately instead of discovering them at the end, which means it fixes them while the context is hot. This one change cut my "it said done but nothing compiles" rate to roughly zero.

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{ "type": "command",
        "command": "bash .claude/hooks/protect-secrets.sh" }]
    }],
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{ "type": "command",
        "command": "npm run --silent typecheck" }]
    }]
  }
}

What I tried and deleted

For honesty's sake: I also built an agent for writing commit messages (the main session does this fine), a skill for deployments (too risky to automate, I want my hands on that), and a hook that auto-ran the full test suite on every edit (made everything crawl — the typecheck is the right granularity; full tests run at review time).

If a piece of setup doesn't save you something every day, it's not configuration, it's clutter.

The honest summary

Claude Code without setup is a talented freelancer on their first day, every day. With a grown-from-failures CLAUDE.md, one cold-eyed reviewer agent, one checklist skill and two hooks, it's closer to a colleague who's been on the project for a month.

The setup took me about two hours total, spread over days, mostly as reactions to things that annoyed me. The payback is that I now ship features for a production bot — payments, reminders, a wait-list — in evenings, alone, without the quality dropping.

Start with CLAUDE.md. Add a reviewer agent the first time you catch a bug you should've caught. Grow the rest from your own failures — they're better teachers than my list anyway.

Top comments (4)

Ashan de Silva • Jun 7

This is the useful version of a Claude Code writeup because it separates novelty from operational leverage. The things that keep earning their place for me are not clever prompts. They are the boring rails around the work: repo-local rules, small task packets, verification commands, and a habit of making the agent read the existing code before touching anything.

The pattern I keep seeing: Claude Code feels magical on day one, then messy on day ten unless the project has a real operating shape. Once the rules and checks are explicit, it becomes much more like a junior engineer with absurd stamina than a magic autocomplete box.

Boris Kl • Jun 21

Yeah, "junior with absurd stamina" is exactly the feel. The day-ten mess almost always traces back to one missing thing for me: a verification command the agent can run on itself. If it can't check its own work, it'll break something with full confidence and move on. Read-before-touch is the other half. A big chunk of my CLAUDE.md is just "go see how X already works before you write Y."

xulingfeng • Jun 7

The reviewer agent catching the midnight bug before customers did — that line hit. I've been burned by the same kind of edge case (TZ boundaries at midnight) more times than I count. The CLAUDE.md approach of growing rules from actual failures instead of upfront planning is the real takeaway here. That's how you build a doc that actually earns its keep.

Boris Kl • Jun 21

Midnight TZ bugs are their own special hell. The grow-rules-from-failures thing wasn't a plan, honestly. I just got tired of re-explaining the same gotcha every session, so each line in that file is basically a scar. Upfront rules always read clean and then miss the exact failure mode that bites you in prod.