DEV Community

Deva
Deva

Posted on

One line was corrupting every draft my generator produced

One line. That is all it took to contaminate every draft my content engine generated.

I run a generation pipeline where claude p produces draft posts. My CLAUDE.md defines an orchestrator policy that requires Claude to emit a routing verdict on every turn: something like ▸ T0 · main thread · generate draft · interactive. This is genuinely useful behavior in interactive sessions. It tells me at a glance what tier of work got dispatched and why.

The problem: claude p does not know whether it is running interactively or headlessly. It follows the instructions in CLAUDE.md regardless. So every single draft that came back from the generator had that verdict line sitting at the top before the actual post body.

Here is what a raw generation looked like:

▸ T0 · main thread · generate draft · interactive

The thing most developers get wrong about shipping fast...
Enter fullscreen mode Exit fullscreen mode

The verdict line is not content. It is orchestration metadata. But it was getting saved to disk and rendered in the dashboard as part of the draft. Every draft. Not occasionally, not on edge cases. Every single one.

The fix was a _clean() function that strips lines matching the routing verdict pattern before the draft gets saved or displayed. The regex matches lines starting with ▸ T followed by digits, strips them, and re joins. Runs on the raw output before anything downstream touches it.

The tradeoff I had to think through: strip at save time or strip at display time?

Strip at display time means the raw artifact on disk is faithful to what the model returned. You can always go back and see the full output. The downside is that every consumer of that data (the dashboard, any export, any future pipeline stage) has to remember to strip. That is a leak waiting to happen. One new consumer that forgets, and metadata is content again.

Strip at save time means the stored draft is clean by definition. Nothing downstream has to think about it. The cost is that you lose the raw output, but the raw output has no value here. The routing verdict is not data I need to keep.

I went with strip at save time. The principle is simple: garbage that enters the store becomes everyone's problem. Stop it at the door.

What I would do differently: the root issue is that the orchestrator policy in CLAUDE.md does not distinguish between interactive use and headless claude p calls. A better design emits the routing verdict only when a human is watching. The right fix is a context signal, an env var or a system prompt addendum, that suppresses non content output in pipeline mode. _clean() is the right fix in the short term. It is not the right architecture over time.

The deeper lesson is that any LLM generated output you pipe into a store will carry artifacts from the instructions you gave it. Those instructions exist for good reasons. The instructions do not know they are writing to a database. You have to stand at the boundary and decide what is data and what is noise, and you have to make that decision exactly once, at the point of entry, not every time something reads the data downstream.

Until the root fix lands, _clean() runs on every save. Every draft that comes out is clean. The dashboard shows content, not metadata. That is the job.

Top comments (0)