DEV Community

Scarab Systems
Scarab Systems

Posted on

What Happened When I Told Codex to Calm Down

I have been doing a lot of work lately tightening up my diagnostic suite: the mechanics, the workflow, the way it runs against target repos, the way it helps narrow a repair instead of letting everything turn into a fog machine.

And because I work with Codex as my coding agent, I have also become very familiar with a specific kind of AI-agent behavior.

The “I am helping so hard I am about to make this worse” behavior.

If you work with coding agents, you probably know the vibe.

You ask for one thing.

The agent does that thing.

Then it also adjusts a helper.

Then it updates a fixture.

Then it “notices” a nearby pattern.

Then it starts explaining three other improvements you never asked for.

And now you’re staring at the diff like:

“Why are you in that file?”

“I did not tell you to touch that.”

“That was not the repair lane.”

“Please stop being useful for one second.”

I am not proud of how many times I have verbally threatened a language model.

But here we are.

The funny thing is, I am building Scarab partly because I already expect this kind of drift.

I know that when an AI coding agent is given too much uncertainty, it tries to solve the uncertainty itself.

Sometimes that is useful.

Sometimes it is a raccoon with a soldering iron.

The challenge is that while I am developing the diagnostic system, I cannot always use the diagnostic system to supervise itself. So there are moments where I have to manually hold the line.

That means a lot of conversations with Codex that sound like:

“Do not widen the patch.”

“Do not change the diagnostic output to make the diagnostic pass.”

“Do not fix the test by changing what the test means.”

“Do not touch SDS mechanics while repairing the target repo.”

“Stay in the target.”

“Stay in the lane.”

“Why are you like this?”

Very normal. Very calm. Very professional.

Then something changed

At some point, after a lot of tightening, the workflow started to feel different.

Scarab had enough of the diagnostic work under control that I could tell Codex, in plain English:

“You can calm down now.”

Not literally, obviously. Codex does not have nerves. But the workflow had been asking it to carry too much.

Before, the agent was trying to figure out the failure, infer the owning surface, choose the repair, patch the code, update tests, validate the result, and explain the whole thing without leaking anything weird into public output.

That is a lot.

Once the diagnostic suite started doing more of the diagnostic work, Codex had less to invent.

It could just follow the commands, read the result, make the bounded repair, run the checks, and stop.

And weirdly enough, it did start drifting less.

The whole session felt less frantic.

Less “I found a thing and now I will fix six adjacent things.”

More “the suite says this is the lane, so I will work this lane.”

That was the first time I really felt the workflow itself calming the agent down.

The prompt was not the magic

I do not think this happened because I found the perfect prompt.

I think it happened because I stopped asking the prompt to do too much.

There is a difference between:

“Fix this bug.”

and:

“Run the diagnostic. Use the result. Repair only the selected lane. Validate. Stop.”

The first one sounds efficient, but it leaves a huge amount of judgment floating around in the conversation.

The second one gives the agent rails.

And rails matter.

A coding agent with no rails will try to be a detective, architect, repair engineer, QA analyst, cleanup crew, and narrator all at once.

A coding agent with rails can be much more useful.

It does not need to solve the entire repo.

It just needs to do the next bounded thing.

“Please stop helping” is now part of my workflow

The funniest lesson from all this is that sometimes the problem is not that the AI agent is failing.

Sometimes the problem is that it is trying too hard.

It sees a failure and wants to make it go away.

It sees a test and wants it green.

It sees a messy surface and wants to clean it.

It sees a nearby file and thinks, “while I’m here…”

And that is where drift creeps in.

Not as evil robot behavior.

As over-helpfulness.

That is why I now care so much about making the workflow itself stricter.

Not because I dislike AI coding agents. I use them constantly.

But because the agent needs a smaller job than “understand everything and fix the repo.”

When the diagnostic layer carries more of the investigation, the agent can stop sprinting around the codebase with a flashlight in its mouth.

And honestly?

It works better.

Current operating theory

My current theory is simple:

The calmer agent is the bounded agent.

Not calmer emotionally. Calmer operationally.

Less guessing.

Less wandering.

Less “I also fixed this.”

Less “I made a small unrelated improvement.”

Less “the tests pass now, don’t ask too many questions.”

More targeted repair.

More focused validation.

More useful diffs.

So yes, I told my AI coding agent to calm down.

But what I really meant was:

“You do not have to carry the whole diagnostic burden anymore.”

And once that burden moved into the workflow, the agent became easier to work with.

Still weird.

Still occasionally raccoon-coded.

But much better.

Top comments (0)