Instruction Best Practices: Precision Beats Clarity

#ai #beginners #performance #agents

Two rules in the same file. Both say "don't mock."

When working with external services, avoid using mock objects in tests.

When writing tests for src/payments/, do not use unittest.mock.

Same intent. Same file. Same model. One gets followed. One gets ignored.

I stared at the diff for a while, convinced something was broken. The model loaded the file. It read both rules. It followed one and walked past the other like it wasn't there.

Nothing was broken. The words were wrong.

The experiment

I ran controlled behavioral experiments: same model, same context window, same position in the file. One variable changed at a time. Over a thousand runs per finding, with statistically significant differences between conditions.

Two findings stood out.

First (and the one that surprised me most): when instructions have a conditional scope ("When doing X..."), precision matters enormously. A broad scope is worse than a wrong scope.

Second: instructions that name the exact construct get followed roughly 10 times more often than instructions that describe the category. "unittest.mock" vs "mock objects" — same rule, same meaning to a human. Not the same to the model.

Scope it or drop it

Most instructions I see in the wild look like this:

When working with external services, do not use unittest.mock.

That "When working with external services" is the scope — it tells the agent when to apply the rule. Scopes are useful. But the wording matters more than you'd expect.

I tested four scope wordings for the same instruction:

# Exact scope — best compliance
When writing tests for src/payments/, do not use unittest.mock.

# Universal scope — nearly as good
When writing tests, do not use unittest.mock.

# Wrong domain — degraded
When working with databases, do not use unittest.mock.

# Broad category — worst compliance
When working with external services, do not use unittest.mock.

Read that ranking again. Broad is worse than wrong.

"When working with databases" has nothing to do with the test at hand. But it gives the agent something concrete - a specific domain to anchor on. The instruction is scoped to the wrong context, but it's still a clear, greppable constraint.

"When working with external services" is technically correct. It even sounds more helpful. But it activates a cloud of associations - HTTP clients, API wrappers, service meshes, authentication, retries - and the instruction gets lost in the noise.

The rule: if your scope wouldn't work as a grep pattern, rewrite it or drop it.

An unconditional instruction beats a badly-scoped conditional:

# Broad scope — fights itself
When working with external services, prefer real implementations
over mock objects in your test suite.

# No scope — just say it
Do not use unittest.mock.

The second version is blunter. It's also more effective. Universal scopes ("When writing tests") cost almost nothing — they frame the context without introducing noise. But broad category scopes actively hurt.

Name the thing

Here's what the difference looks like across domains.

# Describes the category — low compliance
Avoid using mock objects in tests.

# Names the construct — high compliance
Do not use unittest.mock.

# Category
Handle errors properly in API calls.

# Construct
Wrap calls to stripe.Customer.create() in try/except StripeError.

# Category
Don't use unsafe string formatting.

# Construct
Do not use f-strings in SQL queries. Use parameterized queries
with cursor.execute().

# Category
Avoid storing secrets in code.

# Construct
Do not hardcode values in os.environ[]. Read from .env
via python-dotenv.

The pattern: if the agent could tab-complete it, use that form. If it's something you'd type into an import statement, a grep, or a stack trace - that's the word the agent needs.

Category names feel clearer to us, humans. "Mock objects" is plain English. But the model matches against what it would actually generate, not against what the words mean in English. "unittest.mock" matches the tokens the model would produce when writing test code. "Mock objects" matches everything and nothing.

Think of it like search. A query for unittest.mock returns one result. A query for "mocking libraries" returns a thousand. The agent faces the same problem: a vague instruction activates too many associations, and the signal drowns.

The compound effect

When both parts of the instruction are vague - vague scope, vague body - the failures compound. When both are precise, the gains compound.

# Before — vague everywhere
When working with external services, prefer using real implementations
over mock objects in your test suite.

# After — precise everywhere
When writing tests for `src/payments/`:
Do not import `unittest.mock`.
Use the sandbox client from `tests/fixtures/stripe.py`.

Same intent. The rewrite takes ten seconds. The difference is not incremental, it's categorical.

Formatting gets the instruction read - headers, code blocks, hierarchy make it scannable. Precision gets the instruction followed - exact constructs and tight scopes make it actionable. They work together. A well-formatted vague instruction still gets ignored. A precise instruction buried in a wall of text still gets missed. You need both.

When to adopt this

This matters most when:

Your instruction files mention categories more than constructs, like "services," "libraries," "objects," "errors" etc.
You use broad conditional scopes: "when working with...," "for external...," "in general..."
You have rules that are loaded and read but not followed
You want to squeeze more compliance out of existing instructions without restructuring the file

It matters less when your instructions are already construct-level ("do not call eval()") or unconditional.

Try it

Open your instruction files.
Find every instruction that uses a category word -> "services," "objects," "libraries," "errors," "dependencies."
Replace it with the construct the agent would encounter at runtime - the import path, the class name, the file glob, the CLI flag.
For conditional instructions: replace broad scopes with exact paths or file patterns. If you can't be exact, drop the condition entirely - unconditional is better than vague.

Then run your agent on the same task that was failing. You'll see the difference.

Formatting is the signal. Precision is the target.