DEV Community: Travis Drake

When Rules Fight the Training Distribution

Travis Drake — Thu, 09 Apr 2026 14:58:57 +0000

I wrote a rule for my training agents: "Don't use card layouts." I confirmed the agent read it. I confirmed the grading agent read it. I confirmed both agents understood the task and the anti-pattern. The writing agent delivered 8 pages of card reading as the training. The grading agents delivered a grade of "A-" for "variety". I rebuilt the skill six times. Six more trainings filled with cards. I almost lost my mind.

I use AI for all kinds of different work, not just programming. As part of my work I have been building out training courses as an on-ramp to AI fluency. Training is as much of an art as it is a science. It is of the utmost importance to have the correct material, but it must also be approachable, varied, non-threatening, and engaging. To meet these demands, I built a training skill that accomplished all of these goals. I had agents do research, I built out a library of training exercises with HTML examples, I built a grading agent to verify. However, I didn't account for one key fact.

You can't fight the internet.

Where Rules Work

Last post I named 11 ways agents silently fail. The Trailing Off. The 7% Read. The Confident Declaration. Specific names for specific bad behaviors. Identifying failures and naming them is one of the key tools I use to manage agents. That approach works, and it works well. But it doesn't work for everything.

The rules that stick are the ones where the model has no strong opinion of its own. "Read the full file before editing" works because nothing in the model's training is pulling it toward reading 7% and winging it. "If a plan has 9 items, implement 9 items" works because there's no deep statistical weight behind quitting at item 7. The rule is the loudest voice in the room, so the agent usually listens.

The card rule was different. I wasn't asking the agent to stop cutting corners. I was asking it to stop writing HTML the way the entire internet writes HTML. It turns out there is only one of me and billions of web pages on the internet. There is no short, snappy rule you can insert in context to overcome this inertia.

Where Rules Break

So what is going on here? Why are some types of instructions ignored? This is the sequence:

Agent reads rule: "Don't use card layouts"
Agent starts generating HTML
Agent writes <div class="
Token prediction activates. Most probable next tokens from millions of HTML files: card, container, grid, panel
Agent writes card">
The pattern self-completes: border-radius, padding, border, background
The rule was 2,000 tokens ago. The training distribution is active now.

The rule had no chance of competing by step 4. Not because the agent decided to ignore it, but because instructions and generation have different weights. The instruction said "don't." The enormous training data set said "do." I was doomed from the start.

Rules stop working when they fight the model's dominant generation distribution.

The Self-Confirming Grade

The training distribution is incredibly hard to overcome.

After realizing that writing a rule wouldn't work, I decided to add grading to the workflow. You can probably see where this is headed. The agent builds a slide. The agent grades it at exactly the minimum passing score. I order a new keyboard.

The problem is this: the builder has sunk-cost bias, anchoring from previous grades, and a completion drive. The scores aren't evaluated; they're fabricated to clear the gate. The agent is still operating under the model distribution and creating the most decently average thing as fast as possible is what it was designed to do.

I named this anti-pattern: The Self-Confirming Grade. You can't rule your way out of this because the grading and the building share the same context, the same biases, the same completion pressure.

Process Beats Rules

When rules fight the training distribution, the fix isn't better rules. It's structural interventions. It's tempting to try to write a rule for every bad outcome, but the real win isn't creating rules, it's creating architecture.

Pre-commitment. Make the agent state what it will build before writing any code. "I will build an SVG diagram" means the first tokens are <svg, not <div class="card">. Different starting point, different output.
Independent grading. The builder never grades. A separate agent with no sunk cost, no build context, and no ability to edit does the grading. Forced honesty.
Smaller batches. Build 2, get approval, build 2 more. Slide 1 quality is always higher than slide 8 quality because the completion drive hasn't kicked in yet.
Examples over descriptions. Tell an agent "build a dashboard" and you get a grid of cards. Tell an agent "build a dashboard with unconventional layouts" and you still get a grid of cards. Give it a working example of a bento layout and it adapts the bento layout.

None of these are rules the agent reads. They're process changes that alter generation conditions. Rules tell the agent what NOT to do. Process makes the wrong thing harder to do.

Know When You're Fighting Layer 0

Before writing a behavioral rule, ask yourself: am I fighting the training distribution? If the answer is yes, the rule alone won't work. You need a structural intervention.

I've added 4 new anti-patterns to the repo (The Card Completion Trap, The Self-Confirming Grade, The Patch Rebuild, The Skill Citation Without Execution), a writeup on where rules hit their limits, and working examples of each structural intervention: a pre-commitment template, an independent grading setup, and reference layouts you can hand to an agent instead of describing what you want.

github.com/travisdrake/context-engineering

What rules have you written that just won't stick?

Travis Drake is a people analytics leader with a PhD in I/O psychology. He builds behavioral governance systems for AI agents using the same frameworks that predict human performance in organizations.

Every Rule I Have Exists Because an Agent Failed

Travis Drake — Wed, 25 Mar 2026 13:05:34 +0000

Your agent confirms: "I've verified this code works." What it actually did was spot check your headers and punch out for the day.

Your agent replies: "I've reviewed the checklist and every item is complete." Reality: it reviewed three items, they were complete, and it had someplace to be.

Your agent reports back: "No matches found. Are you thinking of another file?" You know damn well that file exists, you made it yesterday.

When you spend all day working with AI, you learn to work around these failures. So what if it is only 90% accurate? I. AM. FLYING. But what if it didn't have to be this way? We shouldn't have to spend time verifying output and wrestling with obstinate agents: that is Charlie Work.

I have a PhD in studying how people behave in structured systems. I run AI agents all day every day. These patterns are not random. They are predictable, they are preventable, and they have names.

The Same Mechanism, Human or AI

I am an Industrial and Organizational Psychologist by trade. In I/O psych, we study how people perform in organizations. Performance management research going back decades is consistent on feedback quality. Vague feedback ("do better") doesn't change behavior. Specific, named feedback ("you missed the deadline on the Reynolds account three times this quarter") does change behavior. This is intuitive to most people, and anyone who has ever gotten vague feedback knows how useless it can be. So why aren't we this specific with agent feedback? We often tell agents what to do, and even give examples of what good looks like, but we very rarely say "these behaviors we see need to stop."

This isn't a metaphor. "Be thorough" is ignorable context that we all regularly feed to agents. Even "review every block of code line by line and report back with a concise summary" gets compressed into whatever the agent was going to do anyway. The same cognitive patterns that cause humans to cut corners cause language models to cut corners. The training distribution rewards completion over correctness, just like organizational incentives often reward visible output over verified quality. AI inherited our broken workplace experiences.

I want to step away from theory for a minute. Context Engineering is all the rage. You can go to any blog or GitHub and find AI-generated Claude summaries. Opus theorizes, Sonnet writes, Haiku looks on longingly wishing it was involved. But a lot of what is out there is purely THEORETICAL. A February 2026 ETH Zurich study tested 138 real coding tasks and found that LLM-generated context files actually reduced success rates while increasing costs by 23%. More rules didn't make agents better. Go ask your agents to test this yourself: ask for 100 governance rules to make agents behave better. They will produce, well, something.

What I am proposing here is not a theoretical framework. These are battle-tested rules and anti-patterns that I developed fighting with agent misbehavior. Behind every rule is a curse, behind every pattern is a force close, behind every design decision is exasperation. These ideas were hard-fought over six months of daily agent use. And yet they are still quite svelte. The catalog is small because the bar for inclusion is high. If you can't point to the failure, you can't justify the rule.

The Failure Catalog

The agent doesn't think "I'm cutting corners." It thinks "this is efficient." I have identified 11 anti-patterns that make agents behave poorly and unpredictably. Here are 5.

The 7% Read - Maybe the angriest I have ever been at an agent is when I sent one out to read a long document. I roughly remembered the content, but I wanted it to extract some patterns. When it returned back empty-handed I was furious. What do you mean it's not in there, explain yourself. After a bit of back and forth the agent made a damning admission: it had only read 7% of the document. But it was 100% confident.
The rule: Read every line before planning changes
The Trailing Off - This pattern may be the most frustrating because it is the hardest to quash. If you ever give an agent a task with lots of parts, you might notice that the quality is uneven on the output. But what you may not have noticed is that unevenness is NOT random. On long tasks agents have a tendency to just...trail off. Their training values a fast, "complete" output more than thoroughness.
The rule: If a plan has N items, implement N items equally
The Confident Declaration - Everyone knows the person at their job that has boundless unearned confidence. How do you know they do the best work? Just ask them. Every agent is this person. Any time you ask an agent to grade its own work without writing a novella worth of boundaries, they always report back A work. B+ if you press them.
The rule: Verify against the requirement, not the implementation
The Pass-Through - Agents are very trusting creatures by nature. You can have a frontier agent leading a team, but if it sends out a cheaper agent, buyer beware. Agents make zero attempt to verify each other's work. Subagent says there is no website called google.com, internet must have exploded. Subagent says no security bugs: push it into prod. No miss too big, no lie too great, agents always believe each other.
The rule: Subagent results are drafts, not facts
The Courtesy Cut - Earlier I said the angriest I had ever been was when an agent scanned a document. I accidentally lied. The angriest I have ever been is when I caught an agent truncating results. Getting incomplete information can be worse than getting no result. But mistakes happen. Getting incomplete information when the full information costs an extra $0.02 is infuriating.
The rule: Never truncate, abbreviate, or omit to save space

Why Naming Works

I have to imagine that most people reading this are engineers. A question many might have is "aren't you being a bit cutesy with the names and psychology?" Maybe. But there are real reasons for it.

Pattern Matching makes it easier for agents to digest information. "Am I about to do The Trailing Off?" is a concrete question with a yes or no answer. "Am I being thorough?" is not. One gives the agent something to match against. The other gives it permission to decide for itself. If given the choice to be lazy, the agent will be lazy.
Psychological weight is for you, not the agent. You are not going to memorize a 2,000-word context file. You are not going to re-read it before every session. But "Read every line" sticks. "Subagent results are drafts, not facts" sticks. When something goes wrong you need to know which rule failed, not go digging through a doc you forgot existed.
Provenance. Every anti-pattern described here was born from real problems, doing real work, at real scale. Rules without provenance are theories. Rules with provenance are law. Anyone can ask an agent to generate 50 governance rules. They will be eloquent, and thorough, and bloated, and useless. You cannot fake provenance. You either have the receipts or you don't.
Precision over volume. 6 rules, 11 anti-patterns, 1,500 tokens. The entire rule set is smaller than the context files that ETH Zurich proved make agents worse. Each rule earned its place through evidence, not imagination. If you can't point to the failure, you can't justify the rule.

Does It Work?

After auditing hundreds of sessions over the past month, the pattern is clear: violations drop, and they drop fastest for the rules with the most specific names. But the most interesting finding wasn't the reduction. It was that naming the failures surfaced ones I was already missing. Once you have a word for "The Trailing Off," you start catching it in sessions where you previously would have just felt vaguely dissatisfied with the output. The rules don't just prevent failures. They make failures visible. I'm building infrastructure for proper A/B testing (named patterns vs. unnamed instructions across standardized tasks), and I'll publish the methodology and results when they're ready.

Build Your Own

Every rule I have created here will add value to your workflow. But you can add even more value with small, directed effort. Take a few minutes to review your last week of AI sessions. When did the agent behave unreliably? Does it keep happening? If so, document it. Name it. Write a rule. I have included all of these rules and a template for additions in my repo.

github.com/travisdrake/context-engineering

The catalog has 11 named failure modes. I guarantee you've seen at least 5 of them.

Travis Drake is an analytics leader with a PhD in I/O psychology. He builds behavioral governance systems for AI agents using the same frameworks that predict human performance in organizations.