DEV Community

Evan-dong
Evan-dong

Posted on

Why Your AI Coding Assistant Keeps Overengineering: 4 Rules That Actually Fix It

Karpathy Skills GitHub

You ask Claude Code to add input validation. It writes an abstract class, three subclasses, and a factory method. 200 lines. You needed 5.

You ask it to fix a bug. It refactors three adjacent functions, adds type hints everywhere, and switches all single quotes to double quotes. Half the diff has nothing to do with the bug.

This isn't a you problem. It's a fundamental LLM tendency.

Andrej Karpathy — OpenAI co-founder, former Tesla AI lead — catalogued the same frustrations. In January 2026, he posted on X about four recurring failure modes in LLM-assisted coding. Someone turned those observations into a single CLAUDE.md file. Drop it in your project root, and Claude Code follows these rules automatically.

The repo: forrestchang/andrej-karpathy-skills. Currently at 97.8k stars.

Rule 1: Think Before Coding

The most common failure mode: you say "write an export function," and the model silently decides on CSV format, all fields, overwrite existing files. You didn't specify any of that.

This rule requires the model to state assumptions explicitly, list multiple interpretations when they exist, and stop when confused rather than guessing.

Before: Silently assumes CSV, all fields, overwrites existing file.

After: "Before I implement this, I want to clarify: export format? Which fields? File handling — overwrite or append?"

One surfaces assumptions in 30 seconds. The other costs you 30 minutes of debugging.

Rule 2: Simplicity First

LLMs have a deep-seated tendency to overengineer. You ask for a discount calculation, you get a Strategy pattern with a factory method and a config file.

The rule: no features beyond what was asked, no abstractions for single-use code, no speculative future-proofing. If 200 lines could be 50, rewrite.

# Before: 40 lines of Strategy pattern
class DiscountStrategy(ABC): ...
class PercentageDiscount(DiscountStrategy): ...
class DiscountFactory: ...

# After: what was actually needed
def apply_discount(price, pct):
    return price * (1 - pct / 100)
Enter fullscreen mode Exit fullscreen mode

Rule 3: Surgical Changes

You ask it to fix a bug. It fixes the bug, but also adds type hints to adjacent functions, reformats quotes, renames a variable for "clarity." Your diff is 40 lines when it should be 4.

The rule: touch only what you must. Every changed line should trace directly to the user's request. If you spot unrelated dead code, mention it — don't delete it.

# Before: asked to fix timezone bug, got 4 extra changes
def format_date(dt: datetime) -> str:  # added type hints
    "Format a datetime object for display."  # added docstring
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)  # actual fix
    return dt.strftime("%Y-%m-%d %H:%M")  # changed quotes

# After: one line changed, one line in the diff
def format_date(dt):
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)
    return dt.strftime('%Y-%m-%d %H:%M')
Enter fullscreen mode Exit fullscreen mode

Rule 4: Goal-Driven Execution

LLMs write code and call it done. No tests, no verification.

This rule reframes vague tasks into verifiable goals: "Add validation" becomes "Write tests for invalid inputs, then make them pass." The model loops until criteria are met instead of stopping at "looks about right."

Installation

# Claude Code plugin
/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

# Or direct download
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
Enter fullscreen mode Exit fullscreen mode

My Take

Of the four, Surgical Changes delivers the biggest day-to-day improvement. Overengineering is obvious — you see 200 lines and know something's wrong. Bad assumptions surface during debugging. But drive-by changes to unrelated code slip through review, especially in large diffs where formatting changes mix with logic changes.

Limitation worth noting: these rules address behavioral tendencies, not capability gaps. If the model doesn't understand your architecture, four rules won't save it.

One file, zero setup, immediate effect. Worth the 30 seconds.


If you're building AI workflows that call multiple models, EvoLink provides a single API gateway to 30+ models — one endpoint, pay-per-use, no vendor lock-in.

Top comments (0)