WonderLab

Posted on May 22

One Open Source Project a Day (No. 72): Andrej Karpathy Skills — Fix Four Chronic LLM Coding Problems With a Single CLAUDE.md

#opensource #ai #claude #karpathy

Introduction

"LLMs excel at looping until they meet specific goals — so provide success criteria rather than imperative instructions."

This is the NO.72 article in the "One Open Source Project a Day" series. Today we are exploring andrej-karpathy-skills.

This project is unusual: its core is not a tool, framework, or library — it is a single CLAUDE.md file.

The story starts with Andrej Karpathy posting on X after heavy Claude Code usage, documenting failure patterns he observed in LLM coding: diving into implementation without clarification, engineering simple problems into complex solutions, making unrequested changes to adjacent code.

The multica-ai team distilled those observations into four actionable behavioral principles and packaged them into a CLAUDE.md — drop it in a project, and Claude Code changes how it behaves. It also ships in Claude Code plugin and Cursor rules formats, covering the two main AI coding tools.

The project answers a frequently overlooked question: rather than teaching an LLM exactly what to do, teach it how to think.

What You Will Learn

The four LLM coding failure modes Karpathy identified
The content and real-world examples behind each of the four principles
Three installation methods: standalone CLAUDE.md / Claude Code plugin / Cursor rules
Why "give success criteria" is more effective than "give step-by-step instructions"
How to verify the guidelines are actually working

Prerequisites

Familiarity with Claude Code, Cursor, or similar AI coding tools
Enough hands-on coding experience to recognize LLM coding pain points

Project Background

Project Introduction

andrej-karpathy-skills is fundamentally a behavioral configuration file. Its design philosophy stems from one key insight:

LLM coding problems are often not about capability — they are about unconstrained behavior.

The model is capable of writing simple code, but nothing tells it "don't write complex code." It is capable of asking for clarification first, but there is no pressure making it do so. It knows it shouldn't touch unrelated code, but the habit of "while I'm at it" is hard to suppress.

This CLAUDE.md file makes those constraints explicit, injecting them into every conversation as context.

The project ships in three formats to cover different workflows and tools.

Author/Team

Maintainer: multica-ai (Multica AI team)
Inspiration: Andrej Karpathy's observations shared on X about LLM coding usage
Original author: The CLAUDE.md content was originally compiled by forrestchang; multica-ai extended it into a full plugin ecosystem

About Andrej Karpathy: Co-founder of OpenAI, former Tesla AI Director, now independent researcher. Known for nanoGPT, Neural Networks: Zero to Hero, and other educational projects widely followed in the AI community. His practical feedback on AI tools carries significant weight.

Project Stats

📄 Core file: CLAUDE.md (behavioral guidelines)
🔌 Claude Code plugin + Cursor rules
📖 Includes: EXAMPLES.md (contrast examples for each principle)
📄 License: MIT
🌐 Repository: multica-ai/andrej-karpathy-skills

Main Features

Core Utility

This CLAUDE.md directly targets the four most common LLM coding failure modes:

Common LLM coding failures
  ├── Silent assumptions (dive into code without clarifying)
  ├── Over-engineering (turn simple problems into complex ones)
  ├── Scope creep (touch unrelated code while "in the area")
  └── Vague execution (no verifiable definition of done)
        ↓ CLAUDE.md injects four behavioral principles
  Changed behavior
  ├── Think Before Coding
  ├── Simplicity First
  ├── Surgical Changes
  └── Goal-Driven Execution

Quick Start

Method 1: Claude Code plugin (recommended, global)

/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

Applies to all projects after install — no per-project setup needed.

Method 2: Per-project CLAUDE.md

cd your-project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

If your project already has a CLAUDE.md, merge the content manually.

Method 3: Cursor rules

The repo ships .cursor/rules/karpathy-guidelines.mdc with alwaysApply: true. Copy it into your project:

mkdir -p .cursor/rules
cp path/to/karpathy-guidelines.mdc .cursor/rules/

Verify it's active: Cursor → Settings → Rules — the guideline should appear in the list.

Signs the guidelines are working:

Claude asks clarifying questions before implementing, rather than diving in
PR diffs are cleaner — only the requested changes appear
No more "while I was there, I also improved..." in commit messages

The Four Principles

Principle 1: Think Before Coding

The problem it targets: When faced with ambiguous requirements, LLMs tend to make implicit assumptions and execute silently, rather than surfacing the uncertainty.

Wrong behavior example:

User: Add a feature to export user data
LLM: [immediately starts writing code]
     → Assumes CSV format, assumes all fields, assumes synchronous download
     → But the user actually needed a background job + specific fields + JSON format

Correct behavior (the principle requires surfacing uncertainty first):

User: Add a feature to export user data
LLM: Before I start, a few things to clarify:
     1. Scope: Export all users or a filtered subset?
     2. Format: CSV / JSON / Excel?
     3. Delivery: Browser download, background job via email, or API endpoint?
     4. Fields: Which fields to include? Any sensitive data restrictions?

Another canonical example — "Make search faster" has three completely different valid meanings:

Interpretation	Actual Need	Solution
Response time	API returns slowly	Add indexes, caching, optimize queries
Throughput	High concurrency	Horizontal scaling, queuing
Perceived UX speed	User feels it is slow	Preloading, skeleton screens, instant feedback

All three require fundamentally different approaches. None can be "defaulted to."

What the principle requires: When multiple reasonable interpretations exist, present all of them and let the user choose. When genuinely confused, stop and say "I'm not sure how to handle this" rather than pushing through.

Principle 2: Simplicity First

The problem it targets: LLMs have a strong over-engineering tendency — introducing abstractions, frameworks, and "flexibility" before complexity is actually needed.

Wrong behavior example — discount calculation:

# User request: implement a simple discount calculation

# ❌ LLM's "solution" (10x the code needed):
class DiscountStrategy(ABC):
    @abstractmethod
    def calculate(self, price: float) -> float: ...

class PercentageDiscount(DiscountStrategy):
    def __init__(self, config: DiscountConfig): ...
    def calculate(self, price: float) -> float: ...

class DiscountCalculator:
    def __init__(self, strategy: DiscountStrategy): ...
    def apply(self, cart: Cart) -> float: ...

# ...plus factory class, config class, registry...

# ✅ What was actually needed (one function):
def apply_discount(price: float, discount_pct: float) -> float:
    return price * (1 - discount_pct / 100)

Another example — "Save user preferences":

❌ LLM implemented:
   - A caching layer with expiry (nobody asked)
   - Input validation (no bad data has appeared yet)
   - Conflict merging logic (nobody hit this problem)
   - A change notification system (nobody mentioned it)

✅ What was actually needed:
   - One function that writes preferences to the database

The benchmark the principle provides:

"Would a senior engineer look at this and say it's overcomplicated?"

If 200 lines could be 50, rewrite it as 50.

Core maxim:

"Good code is code that solves today's problem simply, not tomorrow's problem prematurely."

Premature complexity is not just wasteful — it makes code harder to understand, introduces more bugs, and slows development, even when it follows recognized design patterns.

Principle 3: Surgical Changes

The problem it targets: LLMs do "drive-by refactoring" — while fixing a bug, they also update quote styles, add type annotations, rename variables, and reorganize imports.

This behavior feels helpful but has two serious problems:

Makes diffs hard to review: Reviewers cannot distinguish which changes are bug fixes from which are "while I was there" improvements
Introduces unexpected regressions: Every unrequested change is a potential risk point

The right approach:

# Original code (has a bug, and some "imperfections")
def calculate_total(items):
    total = 0
    for item in items:
        total += item['price']  # single quotes
    return total                 # no type annotation

# ❌ LLM's "comprehensive improvement":
def calculate_total(items: list[dict]) -> float:  # added type annotation
    """Calculate total price of items."""          # added docstring
    total: float = 0.0                             # changed variable type
    for item in items:
        total += item["price"]  # changed to double quotes ("style consistency")
    return total

# ✅ Only fix the bug (suppose the bug is empty items):
def calculate_total(items):
    if not items:               # only this one line added
        return 0
    total = 0
    for item in items:
        total += item['price']  # original style preserved
    return total

Specific requirements of the principle:

Every changed line must trace directly to the user's request
Match existing code style even if you prefer a different one
Do not improve code you happen to pass through unless explicitly asked
Only clean up unused imports/variables that your changes created — leave pre-existing dead code alone

Principle 4: Goal-Driven Execution

The problem it targets: Given a vague task, LLMs produce plans that look comprehensive but lack verifiable outcomes.

A vague plan example:

Task: "Refactor the auth module"

❌ Vague plan:
   1. Review existing code
   2. Identify problems
   3. Improve structure
   4. Run tests
   → Not a single step has a clear definition of "done"

The principle requires converting tasks to verifiable goals:

Task: "Fix the login bug"

✅ Goal-driven plan:
   Step 1: Write a failing test that reproduces the bug
           Checkpoint: Test actually fails on current code
   Step 2: Implement the fix
           Checkpoint: The test now passes
   Step 3: Run the full test suite
           Checkpoint: No new regressions
   → Every step has a clear, objective definition of "complete"

Another example:

Task: "Refactor the auth module"

✅ Concretized:
   1. All existing tests pass (record baseline)
   2. Extract TokenService (checkpoint: standalone unit tests pass)
   3. Refactor AuthController to use TokenService (checkpoint: integration tests pass)
   4. All original tests still pass (no regressions)

Karpathy's core insight:

LLMs excel at "looping until they meet specific goals" — so providing success criteria is more effective than providing imperative instructions.

Imperative instructions ("do A, then B, then C") leave the LLM without guidance when something goes wrong. Declarative goals ("this test must pass," "this interface must be callable") let the LLM choose its own path while giving it a clear completion criterion.

Why a File, Not a Tool?

The design choice of this project is worth reflecting on. Faced with LLM coding behavior problems, many solutions are possible:

Build an agent framework to constrain behavior
Develop post-processing tools to detect and correct issues
Fine-tune a better model

andrej-karpathy-skills chose the simplest one: a text file, placed in the project, which the LLM reads and follows itself.

This choice is itself the best demonstration of "Simplicity First" — minimum mechanism, today's problem solved. And a text file has one advantage no tool can match: it can be read, understood, and modified by anyone at any time, with no black box.

Project Links & Resources

Official Resources

🌟 GitHub: https://github.com/multica-ai/andrej-karpathy-skills
📄 Direct CLAUDE.md download: Available via curl (see Quick Start above)
📖 Examples: EXAMPLES.md in the repo (contrast examples for each principle — recommended reading)

Target Audience

Daily Claude Code / Cursor users: Who want to reduce LLM over-engineering and unnecessary code changes
Team engineering productivity leads: Looking to integrate behavioral standards into a shared CLAUDE.md and standardize AI-assisted coding across a team
Developers who care about reviewable PRs: Who are tired of LLM-generated "super diffs" and want clean, focused pull requests containing only the requested changes

Summary

Key Takeaways

Origin: Directly distilled from Karpathy's first-hand observations of LLM coding failure modes — grounded in real usage
Four principles:
- Think Before Coding: Make implicit assumptions explicit questions rather than silently picking one
- Simplicity First: Write the minimum code to solve today's problem, don't pre-build "flexibility"
- Surgical Changes: Every changed line traces to the request; no drive-by refactoring
- Goal-Driven Execution: Provide success criteria, not step-by-step instructions
Three install formats: CLAUDE.md (per-project) / Claude Code plugin (global) / Cursor rules
Core philosophy: LLMs excel at looping toward goals — give them goals, not procedures
Self-demonstrating: The simplest possible solution (one file) to the problem it addresses — Simplicity First, embodied

One-Line Review

andrej-karpathy-skills does something deceptively small but far-reaching: compresses the engineering wisdom of "how to use LLMs well for coding" into a single text file anyone can read, understand, and drop into any project — and that file itself is the best proof of the simple-first philosophy it advocates.

Find more useful knowledge and interesting products on my Homepage

DEV Community