DEV Community

Skills, Not Vibes: Teaching AI Agents to Write Clean Code

Cover

In February 2025, Andrej Karpathy coined "vibe coding" to describe programming's new reality: give in to the vibes, accept all changes, "forget that the code even exists." He called it "not too bad for throwaway weekend projects." But for production systems? That's where the trouble starts.

I've watched AI-generated codebases accumulate the same mess developers spent decades learning to avoid—duplication everywhere, inconsistent naming, missing edge cases. Then it hit me: these are exactly the problems Robert C. Martin warned about in Clean Code almost two decades ago.

So I went back to the book, specifically Chapter 17's catalog of 66 code smells and heuristics. These aren't just relevant to AI coding—they're more relevant. AI makes exactly the mistakes Uncle Bob warned us about, just faster and at scale.

The solution? Skills—instruction files that AI agents read before writing code. I've translated Clean Code's complete catalog into Python skills you can use today. They work in Google's Antigravity IDE, Anthropic's Claude Code, and anywhere that supports the Agent Skills standard.

Let me show you why we need this, and how to implement it.


Even Linus Torvalds Vibe Codes (Sometimes)

In January 2026, Linus Torvalds revealed a side project called AudioNoise—a digital audio effects simulator he'd been tinkering with over the holidays. The Python visualizer, he noted, was "basically written by vibe-coding."

In his own words from the repo:

"I know more about analog filters—and that's not saying much—than I do about python. It started out as my typical 'google and do the monkey-see-monkey-do' kind of programming, but then I cut out the middle-man—me—and just used Google Antigravity to do the audio sample visualizer."

The Hacker News discussion revealed two camps. Some saw it as validation: "It's official, vibe coding is legit." Others noted the crucial context: Torvalds used AI for the part he lacks expertise in (Python visualization) while hand-coding the parts he knows (C and digital signal processing).

One commenter nailed it: "There's a big difference between vibe-coding an entire project and having an AI build a component that you lack competency for."

Another observation cut deeper: "If anyone on the planet knows how to do vibe coding right, it's him"—because Torvalds spent decades mastering code review. He can spot bad code instantly. Most of us can't.

But here's what's telling: Torvalds wrote tests for his hand-coded C—numerical accuracy checks for the DSP primitives he understands. The vibe-coded Python visualizer? No tests, no type hints, and a duplicated function definition that slipped right through. The same four-line method appears twice in a row—the first an empty stub, the second the real implementation. It's textbook "Accept All, don't read the diffs." The code runs fine (Python silently overwrites the first definition), but it's exactly the kind of dead code that accumulates into maintenance nightmares.

This works for Torvalds' toy project precisely. It's a throwaway learning exercise. The moment that visualizer needs to be production code, those missing guardrails become technical debt.

The same week, Torvalds rejected "AI slop" submissions to the Linux kernel, arguing that documentation telling people not to submit garbage won't help because "the people who would submit it won't read the documentation anyway."

The lesson isn't that vibe coding is bad. It's that context matters. Skills let you define when to enforce rigor and when to let the vibes flow.


The Data: AI Code Quality Is Getting Worse

Google's DORA Report found AI adoption shows a negative relationship with software delivery stability. The 2025 report's central finding: "AI doesn't fix a team; it amplifies what's already there." Without robust control systems—strong testing, mature practices, fast feedback loops—increased AI-generated code leads to instability. Skills are exactly those control systems, encoded as instructions.

Carnegie Mellon researchers analyzed 807 GitHub repositories after Cursor adoption: +30% static analysis warnings, +41% code complexity. The speed gains were transient; the quality problems compounded.

GitClear's analysis of 211 million lines of code from Google, Microsoft, Meta, and enterprise repositories found code duplication increased 4x with AI adoption. For the first time in their dataset, copy/pasted code exceeded refactored code.

Even Anthropic's Agentic Coding Trends Report shows the gap: developers use AI in roughly 60% of their work, but can fully delegate only 0-20% of tasks. The rest requires "thoughtful setup, active supervision, and human judgment."

That gap—between what AI touches and what AI can own—is exactly what skills address. The setup is the skill. The supervision is the rules.

The Pattern: AI Recreates Classic Code Smells

The research consistently identifies the same failure patterns. Here's how they map to specific Clean Code violations:

Naming and Consistency Problems

  • Inconsistent variable names across similar functions
  • Vague names like data, tmp, proc
  • Mixing naming conventions (camelCase and snake_case)
  • Clean Code rules: N1 (descriptive names), G11 (consistency), G24 (conventions)

Code Duplication

  • Copy/paste instead of extracting shared logic
  • Same calculation appearing in multiple places
  • Pattern repetition that should be abstracted
  • Clean Code rule: G5 (DRY - Don't Repeat Yourself)

Missing Safety Checks

  • No validation of input boundaries
  • Assumptions about data structure without verification
  • Missing null/None checks
  • Clean Code rules: G3 (boundary conditions), G4 (don't override safeties), G26 (be precise)

Readability Issues

  • Magic numbers without explanation (what does 86400 mean?)
  • Unused variables cluttering code
  • Functions mixing multiple abstraction levels
  • Clean Code rules: G12 (remove clutter), G16 (no obscured intent), G34 (single abstraction level)

Performance Problems

  • Functions doing multiple things at once
  • Exposing internal data unnecessarily
  • Nested loops that could be optimized
  • Clean Code rules: G8 (minimize public interface), G30 (functions do one thing)

These aren't arbitrary style preferences—they're the exact problems that make code hard to maintain, debug, and extend. The skills we'll build enforce these rules automatically.

The fix isn't to stop using AI. It's to give AI the explicit rules it needs to follow.

That's what skills do.


What Are Skills?

Skills are markdown files containing domain-specific instructions that AI agents read before working on your code. They follow the Agent Skills open standard and work in Google Antigravity, Anthropic's Claude Code, and other compatible agents.

The architecture is called Progressive Disclosure. Instead of dumping every instruction into the agent's context at once (causing what Antigravity's docs call "Context Saturation"), skills work in layers:

  1. Discovery: The agent sees only a lightweight menu of skill names and descriptions
  2. Activation: When your request matches a skill's description, the full instructions load
  3. Execution: Scripts and templates are read only when the task requires them

This keeps the agent fast and focused. It's not thinking about database migrations when you're writing a React component.

The format is simple:

---
name: skill-name
description: When this skill should activate
---

# Skill Title

Your instructions, examples, and rules here.
Enter fullscreen mode Exit fullscreen mode

The description field is crucial—it's the trigger phrase. The agent semantically matches your request against all available skill descriptions to decide which ones to load. "Enforces function best practices" is vague. "Use when writing or refactoring Python functions" tells the agent exactly when to activate.

Skills can do far more than enforce coding standards—the community has built skills for Stripe integration, Metasploit security testing, voice agents, and even multi-agent startup automation. This article focuses on one specific use case: encoding Clean Code principles.

Let me show you how to translate Clean Code's catalog into working skills.


Building the Skills: Three Examples

Rather than catalog all 66 rules exhaustively, I'll show you three critical categories in detail. The complete implementation is at the end.

1. Comments (C1-C5): Code Should Explain Itself

Uncle Bob is famously skeptical of comments—not because documentation is bad, but because comments rot faster than code updates.

File Reference: clean-comments/SKILL.md

---
name: clean-comments
description: Use when writing, fixing, editing, or reviewing Python comments and docstrings. Enforces Clean Code principles—no metadata, no redundancy, no commented-out code.
---

# Clean Comments

## C1: No Inappropriate Information

Comments shouldn't hold metadata. Use Git for author names, change history, 
ticket numbers, and dates. Comments are for technical notes about code only.

## C2: Delete Obsolete Comments

If a comment describes code that no longer exists or works differently, 
delete it immediately. Stale comments become "floating islands of 
irrelevance and misdirection."

## C3: No Redundant Comments

# Bad - the code already says this
i += 1  # increment i
user.save()  # save the user

# Good - explains WHY, not WHAT
i += 1  # compensate for zero-indexing in display

## C4: Write Comments Well

If a comment is worth writing, write it well:
- Choose words carefully
- Use correct grammar
- Don't ramble or state the obvious
- Be brief

## C5: Never Commit Commented-Out Code

# DELETE THIS - it's an abomination
# def old_calculate_tax(income):
#     return income * 0.15

Who knows how old it is? Who knows if it's meaningful? Delete it. 
Git remembers everything.

## The Goal

The best comment is the code itself. If you need a comment to explain 
what code does, refactor first, comment last.
Enter fullscreen mode Exit fullscreen mode

2. Functions (F1-F4): Small, Focused, Obvious

Functions should do one thing, do it well, and have an obvious purpose.

File Reference: clean-functions/SKILL.md

---
name: clean-functions
description: Use when writing or refactoring Python functions. Enforces Clean Code principles—maximum 3 arguments, single responsibility, no flag parameters.
---

# Clean Functions

## F1: Too Many Arguments (Maximum 3)

# Bad - too many parameters
def create_user(name, email, age, country, timezone, language, newsletter):
    ...

# Good - use a dataclass or dict
@dataclass
class UserData:
    name: str
    email: str
    age: int
    country: str
    timezone: str
    language: str
    newsletter: bool

def create_user(data: UserData):
    ...

More than 3 arguments means your function is doing too much or needs 
a data structure.

## F2: No Output Arguments

Don't modify arguments as side effects. Return values instead.

# Bad - modifies argument
def append_footer(report: Report) -> None:
    report.append("\n---\nGenerated by System")

# Good - returns new value
def with_footer(report: Report) -> Report:
    return report + "\n---\nGenerated by System"

## F3: No Flag Arguments

Boolean flags mean your function does at least two things.

# Bad - function does two different things
def render(is_test: bool):
    if is_test:
        render_test_page()
    else:
        render_production_page()

# Good - split into two functions
def render_test_page(): ...
def render_production_page(): ...

## F4: Delete Dead Functions

If it's not called, delete it. No "just in case" code. Git preserves history.
Enter fullscreen mode Exit fullscreen mode

3. General Principles (G1-G36): The Core Rules

These are the fundamental patterns that separate clean code from legacy nightmares.

File Reference: clean-general/SKILL.md

---
name: clean-general
description: Use when reviewing Python code quality. Enforces Clean Code's core principles—DRY, single responsibility, clear intent, no magic numbers, proper abstractions.
---

# General Clean Code Principles

## Critical Rules

**G5: DRY (Don't Repeat Yourself)**

Every piece of knowledge has one authoritative representation.

# Bad - duplication
tax_rate = 0.0825
ca_total = subtotal * 1.0825
ny_total = subtotal * 1.07

# Good - single source of truth
TAX_RATES = {"CA": 0.0825, "NY": 0.07}
def calculate_total(subtotal: float, state: str) -> float:
    return subtotal * (1 + TAX_RATES[state])

**G16: No Obscured Intent**

Don't be clever. Be clear.

# Bad - what does this do?
return (x & 0x0F) << 4 | (y & 0x0F)

# Good - obvious intent
return pack_coordinates(x, y)

**G23: Prefer Polymorphism to If/Else**

# Bad - will grow forever
def calculate_pay(employee):
    if employee.type == "SALARIED":
        return employee.salary
    elif employee.type == "HOURLY":
        return employee.hours * employee.rate
    elif employee.type == "COMMISSIONED":
        return employee.base + employee.commission

# Good - open/closed principle
class SalariedEmployee:
    def calculate_pay(self): return self.salary

class HourlyEmployee:
    def calculate_pay(self): return self.hours * self.rate

class CommissionedEmployee:
    def calculate_pay(self): return self.base + self.commission

**G25: Replace Magic Numbers with Named Constants**

# Bad
if elapsed_time > 86400:
    ...

# Good
SECONDS_PER_DAY = 86400
if elapsed_time > SECONDS_PER_DAY:
    ...

**G30: Functions Should Do One Thing**

If you can extract another function, your function does more than one thing.

**G36: Law of Demeter (Avoid Train Wrecks)**

# Bad - reaching through multiple objects
output_dir = context.options.scratch_dir.absolute_path

# Good - one dot
output_dir = context.get_scratch_dir()

## Enforcement Checklist

When reviewing AI-generated code, verify:
- [ ] No duplication (G5)
- [ ] Clear intent, no magic numbers (G16, G25)
- [ ] Polymorphism over conditionals (G23)
- [ ] Functions do one thing (G30)
- [ ] No Law of Demeter violations (G36)
- [ ] Boundary conditions handled (G3)
- [ ] Dead code removed (G9)
Enter fullscreen mode Exit fullscreen mode

The Complete Catalog

I've translated all 66 rules from Clean Code Chapter 17 into skills covering six categories:

Click to expand all skill categories

Comments (C1-C5): Minimal, accurate commenting

  • C1: No inappropriate information (metadata belongs in version control)
  • C2: Delete obsolete comments immediately
  • C3: No redundant comments that repeat the code
  • C4: Write comments well—brief, grammatical, purposeful
  • C5: Never commit commented-out code

Environment (E1-E2): One-command build and test

  • E1: Build requires only one step
  • E2: Tests require only one step

Functions (F1-F4): Small, focused, obvious

  • F1: Maximum 3 arguments (use data structures for more)
  • F2: No output arguments (return values instead)
  • F3: No flag arguments (split into separate functions)
  • F4: Delete dead functions

General (G1-G36): Core principles

  • G1: Multiple languages in one source file
  • G2: Obvious behavior is unimplemented
  • G3: Incorrect behavior at the boundaries
  • G4: Overridden safeties
  • G5: Duplication
  • G6: Code at wrong level of abstraction
  • G7: Base classes depending on their derivatives
  • G8: Too much information
  • G9: Dead code
  • G10: Vertical separation
  • G11: Inconsistency
  • G12: Clutter
  • G13: Artificial coupling
  • G14: Feature envy
  • G15: Selector arguments
  • G16: Obscured intent
  • G17: Misplaced responsibility
  • G18: Inappropriate static
  • G19: Use explanatory variables
  • G20: Function names should say what they do
  • G21: Understand the algorithm
  • G22: Make logical dependencies physical
  • G23: Prefer polymorphism to if/else or switch/case
  • G24: Follow standard conventions
  • G25: Replace magic numbers with named constants
  • G26: Be precise
  • G27: Structure over convention
  • G28: Encapsulate conditionals
  • G29: Avoid negative conditionals
  • G30: Functions should do one thing
  • G31: Hidden temporal couplings
  • G32: Don't be arbitrary
  • G33: Encapsulate boundary conditions
  • G34: Functions should descend only one level of abstraction
  • G35: Keep configurable data at high levels
  • G36: Avoid transitive navigation

Names (N1-N7): Descriptive, unambiguous, right-sized

  • N1: Choose descriptive names
  • N2: Choose names at the right abstraction level
  • N3: Use standard nomenclature where possible
  • N4: Use unambiguous names
  • N5: Use long names for long scopes
  • N6: Avoid encodings (Hungarian notation, etc.)
  • N7: Names should describe side effects

Tests (T1-T9): Fast, independent, exhaustive

  • T1: Insufficient tests—test everything that could break
  • T2: Use a coverage tool
  • T3: Don't skip trivial tests
  • T4: Ignored tests indicate ambiguity
  • T5: Test boundary conditions
  • T6: Exhaustively test near bugs
  • T7: Patterns of failure are diagnostic
  • T8: Coverage patterns can be revealing
  • T9: Tests should be fast

Get the complete skill files:

Clean Code Skills for AI Agents

Agent Skills License: MIT

Teach your AI to write code that doesn't suck.

This repository contains Agent Skills that enforce Robert C. Martin's Clean Code principles. They work with Google Antigravity, Anthropic's Claude Code, and any agent that supports the Agent Skills standard.

Why?

AI generates code fast, but research shows it also generates technical debt fast:

  • GitClear: 4x increase in code duplication with AI adoption
  • Carnegie Mellon: +30% static analysis warnings, +41% code complexity after Cursor adoption
  • Google DORA: Negative relationship between AI adoption and software delivery stability

These skills encode battle-tested solutions to exactly these problems—directly into your AI workflow.

What's Included

Skill Description Rules
boy-scout Orchestrator—always leave code cleaner than you found it Coordinates all skills
python-clean-code Master skill with all 66 rules C1-C5, E1-E2, F1-F4, G1-G36, N1-N7, P1-P3, T1-T9
clean-comments Minimal, accurate commenting C1-C5
clean-functions Small, focused, obvious functions F1-F4

The repo includes:

  • boy-scout: An orchestrator skill that embodies the Boy Scout Rule—"always leave code cleaner than you found it"—and coordinates the other skills
  • python-clean-code: A master skill with all 66 rules, plus a quick reference table and anti-patterns cheatsheet
  • Individual skills for each category (clean-comments, clean-functions, clean-general, clean-names, clean-tests)—drop in only what you need
  • Installation instructions for Antigravity, Claude Code, and other Agent Skills-compatible tools

How to Use These Skills

Skills sit in a specific place in the agent ecosystem. Rules are passive guardrails that are always on. Skills are agent-triggered—the model decides when to equip them based on your intent. If you're using MCP servers (connections to external tools like GitHub or Postgres), think of MCP as the "hands" and skills as the "brains" that direct them.

For Antigravity

  1. Create .agent/skills/ in your project root (or ~/.gemini/antigravity/skills/ for global access)
  2. Save the skill as a folder with a SKILL.md file inside (e.g., .agent/skills/python-clean-code/SKILL.md)
  3. Ask the agent to review or write code—it'll automatically apply the rules when relevant

Global vs Project Skills

  • Project-specific: .agent/skills/
  • Global Antigravity: ~/.gemini/antigravity/skills/

The agent only loads full skill content when needed, so comprehensive skills don't slow down simple requests.

Going Further

The skills in this article are instruction-only—they tell the agent what to do. For stricter enforcement, you could add a scripts/ folder with a linter that compatible agents runs them automatically, or an examples/ folder with before/after code samples for few-shot learning. The format supports it; we're just keeping things simple here.


A Real-World Example

Here's code that violates multiple Clean Code rules:

from utils import *  # P1
# Author: John, Modified: 2024-01-15  # C1
def proc(d, t, flag=False):  # N1, F1, F3
    # Process the data  # C3
    x = []  # N1
    for i in d:
        if flag:  # F3
            if i['type'] == 'A':  # G23
                x.append(i['val'] * 1.0825)  # G25
            elif i['type'] == 'B':
                x.append(i['val'] * 1.05)  # G25
        else:
            x.append(i['val'])
    with open(f'/tmp/{t}.json', 'w') as f:  # G6
        json.dump(x, f)
    # Old approach  # C5
    # for item in d:
    #     print(item)
    return x
Enter fullscreen mode Exit fullscreen mode

Violations: P1, C1, C3, C5, F1, F3, G6, G23, G25, N1

With the Clean Code skill active, ask your AI agent to refactor this:

import json
from pathlib import Path
from typing import List, Literal
from dataclasses import dataclass

TAX_RATE_CA = 0.0825
TAX_RATE_NY = 0.05
TransactionType = Literal['CA', 'NY']

@dataclass
class Transaction:
    value: float
    type: TransactionType

def apply_tax(transaction: Transaction) -> float:
    """Apply state-specific tax to transaction value."""
    tax_rates = {'CA': TAX_RATE_CA, 'NY': TAX_RATE_NY}
    return transaction.value * (1 + tax_rates[transaction.type])

def process_transactions_with_tax(
    transactions: List[Transaction]
) -> List[float]:
    """Calculate taxed values for all transactions."""
    return [apply_tax(t) for t in transactions]

def process_transactions_without_tax(
    transactions: List[Transaction]
) -> List[float]:
    """Extract raw values from all transactions."""
    return [t.value for t in transactions]

def save_results(values: List[float], output_path: Path) -> None:
    """Save processed values to JSON file."""
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with output_path.open('w') as f:
        json.dump(values, f)
Enter fullscreen mode Exit fullscreen mode

The refactored version:

  • ✅ No wildcard imports (P1)
  • ✅ No metadata comments (C1)
  • ✅ No redundant comments (C3)
  • ✅ No commented-out code (C5)
  • ✅ Descriptive names (N1)
  • ✅ No flag arguments (F3)
  • ✅ Named constants instead of magic numbers (G25)
  • ✅ Functions do one thing (G30)
  • ✅ Polymorphism through data structure (G23)

Anatomy of a Vibe-Coded Script

Remember the duplicated function I mentioned in Torvalds' AudioNoise visualizer? Here it is:

def update_slider_text(self, val):
    """Helper to update slider texts (Width and End Point)."""
    start_val, end_val = val
    width = end_val - start_val

def update_slider_text(self, val):
    """Helper to update slider texts (Width and End Point)."""
    start_val, end_val = val
    width = end_val - start_val

    if self.x_mode == 'Time':
        self.slider.valtext.set_text(f"Window: {start_val:.3f} + {width:.3f} s")
    else:
        self.slider.valtext.set_text(f"Window: {int(start_val)} + {int(width)}")
Enter fullscreen mode Exit fullscreen mode

The first definition unpacks values, calculates width, then... returns None. The second definition is the real implementation. Python silently overwrites the first with the second, so the code runs. But it's textbook dead code—Clean Code rule G9: Remove dead code.

With the skill active, an agent refactors the entire 600-line script. The duplicate vanishes, magic numbers become constants, and nested functions get extracted into focused methods:

def update_slider_text(self, val: tuple[float, float]):
    """Update slider text with either time or sample count."""
    start_val, end_val = val
    width = end_val - start_val

    if self.x_mode == 'Time':
        self.slider.valtext.set_text(f"Window: {start_val:.3f} + {width:.3f} s")
    else:
        self.slider.valtext.set_text(f"Window: {int(start_val)} + {int(width)}")
Enter fullscreen mode Exit fullscreen mode

Antigravity Review

The refactored version:

  • ✅ Dead code removed (G9)
  • ✅ Type hints added (clarity)
  • ✅ Single, authoritative definition (G5)
  • ✅ Magic numbers extracted to constants (G25)
  • ✅ Large methods decomposed (G30)

The full diff shows 600+ lines reduced to ~440—not by removing functionality, but by eliminating duplication and extracting reusable patterns.


Why This Matters Now

Vibe coding isn't going away. AI will get better at generating code, not worse. But "better at generating" doesn't mean "better at maintaining."

The research is clear: AI produces code faster, but that code accumulates technical debt faster too. Without guard rails, we're building tomorrow's legacy systems today.

Uncle Bob's Clean Code principles are almost 20 years old, but they're exactly what we need now. They're not arbitrary style preferences—they're battle-tested solutions to the problems AI recreates at scale.

Skills give you the mechanism to encode these rules directly into your AI workflow. Whether you're using Antigravity, Claude Code, or another agent, the approach is the same: define what clean code means, then let the AI follow the rules.

Your agent doesn't know what good code looks like unless you tell it.

So tell it.


Resources

The Book

  • Clean Code by Robert C. Martin: Amazon

Skills Documentation

Research Cited

Get the Skills

The future of programming is human intent translated by AI. Make sure the translation preserves quality, not just speed.

Top comments (0)