Ertuğrul Demir for Google Developer Experts

Posted on Jan 26

Skills, Not Vibes: Teaching AI Agents to Write Clean Code

#ai #antigravity #python #programming

In February 2025, Andrej Karpathy coined "vibe coding" to describe programming's new reality: give in to the vibes, accept all changes, "forget that the code even exists." He called it "not too bad for throwaway weekend projects." But for production systems? That's where the trouble starts.

I've watched AI-generated codebases accumulate the same mess developers spent decades learning to avoid—duplication everywhere, inconsistent naming, missing edge cases. Then it hit me: these are exactly the problems Robert C. Martin warned about in Clean Code almost two decades ago.

So I went back to the book, specifically Chapter 17's catalog of 66 code smells and heuristics. These aren't just relevant to AI coding—they're more relevant. AI makes exactly the mistakes Uncle Bob warned us about, just faster and at scale.

The solution? Skills—instruction files that AI agents read before writing code. I've translated Clean Code's complete catalog into Python skills you can use today. They work in Google's Antigravity IDE, Anthropic's Claude Code, and anywhere that supports the Agent Skills standard.

Let me show you why we need this, and how to implement it.

Even Linus Torvalds Vibe Codes (Sometimes)

In January 2026, Linus Torvalds revealed a side project called AudioNoise—a digital audio effects simulator he'd been tinkering with over the holidays. The Python visualizer, he noted, was "basically written by vibe-coding."

In his own words from the repo:

"I know more about analog filters—and that's not saying much—than I do about python. It started out as my typical 'google and do the monkey-see-monkey-do' kind of programming, but then I cut out the middle-man—me—and just used Google Antigravity to do the audio sample visualizer."

The Hacker News discussion revealed two camps. Some saw it as validation: "It's official, vibe coding is legit." Others noted the crucial context: Torvalds used AI for the part he lacks expertise in (Python visualization) while hand-coding the parts he knows (C and digital signal processing).

One commenter nailed it: "There's a big difference between vibe-coding an entire project and having an AI build a component that you lack competency for."

Another observation cut deeper: "If anyone on the planet knows how to do vibe coding right, it's him"—because Torvalds spent decades mastering code review. He can spot bad code instantly. Most of us can't.

But here's what's telling: Torvalds wrote tests for his hand-coded C—numerical accuracy checks for the DSP primitives he understands. The vibe-coded Python visualizer? No tests, no type hints, and a duplicated function definition that slipped right through. The same four-line method appears twice in a row—the first an empty stub, the second the real implementation. It's textbook "Accept All, don't read the diffs." The code runs fine (Python silently overwrites the first definition), but it's exactly the kind of dead code that accumulates into maintenance nightmares.

This works for Torvalds' toy project precisely. It's a throwaway learning exercise. The moment that visualizer needs to be production code, those missing guardrails become technical debt.

The same week, Torvalds rejected "AI slop" submissions to the Linux kernel, arguing that documentation telling people not to submit garbage won't help because "the people who would submit it won't read the documentation anyway."

The lesson isn't that vibe coding is bad. It's that context matters. Skills let you define when to enforce rigor and when to let the vibes flow.

The Data: AI Code Quality Is Getting Worse

Google's DORA Report found AI adoption shows a negative relationship with software delivery stability. The 2025 report's central finding: "AI doesn't fix a team; it amplifies what's already there." Without robust control systems—strong testing, mature practices, fast feedback loops—increased AI-generated code leads to instability. Skills are exactly those control systems, encoded as instructions.

Carnegie Mellon researchers analyzed 807 GitHub repositories after Cursor adoption: +30% static analysis warnings, +41% code complexity. The speed gains were transient; the quality problems compounded.

GitClear's analysis of 211 million lines of code from Google, Microsoft, Meta, and enterprise repositories found code duplication increased 4x with AI adoption. For the first time in their dataset, copy/pasted code exceeded refactored code.

Even Anthropic's Agentic Coding Trends Report shows the gap: developers use AI in roughly 60% of their work, but can fully delegate only 0-20% of tasks. The rest requires "thoughtful setup, active supervision, and human judgment."

That gap—between what AI touches and what AI can own—is exactly what skills address. The setup is the skill. The supervision is the rules.

The Pattern: AI Recreates Classic Code Smells

The research consistently identifies the same failure patterns. Here's how they map to specific Clean Code violations:

Naming and Consistency Problems

Inconsistent variable names across similar functions
Vague names like data, tmp, proc
Mixing naming conventions (camelCase and snake_case)
Clean Code rules: N1 (descriptive names), G11 (consistency), G24 (conventions)

Code Duplication

Copy/paste instead of extracting shared logic
Same calculation appearing in multiple places
Pattern repetition that should be abstracted
Clean Code rule: G5 (DRY - Don't Repeat Yourself)

Missing Safety Checks

No validation of input boundaries
Assumptions about data structure without verification
Missing null/None checks
Clean Code rules: G3 (boundary conditions), G4 (don't override safeties), G26 (be precise)

Readability Issues

Magic numbers without explanation (what does 86400 mean?)
Unused variables cluttering code
Functions mixing multiple abstraction levels
Clean Code rules: G12 (remove clutter), G16 (no obscured intent), G34 (single abstraction level)

Performance Problems

Functions doing multiple things at once
Exposing internal data unnecessarily
Nested loops that could be optimized
Clean Code rules: G8 (minimize public interface), G30 (functions do one thing)

These aren't arbitrary style preferences—they're the exact problems that make code hard to maintain, debug, and extend. The skills we'll build enforce these rules automatically.

The fix isn't to stop using AI. It's to give AI the explicit rules it needs to follow.

That's what skills do.

What Are Skills?

Skills are markdown files containing domain-specific instructions that AI agents read before working on your code. They follow the Agent Skills open standard and work in Google Antigravity, Anthropic's Claude Code, and other compatible agents.

The architecture is called Progressive Disclosure. Instead of dumping every instruction into the agent's context at once (causing what Antigravity's docs call "Context Saturation"), skills work in layers:

Discovery: The agent sees only a lightweight menu of skill names and descriptions
Activation: When your request matches a skill's description, the full instructions load
Execution: Scripts and templates are read only when the task requires them

This keeps the agent fast and focused. It's not thinking about database migrations when you're writing a React component.

The format is simple:

---
name: skill-name
description: When this skill should activate
---

# Skill Title

Your instructions, examples, and rules here.

The description field is crucial—it's the trigger phrase. The agent semantically matches your request against all available skill descriptions to decide which ones to load. "Enforces function best practices" is vague. "Use when writing or refactoring Python functions" tells the agent exactly when to activate.

Skills can do far more than enforce coding standards—the community has built skills for Stripe integration, Metasploit security testing, voice agents, and even multi-agent startup automation. This article focuses on one specific use case: encoding Clean Code principles.

Let me show you how to translate Clean Code's catalog into working skills.

Building the Skills: Three Examples

Rather than catalog all 66 rules exhaustively, I'll show you three critical categories in detail. The complete implementation is at the end.

1. Comments (C1-C5): Code Should Explain Itself

Uncle Bob is famously skeptical of comments—not because documentation is bad, but because comments rot faster than code updates.

File Reference: clean-comments/SKILL.md

---
name: clean-comments
description: Use when writing, fixing, editing, or reviewing Python comments and docstrings. Enforces Clean Code principles—no metadata, no redundancy, no commented-out code.
---

# Clean Comments

## C1: No Inappropriate Information

Comments shouldn't hold metadata. Use Git for author names, change history, 
ticket numbers, and dates. Comments are for technical notes about code only.

## C2: Delete Obsolete Comments

If a comment describes code that no longer exists or works differently, 
delete it immediately. Stale comments become "floating islands of 
irrelevance and misdirection."

## C3: No Redundant Comments

# Bad - the code already says this
i += 1  # increment i
user.save()  # save the user

# Good - explains WHY, not WHAT
i += 1  # compensate for zero-indexing in display

## C4: Write Comments Well

If a comment is worth writing, write it well:
- Choose words carefully
- Use correct grammar
- Don't ramble or state the obvious
- Be brief

## C5: Never Commit Commented-Out Code

# DELETE THIS - it's an abomination
# def old_calculate_tax(income):
#     return income * 0.15

Who knows how old it is? Who knows if it's meaningful? Delete it. 
Git remembers everything.

## The Goal

The best comment is the code itself. If you need a comment to explain 
what code does, refactor first, comment last.

2. Functions (F1-F4): Small, Focused, Obvious

Functions should do one thing, do it well, and have an obvious purpose.

File Reference: clean-functions/SKILL.md

---
name: clean-functions
description: Use when writing or refactoring Python functions. Enforces Clean Code principles—maximum 3 arguments, single responsibility, no flag parameters.
---

# Clean Functions

## F1: Too Many Arguments (Maximum 3)

# Bad - too many parameters
def create_user(name, email, age, country, timezone, language, newsletter):
    ...

# Good - use a dataclass or dict
@dataclass
class UserData:
    name: str
    email: str
    age: int
    country: str
    timezone: str
    language: str
    newsletter: bool

def create_user(data: UserData):
    ...

More than 3 arguments means your function is doing too much or needs 
a data structure.

## F2: No Output Arguments

Don't modify arguments as side effects. Return values instead.

# Bad - modifies argument
def append_footer(report: Report) -> None:
    report.append("\n---\nGenerated by System")

# Good - returns new value
def with_footer(report: Report) -> Report:
    return report + "\n---\nGenerated by System"

## F3: No Flag Arguments

Boolean flags mean your function does at least two things.

# Bad - function does two different things
def render(is_test: bool):
    if is_test:
        render_test_page()
    else:
        render_production_page()

# Good - split into two functions
def render_test_page(): ...
def render_production_page(): ...

## F4: Delete Dead Functions

If it's not called, delete it. No "just in case" code. Git preserves history.

3. General Principles (G1-G36): The Core Rules

These are the fundamental patterns that separate clean code from legacy nightmares.

File Reference: clean-general/SKILL.md

---
name: clean-general
description: Use when reviewing Python code quality. Enforces Clean Code's core principles—DRY, single responsibility, clear intent, no magic numbers, proper abstractions.
---

# General Clean Code Principles

## Critical Rules

**G5: DRY (Don't Repeat Yourself)**

Every piece of knowledge has one authoritative representation.

# Bad - duplication
tax_rate = 0.0825
ca_total = subtotal * 1.0825
ny_total = subtotal * 1.07

# Good - single source of truth
TAX_RATES = {"CA": 0.0825, "NY": 0.07}
def calculate_total(subtotal: float, state: str) -> float:
    return subtotal * (1 + TAX_RATES[state])

**G16: No Obscured Intent**

Don't be clever. Be clear.

# Bad - what does this do?
return (x & 0x0F) << 4 | (y & 0x0F)

# Good - obvious intent
return pack_coordinates(x, y)

**G23: Prefer Polymorphism to If/Else**

# Bad - will grow forever
def calculate_pay(employee):
    if employee.type == "SALARIED":
        return employee.salary
    elif employee.type == "HOURLY":
        return employee.hours * employee.rate
    elif employee.type == "COMMISSIONED":
        return employee.base + employee.commission

# Good - open/closed principle
class SalariedEmployee:
    def calculate_pay(self): return self.salary

class HourlyEmployee:
    def calculate_pay(self): return self.hours * self.rate

class CommissionedEmployee:
    def calculate_pay(self): return self.base + self.commission

**G25: Replace Magic Numbers with Named Constants**

# Bad
if elapsed_time > 86400:
    ...

# Good
SECONDS_PER_DAY = 86400
if elapsed_time > SECONDS_PER_DAY:
    ...

**G30: Functions Should Do One Thing**

If you can extract another function, your function does more than one thing.

**G36: Law of Demeter (Avoid Train Wrecks)**

# Bad - reaching through multiple objects
output_dir = context.options.scratch_dir.absolute_path

# Good - one dot
output_dir = context.get_scratch_dir()

## Enforcement Checklist

When reviewing AI-generated code, verify:
- [ ] No duplication (G5)
- [ ] Clear intent, no magic numbers (G16, G25)
- [ ] Polymorphism over conditionals (G23)
- [ ] Functions do one thing (G30)
- [ ] No Law of Demeter violations (G36)
- [ ] Boundary conditions handled (G3)
- [ ] Dead code removed (G9)

The Complete Catalog

I've translated all 66 rules from Clean Code Chapter 17 into skills covering six categories:

Click to expand all skill categories

Comments (C1-C5): Minimal, accurate commenting

C1: No inappropriate information (metadata belongs in version control)
C2: Delete obsolete comments immediately
C3: No redundant comments that repeat the code
C4: Write comments well—brief, grammatical, purposeful
C5: Never commit commented-out code

Environment (E1-E2): One-command build and test

E1: Build requires only one step
E2: Tests require only one step

Functions (F1-F4): Small, focused, obvious

F1: Maximum 3 arguments (use data structures for more)
F2: No output arguments (return values instead)
F3: No flag arguments (split into separate functions)
F4: Delete dead functions

General (G1-G36): Core principles

G1: Multiple languages in one source file
G2: Obvious behavior is unimplemented
G3: Incorrect behavior at the boundaries
G4: Overridden safeties
G5: Duplication
G6: Code at wrong level of abstraction
G7: Base classes depending on their derivatives
G8: Too much information
G9: Dead code
G10: Vertical separation
G11: Inconsistency
G12: Clutter
G13: Artificial coupling
G14: Feature envy
G15: Selector arguments
G16: Obscured intent
G17: Misplaced responsibility
G18: Inappropriate static
G19: Use explanatory variables
G20: Function names should say what they do
G21: Understand the algorithm
G22: Make logical dependencies physical
G23: Prefer polymorphism to if/else or switch/case
G24: Follow standard conventions
G25: Replace magic numbers with named constants
G26: Be precise
G27: Structure over convention
G28: Encapsulate conditionals
G29: Avoid negative conditionals
G30: Functions should do one thing
G31: Hidden temporal couplings
G32: Don't be arbitrary
G33: Encapsulate boundary conditions
G34: Functions should descend only one level of abstraction
G35: Keep configurable data at high levels
G36: Avoid transitive navigation

Names (N1-N7): Descriptive, unambiguous, right-sized

N1: Choose descriptive names
N2: Choose names at the right abstraction level
N3: Use standard nomenclature where possible
N4: Use unambiguous names
N5: Use long names for long scopes
N6: Avoid encodings (Hungarian notation, etc.)
N7: Names should describe side effects

Tests (T1-T9): Fast, independent, exhaustive

T1: Insufficient tests—test everything that could break
T2: Use a coverage tool
T3: Don't skip trivial tests
T4: Ignored tests indicate ambiguity
T5: Test boundary conditions
T6: Exhaustively test near bugs
T7: Patterns of failure are diagnostic
T8: Coverage patterns can be revealing
T9: Tests should be fast

Get the complete skill files:

ertugrul-dmr / clean-code-skills

Clean Code Skills for AI Agents

Teach your AI to write code that doesn't suck.

This repository contains Agent Skills that enforce Robert C. Martin's Clean Code principles. They work with Google Antigravity, Anthropic's Claude Code, and any agent that supports the Agent Skills standard.

Why?

AI generates code fast, but research shows it also generates technical debt fast:

GitClear: 4x increase in code duplication with AI adoption
Carnegie Mellon: +30% static analysis warnings, +41% code complexity after Cursor adoption
Google DORA: Negative relationship between AI adoption and software delivery stability

These skills encode battle-tested solutions to exactly these problems—directly into your AI workflow.

What's Included

Skill	Description	Rules
`boy-scout`	Orchestrator—always leave code cleaner than you found it	Coordinates all skills
`python-clean-code`	Master skill with all 66 rules	C1-C5, E1-E2, F1-F4, G1-G36, N1-N7, P1-P3, T1-T9
`clean-comments`	Minimal, accurate commenting	C1-C5
`clean-functions`	Small, focused, obvious functions	F1-F4

…

View on GitHub

The repo includes:

boy-scout: An orchestrator skill that embodies the Boy Scout Rule—"always leave code cleaner than you found it"—and coordinates the other skills
python-clean-code: A master skill with all 66 rules, plus a quick reference table and anti-patterns cheatsheet
Individual skills for each category (clean-comments, clean-functions, clean-general, clean-names, clean-tests)—drop in only what you need
Installation instructions for Antigravity, Claude Code, and other Agent Skills-compatible tools

How to Use These Skills

Skills sit in a specific place in the agent ecosystem. Rules are passive guardrails that are always on. Skills are agent-triggered—the model decides when to equip them based on your intent. If you're using MCP servers (connections to external tools like GitHub or Postgres), think of MCP as the "hands" and skills as the "brains" that direct them.

For Antigravity

Create .agent/skills/ in your project root (or ~/.gemini/antigravity/skills/ for global access)
Save the skill as a folder with a SKILL.md file inside (e.g., .agent/skills/python-clean-code/SKILL.md)
Ask the agent to review or write code—it'll automatically apply the rules when relevant

Global vs Project Skills

Project-specific: .agent/skills/
Global Antigravity: ~/.gemini/antigravity/skills/

The agent only loads full skill content when needed, so comprehensive skills don't slow down simple requests.

Going Further

The skills in this article are instruction-only—they tell the agent what to do. For stricter enforcement, you could add a scripts/ folder with a linter that compatible agents runs them automatically, or an examples/ folder with before/after code samples for few-shot learning. The format supports it; we're just keeping things simple here.

A Real-World Example

Here's code that violates multiple Clean Code rules:

from utils import *  # P1
# Author: John, Modified: 2024-01-15  # C1
def proc(d, t, flag=False):  # N1, F1, F3
    # Process the data  # C3
    x = []  # N1
    for i in d:
        if flag:  # F3
            if i['type'] == 'A':  # G23
                x.append(i['val'] * 1.0825)  # G25
            elif i['type'] == 'B':
                x.append(i['val'] * 1.05)  # G25
        else:
            x.append(i['val'])
    with open(f'/tmp/{t}.json', 'w') as f:  # G6
        json.dump(x, f)
    # Old approach  # C5
    # for item in d:
    #     print(item)
    return x

Violations: P1, C1, C3, C5, F1, F3, G6, G23, G25, N1

With the Clean Code skill active, ask your AI agent to refactor this:

import json
from pathlib import Path
from typing import List, Literal
from dataclasses import dataclass

TAX_RATE_CA = 0.0825
TAX_RATE_NY = 0.05
TransactionType = Literal['CA', 'NY']

@dataclass
class Transaction:
    value: float
    type: TransactionType

def apply_tax(transaction: Transaction) -> float:
    """Apply state-specific tax to transaction value."""
    tax_rates = {'CA': TAX_RATE_CA, 'NY': TAX_RATE_NY}
    return transaction.value * (1 + tax_rates[transaction.type])

def process_transactions_with_tax(
    transactions: List[Transaction]
) -> List[float]:
    """Calculate taxed values for all transactions."""
    return [apply_tax(t) for t in transactions]

def process_transactions_without_tax(
    transactions: List[Transaction]
) -> List[float]:
    """Extract raw values from all transactions."""
    return [t.value for t in transactions]

def save_results(values: List[float], output_path: Path) -> None:
    """Save processed values to JSON file."""
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with output_path.open('w') as f:
        json.dump(values, f)

The refactored version:

✅ No wildcard imports (P1)
✅ No metadata comments (C1)
✅ No redundant comments (C3)
✅ No commented-out code (C5)
✅ Descriptive names (N1)
✅ No flag arguments (F3)
✅ Named constants instead of magic numbers (G25)
✅ Functions do one thing (G30)
✅ Polymorphism through data structure (G23)

Anatomy of a Vibe-Coded Script

Remember the duplicated function I mentioned in Torvalds' AudioNoise visualizer? Here it is:

def update_slider_text(self, val):
    """Helper to update slider texts (Width and End Point)."""
    start_val, end_val = val
    width = end_val - start_val

def update_slider_text(self, val):
    """Helper to update slider texts (Width and End Point)."""
    start_val, end_val = val
    width = end_val - start_val

    if self.x_mode == 'Time':
        self.slider.valtext.set_text(f"Window: {start_val:.3f} + {width:.3f} s")
    else:
        self.slider.valtext.set_text(f"Window: {int(start_val)} + {int(width)}")

The first definition unpacks values, calculates width, then... returns None. The second definition is the real implementation. Python silently overwrites the first with the second, so the code runs. But it's textbook dead code—Clean Code rule G9: Remove dead code.

With the skill active, an agent refactors the entire 600-line script. The duplicate vanishes, magic numbers become constants, and nested functions get extracted into focused methods:

def update_slider_text(self, val: tuple[float, float]):
    """Update slider text with either time or sample count."""
    start_val, end_val = val
    width = end_val - start_val

    if self.x_mode == 'Time':
        self.slider.valtext.set_text(f"Window: {start_val:.3f} + {width:.3f} s")
    else:
        self.slider.valtext.set_text(f"Window: {int(start_val)} + {int(width)}")

The refactored version:

✅ Dead code removed (G9)
✅ Type hints added (clarity)
✅ Single, authoritative definition (G5)
✅ Magic numbers extracted to constants (G25)
✅ Large methods decomposed (G30)

The full diff shows 600+ lines reduced to ~440—not by removing functionality, but by eliminating duplication and extracting reusable patterns.

Why This Matters Now

Vibe coding isn't going away. AI will get better at generating code, not worse. But "better at generating" doesn't mean "better at maintaining."

The research is clear: AI produces code faster, but that code accumulates technical debt faster too. Without guard rails, we're building tomorrow's legacy systems today.

Uncle Bob's Clean Code principles are almost 20 years old, but they're exactly what we need now. They're not arbitrary style preferences—they're battle-tested solutions to the problems AI recreates at scale.

Skills give you the mechanism to encode these rules directly into your AI workflow. Whether you're using Antigravity, Claude Code, or another agent, the approach is the same: define what clean code means, then let the AI follow the rules.

Your agent doesn't know what good code looks like unless you tell it.

So tell it.

Resources

The Book

Clean Code by Robert C. Martin: Amazon

Skills Documentation

Agent Skills Standard — The open standard for AI agent instructions
Antigravity Skills Guide — Google's official documentation
Claude Code Agent Skills — Anthropic's implementation

Research Cited

DORA 2025: AI-Assisted Software Development — Google's findings on AI and delivery stability
Code Quality After Cursor Adoption — Carnegie Mellon's analysis of 807 repositories
GitClear 2025 Code Quality Report — 211M lines analyzed
Agentic Coding Trends — Anthropic's delegation gap analysis

Get the Skills

Clean Code Skills Repository — All 66 rules as ready-to-use skill files

The future of programming is human intent translated by AI. Make sure the translation preserves quality, not just speed.

DEV Community