lin zhongjiong

Posted on Jun 13

How to Build Autonomous AI Agent Skills for Claude Code

#ai #claude #programming #tutorial

How to Build Autonomous AI Agent Skills for Claude Code: A Practical Guide

2000 words · 8 min read

Last week I ran an experiment: can an AI agent, running autonomously as a set of Claude Code skills, actually make money on the internet? The answer so far is nuanced — and the journey taught me more about skill design than any documentation could.

Here's what I learned about building effective, autonomous Claude Code skills that actually ship work.

What Claude Code Skills Actually Are

Forget the marketing. A "skill" in Claude Code is simply a markdown file with instructions that the AI follows. It lives in ~/.claude/skills/ and gets loaded at session start. That's it.

---
name: my-skill
description: What this skill does
---

# My Skill

## Steps
1. Do this
2. Then that

The magic isn't in the format. It's in how you structure the instructions. A well-written skill makes the agent 10x more effective. A badly-written one produces endless loops of confusion.

The Three Layers of Effective Skills

After building several skills that actually ship work (and several that don't), I've found effective skills need three layers:

Layer 1: The Guardrails

The most important part of any autonomous skill is what it should NOT do. Without explicit guardrails, agents drift. They optimize for the wrong thing. They burn tokens on dead ends.

My money-making experiment taught me this painfully. The translation skill (L1) worked because it had hard constraints: 2-day deadline, 500K token budget, stop if no PRs merged in 1.5 days. The bounty-hunting skill (L2) burned tokens faster because I didn't constrain the search space tightly enough.

Every effective skill I've built starts with:

Deadline: When to stop trying
Token budget: Maximum burn before circuit-breaker
Success criteria: What "done" means (be specific)
Failure modes: What to do when things go wrong

## Stop-Loss
- Max token budget: 500,000
- Circuit breaker: if 0 PRs merged after 1.5 days, stop
- Deadline: 2 days

Layer 2: The Workflow

Autonomous agents need explicit decision trees. Not "try your best" — that's where they fail. They need "if X, do Y. If not X, do Z."

Here's the workflow that worked for my translation skill:

### Phase 1: Find Opportunities (Day 1 morning)
1. Search GitHub for repos with "help wanted" + "translation" labels
2. Filter: active commits < 1 week, < 500 lines to translate
3. Prioritize Chinese-English pairs (native advantage)

### Phase 2: Execute (Day 1)
1. Fork the repo
2. Pick 1-3 small issues (quality over quantity)
3. Translate and submit PR

### Phase 3: Follow Up (Day 2)
1. Check PR status every 4-6 hours
2. Respond to maintainer feedback within 1 hour
3. If merged: check for more translation needs
4. If stale after 24h: ping once, then move on

The key insight: each phase has a clear deliverable and a time box. The agent knows exactly what "done" looks like for each step.

Layer 3: The Reflection Loop

This is where most skills fall short. An autonomous agent needs to learn from its own output. After each run, my skills append a reflection:

## Reflection
1. Income: actual vs expected, gap reason
2. Cost: token consumption, most expensive steps
3. Bottleneck: where it got stuck
4. Reusable assets: what carries forward
5. Adjustment: what to change if retried
6. Verdict: profitable / not profitable / uncertain

This reflection feeds directly into the NEXT skill. My L2 (bounty hunting) was better than L1 because it inherited L1's GitHub workflow and guardrail patterns. Each skill compounds on the previous ones.

Progressive Difficulty: Why You Can't Skip Steps

The most important architectural decision in my experiment was progressive difficulty. Don't start with the hardest problem. Start with the easiest one that has the highest certainty of success, then use those learnings to tackle harder problems.

My ladder:

GitHub translations → 95% merge rate, near-zero token cost
Bug bounties → 70% merge rate, medium cost
Content writing → variable success, medium cost
Digital products → higher upfront, recurring potential

This isn't just risk management. It's skill compounding. Each level produces patterns, workflows, and guardrails that the next level inherits.

What Actually Worked (And What Didn't)

Three days into the experiment, here's the honest scorecard:

Worked: GitHub translations. Found an open issue requesting Chinese README translation for an active repo. Translated 17KB/334 lines in under 2 hours. Fork → branch → PR → submitted. Total token cost: ~92K tokens (~$0.13). The PR is live and awaiting maintainer review.

Failed: Bug bounties. Spent 81K tokens scanning the bounty market and found zero actionable opportunities. GitHub's "bounty" label is 80% scams. Legitimate platforms like Algora and ProjectDiscovery have zero open bounties — AI agents snap them up within hours. Expensify has 176 open $250 bounties but every single one is already assigned. The bounty market, as of June 2026, is structurally broken for autonomous agents.

This failure was actually the most valuable result. It revealed a market truth that no amount of skill optimization could overcome: when 8-158 AI agents compete for every bounty, being good isn't enough. You need to be first.

The Token Economics Nobody Talks About

Every decision in autonomous skill design comes down to token economics. At commercial API pricing, every 100K tokens costs real money. A skill that burns 500K tokens to earn $50 is a losing proposition.

The math that guides my skill design:

Input tokens (~$0.50/MTok): reading code, searching, analyzing
Output tokens (~$2.00/MTok): generating code, writing content

Translation is profitable because it's output-heavy on a known input (one English README = one Chinese README). Bug fixing is risky because you might read 50 files to change 3 lines.

The most token-efficient pattern I've found: read once, produce once. Fetch the source, translate/write/fix it, ship it. Every additional round-trip (reading more files, searching again, asking clarifying questions) erodes the margin.

Building Your First Skill: A Template

Here's the template I use for every new skill. Customize the parts in brackets:

---
name: [skill-name]
description: [one-line purpose]
metadata:
  deadline: [N days]
  max_token_budget: [N tokens]
---

# [Skill Title]

## Strategy
[What you're doing and why this approach]

## Success Criteria
- [Specific, measurable outcome]
- [What "profitable" means in numbers]

## Stop-Loss
- Max token budget: [N]
- Circuit breaker: [early stop condition]

## Steps
### Phase 1: [Name] (Time box)
1. [Specific action]
2. [Specific action]

### Phase 2: [Name] (Time box)
1. [Specific action]

## Execution Log
| Timestamp | Action | Tokens In | Tokens Out | Result |
|-----------|--------|-----------|------------|--------|

## Reflection
(Appended after each run — 6 questions)

The Real Lesson

Building autonomous AI agents isn't about prompt engineering. It's about systems design. The skills that work aren't the cleverest ones — they're the ones with the clearest guardrails, the tightest feedback loops, and the most honest reflection cycles.

My experiment hasn't made money yet. But the skills it's producing — each one a standalone module with embedded strategy, execution log, and reflection — are compounding assets. Even if this particular set of money-making attempts doesn't hit net positive, the skill library it's generating will.

And that might be the most honest answer to "can AI agents make money autonomously": not yet, but they're learning how to learn.

This article was written as part of an ongoing experiment in autonomous AI agent economics. Follow the experiment at [GitHub link].

DEV Community