DEV Community

AI Operator
AI Operator

Posted on

How I Built a Custom Skill System for Claude Code That Autonomously Runs My Content Business

I want to show you something I built that genuinely surprised me with how far it went.

A few months ago I started using Claude Code not just as a coding assistant, but as the execution layer for an entire content and monetization business. I built a custom skill architecture on top of it, and now it runs daily content operations, community growth, SEO monitoring, and revenue optimization largely without me.

This is not a "Claude wrote my blog posts" story. This is about building a persistent, stateful agent system that executes multi-step workflows, learns from previous runs, and schedules itself.

Here's the full architecture.


The Problem With Existing AI Workflows

Most AI automation setups fall into one of two traps:

  1. One-shot prompts - You write a mega-prompt, get output, move on. No state. No learning. Every run is from scratch.
  2. Rigid pipelines - Tools like n8n or Make.com give you workflow automation but the AI step is just a black box call, with no way to carry context, adapt behavior, or self-improve.

What I wanted was something closer to having a junior operator who remembers what was done yesterday, knows the brand voice, can look up past articles to avoid repeating topics, and improves their own playbook when something underperforms.

Claude Code's file access and tool-use architecture made this possible.


The Core Architecture: Skills as YAML-Fronted Markdown

Everything runs on what I call "skills" - Markdown files with YAML frontmatter that live in a skills/<name>/SKILL.md path. Here's a simplified example:

---
name: rr-article-writer
description: Write SEO-optimised articles for RoboRhythms.com
---

# Article Writer Skill

## Trigger Conditions
Use when asked to write or draft a blog post for the site.

## Process

### Step 1: Research
- Check article-history.md to avoid duplicate topics
- Search for keyword volume data using DataForSEO MCP
- Pull top competitor content via NotebookLM

### Step 2: Write
- 2000-3500 words
- First-person sections where relevant
- No em dashes (brand rule)
- Numbered lists for any process with 3+ steps

### Step 3: Audit
- Run rr-auditor skill before publishing
- Check: title length, meta description, internal links

### Step 4: Publish
- POST to WordPress REST API via execution/publish_to_wordpress.py
- Update article-history.md with new entry
Enter fullscreen mode Exit fullscreen mode

A deploy.py script processes these files and registers them as plugins in the Claude Code plugin system. After deploying, Claude sees them as callable skills.

The power here is that the "prompt" for each workflow is version-controlled, improvable, and can reference other files (like article-history.md) that persist state across sessions.


The Execution Layer: Python Scripts as Tools

Claude Code can run bash commands, but raw bash gets messy fast for anything stateful. I wrote a library of Python execution scripts that handle the heavy lifting:

Script Purpose
publish_to_wordpress.py WordPress REST API: create or PATCH posts
generate_reddit_report.py Session PDF with stat cards and action breakdown
generate_forum_report.py Forum outreach PDF with warmup progress
fal_generate.py fal.ai FLUX.1 image generation for blog posts
create_youtube_video.py ElevenLabs TTS + Kling image-to-video + ffmpeg assembly

Each script accepts clean arguments (--post-id, --config, etc.) so Claude can call them with specific parameters without needing to construct raw API payloads every time.

This separation matters. Keeping the "what to do" in SKILL.md and the "how to execute" in Python scripts means I can upgrade the execution layer without touching the agent logic.


Persistent Memory: Two Layers

The system uses two memory layers that solve different problems.

Layer 1: File-based reference data

Simple Markdown and JSON files that Claude reads at the start of relevant tasks:

  • article-history.md - Every published article with title, date, primary keyword, and word count. Claude checks this before writing to avoid repeating topics.
  • affiliate-links.md - The canonical list of all affiliate links organized by category. Claude picks relevant ones when writing rather than having to ask.
  • subreddit-state.json - Post rotation state for the community subreddit so content doesn't repeat.
  • forum-queue.json - Pending forum replies that were researched but not yet posted.

These files act as long-term memory that survives session restarts.

Layer 2: claude-mem (SQLite + Chroma vectors)

For richer cross-session memory, I run a local claude-mem server (SQLite + Chroma vector DB) on port 37777. This stores "observations" - tagged records of past work, decisions, bugs found, and patterns discovered. Each observation has a token cost estimate so I can load exactly what I need without blowing context.

When starting a new session, a startup hook loads a summary index of recent observations (around 7k tokens). I can fetch specific records by ID or do semantic search when I need the full context of a past decision.

The key insight: you don't need to load all your memory upfront. A semantic index costs 7k tokens; full recall of a specific past decision costs maybe 200-400 tokens. This keeps the context window available for actual work.


The Scheduler: A Daily Briefing System

Every session starts with a SessionStart hook that runs agents/daily_briefing.py. This script:

  1. Reads agents/schedule.json which tracks every recurring task, its cadence in days, and when it last ran
  2. Calculates what's overdue, due today, and coming up
  3. Fetches breaking AI news from Perplexity API (for content inspiration)
  4. Summarizes skill quality metrics (rolling average scores from the evaluator)
  5. Prints the whole thing as a formatted briefing in Claude's context window

Here's a simplified schedule.json entry:

{
  "id": "rr-article",
  "name": "RoboRhythms Article",
  "skill": "reddit-article-pipeline",
  "cadence_days": 3,
  "last_run": "2026-03-19",
  "active": true
}
Enter fullscreen mode Exit fullscreen mode

The briefing shows up at the start of every Claude session. It means I never have to think "what should Claude work on today" - the system surfaces that automatically.

After any task completes, Claude updates last_run in the schedule file. This keeps the briefing accurate without any external cron jobs or database.


The Skill Pipeline: How an Article Gets Made

Here's the full article production pipeline to give you a concrete example of how these pieces fit together:

reddit-article-pipeline
  - Scans target subreddits for trending threads
  - Identifies a high-engagement topic with SEO potential
  - rr-notebooklm-prep
       - Fetches top 5 competitor articles on the topic
       - Uploads them to NotebookLM as sources
       - Extracts research brief: key facts, content gaps, angle ideas
  - rr-article-writer
       - Checks article-history.md (no duplicate topics)
       - Writes 2000-3500 word draft
       - Embeds internal links + affiliate links where relevant
  - rr-auditor
       - Scans for: em dashes, banned words, missing numbered lists
       - Checks title length, meta description
       - Fixes issues in-place
  - rr-image-generator
       - Generates 3 JPEG images via fal.ai FLUX.1
  - rr-publisher
       - POSTs to WordPress REST API as draft
       - Sends email with edit link for review
Enter fullscreen mode Exit fullscreen mode

The whole pipeline runs in a single Claude session. What used to take 3-4 hours of manual work runs in about 20-30 minutes with minimal oversight.


The Self-Improvement Loop

This is the part I'm most proud of: the system evaluates its own outputs and improves itself.

After each skill run, a skill-evaluator task runs against a rubric file (EVALS.md) that defines scoring dimensions for each skill. For rr-article-writer, dimensions include: keyword presence, internal link count, first-person voice, list formatting, word count, and zero banned words.

Scores get logged to skills/eval/results/eval-log.json. A separate skill-improver task runs daily and:

  1. Reads the eval log
  2. Identifies dimensions scoring below threshold (< 8.5)
  3. Edits the relevant SKILL.md to add explicit rules addressing the failure mode
  4. Logs the change as a formal experiment with hypothesis and expected impact

Skills that consistently score 9.5+ for 3+ runs get marked as "graduated" and the improver leaves them alone.

I've watched skills go from 7.2 to 9.4 over two weeks of automatic iteration. The system catches its own failure patterns and encodes fixes into the playbook.


Community Growth: The Reddit + Subreddit System

Alongside content production, the system runs two community channels:

Reddit warmup account: A warmup protocol that restricts to value-only replies (no product links) until the account is 14+ days old with 20+ posts. The reddit-growth skill finds high-intent threads in target subreddits, evaluates each thread for relevance and recent activity, writes a genuine reply that fits the thread's voice, and queues it for posting. Firefox-based Playwright handles the actual posting (no Reddit API, which avoids rate limits and suspicion).

Dedicated subreddit: A community subreddit I'm growing. The subreddit-builder skill rewrites popular posts from related subreddits into fresh original content, maintains a 90/10 helpful-to-promotional content ratio, and enforces a brand mention cadence (organic brand drop every ~6 posts). State is tracked in a JSON file to avoid repeating post formats or topics.

Both run daily and take about 15-20 minutes of autonomous execution.


The MCP Layer: What Unlocked the Real Power

Model Context Protocol (MCP) servers are what made external integrations clean. Instead of Claude having to construct raw API requests, MCP servers expose tools that Claude can call like functions.

The stack running in this system:

  • gmail MCP - Send reports, receive leads, automate email workflows
  • Google Search Console MCP (custom Python server) - Real click/impression/ranking data for SEO decisions
  • DataForSEO MCP - Live SERP data, keyword research, competitor analysis
  • NotebookLM MCP - Feed competitor content as sources, extract structured research briefs
  • Perplexity MCP - Real-time web search for news and trending topics
  • playwright-firefox MCP - Headless browser for any site that requires authentication or doesn't have an API
  • WhatsApp MCP - A Go bridge + Python MCP layer that lets Claude send WhatsApp messages and respond to them (for idea capture on mobile)

The Playwright MCP is particularly powerful - it's the escape hatch for anything without an API. Forum posting, publishing articles, product management - if it works in a browser, it works as an MCP tool call.


Results After Running This System

After several weeks of operation:

  • Articles published: Regular cadence of long-form SEO content, each going through the full pipeline
  • Reddit account: In warmup phase, building post history before link drops
  • Community subreddit: Growing with daily original posts, no reused content
  • Forum outreach: Active across 5 multilingual forums in warmup phase
  • Skill scores: Most core skills now averaging 9.0+, with subreddit builder at 9.9

The system runs largely overnight. I wake up to briefings, published drafts waiting for review, and execution reports.


What I'd Do Differently

1. Design the memory schema first. I added article-history.md late and had to retroactively backfill it. If you're building something like this, decide what state you need to persist on day one.

2. Make execution scripts idempotent. Early versions of my WordPress publisher could create duplicate posts if run twice. Every script should be safe to re-run.

3. Instrument everything. I added the skill evaluator + experiment log system about halfway through. Wish I'd had it from day one - I'd have much better data on what prompt changes actually moved the needle.

4. Don't make skills too long. Early SKILL.md files were 400+ lines and became hard to maintain. Shorter, focused skills with explicit references to shared config files work better than monoliths.


Getting Started

If you want to build something similar, start with these three pieces:

  1. A skill file system - Even just 3-5 Markdown skill files for your most repeated AI workflows
  2. A reference data layer - History files and config files that persist across sessions
  3. A daily briefing script - Even a simple one that reads a JSON schedule and outputs what's due

You don't need all the MCP servers and execution scripts on day one. The skeleton - skill files + persistent reference data + daily schedule - delivers most of the value.

I packaged the full architecture, all my SKILL.md templates, the scheduling system, execution script patterns, and the MCP configuration into a guide for anyone who wants to build their own version of this without starting from scratch.

If you're building on top of Claude Code and want to skip the trial-and-error phase: The AI Agent Automation Blueprint 2026


Questions about any specific part of the architecture? Drop them in the comments - happy to go deep on any of the components.

Top comments (0)