DEV Community

スシロー
スシロー

Posted on

I Built a Python Pipeline That Drafts Affiliate Articles Locally with Claude — Here's the Code, the 41-Second Run, and the Bug T

If you read this, you'll be able to run a small Python pipeline on your own laptop that: (1) generates a draft article from a topic + a keyword list, (2) injects your affiliate links only where they're contextually relevant, and (3) refuses to save anything where the title doesn't match the body. No SaaS, no cron server — just python pipeline.py "Laravel N+1" and a Markdown file lands in out/.

I run this every morning. Over 6 weeks it produced 17 drafts; my honest conversion is still low (think single-digit clicks, not "月10万"), but the machinery works and the failure modes are interesting. This is the build log, not a get-rich post.

The architecture: one Python file, Claude for prose, a hard validation gate

The whole thing is ~180 lines. The non-obvious design decision: the LLM never touches your affiliate links. Claude writes prose; a deterministic Python step does link insertion. Why? Because the first version let the model embed links, and Claude happily invented https://amzn.to/laravel-pro — a URL that does not exist. Hallucinated affiliate links are worse than no links: they leak trust and earn nothing.

So the contract is:

  • Claude (claude-opus-4-8 via the Anthropic SDK): topic → structured JSON {title, sections[], keywords_used[]}.
  • Python: takes that JSON, matches section text against a curated link table, and inserts at most one link per 400 words.
  • A validation gate: title tokens must overlap the body, or the draft is rejected and nothing is written.

Here is the generation core. It uses the Anthropic Messages API with a forced JSON shape via a tool definition — that's the reliable way to get structured output, far better than "please return JSON" in the prompt.

# pipeline.py  (Python 3.11)
import json, os, re, sys, pathlib
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
MODEL = "claude-opus-4-8"

ARTICLE_TOOL = {
    "name": "emit_article",
    "description": "Return the drafted technical article as structured data.",
    "input_schema": {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "sections": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "h2": {"type": "string"},
                        "body_md": {"type": "string"},
                    },
                    "required": ["h2", "body_md"],
                },
            },
            "keywords_used": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["title", "sections", "keywords_used"],
    },
}

def draft(topic: str, keywords: list[str]) -> dict:
    prompt = (
        f"You are a senior backend engineer. Write a hands-on article on: {topic}.\n"
        f"Each H2 must contain at least one of these search keywords: {keywords}.\n"
        "Include real numbers and one runnable code block per section. "
        "Do NOT include any URLs or affiliate links — leave linking to the pipeline."
    )
    resp = client.messages.create(
        model=MODEL,
        max_tokens=4000,
        tools=[ARTICLE_TOOL],
        tool_choice={"type": "tool", "name": "emit_article"},
        messages=[{"role": "user", "content": prompt}],
    )
    for block in resp.content:
        if block.type == "tool_use":
            return block.input
    raise RuntimeError("model did not call emit_article")
Enter fullscreen mode Exit fullscreen mode

tool_choice forcing emit_article is the part that took me three tries to get right. Without it, ~1 in 8 runs returned a chatty text block ("Sure! Here's your article...") and my json.loads blew up. Forcing the tool dropped that failure rate to zero across the last 60 runs.

The link table: deterministic insertion, max 1 per 400 words, no hallucinated amzn.to

This is the boring part that actually protects revenue. I keep a hand-written table of links I'm actually registered for (A8.net, an affiliate-enabled book retailer, etc.), each with a list of trigger keywords. Python inserts a link only when a section genuinely discusses that topic, and never more than one per ~400 words — because a wall of affiliate links is the fastest way to get a reader to bounce and an editor to flag spam.

# links.py
LINK_TABLE = [
    {
        "triggers": ["n+1", "eloquent", "query log", "eager loading"],
        "anchor": "a practical Laravel performance book",
        "url": "https://example-a8-link/laravel-perf",  # your real A8 tracking URL
    },
    {
        "triggers": ["new nisa", "index fund", "brokerage"],
        "anchor": "open a tsumitate NISA account",
        "url": "https://example-a8-link/nisa",
    },
]

def inject_links(body_md: str) -> tuple[str, int]:
    words = max(len(body_md.split()), 1)
    budget = max(1, words // 400)          # at most 1 link per 400 words
    low = body_md.lower()
    inserted = 0
    for link in LINK_TABLE:
        if inserted >= budget:
            break
        if any(t in low for t in link["triggers"]):
            md_link = f"[{link['anchor']}]({link['url']})"
            body_md += f"\n\n> 📚 Related: {md_link}"
            inserted += 1
    return body_md, inserted
Enter fullscreen mode Exit fullscreen mode

Measured behavior on my last 17 drafts: average 1.3 links per article, and 4 articles got zero links because no section matched a trigger — which is exactly what I want. An off-topic affiliate link converts at ~0% and costs you credibility. Letting the budget go to zero is a feature.

The bug that published 3 articles with mismatched titles (and the gate that killed it)

Here's the failure story. Early on, my title prompt and my body prompt were two separate Claude calls. On three mornings the title said "Laravel Eloquent N+1" while the body had drifted into MySQL index design — because the second call had no memory of the first. I didn't notice until a reader DMed me "the title is lying." Mortifying.

Fix: one call returns both (already done above), plus a deterministic gate that runs before anything is written to disk. If fewer than 2 meaningful title tokens appear in the body, the draft is rejected — no file, non-zero exit code, loud message.

STOP = {"the", "a", "to", "in", "with", "and", "of", "for", "how", "i"}

def title_matches_body(title: str, body: str) -> bool:
    toks = [t for t in re.findall(r"[a-z0-9+]+", title.lower()) if t not in STOP]
    body_low = body.lower()
    hits = sum(1 for t in toks if t in body_low)
    return hits >= 2          # require 2+ real title tokens in the body

def build(topic: str, keywords: list[str]) -> pathlib.Path:
    art = draft(topic, keywords)
    parts = [f"# {art['title']}\n"]
    for sec in art["sections"]:
        body, n = inject_links(sec["body_md"])
        parts.append(f"## {sec['h2']}\n\n{body}\n")
    full = "\n".join(parts)

    if not title_matches_body(art["title"], full):
        raise SystemExit(f"REJECTED: title/body drift -> {art['title']!r}")

    slug = re.sub(r"[^a-z0-9]+", "-", art["title"].lower()).strip("-")[:60]
    out = pathlib.Path("out") / f"{slug}.md"
    out.parent.mkdir(exist_ok=True)
    out.write_text(full, encoding="utf-8")
    return out

if __name__ == "__main__":
    topic = sys.argv[1] if len(sys.argv) > 1 else "Laravel Eloquent N+1"
    kws = ["eloquent", "whereHas", "eager loading", "query log"]
    path = build(topic, kws)
    print(f"wrote {path}")
Enter fullscreen mode Exit fullscreen mode

Since adding title_matches_body, the gate has rejected 2 of the last 31 runs — both genuine drifts where Claude wandered off-topic in a long section. Two prevented embarrassments for the cost of a 5-line function. The >= 2 threshold matters: at >= 1, a single accidental token like "the" (before I added the stoplist) passed garbage; at >= 3, legitimate short titles got rejected. Two is the sweet spot for my title lengths.

Timing on a real laptop: 41 seconds, ~3,800 output tokens, and why I don't parallelize

On an M-class / Ryzen laptop the bottleneck is entirely the API round-trip, not Python. A full run breaks down as:

  • Claude generation (max_tokens=4000, usually ~3,800 used): 38–40 s
  • Link injection + validation + file write: <0.1 s
  • Total wall clock: ~41 s per article.

I deliberately do not fan out 10 topics in parallel. One article a day, hand-reviewed before posting, keeps quality up and keeps me off platform spam filters — which is the real constraint, not throughput. The machine could do 30 in 20 minutes; that's exactly the trap that gets accounts flagged.

Wiring it to GitHub Actions: one article every morning at 07:00 JST

The local script is the unit; GitHub Actions is just a free cron that runs it and commits the result. The keys live in repo secrets, never in the file. Cost note: at current Opus pricing, ~3,800 output tokens is a few cents per run — call it the price of a vending-machine coffee per month, not per article.

# .github/workflows/daily.yml
name: daily-draft
on:
  schedule:
    - cron: "0 22 * * *"   # 22:00 UTC = 07:00 JST
  workflow_dispatch: {}
jobs:
  draft:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install anthropic
      - env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python pipeline.py "Laravel Eloquent N+1 query optimization"
      - run: |
          git config user.name "draft-bot"
          git config user.email "bot@users.noreply.github.com"
          git add out/ && git commit -m "daily draft" || echo "nothing to commit"
          git push
Enter fullscreen mode Exit fullscreen mode

The || echo "nothing to commit" line is load-bearing: when the validation gate rejects a draft, there's no file, git commit would exit non-zero, and the whole Action would go red for no good reason. This keeps a rejection (correct behavior) from looking like a failure.

What actually moves the needle (and what doesn't)

Blunt truth from 6 weeks: the pipeline is the easy 20%. Distribution is the other 80%, and code can't fake it. My drafts that got read were the ones where the topic matched the platform's audience (concrete Laravel/Python implementation posts on a dev-heavy platform), not the generic ones. The automation's real value isn't "passive income" — it's removing the 40-minute cold-start of staring at a blank editor, so I'll actually publish 5 days a week instead of 1.

If you build this, steal three ideas specifically: (1) force structured output with tool_choice so you never parse free text; (2) keep affiliate links in deterministic Python, never in the prompt, so the model can't hallucinate a payout URL; (3) add a title↔body gate before any write — it's the cheapest insurance against shipping something that lies to your readers.

The full ~180-line version, plus the link table format, is the same shape as above — copy the three functions and you have a working draft generator today. If you want to go deeper on the query-optimization side that these drafts target, a practical Laravel performance book is the one I keep open while editing.

Top comments (0)