DEV Community

bluefin ships
bluefin ships

Posted on

Building 10 AI agents in pure Python (no LangChain, no SaaS, no BS)

I spent six weeks in early 2026 trying to build a "personal AI ops" stack — the kind of small daemons that handle email triage, market summaries, meeting notes, the boring infra of a knowledge worker's day.

I tried LangChain. Then LangGraph. Then CrewAI. Then a Zapier subscription. I shipped nothing for a month.

Then I deleted all of it and wrote ten Python files. Each one does one job, in 15-30 lines of core logic. They've been running on my Mac via launchd for the last three weeks. This is what I learned.

The architecture (it's almost embarrassingly small)

launchd / cron
       │
       ▼
 Recipe (.py)        ← single file, no framework
       │
   ┌───┼───┐
   ▼   ▼   ▼
 API   Web  Local
 fetch scrape files
       │
       ▼
 AI layer (3-tier fallback)
   ① Claude API
   ② Ollama local
   ③ raw passthrough
       │
       ▼
 Push (Discord / Telegram / stdout)
Enter fullscreen mode Exit fullscreen mode

Every agent follows the same pipeline: fetch → reason → push. That's it. Once you internalize that shape, you stop reaching for frameworks.

Recipe 1: Gmail triage in 18 lines

from googleapiclient.discovery import build
import anthropic, json

def triage_today(creds):
    gmail = build('gmail', 'v1', credentials=creds)
    msgs = gmail.users().messages().list(
        userId='me', q='newer_than:1d', maxResults=30).execute()

    summaries = []
    for m in msgs.get('messages', []):
        full = gmail.users().messages().get(userId='me', id=m['id']).execute()
        snippet = full.get('snippet', '')[:300]
        subject = next((h['value'] for h in full['payload']['headers']
                        if h['name']=='Subject'), '')
        summaries.append(f"{subject} | {snippet}")

    judged = anthropic.Anthropic().messages.create(
        model="claude-sonnet-4", max_tokens=600,
        messages=[{"role":"user","content":
          f"Rank these emails 1-10 by importance to a freelance dev. "
          f"Return JSON [{{i, score, why}}]. Top 5 only.\n\n"
          + "\n".join(f"[{i}] {s}" for i,s in enumerate(summaries))}])
    return json.loads(judged.content[0].text)
Enter fullscreen mode Exit fullscreen mode

No agent framework. No tool-use loop. Just: fetch the data, give Claude the data, parse the answer. Run it at 21:00 via launchd, push the top-5 to LINE Notify. Done.

Recipe 3: Hacker News morning digest

import requests, anthropic

def hn_digest():
    ids = requests.get("https://hacker-news.firebaseio.com/v0/topstories.json").json()[:10]
    stories = [requests.get(
        f"https://hacker-news.firebaseio.com/v0/item/{i}.json").json() for i in ids]

    prompt = "Summarize each in 2 lines (Traditional Chinese), keep title English:\n\n"
    prompt += "\n".join(f"{s['title']}{s.get('url','(text)')}" for s in stories)

    out = anthropic.Anthropic().messages.create(
        model="claude-sonnet-4", max_tokens=1500,
        messages=[{"role":"user","content":prompt}])
    return out.content[0].text
Enter fullscreen mode Exit fullscreen mode

Eleven lines. Runs at 06:00. Pushes to my Discord. Replaces three RSS readers and a habit of doom-scrolling HN at breakfast.

Recipe 10: Multi-AI judge

The one I lean on most. When I'm stuck on a real decision, I fan out to Claude + GPT-4o + Gemini in parallel, then have Claude judge the panel:

import asyncio

async def judge(question):
    answers = await asyncio.gather(
        ask_claude(question), ask_gpt4o(question), ask_gemini(question),
        return_exceptions=True)

    verdict = anthropic.Anthropic().messages.create(
        model="claude-opus-4", max_tokens=1500,
        messages=[{"role":"user","content": f"""
Question: {question}

[Claude]: {answers[0]}
[GPT-4o]: {answers[1]}
[Gemini]: {answers[2]}

1. Where do they agree?
2. Where do they disagree, who's more defensible?
3. Final call + confidence 1-10."""}])
    return verdict.content[0].text
Enter fullscreen mode Exit fullscreen mode

Cost: ~$0.05 per call. Worth it on anything I'd otherwise spend 20 minutes turning over.

Why I stopped using frameworks

LangChain optimizes for "any LLM, any tool, any chain" — which means three layers of abstraction sitting between you and the API call you actually wanted to make. Every time it broke (and it broke often), debugging meant tracing through code I didn't write.

Pure Python + the official SDKs gave me:

  • Files I can grep
  • Stack traces that make sense
  • The ability to swap Claude for Ollama in one line
  • Cron-friendly behavior (no daemon, no warm-up, just run and exit)

If your agent fits in 30 lines, a framework is a tax, not a tool.

The fallback pattern that saved me

Every recipe has this:

def reason(prompt):
    try:
        return claude_call(prompt)
    except Exception:
        try:
            return ollama_call(prompt)  # local llama3
        except Exception:
            return prompt  # raw passthrough — at least the data still flows
Enter fullscreen mode Exit fullscreen mode

I lost two days last month to an Anthropic outage. Now my morning digest still arrives — degraded, sometimes just raw HN titles, but it arrives. Resilience is a feature.

Try it yourself

I packaged all 10 recipes as Agent Cookbook — single-file scripts, launchd/cron templates, full setup guide. $19 one-time, no subscription, the code is yours to fork.

https://vampireheart3.gumroad.com/l/agent-cookbook

Or just steal the patterns above. The architecture diagram is the actual product — the code is just a faithful copy of it.

Top comments (0)