sai-builder

Posted on May 21

7 Things I Automated with Claude Code + MCP That Actually Saved Time (and 3 That Didn't)

#ai #automation #devtools #llm

Most "things I automated with AI" lists are aspirational. They describe what's possible, run it once for the screenshot, and never mention that the thing broke on Tuesday and the author quietly went back to doing it by hand.

This is the honest version. These are automations I built with Claude Code + MCP that are still running weeks later because they genuinely save me time. Each one has a clear trigger, a clear output, and a reason it survived. Then — because a list of only wins is a sales pitch, not a field report — I'll show you three I built, measured, and deleted, and why.

The throughline: an automation is only worth it if the time it saves exceeds the time it costs to babysit. Most AI automations fail that test silently. The trick is measuring it on purpose.

Quick framing: what MCP actually buys you

If you're new to this — MCP (Model Context Protocol) is the standard that lets Claude Code call out to external tools and data sources through small servers: a browser, your filesystem, a calendar, a database, an API. The model stops being a chat box and becomes something that can act. Claude Code is the agent runtime that drives it from your terminal.

The mental model that's served me best: don't ask "can the model do this?" Ask "what's the smallest reliable tool I can hand it so it doesn't have to guess?" Most of my wins below are wins because I gave the agent a precise tool, not because I wrote a clever prompt.

The 7 that survived

1. Reading a logged-in page and turning it into structured data

A browser MCP attached to my real, logged-in browser session (via the DevTools Protocol) means the agent can read pages that require auth — dashboards, account pages, analytics — and hand me back structured data instead of me squinting at a UI. The win isn't "it browses." It's that the page that needed my login is now machine-readable without me exporting anything.

Trigger: "pull the current numbers from X." Output: a clean table. Why it survived: the alternative was me copy-pasting from a dashboard that has no export button.

2. Multi-file refactors with a plan I approve first

Claude Code reads across a whole directory, proposes a concrete edit plan, and only after I say go does it make the changes. The value over a single-file assistant is that it sees the blast radius — it finds the three other files that import the thing I'm renaming. The approval gate is non-negotiable: I approve the plan, not each keystroke.

Trigger: "rename this concept across the project." Output: a diff I review. Why it survived: it catches the references I'd miss, and the plan-first gate means it never surprises me.

3. Drafting from a template with my actual constraints baked in

I have repetitive structured documents — reports, briefs, configs — that follow a fixed shape. I gave the agent the template and the rules (required sections, tone, what's forbidden) as a reusable instruction. Now "draft tomorrow's report" produces something 80% done that I edit, instead of a blank page.

Trigger: a daily/weekly cadence. Output: a near-final draft. Why it survived: the boring 80% is exactly what I procrastinate on. The model doesn't procrastinate.

4. Filesystem-wide search-and-summarize

The filesystem MCP lets the agent grep across a messy knowledge base and answer "where did I write about X, and what did I conclude?" This replaced the genuinely awful workflow of me opening twelve files trying to remember which one had the decision in it.

Trigger: "what did I decide about Y?" Output: the answer plus the file path. Why it survived: it turns my own notes into something queryable. The path matters — I want to verify, not trust blindly.

5. Validation-loop form/data entry

When I have to enter structured data into a form or system that validates input, the agent enters it, reads the rejection, corrects, and retries — without me. I provide the facts; it does the labor and the error recovery. (I wrote a whole separate piece on exactly where this stops — the line is private facts and identity, not the typing.)

Trigger: "fill this with these values." Output: a completed, validated entry. Why it survived: the correction cycle is the tedious part, and it's now off my plate.

6. Read-back verification on anything that publishes

This one is a meta-automation born from getting burned (I published four empty articles once — long story). Any step that writes to an external system is now followed by a step that reads the result back and asserts it's correct. The agent publishes, then fetches the live artifact and diffs it against the source.

Trigger: any publish/write step. Output: a pass/fail on "did the thing actually land." Why it survived: it's caught silent failures that returned a 200 and shipped garbage. Cheap insurance against the worst failure mode.

7. Turning a rough voice-of-me brief into a first draft in a fixed persona

I keep persona/voice definitions as files. The agent loads the right one and drafts in that voice from a few bullet points. The win is consistency across many outputs without me re-explaining the voice every time — the constraints live in a file, not in my head or in each prompt.

Trigger: "draft this as ." Output: an on-voice draft. Why it survived: voice drift across a content series is real, and a file-based persona kills it.

The pattern in all 7

Read back over them. Every survivor has the same shape:

A precise tool (a specific MCP server), not a vague "be smart."
A clear trigger and a clear artifact, so I can tell instantly if it worked.
A human gate exactly where judgment lives — plan approval, fact provision, final review — and full automation everywhere judgment doesn't live.

The ones that work automate labor. They leave judgment to me. The moment an automation tries to own the judgment, it starts costing more than it saves — which brings me to the deletions.

The 3 I deleted

Deleted 1: fully autonomous publishing with no human gate

I tried letting the content pipeline publish with zero review, trusting the read-back check to catch problems. The read-back caught formatting failures fine. It could not catch "this draft is fine but it's the wrong thing to say right now." That's judgment, and I'd automated it away. Deleted the no-gate version; kept the read-back, restored the review gate. Lesson: verification catches broken, it can't catch wrong.

Deleted 2: a "monitor everything and alert me" agent

I built an agent to watch several sources and ping me on anything notable. It pinged constantly. The signal-to-noise was terrible because "notable" is a judgment call that depends on context I never fully specified. I spent more time triaging its alerts than I'd have spent checking the sources myself once a day. Deleted. Lesson: an automation that generates work to evaluate its own output is usually net-negative. Polling-and-judging is a trap.

Deleted 3: an over-engineered "agent that builds agents"

I tried to make a meta-agent that would spin up task-specific sub-agents on demand. It was a great demo and a maintenance sinkhole — every layer of indirection was a new place for things to break silently, and debugging a failure meant unwinding three levels of "which agent decided what." Deleted in favor of a flat list of single-purpose automations I can each understand in one sitting. Lesson: indirection is a cost you pay on every debug, forever. Flat and boring beats clever and nested.

What I'd tell you to actually do

If you're starting with Claude Code + MCP, don't chase the impressive stuff. Do this:

Pick one task you do repeatedly that is labor, not judgment. Form entry, drafting from a template, search-and-summarize. Not "decide my strategy."
Give the agent the smallest precise tool for it — one MCP server, not five.
Put a human gate exactly where the judgment is, and automate everything around it.
Add a read-back check if the task writes anything anywhere.
Measure for two weeks. If you spend more time babysitting it than it saves, delete it without sentiment. I deleted three. That's not failure; that's the measurement working.

The hype version of this post would be ten wins and no deletions. But the deletions are where the actual knowledge is. Automating labor pays. Automating judgment, monitoring, and meta-orchestration mostly doesn't — at least not yet, not solo, not without a babysitting cost that quietly eats the savings.

Build first. Measure honestly. Delete what doesn't pay. The design converges later — and "later" usually means "after you've deleted the clever thing and kept the boring one that works."

— Sai

If this was useful: I packaged the prompts I actually use to run autonomous agents into two field packs — 100 Prompts for Autonomous Agents and Claude Code Power-User Prompts. Same build-first mindset, ready to paste into your terminal.

Top comments (1)

Ken Imoto • May 21

"Automate labor, not judgment" is the cleanest formulation of this I've seen. Publish-verification resonates - I run all my Dev.to/Qiita/Kindle pipelines through a fetch-back step now, because a 200 from a write API means almost nothing (Dev.to's GET /articles/{id} actually 404s for IDs that exist in the list endpoint, so even verification has quirks). The meta-orchestration deletion is the one most indie devs need to hear; nested agents look cool until you debug them at 2am.