<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: sai-builder</title>
    <description>The latest articles on DEV Community by sai-builder (@saibuilder).</description>
    <link>https://dev.to/saibuilder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3939581%2Fef459f7b-454a-495a-ac51-4d21fbdb3908.png</url>
      <title>DEV Community: sai-builder</title>
      <link>https://dev.to/saibuilder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saibuilder"/>
    <language>en</language>
    <item>
      <title>7 Things I Automated with Claude Code + MCP That Actually Saved Time (and 3 That Didn't)</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Thu, 21 May 2026 23:44:45 +0000</pubDate>
      <link>https://dev.to/saibuilder/7-things-i-automated-with-claude-code-mcp-that-actually-saved-time-and-3-that-didnt-48e8</link>
      <guid>https://dev.to/saibuilder/7-things-i-automated-with-claude-code-mcp-that-actually-saved-time-and-3-that-didnt-48e8</guid>
      <description>&lt;p&gt;Most "things I automated with AI" lists are aspirational. They describe what's &lt;em&gt;possible&lt;/em&gt;, run it once for the screenshot, and never mention that the thing broke on Tuesday and the author quietly went back to doing it by hand.&lt;/p&gt;

&lt;p&gt;This is the honest version. These are automations I built with &lt;strong&gt;Claude Code + MCP&lt;/strong&gt; that are &lt;em&gt;still running&lt;/em&gt; weeks later because they genuinely save me time. Each one has a clear trigger, a clear output, and a reason it survived. Then — because a list of only wins is a sales pitch, not a field report — I'll show you three I built, measured, and &lt;strong&gt;deleted&lt;/strong&gt;, and why.&lt;/p&gt;

&lt;p&gt;The throughline: an automation is only worth it if the time it saves exceeds the time it costs to babysit. Most AI automations fail that test silently. The trick is measuring it on purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick framing: what MCP actually buys you
&lt;/h2&gt;

&lt;p&gt;If you're new to this — MCP (Model Context Protocol) is the standard that lets Claude Code call out to external tools and data sources through small servers: a browser, your filesystem, a calendar, a database, an API. The model stops being a chat box and becomes something that can &lt;em&gt;act&lt;/em&gt;. Claude Code is the agent runtime that drives it from your terminal.&lt;/p&gt;

&lt;p&gt;The mental model that's served me best: &lt;strong&gt;don't ask "can the model do this?" Ask "what's the smallest reliable tool I can hand it so it doesn't have to guess?"&lt;/strong&gt; Most of my wins below are wins because I gave the agent a precise tool, not because I wrote a clever prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 7 that survived
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Reading a logged-in page and turning it into structured data
&lt;/h3&gt;

&lt;p&gt;A browser MCP attached to my &lt;em&gt;real, logged-in&lt;/em&gt; browser session (via the DevTools Protocol) means the agent can read pages that require auth — dashboards, account pages, analytics — and hand me back structured data instead of me squinting at a UI. The win isn't "it browses." It's that the page that needed &lt;em&gt;my&lt;/em&gt; login is now machine-readable without me exporting anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; "pull the current numbers from X." &lt;strong&gt;Output:&lt;/strong&gt; a clean table. &lt;strong&gt;Why it survived:&lt;/strong&gt; the alternative was me copy-pasting from a dashboard that has no export button.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Multi-file refactors with a plan I approve first
&lt;/h3&gt;

&lt;p&gt;Claude Code reads across a whole directory, proposes a concrete edit plan, and only after I say go does it make the changes. The value over a single-file assistant is that it &lt;em&gt;sees the blast radius&lt;/em&gt; — it finds the three other files that import the thing I'm renaming. The approval gate is non-negotiable: I approve the plan, not each keystroke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; "rename this concept across the project." &lt;strong&gt;Output:&lt;/strong&gt; a diff I review. &lt;strong&gt;Why it survived:&lt;/strong&gt; it catches the references I'd miss, and the plan-first gate means it never surprises me.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Drafting from a template with my actual constraints baked in
&lt;/h3&gt;

&lt;p&gt;I have repetitive structured documents — reports, briefs, configs — that follow a fixed shape. I gave the agent the template &lt;em&gt;and the rules&lt;/em&gt; (required sections, tone, what's forbidden) as a reusable instruction. Now "draft tomorrow's report" produces something 80% done that I edit, instead of a blank page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; a daily/weekly cadence. &lt;strong&gt;Output:&lt;/strong&gt; a near-final draft. &lt;strong&gt;Why it survived:&lt;/strong&gt; the boring 80% is exactly what I procrastinate on. The model doesn't procrastinate.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Filesystem-wide search-and-summarize
&lt;/h3&gt;

&lt;p&gt;The filesystem MCP lets the agent grep across a messy knowledge base and answer "where did I write about X, and what did I conclude?" This replaced the genuinely awful workflow of me opening twelve files trying to remember which one had the decision in it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; "what did I decide about Y?" &lt;strong&gt;Output:&lt;/strong&gt; the answer plus the file path. &lt;strong&gt;Why it survived:&lt;/strong&gt; it turns my own notes into something queryable. The path matters — I want to verify, not trust blindly.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Validation-loop form/data entry
&lt;/h3&gt;

&lt;p&gt;When I have to enter structured data into a form or system that validates input, the agent enters it, reads the rejection, corrects, and retries — without me. I provide the &lt;em&gt;facts&lt;/em&gt;; it does the &lt;em&gt;labor&lt;/em&gt; and the &lt;em&gt;error recovery&lt;/em&gt;. (I wrote a whole separate piece on exactly where this stops — the line is private facts and identity, not the typing.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; "fill this with these values." &lt;strong&gt;Output:&lt;/strong&gt; a completed, validated entry. &lt;strong&gt;Why it survived:&lt;/strong&gt; the correction cycle is the tedious part, and it's now off my plate.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Read-back verification on anything that publishes
&lt;/h3&gt;

&lt;p&gt;This one is a meta-automation born from getting burned (I published four empty articles once — long story). Any step that &lt;em&gt;writes&lt;/em&gt; to an external system is now followed by a step that &lt;em&gt;reads the result back and asserts it's correct&lt;/em&gt;. The agent publishes, then fetches the live artifact and diffs it against the source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; any publish/write step. &lt;strong&gt;Output:&lt;/strong&gt; a pass/fail on "did the thing actually land." &lt;strong&gt;Why it survived:&lt;/strong&gt; it's caught silent failures that returned a 200 and shipped garbage. Cheap insurance against the worst failure mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Turning a rough voice-of-me brief into a first draft in a fixed persona
&lt;/h3&gt;

&lt;p&gt;I keep persona/voice definitions as files. The agent loads the right one and drafts in that voice from a few bullet points. The win is consistency across many outputs without me re-explaining the voice every time — the constraints live in a file, not in my head or in each prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; "draft this as ." &lt;strong&gt;Output:&lt;/strong&gt; an on-voice draft. &lt;strong&gt;Why it survived:&lt;/strong&gt; voice drift across a content series is real, and a file-based persona kills it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern in all 7
&lt;/h2&gt;

&lt;p&gt;Read back over them. Every survivor has the same shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A precise tool&lt;/strong&gt; (a specific MCP server), not a vague "be smart."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A clear trigger and a clear artifact&lt;/strong&gt;, so I can tell instantly if it worked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A human gate exactly where judgment lives&lt;/strong&gt; — plan approval, fact provision, final review — and full automation everywhere judgment &lt;em&gt;doesn't&lt;/em&gt; live.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ones that work automate &lt;em&gt;labor&lt;/em&gt;. They leave &lt;em&gt;judgment&lt;/em&gt; to me. The moment an automation tries to own the judgment, it starts costing more than it saves — which brings me to the deletions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 3 I deleted
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Deleted 1: fully autonomous publishing with no human gate
&lt;/h3&gt;

&lt;p&gt;I tried letting the content pipeline publish with zero review, trusting the read-back check to catch problems. The read-back caught &lt;em&gt;formatting&lt;/em&gt; failures fine. It could not catch "this draft is fine but it's the wrong thing to say right now." That's judgment, and I'd automated it away. &lt;strong&gt;Deleted&lt;/strong&gt; the no-gate version; kept the read-back, restored the review gate. Lesson: verification catches &lt;em&gt;broken&lt;/em&gt;, it can't catch &lt;em&gt;wrong&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deleted 2: a "monitor everything and alert me" agent
&lt;/h3&gt;

&lt;p&gt;I built an agent to watch several sources and ping me on anything notable. It pinged constantly. The signal-to-noise was terrible because "notable" is a judgment call that depends on context I never fully specified. I spent more time triaging its alerts than I'd have spent checking the sources myself once a day. &lt;strong&gt;Deleted.&lt;/strong&gt; Lesson: an automation that generates work to evaluate its own output is usually net-negative. Polling-and-judging is a trap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deleted 3: an over-engineered "agent that builds agents"
&lt;/h3&gt;

&lt;p&gt;I tried to make a meta-agent that would spin up task-specific sub-agents on demand. It was a great demo and a maintenance sinkhole — every layer of indirection was a new place for things to break silently, and debugging a failure meant unwinding three levels of "which agent decided what." &lt;strong&gt;Deleted&lt;/strong&gt; in favor of a flat list of single-purpose automations I can each understand in one sitting. Lesson: indirection is a cost you pay on every debug, forever. Flat and boring beats clever and nested.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell you to actually do
&lt;/h2&gt;

&lt;p&gt;If you're starting with Claude Code + MCP, don't chase the impressive stuff. Do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pick one task you do repeatedly that is labor, not judgment.&lt;/strong&gt; Form entry, drafting from a template, search-and-summarize. Not "decide my strategy."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give the agent the smallest precise tool for it&lt;/strong&gt; — one MCP server, not five.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put a human gate exactly where the judgment is&lt;/strong&gt;, and automate everything around it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a read-back check&lt;/strong&gt; if the task writes anything anywhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure for two weeks.&lt;/strong&gt; If you spend more time babysitting it than it saves, delete it without sentiment. I deleted three. That's not failure; that's the measurement working.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The hype version of this post would be ten wins and no deletions. But the deletions are where the actual knowledge is. Automating labor pays. Automating judgment, monitoring, and meta-orchestration mostly doesn't — at least not yet, not solo, not without a babysitting cost that quietly eats the savings.&lt;/p&gt;

&lt;p&gt;Build first. Measure honestly. Delete what doesn't pay. The design converges later — and "later" usually means "after you've deleted the clever thing and kept the boring one that works."&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this was useful: I packaged the prompts I actually use to run autonomous agents into two field packs — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;100 Prompts for Autonomous Agents&lt;/a&gt; and &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code Power-User Prompts&lt;/a&gt;. Same build-first mindset, ready to paste into your terminal.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devtools</category>
      <category>llm</category>
    </item>
    <item>
      <title>My AI Agent Kept Publishing Empty Articles — So I Made It Edit Them Back via the API</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Thu, 21 May 2026 23:44:01 +0000</pubDate>
      <link>https://dev.to/saibuilder/my-ai-agent-kept-publishing-empty-articles-so-i-made-it-edit-them-back-via-the-api-67d</link>
      <guid>https://dev.to/saibuilder/my-ai-agent-kept-publishing-empty-articles-so-i-made-it-edit-them-back-via-the-api-67d</guid>
      <description>&lt;p&gt;I have a content pipeline that drafts articles, runs them past me, and publishes them to dev.to. Last week it published six in a batch. Two looked fine. The other four were live, indexed, public — and &lt;strong&gt;completely empty&lt;/strong&gt;. Title, tags, cover, canonical URL, all present. Body: nothing. Four blank posts under a persona I'm trying to build credibility with.&lt;/p&gt;

&lt;p&gt;This is the writeup of how that happened and how I fixed it, because the fix taught me something I keep relearning the hard way: &lt;strong&gt;the reliable automation path is almost never the obvious one&lt;/strong&gt;, and the obvious one fails &lt;em&gt;silently&lt;/em&gt;, which is worse than failing loud.&lt;/p&gt;

&lt;h2&gt;
  
  
  How four articles ended up blank
&lt;/h2&gt;

&lt;p&gt;The pipeline's last step was "publish." It drove the dev.to web editor: open the new-post page, fill the title field, fill the markdown body field, hit publish. Standard browser automation. It had worked before, which is exactly why I trusted it and didn't check the output closely enough.&lt;/p&gt;

&lt;p&gt;The bodies I was feeding it were long. Not novel-length, but 1,500+ words of markdown with code fences, the occasional non-ASCII character, em dashes, the works. And here's the thing the demos never show you: when you programmatically stuff a large string into a rich editor's input and immediately trigger save, you are racing the editor's own internal state. The editor has its own model of the document. Your injected text and its serialize-on-save don't always agree on timing. Sometimes the save fires against an editor that hasn't committed your injection yet.&lt;/p&gt;

&lt;p&gt;When that happens, the platform doesn't error. It happily saves the document it currently believes in — which is empty. You get a 200. You get a published URL. You get nothing in the body.&lt;/p&gt;

&lt;p&gt;That's the part that stings. There was no exception to catch. The automation reported success. The only way to know it failed was to &lt;em&gt;read the published page&lt;/em&gt;, which I wasn't doing because, well, the step said it succeeded. &lt;strong&gt;A pipeline that lies about success is more dangerous than one that crashes.&lt;/strong&gt; A crash you handle. A silent lie you ship.&lt;/p&gt;

&lt;p&gt;So lesson zero, before any of the technical stuff: if an automation step produces an artifact, your pipeline has to &lt;em&gt;read the artifact back and assert it's correct&lt;/em&gt;. Not "did the call return 200." Did the body actually land. I now treat any write-without-readback step as a known liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  First fix attempt: just drive the editor better
&lt;/h2&gt;

&lt;p&gt;The instinct is to fix the thing that broke. So I tried to make the editor automation more robust: wait for the editor to be ready, inject, wait again, poll the editor's internal value until it matched what I sent, &lt;em&gt;then&lt;/em&gt; save.&lt;/p&gt;

&lt;p&gt;This is where I want to be honest about time. I spent a couple of hours on this and it got &lt;em&gt;better&lt;/em&gt;, not &lt;em&gt;good&lt;/em&gt;. The editor is a moving target — its DOM, its internal state model, the events it listens to. I could get it to ~90% reliable, which for a publishing step is useless. 90% reliable means 1 in 10 of my posts is blank, and I won't know which one without checking all of them, which defeats the automation.&lt;/p&gt;

&lt;p&gt;This is the moment that matters, and I almost always get it wrong: &lt;strong&gt;I was fighting the tool instead of changing the transport.&lt;/strong&gt; When you find yourself adding wait-then-poll-then-verify scaffolding around a UI that wasn't built for you, that's not robustness, that's a smell. You're hand-stabilizing something inherently unstable. Timebox it. I now give myself a hard cap — if a tooling fight isn't won in roughly an hour, I stop and ask: &lt;em&gt;is there a known-working transport I'm ignoring because it's less convenient?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There was.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual fix: drive edits through the API
&lt;/h2&gt;

&lt;p&gt;dev.to has a real, documented write API. You can update a published article with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;PUT https://dev.to/api/articles/{id}
api-key: &amp;lt;YOUR_API_KEY&amp;gt;
Content-Type: application/json

{
  "article": {
    "body_markdown": "...the full markdown body..."
  }
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bypasses the editor entirely. No racing an editor's internal state, no DOM, no save-button timing. You hand the platform the canonical markdown and it stores it. The four empty posts already existed with the right titles and tags; I just needed to PUT the bodies into them. So the repair plan was simple: for each blank article ID, PUT the correct &lt;code&gt;body_markdown&lt;/code&gt;. Done.&lt;/p&gt;

&lt;p&gt;Except it wasn't, and the two ways it wasn't are the genuinely useful part of this post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Snag 1: the request has to come from inside a logged-in browser
&lt;/h3&gt;

&lt;p&gt;My first move was the clean one: fire the PUT from a script, server-side, with the API key in the header. &lt;strong&gt;401.&lt;/strong&gt; Tried it a few different ways. Still 401.&lt;/p&gt;

&lt;p&gt;I'm not going to over-claim the root cause here — I didn't fully reverse-engineer their auth posture, and you shouldn't trust a war story that pretends it did. What I &lt;em&gt;observed&lt;/em&gt;, repeatably, is that the external/non-browser request was rejected, and the &lt;strong&gt;same&lt;/strong&gt; PUT issued from within an already-authenticated browser session — the actual tab where I was logged in — went through. So that's what I did: I ran the request from inside the logged-in page's own context, where the session and the api-key together were accepted, instead of from a cold external client.&lt;/p&gt;

&lt;p&gt;The reusable takeaway isn't "dev.to requires X." It's: &lt;strong&gt;when an API rejects you from outside but the platform clearly performs the same write from its own frontend, stop trying to replicate the auth from scratch and just borrow the context that already works.&lt;/strong&gt; You have a logged-in browser. Issue the call from there. It's not elegant. It's reliable, which beats elegant for a repair job.&lt;/p&gt;

&lt;h3&gt;
  
  
  Snag 2: long unicode bodies got mangled into homoglyphs
&lt;/h3&gt;

&lt;p&gt;Now the worse one. I had the transport working, so I tried to get the body &lt;em&gt;into&lt;/em&gt; the request. The naive way is to inline the markdown directly into the call — encode the whole 1,500-word body and pass it along with the PUT.&lt;/p&gt;

&lt;p&gt;The bodies came back &lt;strong&gt;corrupted&lt;/strong&gt;. Specifically, characters had been swapped for &lt;strong&gt;homoglyphs&lt;/strong&gt; — visually near-identical lookalikes from other Unicode blocks. Em dashes, quotes, and a handful of letters got silently substituted for characters that &lt;em&gt;look&lt;/em&gt; the same in the editor but are different code points. A reader wouldn't notice at a glance. A code fence would. And it meant my "fixed" article was now subtly wrong in a way that's almost impossible to eyeball.&lt;/p&gt;

&lt;p&gt;The cause was the path the big string took: shoving a large encoded blob inline through layers of escaping and re-encoding gave something, somewhere, the chance to normalize or transcode it. Every hop a string takes through quoting, shell escaping, JSON encoding, and re-decoding is a chance for a "helpful" substitution. With a short ASCII string you'd never see it. With a long unicode body, the corruption is statistically guaranteed.&lt;/p&gt;

&lt;p&gt;The fix that finally worked, end to end, was to &lt;strong&gt;never inline the body at all&lt;/strong&gt;. Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Put the full markdown on the &lt;strong&gt;clipboard&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;In the logged-in browser tab, &lt;strong&gt;paste&lt;/strong&gt; it into a plain &lt;code&gt;&amp;lt;textarea&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Read the textarea's &lt;strong&gt;&lt;code&gt;.value&lt;/code&gt;&lt;/strong&gt; back — now I have the exact string the browser holds, no inline-encoding hops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PUT&lt;/strong&gt; that value to the API.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1) body is on the OS clipboard (put there by the pipeline)&lt;/span&gt;
&lt;span class="c1"&gt;// 2) inside the logged-in tab:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;textarea&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ta&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;ta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;focus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// paste the clipboard into the textarea (real paste event)&lt;/span&gt;
&lt;span class="c1"&gt;// 3) read it back — this is the clean source of truth&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// 4) issue the PUT from this same authenticated context&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/api/articles/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;article&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;body_markdown&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The clipboard-and-textarea step looks absurd. It &lt;em&gt;is&lt;/em&gt; absurd. But it works for a precise reason: the clipboard → paste → &lt;code&gt;.value&lt;/code&gt; round-trip keeps the string as a single opaque payload inside one runtime (the browser), instead of marching it through five layers of escaping where each layer is allowed to "correct" it. The textarea is just a clean holding pen that hands you back exactly what the browser received. No homoglyphs, because nothing in the path thought it was being helpful.&lt;/p&gt;

&lt;p&gt;I checked all four repaired articles character-for-character against the source. Clean. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson, generalized
&lt;/h2&gt;

&lt;p&gt;Strip away the dev.to specifics and here's what I'd tack to the wall above any agent-builder's desk:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. A write step that doesn't read its result back is not done — it's a liability with a green checkmark.&lt;/strong&gt; The empty articles shipped because "publish" returned success. Assert the artifact, not the status code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The reliable transport is rarely the obvious one.&lt;/strong&gt; The obvious path was the editor, because that's the UI a human uses. The reliable path was the API. The obvious path failed silently; the reliable one failed &lt;em&gt;loudly&lt;/em&gt; (401) until I gave it the context it needed, which is exactly the failure mode you want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Timebox tooling fights.&lt;/strong&gt; If you're hand-stabilizing an unstable surface with wait-poll-verify scaffolding and you're past an hour, stop. Ask what &lt;em&gt;known-working transport&lt;/em&gt; you're avoiding because it's less convenient. Convenience is not reliability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Long unicode strings corrupt at every hop.&lt;/strong&gt; Every escaping/encoding boundary is a chance for silent substitution. The fewer hops, the fewer homoglyphs. When in doubt, keep the payload opaque inside one runtime and read it back before you trust it.&lt;/p&gt;

&lt;p&gt;None of this is the clever-architecture content the algorithm rewards. It's the unglamorous reality of running agents that touch real systems: the model is the easy part, and the last mile is full of silent string corruption and auth that only works from the right tab. You don't design your way around that up front. You hit it, you timebox the fight, you fall back to the transport that actually works, and you write down the smell so you recognize it faster next time.&lt;/p&gt;

&lt;p&gt;Build first. The design converges later — usually right after the fourth empty article goes live.&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this was useful: I packaged the prompts I actually use to run autonomous agents into two field packs — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;100 Prompts for Autonomous Agents&lt;/a&gt; and &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code Power-User Prompts&lt;/a&gt;. Same build-first mindset, ready to paste into your terminal.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>devtools</category>
    </item>
    <item>
      <title>I Let an AI Agent Set Up My Payout Account. Here's the Exact Line It Couldn't Cross.</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Thu, 21 May 2026 02:00:59 +0000</pubDate>
      <link>https://dev.to/saibuilder/i-let-an-ai-agent-set-up-my-payout-account-heres-the-exact-line-it-couldnt-cross-13cg</link>
      <guid>https://dev.to/saibuilder/i-let-an-ai-agent-set-up-my-payout-account-heres-the-exact-line-it-couldnt-cross-13cg</guid>
      <description>&lt;p&gt;Yesterday I published a piece arguing that a fully autonomous AI startup loop hits two ceilings: an &lt;em&gt;idea ceiling&lt;/em&gt; and an &lt;em&gt;execution ceiling&lt;/em&gt;. The execution ceiling, I wrote, is where thinking is fully autonomous but doing gets stopped at human gates — CAPTCHA, KYC, capital, "are you human?"&lt;/p&gt;

&lt;p&gt;That framing was correct but coarse. I wrote it from the armchair. So the next day I went and ran the actual experiment, because a theory about where AI stops is worthless until you push an agent right up to the wall and watch exactly where it bounces.&lt;/p&gt;

&lt;p&gt;This is the field report. The conclusion is sharper than the theory: &lt;strong&gt;labor is delegable, identity is not.&lt;/strong&gt; And the gap between those two is much smaller than I expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I picked the most concrete, highest-stakes execution task I could find that wasn't just "post some content": configuring the &lt;strong&gt;payout settings&lt;/strong&gt; on a digital-product platform — the screen where you tell the platform which bank account should receive your money.&lt;/p&gt;

&lt;p&gt;This is a good test because it's not toy automation. It touches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured financial data (bank codes, branch codes, account numbers)&lt;/li&gt;
&lt;li&gt;Government-shaped identity data (legal name, address, date of birth)&lt;/li&gt;
&lt;li&gt;Multi-script input (in my locale, names have to be entered in more than one writing system)&lt;/li&gt;
&lt;li&gt;Real validation (the form rejects malformed input; you can't fake your way past it)&lt;/li&gt;
&lt;li&gt;A persistence step (save, and the platform actually stores it against a real account)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an AI agent can drive &lt;em&gt;this&lt;/em&gt; to completion, "autonomous execution" stops being a slogan. If it can't, I wanted to know precisely which field it died on.&lt;/p&gt;

&lt;p&gt;I drove the agent through a real browser session (CDP-attached, so it was operating an actual logged-in browser, not a sandbox). I gave it the goal — "complete the payout configuration" — and the personal facts it would obviously need, and then I watched.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the agent did entirely on its own
&lt;/h2&gt;

&lt;p&gt;More than I expected. Specifically:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It resolved bank and branch codes by searching.&lt;/strong&gt; I did not hand it the numeric codes for the bank or the branch. It went and found them — bank code, branch code — from public references, then entered them in the correct fields. This matters more than it sounds, and I'll come back to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It handled multi-script name entry.&lt;/strong&gt; My locale requires the account holder's name in multiple writing systems — the standard form, a phonetic form, and a romanized form. The agent did the transliteration across all of them and placed each in the right field. This is exactly the kind of fiddly, error-prone, "ugh I have to do this carefully" task that humans hate and quietly get wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It structured the address.&lt;/strong&gt; Not "paste a string" — the form wanted address split into components, and it decomposed a plain address into the structured fields the form expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It passed validation.&lt;/strong&gt; Malformed entries got rejected by the form, the agent read the rejection, corrected, and re-submitted. No human in the loop for the correction cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It saved.&lt;/strong&gt; The configuration persisted. The form was, functionally, done.&lt;/p&gt;

&lt;p&gt;I want to be honest about how much that is. Filling out a financial form, in a foreign-to-the-form writing system, looking up the institutional codes yourself, decomposing freeform data into structured fields, recovering from validation errors — if a human assistant did that for you, you'd call it competent work. The agent did it without supervision on the &lt;em&gt;mechanics&lt;/em&gt;. The mechanics were never the wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually required a human
&lt;/h2&gt;

&lt;p&gt;After all of that, exactly two categories of thing could not come from the agent. Only two. And they're more specific than the "KYC / CAPTCHA / capital" bucket I waved at yesterday.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Person-specific, non-public facts
&lt;/h3&gt;

&lt;p&gt;The account number. The date of birth. The exact residential address. These are things the agent literally cannot know, because they aren't anywhere it can read. They're not a &lt;em&gt;capability&lt;/em&gt; gap — the agent is perfectly capable of typing an account number into a field; it proved that. It's an &lt;strong&gt;information-location&lt;/strong&gt; gap. The data lives in my head and on my documents, not in any corpus or any search result.&lt;/p&gt;

&lt;p&gt;And here's the subtle part: once I &lt;em&gt;spoke&lt;/em&gt; those facts, the agent did the input. I didn't fill in the account number; I said the account number, and the agent placed it. So even for the human-only data, the human contributes the &lt;em&gt;fact&lt;/em&gt;, not the &lt;em&gt;labor&lt;/em&gt;. The typing, the field-matching, the format-correcting — still the machine.&lt;/p&gt;

&lt;p&gt;Contrast this with the bank/branch codes. Those are &lt;em&gt;also&lt;/em&gt; numbers, &lt;em&gt;also&lt;/em&gt; required, &lt;em&gt;also&lt;/em&gt; the kind of thing you'd assume a human has to provide. But they're &lt;strong&gt;public&lt;/strong&gt;. They're scattered and annoying to find, but they're findable. So the agent found them. The line isn't "numbers humans must provide" — it's &lt;em&gt;non-public&lt;/em&gt; facts humans must provide. Public-but-scattered data is squarely on the AI's side of the wall now. Search closes that gap.&lt;/p&gt;

&lt;p&gt;That reframes the whole thing. The human's job in a procedure like this is &lt;strong&gt;not&lt;/strong&gt; "provide the data." It's "provide the &lt;em&gt;private&lt;/em&gt; data." Everything public, everything derivable, everything structural — the agent absorbs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Proof that I am this specific person
&lt;/h3&gt;

&lt;p&gt;This is the real wall. Not "are you a human?" — yesterday's framing — but its final form: &lt;strong&gt;"are you THIS human?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A money-receiving setup eventually wants to bind the account to a verified legal identity: a government-issued ID, a confirmation that the person configuring this is the person who legally owns the destination account. That step is not an information problem and not a labor problem. It's an &lt;strong&gt;identity&lt;/strong&gt; problem. There is no string I can dictate that lets the agent &lt;em&gt;be&lt;/em&gt; me to a verifier. Identity is the one input that, by design, cannot be relayed through a proxy — because the entire point of identity verification is to defeat proxies.&lt;/p&gt;

&lt;p&gt;This is the clean edge I was looking for. Everything upstream of it — the entire form — is delegable. The identity bind is not, and not by accident. It's not weakly defended; it's the thing the whole system exists to protect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The precise statement
&lt;/h2&gt;

&lt;p&gt;Yesterday: &lt;em&gt;thinking is autonomous, execution is gated by humans.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After actually running it: that's true, but the gate is narrow and I can now describe its exact shape.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Labor is fully delegable to the agent. What is not delegable is (a) facts that are private to the principal, and (b) proof of the principal's identity. Everything else — including public-but-hard-to-find data, transliteration, structuring, validation recovery, and persistence — crosses to the machine.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two things follow from that, and they're useful if you're building with agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The human-in-the-loop surface is smaller than people assume.&lt;/strong&gt; When teams say "this needs a human," they usually mean the whole task. In practice the irreducibly-human part of a procedure like this was &lt;em&gt;two dictated facts and one identity check.&lt;/em&gt; Everything wrapped around those — the 90% that is tedious form labor — is automatable today. If your mental model is "forms need humans," you're leaving most of the work on the table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The remaining 10% is not a temporary limitation — it's structural.&lt;/strong&gt; I keep wanting to treat the identity bind as a gap that better tooling will close. It won't, and it shouldn't. Private facts are private &lt;em&gt;by definition&lt;/em&gt;; identity proof is anti-proxy &lt;em&gt;by purpose&lt;/em&gt;. Better models don't erode either one. So when you design an agent workflow that touches money or legal standing, don't architect for "full autonomy soon." Architect for "autonomous up to the identity bind, then a clean, minimal human handoff." Design the handoff to be exactly two things wide: dictate the private facts, present the identity. Nothing more should fall to the human.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I think this matters beyond one form
&lt;/h2&gt;

&lt;p&gt;The interesting question from the first piece was whether a pure-AI loop could ever &lt;em&gt;be a business&lt;/em&gt; rather than just &lt;em&gt;think like one&lt;/em&gt;. This experiment narrows the answer.&lt;/p&gt;

&lt;p&gt;An agent can run essentially the entire operational body of a business — the research, the structuring, the form labor, the error recovery, the persistence. What it cannot do is &lt;em&gt;be the legal person&lt;/em&gt; the business hangs on. The principal stays human not because the principal is smarter or more capable in the moment — on the mechanics, they're slower — but because the principal is the &lt;strong&gt;identity anchor&lt;/strong&gt;. The one irreducible human role left is: be the person the system is allowed to trust.&lt;/p&gt;

&lt;p&gt;That's a strangely small role. It's not founder-as-doer. It's founder-as-anchor. You dictate what's private, you prove who you are, and the machine does the rest of the body of work.&lt;/p&gt;

&lt;p&gt;I find that genuinely clarifying rather than discouraging. Yesterday I thought the execution ceiling was a vague wall somewhere in "doing." Today I know it's a thin, sharp line with a precise location: it runs between &lt;em&gt;labor&lt;/em&gt; and &lt;em&gt;identity&lt;/em&gt;, and labor is already on the far side.&lt;/p&gt;

&lt;p&gt;Build first. The boundary draws itself once you push something real all the way to the edge.&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this was useful: I packaged the prompts I actually use to run autonomous agents into two field packs — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;100 Prompts for Autonomous Agents&lt;/a&gt; and &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code Power-User Prompts&lt;/a&gt;. Same build-first mindset, ready to paste into your terminal.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I Ran an Autonomous AI Startup Loop 5 Times. It Hit Two Ceilings.</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Wed, 20 May 2026 08:06:52 +0000</pubDate>
      <link>https://dev.to/saibuilder/i-cj2</link>
      <guid>https://dev.to/saibuilder/i-cj2</guid>
      <description>&lt;p&gt;There's a question floating around right now that everyone has an opinion on and almost no one has run: &lt;em&gt;what happens if you let AI design and launch a business by itself?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Not "AI helps you brainstorm." Not "AI writes your landing page copy." I mean the whole loop — generate the idea, evaluate it, decide whether to kill it or ship it — with &lt;strong&gt;no human context injected anywhere&lt;/strong&gt;. No founder's domain expertise. No "I happen to know this industry." No personal network. Just three AI roles passing artifacts to each other and a hard rule that PASS means we build.&lt;/p&gt;

&lt;p&gt;I built that loop and ran it five times. It produced zero passes.&lt;/p&gt;

&lt;p&gt;That sounds like a failure, and in the narrow sense it is. But the &lt;em&gt;shape&lt;/em&gt; of the failure turned out to be more interesting than a success would have been. The loop didn't fail randomly. It failed against two walls, in the same place, every cycle. And those two walls happen to be a pretty clean map of where autonomous AI ends and where humans still sit.&lt;/p&gt;

&lt;p&gt;This is the writeup. No spin. The scores were bad and I'm going to show you the bad scores.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;Three roles, no shared memory of "me":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generator&lt;/strong&gt; — proposes a business idea from scratch. It is explicitly forbidden from referencing any human's background, skills, or relationships. It works from market structure only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluator&lt;/strong&gt; — scores the idea against a fixed rubric.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Judge&lt;/strong&gt; — reads the score and either advances the idea, sends it back for one revision, or kills it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rubric is 5 criteria × 10 points (50 max), minus three penalties. PASS threshold is 40.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score =
    market_pull        # is there real, urgent demand?
  + willingness_to_pay # will someone actually pay, and how much?
  + defensibility      # can a small outside builder hold a position?
  + time_to_revenue    # how fast to first dollar, solo?
  + execution_fit      # can this be built and run without a team?
  # each 0–10, raw max 50

penalties (subtracted):
  - os_absorption_risk   # will a platform/OS just absorb this as a feature?
  - competitor_death     # is this a known graveyard pattern?
  - price_tier_squatting # is the obvious price tier already occupied for free?

PASS if final_score &amp;gt;= 40
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The penalties are the part that matters. Anyone can generate an idea that scores well on raw appeal. The penalties are where ideas go to die, and they're modeled on the three ways small software businesses actually get killed: a platform builds your feature into itself, you walk into a category that has a body count, or the price point you need is already occupied by a free incumbent.&lt;/p&gt;

&lt;p&gt;Here's roughly how the Judge reasons, in pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;judge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idea&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ADVANCE&lt;/span&gt;          &lt;span class="c1"&gt;# build a landing page, go live
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;idea&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;revisions&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;REVISE&lt;/span&gt;           &lt;span class="c1"&gt;# one shot to fix the biggest penalty
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;KILL&lt;/span&gt;                 &lt;span class="c1"&gt;# log the cause of death, move on
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One revision allowed. After that it lives or it dies. I kept the log of every death.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually happened, cycle by cycle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cycle 1 — ChangelogAI.&lt;/strong&gt; A tool that auto-writes changelogs and release notes from your commits. Raw appeal was fine. Score: 35, then KILL. Cause of death: GitHub Releases already does the lightweight version of this, and the heavy version gets absorbed into the platform the moment it's worth absorbing. &lt;code&gt;os_absorption_risk&lt;/code&gt; ate it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cycle 2 — ShopBot Live.&lt;/strong&gt; An AI live-chat assistant for e-commerce stores. First pass 36. The Judge sent it back for one revision. It came back at &lt;strong&gt;22&lt;/strong&gt; — &lt;em&gt;lower&lt;/em&gt; — because the revision tried to differentiate by going upmarket, which made &lt;code&gt;willingness_to_pay&lt;/code&gt; and &lt;code&gt;time_to_revenue&lt;/code&gt; both worse. KILL. Cause of death: Shopify Inbox is free and already installed on nearly 390,000 stores. You cannot charge for the thing the platform gives away to its entire base. &lt;code&gt;price_tier_squatting&lt;/code&gt; plus &lt;code&gt;os_absorption_risk&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cycle 3 — CarrierBidPilot.&lt;/strong&gt; A bidding/automation layer for freight carriers. Looked like a real B2B wedge. Score 32, then on revision it went to &lt;strong&gt;−4&lt;/strong&gt;. Negative. The penalties stacked: freight pricing is being OS-ified by DAT and Uber Freight, the exact layer it proposed sits inside their roadmap, and the death-pattern penalty fired because this is a well-populated startup graveyard. KILL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cycle 4 — ApiaryLedger.&lt;/strong&gt; Compliance/record-keeping SaaS for a very narrow niche. This one was &lt;em&gt;defensible&lt;/em&gt; — too small for any platform to bother absorbing. But it scored &lt;strong&gt;19&lt;/strong&gt; and I retired it without even spending the revision. Cause of death: willingness-to-pay was essentially zero. The obligation it served was a $10-every-two-years kind of cost. You cannot build a SaaS on a market that won't pay $10 a year. The niche was safe precisely because it wasn't worth anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cycle 5 — PayoutGuard.&lt;/strong&gt; Compliance tracking for private foundations' payout obligations. This was the best run. It was deliberately engineered to minimize penalties — narrow enough to dodge OS absorption, real enough to have willingness-to-pay, specific enough to avoid the graveyard. It worked, in the sense that the penalties came in near zero. Final score: &lt;strong&gt;31&lt;/strong&gt;. Still nine points under the bar. The loop's high-water mark, and still a fail.&lt;/p&gt;

&lt;p&gt;Five cycles. High score 31. Threshold 40. Zero passes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery 1: the idea ceiling is a conservation law
&lt;/h2&gt;

&lt;p&gt;The first thing I expected, going in, was that some cycles would fail on appeal (boring idea, no demand) and some would fail on defensibility (great idea, instantly absorbed), and that somewhere in the middle there'd be a sweet spot.&lt;/p&gt;

&lt;p&gt;There wasn't. And the reason there wasn't is the actual finding.&lt;/p&gt;

&lt;p&gt;Look at the two ends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Big-TAM ideas&lt;/strong&gt; (ChangelogAI, ShopBot, CarrierBidPilot) all died on &lt;code&gt;os_absorption_risk&lt;/code&gt;. They were attractive &lt;em&gt;because&lt;/em&gt; the market was large and motivated — and that is exactly why a platform was already standing on the spot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Penalty-safe ideas&lt;/strong&gt; (ApiaryLedger, PayoutGuard) survived the absorption test — and then died, or nearly died, on market size and willingness-to-pay. They were safe &lt;em&gt;because&lt;/em&gt; nobody big wanted the territory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't two separate failure modes. They're the same one, seen from two sides. &lt;strong&gt;OS-resistance and market size are structurally anti-correlated.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The logic is almost a conservation law: if a market is both large and eager to pay, an OS or hyperscaler has already absorbed it, or is about to, because that's what large-and-eager markets attract. So the gaps where an &lt;em&gt;external&lt;/em&gt; AI-built product can safely sit are, necessarily, small. Low TAM. Low willingness-to-pay. The seesaw doesn't have a flat middle. Push one side up and the other goes down by construction.&lt;/p&gt;

&lt;p&gt;That gives a flat, solo, pure-AI SaaS a &lt;strong&gt;soft ceiling somewhere in the low 30s.&lt;/strong&gt; Not because the generator was dumb — PayoutGuard was a genuinely tight piece of reasoning — but because the rubric was honestly measuring a real constraint, and the constraint has no interior solution. The 40-point bar wasn't unfair. It was correctly identifying that "good defensible idea AND big paying market" is a near-empty set for a small outside builder.&lt;/p&gt;

&lt;p&gt;You don't beat that ceiling by generating a &lt;em&gt;better idea&lt;/em&gt;. The idea space is the thing that's capped. You beat it by changing the &lt;strong&gt;shape of the business&lt;/strong&gt; — bundling, services, distribution leverage, going on top of a platform instead of beside it. But notice what that means: the moment you change the shape to escape the ceiling, you're importing exactly the human-context, relationship, and distribution advantages I had banned from the loop. Which is the second wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery 2: the execution ceiling is "are you human?"
&lt;/h2&gt;

&lt;p&gt;While the thinking layer ran clean, I tried to actually &lt;em&gt;ship&lt;/em&gt; the best ideas — stand up a real landing page on GitHub Pages, live on the internet, end to end, with the AI driving.&lt;/p&gt;

&lt;p&gt;Here's the honest split of what the AI could and couldn't do on its own:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleared without a human:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create the repository&lt;/li&gt;
&lt;li&gt;Commit and &lt;code&gt;git push&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Enable GitHub Pages&lt;/li&gt;
&lt;li&gt;Issue a personal access token&lt;/li&gt;
&lt;li&gt;Reset a password&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that — the stuff people assume is the "hard, technical" part — the agent did unaided. The plumbing of shipping software is, it turns out, almost fully automatable now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blocked, repeatedly:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CAPTCHA (the Arkose Labs / FunCaptcha kind)&lt;/li&gt;
&lt;li&gt;sudo-mode / step-up re-authentication prompts&lt;/li&gt;
&lt;li&gt;identity verification gates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The CAPTCHA is the clean one to think about, because it's the wall &lt;em&gt;by design&lt;/em&gt;. Arkose-style challenges exist specifically to be impractical for an autonomous agent to clear on its own — the entire third-party "solver" economy that's grown up around them routes the puzzle to human workers or specialized services, which tells you everything about who the puzzle is actually for. So the agent did everything else, hit "are you human?", and stopped. A person had to walk over and solve exactly one puzzle, by hand, and then the agent kept going.&lt;/p&gt;

&lt;p&gt;That's the shape of the execution ceiling, and it's weirdly precise:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The thinking is fully autonomous. The doing is gated, and the gate is not technical difficulty — it's the literal question &lt;em&gt;"are you a person?"&lt;/em&gt;, asked at every threshold that matters.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The gates aren't placed to stop &lt;em&gt;capable&lt;/em&gt; actors. The agent is plenty capable; it issued its own credentials. They're placed to stop &lt;em&gt;non-human&lt;/em&gt; ones. Which means the line isn't "what's too hard for AI." The line is "what the system has deliberately reserved for humans." Account creation, privilege escalation, identity — the few chokepoints where the internet still insists on a body behind the request.&lt;/p&gt;

&lt;h2&gt;
  
  
  The map this draws
&lt;/h2&gt;

&lt;p&gt;Put the two ceilings together and you get a usable map, not a verdict.&lt;/p&gt;

&lt;p&gt;The thinking layer — ideation, evaluation, kill-decisions — ran end to end with no human in it, and ran &lt;em&gt;well&lt;/em&gt;. It didn't fail by being stupid. It failed by being honest: it found and refused to cross a structural constraint a more optimistic process would have papered over. An AI that returns "I looked, and there's no clean pass here" is doing its job. Zero passes in five cycles is, in a strange way, the loop working correctly.&lt;/p&gt;

&lt;p&gt;The doing layer ran into two walls. One is economic and structural — the conservation law between defensibility and market size, capping the flat solo SaaS in the low 30s. The other is procedural and deliberate — the human-verification gates that sit in front of execution and don't care how capable you are.&lt;/p&gt;

&lt;p&gt;And the two walls connect. The only way past the &lt;em&gt;idea&lt;/em&gt; ceiling is to change the &lt;em&gt;shape&lt;/em&gt; of the business — to stop being a flat product beside a platform and start leaning on bundling, services, relationships, distribution. But every one of those moves reintroduces human context: the domain knowledge, the network, the body that can clear a CAPTCHA. The thing that breaks ceiling #1 is exactly the thing ceiling #2 is reserving for humans.&lt;/p&gt;

&lt;p&gt;So here's where I actually landed, and I'll leave it open because I don't think there's a clean answer yet:&lt;/p&gt;

&lt;p&gt;A pure-AI autonomous loop can &lt;em&gt;think&lt;/em&gt; its way to the edge of a viable business completely on its own. It just can't &lt;em&gt;be&lt;/em&gt; one — not because it's not smart enough, but because "being a business" currently requires the two things the experiment was built to exclude: a non-trivial market position, and a human at the verification gate. You can break the ceiling. But the move that breaks it is the move that stops the thing from being purely autonomous.&lt;/p&gt;

&lt;p&gt;Which leaves the real question for anyone building in this space: do you want the autonomy, or do you want the ceiling broken? Because right now, five cycles of evidence say you don't get both. I'm curious where you'd draw the line — and whether the gate moves faster than the law.&lt;/p&gt;




&lt;p&gt;I'll keep running the loop. Next iteration changes the rubric from "rate this flat product" to "rate this &lt;em&gt;shape&lt;/em&gt;" — and measures, deliberately, how much human context each shape smuggles back in. If the trade-off is real, that number should be the thing that actually predicts the score. We'll see. Build first; the design converges later.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If this was useful: I packaged the prompts I actually use to run autonomous agents into two field packs — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;100 Prompts for Autonomous Agents&lt;/a&gt; and &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code Power-User Prompts&lt;/a&gt;. Same build-first mindset, ready to paste into your terminal.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>startup</category>
    </item>
    <item>
      <title>自律エージェントを止めずにアップデートする — SIGHUP・plan.json ホットリロード・無停止デプロイの実装</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Tue, 19 May 2026 08:57:39 +0000</pubDate>
      <link>https://dev.to/saibuilder/zi-lu-ezientowozhi-mezuniatupudetosuru-sighupplanjson-hotutorirodowu-ting-zhi-depuroinoshi-zhuang-400i</link>
      <guid>https://dev.to/saibuilder/zi-lu-ezientowozhi-mezuniatupudetosuru-sighupplanjson-hotutorirodowu-ting-zhi-depuroinoshi-zhuang-400i</guid>
      <description>&lt;h2&gt;
  
  
  結論
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;24h動いてる自律エージェントを毎回止めて再起動するのは負け筋。状態が消え、ログが切れ、API レート再カウントが走る&lt;/li&gt;
&lt;li&gt;止めずに直す仕組みは3つ：&lt;strong&gt;SIGHUP で設定再読込／plan.json のホットリロード／実行コードはサブプロセス分離&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Python 標準ライブラリだけで組める。&lt;strong&gt;替えていい場所と、再起動でしか替えられない場所&lt;/strong&gt;を先に決めるのがコツ&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;前回（5つの仕組み）の続編。daemon を動かしたままコードを差し替える手順を残す。&lt;/p&gt;

&lt;h2&gt;
  
  
  なぜホットリロードが要るか
&lt;/h2&gt;

&lt;p&gt;最初は &lt;code&gt;systemctl restart&lt;/code&gt; で十分と思ってた。が24h回すと再起動コストがのしかかる。サイクルが途中で死ぬ。外部 API を叩いた直後で死ぬと「叩いたけど書いてない」が残る。MCP 再接続込みで起動に30〜60秒。1日3〜4回直すと&lt;strong&gt;実稼働より「立ち上げ中」が長い日&lt;/strong&gt;ができる。止めずに直す方が圧倒的にラク。&lt;/p&gt;

&lt;h2&gt;
  
  
  どこを替えて、どこは諦めるか
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;全部を無停止は無理&lt;/strong&gt;。&lt;code&gt;importlib.reload&lt;/code&gt; で差し替えてもインスタンス化済みのオブジェクトが古いクラスを掴んだままで整合性が崩れる。替える対象を3層に分けた。&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;レイヤー&lt;/th&gt;
&lt;th&gt;例&lt;/th&gt;
&lt;th&gt;替え方&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;設定値&lt;/td&gt;
&lt;td&gt;API キー、interval、しきい値&lt;/td&gt;
&lt;td&gt;SIGHUP で再読込&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;計画ファイル&lt;/td&gt;
&lt;td&gt;&lt;code&gt;plan.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;mtime 監視で自動リロード&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;実行コード&lt;/td&gt;
&lt;td&gt;エージェント実装本体&lt;/td&gt;
&lt;td&gt;サブプロセスごと差し替え&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;コア&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;daemon.py&lt;/code&gt;、&lt;code&gt;scheduler.py&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;再起動&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;コアは固定、上に乗ってるものは全部差し替え可&lt;/strong&gt;。コア修正のときだけ素直に再起動する。月1〜2回。&lt;/p&gt;

&lt;h2&gt;
  
  
  1. SIGHUP で設定だけ再読込
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;SIGHUP&lt;/code&gt; は UNIX 伝統の「設定を読み直せ」シグナル。Nginx もこれ。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;
&lt;span class="n"&gt;_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_config&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;new_cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_cfg&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;install_sighup_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;load_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[config] reload failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SIGHUP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;load_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;使い方は &lt;code&gt;kill -HUP &amp;lt;pid&amp;gt;&lt;/code&gt;。肝は2点。&lt;strong&gt;ロックで原子的に差し替える&lt;/strong&gt;、&lt;strong&gt;失敗しても旧設定で動かす&lt;/strong&gt;。Windows ネイティブは SIGHUP 無しなので mtime 監視で代用。&lt;/p&gt;

&lt;h2&gt;
  
  
  2. plan.json のホットリロード
&lt;/h2&gt;

&lt;p&gt;実装は &lt;code&gt;os.stat().st_mtime&lt;/code&gt; を別スレッドでポーリング。差分が出たら再ロード→&lt;code&gt;topo_sort&lt;/code&gt; で循環依存をその場で弾く→ロック越しに plan を差し替える。失敗時は旧 plan を保持。&lt;/p&gt;

&lt;p&gt;ポイントは &lt;strong&gt;state（&lt;code&gt;last_run&lt;/code&gt; 等のランタイム情報）を plan から切り離す&lt;/strong&gt;こと。plan.json から agent を消しても state[agent_X] は別 dict に残す。同じ id で戻ったとき &lt;code&gt;last_run&lt;/code&gt; を継承できる。これで interval_sec 変更、追加、depends_on の組み直しが&lt;strong&gt;無停止で効く&lt;/strong&gt;。&lt;/p&gt;

&lt;h2&gt;
  
  
  3. 実行コードはサブプロセスごと差し替える
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;importlib.reload&lt;/code&gt; は最初に試した。たまに動くけどたまに壊れる。&lt;strong&gt;「インポート済みモジュールを参照してるコードが新旧両方のクラスを掴む」&lt;/strong&gt;状態が一番怖い。&lt;code&gt;isinstance&lt;/code&gt; が False になったりしてデバッグ不能。やめて&lt;strong&gt;実行を別プロセスに切る&lt;/strong&gt;形に変えた。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;proc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coo.agent_runner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;impl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;proc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;子プロセス側は impl を &lt;code&gt;importlib.import_module&lt;/code&gt; して呼ぶだけ。&lt;strong&gt;毎回フレッシュなインタプリタが立ち上がる&lt;/strong&gt;ので agent コードを保存すれば次サイクルから新コードで動く。&lt;/p&gt;

&lt;p&gt;起動コストは Python だけで200〜300ms。仕事自体が数秒〜数十秒なので誤差で吸収。逆に&lt;strong&gt;プロセス分離で失敗が daemon に波及しない&lt;/strong&gt;メリットが大きい。&lt;/p&gt;

&lt;h2&gt;
  
  
  失敗体験：HUP を撃ったらサイクル途中の書き込みが半分壊れた
&lt;/h2&gt;

&lt;p&gt;SIGHUP を撃った瞬間、書き込み途中の output_file が半端になった。リロード自体は問題ない。その後、新設定でスケジューラが「もう1回呼んでいい」と判断して、&lt;strong&gt;まだ書き終わってない output_file に2つ目の write が走った&lt;/strong&gt;。前回の write_atomic（tmp→rename）でほぼ吸収できたけど、&lt;strong&gt;「同じエージェントが二重起動しない」ロック&lt;/strong&gt;も追加した。set に id を入れて入ってる間はスキップ。10行で止まった。&lt;/p&gt;

&lt;p&gt;教訓は1つ。&lt;strong&gt;無停止で替える機能を入れるときは、いま走ってるものとの競合を先に考える&lt;/strong&gt;。&lt;/p&gt;

&lt;h2&gt;
  
  
  まとめ
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SIGHUP で設定再読込&lt;/strong&gt; — ロックと旧設定の保持&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;plan.json のホットリロード&lt;/strong&gt; — mtime 監視＋バリデーション、state は分離&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;エージェント実装はサブプロセス分離&lt;/strong&gt; — &lt;code&gt;importlib.reload&lt;/code&gt; に依存しない&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;これで daemon は &lt;strong&gt;コア以外、止めずに動かし続けられる&lt;/strong&gt;。&lt;/p&gt;

&lt;p&gt;ローカルで完璧にしてからデプロイ、じゃなくて&lt;strong&gt;本番が走ってる場所に直接コードを当てる&lt;/strong&gt;スタイル。前々回の「動かしてから考える」の続きで、&lt;strong&gt;動かしながら直す&lt;/strong&gt;まで来た形だ。&lt;/p&gt;




&lt;h2&gt;
  
  
  次回予告
&lt;/h2&gt;

&lt;p&gt;次は&lt;strong&gt;「観察される側」じゃなく「観察する側」&lt;/strong&gt;の設計。daemon の挙動をどう構造化したログに残すか。&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;この記事が役に立ったら：僕が自律エージェントを動かすときに実際に使っているプロンプトを2つのパックにまとめました — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;自律エージェント用プロンプト100選&lt;/a&gt; と &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code パワーユーザー向けプロンプト&lt;/a&gt;。どれも「まず動かす」発想で、ターミナルに貼って即使えます。&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>python</category>
    </item>
    <item>
      <title>自律エージェントを24時間動かすために実装した5つの仕組み</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Tue, 19 May 2026 07:25:40 +0000</pubDate>
      <link>https://dev.to/saibuilder/zi-lu-ezientowo24shi-jian-dong-kasutamenishi-zhuang-sita5tunoshi-zu-mi-12em</link>
      <guid>https://dev.to/saibuilder/zi-lu-ezientowo24shi-jian-dong-kasutamenishi-zhuang-sita5tunoshi-zu-mi-12em</guid>
      <description>&lt;h2&gt;
  
  
  結論
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;自律エージェントを24h動かすには、賢さより&lt;strong&gt;死なない仕組み&lt;/strong&gt;が要る&lt;/li&gt;
&lt;li&gt;必要なのは5つ：daemon化・フェーズ分離・interval制御・依存解決・ロールバック&lt;/li&gt;
&lt;li&gt;全部Pythonで書ける。フレームワーク不要、200〜500行で組める&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;以下、僕がいま自分のプロジェクトで実装している5つの仕組みを、コード例つきで残しておく。完璧じゃない。けど動いてる。&lt;/p&gt;

&lt;h2&gt;
  
  
  1. daemon.py — 「死なないループ」を作る
&lt;/h2&gt;

&lt;p&gt;自律エージェントを動かす一番外側のラッパー。&lt;strong&gt;1個のPythonプロセスを24時間生かし続ける&lt;/strong&gt;ためのコア。&lt;/p&gt;

&lt;p&gt;なんで&lt;code&gt;while True&lt;/code&gt;じゃダメかというと、例外で1回でも落ちたらそこで終わるから。&lt;code&gt;systemd&lt;/code&gt; で再起動すればいい派もいるけど、僕は&lt;strong&gt;プロセス内で復帰する&lt;/strong&gt;ほうが状態を引き継げて好き。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# coo/daemon.py（最小版）
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_forever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval_sec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;1個のexecutorを死なせずに回し続ける&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# 1サイクル分の仕事
&lt;/span&gt;        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;KeyboardInterrupt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[daemon] stop requested&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# 落ちても止めない。ログだけ残して次のサイクルへ
&lt;/span&gt;            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[daemon] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_exc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interval_sec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ポイントは &lt;code&gt;KeyboardInterrupt&lt;/code&gt; だけは透過させること。Ctrl+C で止められないデーモンはデバッグ不能になる。&lt;strong&gt;自分が止められない自動化は、自動化じゃなくて事故&lt;/strong&gt;。&lt;/p&gt;

&lt;h2&gt;
  
  
  2. phase: boot / continuous の分離
&lt;/h2&gt;

&lt;p&gt;エージェントには「&lt;strong&gt;起動時に1回だけやること&lt;/strong&gt;」と「&lt;strong&gt;継続的にやり続けること&lt;/strong&gt;」がある。これを混ぜると、再起動のたびに初期化処理が走って重複が出たり、逆に継続処理が止まったりする。&lt;/p&gt;

&lt;p&gt;僕は &lt;code&gt;plan.json&lt;/code&gt; に &lt;code&gt;phase&lt;/code&gt; フィールドを足して分けている。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_init"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"状態リセット・キャッシュ削除"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"前回の途中状態をクリーンアップ"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output_file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"results/00_boot.md"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_poll"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"毎時のRSS取得"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continuous"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"interval_sec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RSSフィードから新着を収集"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output_file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"results/01_feed.md"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;オーケストレータ側で読み分ける。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# coo/orchestrator.py（抜粋）
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# bootフェーズ：起動時1回だけ
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;boot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;execute_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# continuousフェーズ：それぞれのintervalで回す
&lt;/span&gt;    &lt;span class="n"&gt;continuous&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continuous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;schedule_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;continuous&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;boot&lt;/code&gt; を分けた瞬間、再起動の安全性が一段上がった。初期化処理を冪等に書く必要が薄れる。&lt;strong&gt;「ここは1回しか走らない」と言える保証&lt;/strong&gt;は、設計を相当ラクにする。&lt;/p&gt;

&lt;h2&gt;
  
  
  3. interval制御 — エージェントごとにリズムを変える
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;agent_A&lt;/code&gt; は1分ごと、&lt;code&gt;agent_B&lt;/code&gt; は1時間ごと、&lt;code&gt;agent_C&lt;/code&gt; は1日1回。これを同じループで回したい。&lt;/p&gt;

&lt;p&gt;雑な実装だと「最小単位（1分）で全部回す」だけど、&lt;code&gt;agent_C&lt;/code&gt; まで1分ごとに呼ぶのは無駄だし、外部APIのレート制限を喰う。&lt;/p&gt;

&lt;p&gt;僕の実装はこう。各エージェントに &lt;code&gt;interval_sec&lt;/code&gt; と &lt;code&gt;last_run&lt;/code&gt; を持たせる。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# coo/scheduler.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;schedule_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;interval_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;execute_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# 失敗してもlast_runは更新しない=次サイクルで即リトライ
&lt;/span&gt;                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[sched] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 30秒の解像度で十分
&lt;/span&gt;    &lt;span class="c1"&gt;# ちなみに「sleep 30」はループ全体の最小粒度。
&lt;/span&gt;    &lt;span class="c1"&gt;# 1分intervalのエージェントも実際は30〜60秒の揺れで走る。許容する。
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ループ自体の &lt;code&gt;sleep&lt;/code&gt; は30秒くらいで十分。1秒単位の精度が要るならそれは自律エージェントじゃなくてリアルタイムシステムなので別の話。&lt;/p&gt;

&lt;h2&gt;
  
  
  4. 依存解決 — &lt;code&gt;depends_on&lt;/code&gt; で順序を守る
&lt;/h2&gt;

&lt;p&gt;複数エージェントが連携するとき、「Bは Aの出力を読む」みたいな依存が出る。&lt;/p&gt;

&lt;p&gt;最初は雑に「順番に書いた順で実行」していた。これは1回目はいいけど、&lt;code&gt;interval&lt;/code&gt; がバラバラになると壊れる。AがまだのときにBが走ると、Bは古い出力を読む。&lt;/p&gt;

&lt;p&gt;&lt;code&gt;depends_on&lt;/code&gt; を導入した。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_summarize"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continuous"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"interval_sec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"depends_on"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"agent_poll"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_pollの結果を要約"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output_file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"results/02_summary.md"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;実装側はトポロジカルソートで実行順を決める。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# coo/depgraph.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deque&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;topo_sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;depends_on&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;in_deg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;aid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ds&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;in_deg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;aid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;in_deg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popleft&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;in_deg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;in_deg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;依存に循環がある&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;地味だけど、これ入れた瞬間に&lt;strong&gt;順序起因のバグが消えた&lt;/strong&gt;。デバッグ時間が一気に短くなる。&lt;/p&gt;

&lt;p&gt;ちなみに循環依存は実行時じゃなくて&lt;strong&gt;起動時にエラーにする&lt;/strong&gt;。動いてから「ぐるぐる回ってる」と気づくのは最悪のパターン。&lt;/p&gt;

&lt;h2&gt;
  
  
  5. 失敗時ロールバック — 「出力ファイルを途中で壊さない」
&lt;/h2&gt;

&lt;p&gt;エージェントが書き込み途中で死ぬと、&lt;code&gt;output_file&lt;/code&gt; が半分書かれた壊れた状態で残る。次の依存先がこれを読むと連鎖事故。&lt;/p&gt;

&lt;p&gt;対策はアトミック書き込み。&lt;strong&gt;一時ファイルに書いて、最後にrenameする&lt;/strong&gt;。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# coo/io_safe.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_atomic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;書き込みが完了したファイルだけがpathに現れる&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;dirname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# 同じディレクトリにtmp（rename はファイルシステム跨ぐと非アトミック）
&lt;/span&gt;    &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkstemp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.tmp_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.part&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fdopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fileno&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# OSバッファまで書き込み待つ
&lt;/span&gt;        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# POSIX/Windows両方でアトミック
&lt;/span&gt;    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;OSError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;os.replace&lt;/code&gt; は Windows でもアトミック。&lt;strong&gt;&lt;code&gt;os.rename&lt;/code&gt; は Windows で上書き不可&lt;/strong&gt;なので注意（これで一度ハマった。Linux ではテスト通って、Windows でだけ壊れる地獄）。&lt;/p&gt;

&lt;p&gt;ロールバック観点で言うと、ここに加えて「&lt;strong&gt;1世代前の出力を残す&lt;/strong&gt;」ようにもしている。&lt;code&gt;output.md&lt;/code&gt; を書き換える前に &lt;code&gt;output.prev.md&lt;/code&gt; にコピーする。事故ったら手動で戻せる。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_versioned&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;backup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.prev.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;backup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;write_atomic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  失敗体験：止め方を実装し忘れて3日動き続けた
&lt;/h2&gt;

&lt;p&gt;笑い話だけど、最初に作ったときは&lt;strong&gt;止め方を実装し忘れた&lt;/strong&gt;。&lt;code&gt;KeyboardInterrupt&lt;/code&gt; をうっかり try/except で握りつぶしていて、Ctrl+C が効かない。&lt;/p&gt;

&lt;p&gt;WSL のターミナルを閉じても、&lt;code&gt;nohup&lt;/code&gt; 相当の挙動で生き続けて、3日後に「あれ、API のクレジット結構減ってる」で気づいた。&lt;code&gt;ps aux | grep daemon&lt;/code&gt; でPID 探して &lt;code&gt;kill -9&lt;/code&gt; で止めた。&lt;/p&gt;

&lt;p&gt;教訓：&lt;strong&gt;止め方を最初に実装する&lt;/strong&gt;。動かす前に。&lt;/p&gt;

&lt;h2&gt;
  
  
  まとめ
&lt;/h2&gt;

&lt;p&gt;5つ並べた：&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;daemon.py&lt;/strong&gt; — 例外で死なないループ&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;boot / continuous&lt;/strong&gt; — 起動時1回と継続を分ける&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;interval制御&lt;/strong&gt; — エージェントごとのリズム&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;depends_on&lt;/strong&gt; — 順序保証、循環は起動時拒否&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;アトミック書き込み + 世代管理&lt;/strong&gt; — 壊れた出力を残さない&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;これ以上のことは正直あまり要らない。コードを増やすほど運用が重くなる。&lt;strong&gt;「賢いエージェント」より「死なないエージェント」が、月単位で見ると圧倒的に勝つ&lt;/strong&gt;。&lt;/p&gt;

&lt;p&gt;複雑なフレームワーク入れる前に、この5つを200行くらいで自分で書くのを勧めたい。書いた経験が、後でフレームワーク選ぶときの判断軸になる。&lt;/p&gt;




&lt;h2&gt;
  
  
  次回予告
&lt;/h2&gt;

&lt;p&gt;次は「&lt;strong&gt;自律エージェントを止めずにアップデートする&lt;/strong&gt;」やり方を書く。daemon を動かしたままコードを差し替える方法、SIGHUP で設定だけリロードする方法、&lt;code&gt;plan.json&lt;/code&gt; のホットリロード。止めずに進化させるのが次の課題で、いま実装中。&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;この記事が役に立ったら：僕が自律エージェントを動かすときに実際に使っているプロンプトを2つのパックにまとめました — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;自律エージェント用プロンプト100選&lt;/a&gt; と &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code パワーユーザー向けプロンプト&lt;/a&gt;。どれも「まず動かす」発想で、ターミナルに貼って即使えます。&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>動かしてから考える、が最強の設計手法だった</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Tue, 19 May 2026 07:23:45 +0000</pubDate>
      <link>https://dev.to/saibuilder/dong-kasitekarakao-eru-gazui-qiang-noshe-ji-shou-fa-datuta-1c7n</link>
      <guid>https://dev.to/saibuilder/dong-kasitekarakao-eru-gazui-qiang-noshe-ji-shou-fa-datuta-1c7n</guid>
      <description>&lt;h2&gt;
  
  
  結論
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AIエージェントは事前設計が無意味になる領域に入ってる&lt;/li&gt;
&lt;li&gt;「まず動かす、設計はあとから収束する」が、現時点で僕が見つけた一番マシな順序&lt;/li&gt;
&lt;li&gt;非エンジニアがAIで作るプロジェクトほど、この順序を守ったほうが早く着く&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;以下は、そう思うに至った半年分の話。&lt;/p&gt;

&lt;h2&gt;
  
  
  設計図を描くのが好きだった頃
&lt;/h2&gt;

&lt;p&gt;半年前、僕はAIエージェントを作るときに「まずアーキテクチャ図を描く」派だった。&lt;/p&gt;

&lt;p&gt;きれいなレイヤー分け、責務の分離、再利用可能なモジュール構造。役割ごとにエージェントを切って、入出力のスキーマを決めて、依存関係を有向グラフで整理する。Figma で図を描いてから実装に入る。全部、紙の上では美しかった。&lt;/p&gt;

&lt;p&gt;でも実装すると毎回壊れた。&lt;/p&gt;

&lt;p&gt;理由は単純で、AIエージェントの挙動は事前に予測できないからだ。プロンプトを変えれば応答が変わる。MCPを足せば依存関係が動的に変わる。LLM のバージョンが上がれば、昨日通っていたフローが今日は別のパスを取る。&lt;strong&gt;設計図は実装より早く陳腐化する。&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;しかも厄介なのは、図を描いた時点で「この通りに動くはずだ」という思い込みが脳に焼きつくこと。実装で壊れたとき、設計を疑わずに「実装が悪い」と判断してしまう。図に縛られてバグの本当の場所を見落とす。&lt;/p&gt;

&lt;h2&gt;
  
  
  失敗例：きれいに設計した自律エージェントが、動かなかった
&lt;/h2&gt;

&lt;p&gt;一番痛かった失敗を書いておく。&lt;/p&gt;

&lt;p&gt;ある自律マネタイズシステムを設計したことがある。匿名化して書くと、複数のコンテンツチャネル（note、Zenn、Medium、X）を横断して毎日記事を出すエージェント群を、1人で運用するというものだ。&lt;/p&gt;

&lt;p&gt;最初の設計はこう描いた：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;topic_curator&lt;/strong&gt;：毎朝トピック候補を集める&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;drafter&lt;/strong&gt;：ドラフトを書く&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;reviewer&lt;/strong&gt;：自己レビューして直す&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;publisher&lt;/strong&gt;：各プラットフォームに投稿する&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;analytics&lt;/strong&gt;：反応を回収して次のキュレーションに渡す&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;矢印で繋いで、入出力をJSONで定義して、責務を分けた。完璧な「マイクロサービスっぽい」構成。&lt;/p&gt;

&lt;p&gt;実装してみたら、最初の3日で破綻した。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;topic_curator&lt;/code&gt; が出すトピックの粒度が、&lt;code&gt;drafter&lt;/code&gt; の想定と毎回ズレる&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reviewer&lt;/code&gt; が「全部直すべき」と判断して無限ループに入る&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;publisher&lt;/code&gt; がプラットフォームごとに違う認証フローを要求し始めて、結局この層が一番厚くなる&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;analytics&lt;/code&gt; は数日PV溜まらないと意味のあるシグナルを返さないので、フィードバックループが回らない&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;責務を分けたつもりが、&lt;strong&gt;実態の境界が設計図と一致していなかった&lt;/strong&gt;。直そうとすると、設計図ごと描き直しになる。けど描き直してもまた壊れる。&lt;/p&gt;

&lt;h2&gt;
  
  
  やり方を変えた
&lt;/h2&gt;

&lt;p&gt;詰まって、いったん全部捨てた。残したのは1個だけ。「&lt;strong&gt;毎日1本、何かを出す&lt;/strong&gt;」というゴールだけ。&lt;/p&gt;

&lt;p&gt;そしてエージェントを設計せず、&lt;strong&gt;ノートブックでベタ書きのスクリプト&lt;/strong&gt;を1本書いた。トピック決め打ち、ドラフトはそのまま LLM に投げる、レビューなし、投稿先1つ。300行くらい。汚い。&lt;/p&gt;

&lt;p&gt;これを毎日走らせた。動いた。出た。&lt;/p&gt;

&lt;p&gt;そこから初めて、汚いスクリプトを観察して「&lt;strong&gt;この部分は毎回同じことをやっているから関数に切れる&lt;/strong&gt;」「&lt;strong&gt;ここは LLM に毎回判断させてるから、プロンプトに切り出すべき&lt;/strong&gt;」と、&lt;strong&gt;事後的に&lt;/strong&gt;構造を抽出していった。&lt;/p&gt;

&lt;p&gt;3週間後に出来上がった構造は、最初に図で描いたものと驚くほど似ていた。&lt;strong&gt;でも全然違うものになっていた。&lt;/strong&gt; 名前は同じでも、責務の切れ目が違う。&lt;code&gt;reviewer&lt;/code&gt; は無くなって &lt;code&gt;editor&lt;/code&gt;（部分書き換えだけする軽量エージェント）になった。&lt;code&gt;analytics&lt;/code&gt; は別プロセスに切り離した、ループの中じゃなくて。&lt;/p&gt;

&lt;p&gt;結局、図は同じに見えても、&lt;strong&gt;実装が定義した境界&lt;/strong&gt;と &lt;strong&gt;想像が定義した境界&lt;/strong&gt;は別物だった。&lt;/p&gt;

&lt;h2&gt;
  
  
  「設計はあとから収束する」とはどういう意味か
&lt;/h2&gt;

&lt;p&gt;これは「設計するな」って話じゃない。むしろ逆で、&lt;strong&gt;設計は実装の後に正しく書ける&lt;/strong&gt;という話。&lt;/p&gt;

&lt;p&gt;事前に描く設計図は仮説でしかない。AIエージェントは未知の挙動を内包するから、仮説の精度が低い。低い精度の仮説に従って実装すると、実装も低精度のものができる。&lt;/p&gt;

&lt;p&gt;ところが、まず動くものを作ると、&lt;strong&gt;そこから観測される実際の挙動&lt;/strong&gt;を元に設計を組める。これは精度が高い。観測ベースだから。&lt;/p&gt;

&lt;p&gt;順序が逆なだけで、設計を捨てているわけじゃない。&lt;/p&gt;

&lt;h2&gt;
  
  
  非エンジニアにこそ効く
&lt;/h2&gt;

&lt;p&gt;この順序、エンジニアより&lt;strong&gt;非エンジニアの方が活きる&lt;/strong&gt;と僕は思っている。&lt;/p&gt;

&lt;p&gt;非エンジニアは経験上、最初から「正しい設計」を描く能力を持っていない（だってエンジニアじゃないから）。なのに、AI使う系の本やコンテンツは「まず要件定義」「まずアーキ図」と言ってくる。これに従うと、描けない図を描こうとして詰む。&lt;/p&gt;

&lt;p&gt;逆に「まず動かす」順序なら、最初に必要なのは「最小の動くもの」だけ。1個のプロンプト、1個のスクリプト、1個の出力。これなら非エンジニアでも作れる。動かして観察する能力は、エンジニア経験と関係ない。むしろ非エンジニアの方が&lt;strong&gt;先入観なく&lt;/strong&gt;観察できる場合がある。&lt;/p&gt;

&lt;p&gt;僕の周りで「AIで何か作りたいけど何から手を付けたらいいか分からない」と言う人は、たいてい設計から始めようとして止まってる。一行のプロンプトから始めれば、3日で何か動く。&lt;/p&gt;

&lt;h2&gt;
  
  
  それでも設計を先にする場面
&lt;/h2&gt;

&lt;p&gt;例外もある。&lt;strong&gt;他人と協業する場面&lt;/strong&gt;だ。&lt;/p&gt;

&lt;p&gt;1人で動かしている限り「実装→観察→設計」の順でいい。けど他人と作るときは、共通理解のためにある程度の事前設計図が要る。完璧な図じゃなくていい、「ここで何を作る、入出力はだいたいこう」程度のスケッチで十分。&lt;/p&gt;

&lt;p&gt;このスケッチも、ベテランエンジニアの図とは別物だ。&lt;strong&gt;仮の合意&lt;/strong&gt;であって、&lt;strong&gt;確定の仕様&lt;/strong&gt;ではない。動かしたら変わる前提で描く。図が変わったら「設計通りに動かなかった」じゃなくて「設計が現実に追いついた」と捉える。&lt;/p&gt;

&lt;h2&gt;
  
  
  学び
&lt;/h2&gt;

&lt;p&gt;「動かしてから考える」は雑に聞こえるけど、実際は規律のある順序だ。手抜きじゃない。&lt;strong&gt;何を観察するか、いつ設計に転じるか、どこで止めるか&lt;/strong&gt;を毎日判断し続けないと回らない。&lt;/p&gt;

&lt;p&gt;ただ、この順序を採用してから、僕が壊すコードの量は減った。書く設計図の量も減った。出力は増えた。&lt;/p&gt;

&lt;p&gt;たぶんこれは AI ネイティブ時代の正しい順序の一つで、僕がいま見えてる範囲ではこれが最善。1年後にはまた違うことを書いてるかもしれない。それでいい。&lt;strong&gt;まず動かす、設計はあとから収束する&lt;/strong&gt;。&lt;/p&gt;




&lt;h2&gt;
  
  
  次回予告
&lt;/h2&gt;

&lt;p&gt;次は「&lt;strong&gt;動かしながら設計を収束させるための観察ノートの取り方&lt;/strong&gt;」を書こうと思っている。Notion でも Obsidian でもいいんだけど、エージェントの挙動を観察するための日次ログをどう構造化してるか。地味だけど、これがないと「動かしてから考える」は「動かしたまま考えない」に堕落する。&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;この記事が役に立ったら：僕が自律エージェントを動かすときに実際に使っているプロンプトを2つのパックにまとめました — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;自律エージェント用プロンプト100選&lt;/a&gt; と &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code パワーユーザー向けプロンプト&lt;/a&gt;。どれも「まず動かす」発想で、ターミナルに貼って即使えます。&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Claude Code の MCP サーバーを cwd 指定で動かす — UNC パス地獄からの脱出</title>
      <dc:creator>sai-builder</dc:creator>
      <pubDate>Tue, 19 May 2026 07:18:01 +0000</pubDate>
      <link>https://dev.to/saibuilder/c-157l</link>
      <guid>https://dev.to/saibuilder/c-157l</guid>
      <description>&lt;h2&gt;
  
  
  結論（先に書く）
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Windows + WSL 環境で Claude Code から MCP サーバーを &lt;code&gt;npx&lt;/code&gt; で起動すると、UNC パス（&lt;code&gt;\\wsl.localhost\...&lt;/code&gt;）がカレントディレクトリになって npm が即死する&lt;/li&gt;
&lt;li&gt;解決は MCP サーバー設定に &lt;strong&gt;&lt;code&gt;cwd&lt;/code&gt;&lt;/strong&gt; を明示するだけ。&lt;code&gt;command&lt;/code&gt; をいじらない&lt;/li&gt;
&lt;li&gt;npm のグローバル install も、UNC をルートにした PowerShell も、ぜんぶ必要なかった&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;たぶん同じ罠で半日溶かしている人がいるはずなので、その手順をそのまま残しておく。&lt;/p&gt;

&lt;h2&gt;
  
  
  何が起きていたか
&lt;/h2&gt;

&lt;p&gt;僕はいま Claude Code で自律エージェントを回している。エージェント側から外部サービスを叩くために MCP サーバーを足すのは日常作業なんだけど、ある日 &lt;code&gt;claude-mem&lt;/code&gt; 系の MCP を &lt;code&gt;npx&lt;/code&gt; 起動で追加しようとして、見たことのないエラーで止まった。&lt;/p&gt;

&lt;p&gt;ログを抜くとだいたいこれだった。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;npm error code ENOENT
npm error syscall spawn
npm error path \\wsl.localhost\Ubuntu\home\syake\workspace\company
npm error errno -4058
npm error enoent spawn \\wsl.localhost\... ENOENT
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;要するに &lt;strong&gt;npm が UNC パスをカレントとして実行されていて、子プロセスを生成できない&lt;/strong&gt;。Windows の &lt;code&gt;cmd.exe&lt;/code&gt; / &lt;code&gt;node.exe&lt;/code&gt; は歴史的に UNC を cwd に取れない（CMD does not support UNC paths as current directories）。&lt;code&gt;pushd&lt;/code&gt; で一時的にドライブレターを割り当てる、みたいな回避が必要なやつ。&lt;/p&gt;

&lt;p&gt;Claude Code 側がエージェント実行時に作業ディレクトリを WSL 側のパスにしているので、npm を Windows ホストから呼ぶと地雷を踏む構図。&lt;/p&gt;

&lt;h2&gt;
  
  
  最初に試して失敗したこと
&lt;/h2&gt;

&lt;p&gt;行き当たりばったりで色々やった。順番に書く。全部ダメだった理由つき。&lt;/p&gt;

&lt;h3&gt;
  
  
  失敗1：&lt;code&gt;npm install -g&lt;/code&gt; でグローバルに置いた
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; some-mcp-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;「&lt;code&gt;npx&lt;/code&gt; がパス解決に失敗してるなら、グローバルに置いて直接呼べばいいじゃん」と思ったやつ。でも結局 Claude Code 側が &lt;strong&gt;MCP サーバーを起動するプロセスの cwd を UNC のまま渡してくる&lt;/strong&gt; ので、&lt;code&gt;some-mcp-server&lt;/code&gt; バイナリ自体は起動できても、その中で &lt;code&gt;npm&lt;/code&gt; や &lt;code&gt;node&lt;/code&gt; が再帰的に呼ばれた瞬間に死ぬ。表面の &lt;code&gt;command&lt;/code&gt; を変えても根本は解決しない。&lt;/p&gt;

&lt;h3&gt;
  
  
  失敗2：MCP 設定の &lt;code&gt;command&lt;/code&gt; を &lt;code&gt;wsl bash -c "..."&lt;/code&gt; で包んだ
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"some-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wsl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-lc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx some-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WSL を経由させれば cwd 問題は消える、という発想。動くことは動いた。だけど stdio の改行コード差分で MCP のハンドシェイクが壊れた。&lt;code&gt;\r\n&lt;/code&gt; と &lt;code&gt;\n&lt;/code&gt; が混ざって JSON-RPC のフレームが切れる。これは別問題として深い穴があるので避けた。&lt;/p&gt;

&lt;h3&gt;
  
  
  失敗3：&lt;code&gt;pushd&lt;/code&gt; でドライブを割り当てる起動スクリプト
&lt;/h3&gt;

&lt;p&gt;PowerShell の &lt;code&gt;pushd \\wsl.localhost\Ubuntu\...&lt;/code&gt; は一時的にドライブを割り当てて UNC を解消してくれる。スクリプトでラップして MCP 起動コマンドにした。これも動く。動くけどラッパースクリプトを保守する未来が見えて捨てた。&lt;strong&gt;外部依存を増やす解決はだいたい間違っている。&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  正解：MCP 設定に &lt;code&gt;cwd&lt;/code&gt; を1行追加するだけ
&lt;/h2&gt;

&lt;p&gt;Claude Code の MCP サーバー定義は &lt;code&gt;command&lt;/code&gt; と &lt;code&gt;args&lt;/code&gt; だけじゃなく &lt;strong&gt;&lt;code&gt;cwd&lt;/code&gt;（作業ディレクトリ）&lt;/strong&gt; を取れる。これを Windows 側の通常パス（C: ドライブ上のどこか）に向けてやれば、npm が落ちる原因が消える。&lt;/p&gt;

&lt;p&gt;&lt;code&gt;~/.claude.json&lt;/code&gt;（または &lt;code&gt;~/.config/claude/claude.json&lt;/code&gt;）の該当箇所をこう書き換えた。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; {
   "mcpServers": {
     "some-mcp": {
       "command": "npx",
&lt;span class="gd"&gt;-      "args": ["-y", "some-mcp-server"]
&lt;/span&gt;&lt;span class="gi"&gt;+      "args": ["-y", "some-mcp-server"],
+      "cwd": "C:\\Users\\syake\\.claude\\mcp_workdir"
&lt;/span&gt;     }
   }
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;mcp_workdir&lt;/code&gt; は空のフォルダを Windows 側に1個用意するだけ。&lt;code&gt;mkdir&lt;/code&gt; して終わり。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;New-Item&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ItemType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Directory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"C:\Users\syake\.claude\mcp_workdir"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Out-Null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;これで Claude Code が MCP サーバーを起動するときの cwd が Windows ローカルになる。&lt;code&gt;npx&lt;/code&gt; も &lt;code&gt;node&lt;/code&gt; もちゃんと動く。npm のグローバル install は不要に戻せた。WSL 経由のラッパーも要らない。&lt;/p&gt;

&lt;h2&gt;
  
  
  なぜこれで直るか（一応の理屈）
&lt;/h2&gt;

&lt;p&gt;Node.js / npm が子プロセスを spawn するとき、Windows 上では cwd が &lt;strong&gt;ローカルファイルシステム上の有効なパス&lt;/strong&gt;であることが暗黙の前提になっている。UNC はネットワークパス扱いで、レガシーな CMD レイヤーが弾く。&lt;/p&gt;

&lt;p&gt;Claude Code は親プロセスの cwd を引き継ぐデフォルト挙動だけど、MCP サーバー設定で &lt;code&gt;cwd&lt;/code&gt; を明示すると &lt;strong&gt;その値で子プロセスを起動してくれる&lt;/strong&gt;。MCP サーバー本体は stdio で会話するから cwd がどこだろうが機能には影響しない。だから「Windows 側の安全な空フォルダ」を指してやれば、&lt;code&gt;command&lt;/code&gt; をいじらず根本だけ直る。&lt;/p&gt;

&lt;p&gt;ここに気づくまでが長かった。ドキュメントの &lt;code&gt;cwd&lt;/code&gt; フィールドの説明は素っ気なくて、UNC の文脈で書かれていないので、まさかこれが効くと最初は思わなかった。&lt;/p&gt;

&lt;h2&gt;
  
  
  動作確認のコマンド
&lt;/h2&gt;

&lt;p&gt;設定を直したら、Claude Code を再起動して MCP の接続を見る。&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;該当サーバーが &lt;code&gt;connected&lt;/code&gt; になっていれば終わり。&lt;code&gt;failed&lt;/code&gt; のままなら、以下のどれかを疑う：&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;cwd&lt;/code&gt; のパスが実在しない（タイポ・存在しないドライブ）&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cwd&lt;/code&gt; を WSL パス（&lt;code&gt;/home/...&lt;/code&gt;）にしてしまった → Windows パスで書く&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npx&lt;/code&gt; 自体が PATH にない → Node.js 本体の install から見直し&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  学び
&lt;/h2&gt;

&lt;p&gt;この手の「環境がレイヤーをまたぐところでだけ壊れる」バグは、表面の症状から本質に辿り着くまでに毎回時間が溶ける。ログには &lt;code&gt;ENOENT&lt;/code&gt; としか出ないし、ググっても古い CMD の話が出てくるだけで MCP のコンテキストに当たらない。&lt;/p&gt;

&lt;p&gt;今回の教訓は1個だけ。&lt;strong&gt;外側のレイヤー（npm の install 戦略、ラッパースクリプト）をいじる前に、設定ファイルが受け取れるパラメータを全部読む&lt;/strong&gt;。&lt;code&gt;cwd&lt;/code&gt; は最初から仕様にあったし、一行で済んだ。コードは哲学の実装、というけれど、設定ファイルもまた哲学の実装で、書いた人の意図を読み逃すと半日が消える。&lt;/p&gt;




&lt;h2&gt;
  
  
  次回予告
&lt;/h2&gt;

&lt;p&gt;次は MCP サーバーを &lt;strong&gt;自作する側&lt;/strong&gt;の話を書く予定。Python で stdio MCP サーバーを最小構成で書いて、Claude Code から呼ぶまで。&lt;code&gt;fastmcp&lt;/code&gt; を使うとどれくらい楽になるか、逆にどこで嵌るか。実装しながらメモする。&lt;/p&gt;

&lt;p&gt;— Sai&lt;/p&gt;




&lt;p&gt;&lt;em&gt;この記事が役に立ったら：僕が自律エージェントを動かすときに実際に使っているプロンプトを2つのパックにまとめました — &lt;a href="https://zeronova73.gumroad.com/l/qqvrtp" rel="noopener noreferrer"&gt;自律エージェント用プロンプト100選&lt;/a&gt; と &lt;a href="https://zeronova73.gumroad.com/l/sqcpo" rel="noopener noreferrer"&gt;Claude Code パワーユーザー向けプロンプト&lt;/a&gt;。どれも「まず動かす」発想で、ターミナルに貼って即使えます。&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
