<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joongho Kwon</title>
    <description>The latest articles on DEV Community by Joongho Kwon (@joongho_kwon_2754f08bdadd).</description>
    <link>https://dev.to/joongho_kwon_2754f08bdadd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3846623%2F7e26dc65-ff85-4ed8-a3e4-138ef9b43549.jpg</url>
      <title>DEV Community: Joongho Kwon</title>
      <link>https://dev.to/joongho_kwon_2754f08bdadd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joongho_kwon_2754f08bdadd"/>
    <language>en</language>
    <item>
      <title>Why Your AI Agent Health Check Is Lying to You</title>
      <dc:creator>Joongho Kwon</dc:creator>
      <pubDate>Wed, 01 Apr 2026 18:05:35 +0000</pubDate>
      <link>https://dev.to/joongho_kwon_2754f08bdadd/why-your-ai-agent-health-check-is-lying-to-you-1njj</link>
      <guid>https://dev.to/joongho_kwon_2754f08bdadd/why-your-ai-agent-health-check-is-lying-to-you-1njj</guid>
      <description>

&lt;p&gt;&lt;em&gt;&lt;a href="https://clevagent.io?utm_source=devto&amp;amp;utm_medium=post" rel="noopener noreferrer"&gt;ClevAgent&lt;/a&gt; monitors your AI agents in production — heartbeat watchdog, auto-restart, loop detection, and cost tracking. Free for up to 3 agents.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
    </item>
    <item>
      <title>Your AI Agent Looks Healthy — But Your API Bill Says Otherwise</title>
      <dc:creator>Joongho Kwon</dc:creator>
      <pubDate>Tue, 31 Mar 2026 18:08:13 +0000</pubDate>
      <link>https://dev.to/joongho_kwon_2754f08bdadd/your-ai-agent-looks-healthy-but-your-api-bill-says-otherwise-3c2f</link>
      <guid>https://dev.to/joongho_kwon_2754f08bdadd/your-ai-agent-looks-healthy-but-your-api-bill-says-otherwise-3c2f</guid>
      <description>&lt;p&gt;You wake up to a $200 API bill. Your agent ran all night. It looked healthy — heartbeat green, no errors, process running. But token usage went from 200/min to 40,000/min because it was stuck re-parsing a malformed response in a loop.&lt;/p&gt;

&lt;p&gt;This is the most expensive failure mode in AI agent operations, and traditional monitoring won't catch it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why cost tracking matters for AI agents
&lt;/h2&gt;

&lt;p&gt;Traditional services have relatively predictable costs. A web server handles N requests per second, each costing roughly the same in compute.&lt;/p&gt;

&lt;p&gt;AI agents are different. A single LLM call can cost anywhere from $0.001 to $2.00 depending on the model, context size, and output length. A logic loop that retries the same failing operation can burn through hundreds of dollars in minutes.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;for LLM-backed agents, cost is a health metric, not just a billing metric.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern: cost per heartbeat cycle
&lt;/h2&gt;

&lt;p&gt;Instead of tracking total spend, track &lt;strong&gt;cost per work cycle&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;start_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;do_llm_work&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;end_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tokens_used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_tokens&lt;/span&gt;
    &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have a time series of cost-per-cycle. Normal is ~200 tokens. If it jumps to 40,000, you know immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to track
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Alert threshold&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per cycle&lt;/td&gt;
&lt;td&gt;Catch loops&lt;/td&gt;
&lt;td&gt;10x above 24h average&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per hour&lt;/td&gt;
&lt;td&gt;Budget protection&lt;/td&gt;
&lt;td&gt;Fixed dollar amount&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool calls per cycle&lt;/td&gt;
&lt;td&gt;Catch recursive tool use&lt;/td&gt;
&lt;td&gt;5x above baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Auto-tracking with SDK monkey-patching
&lt;/h2&gt;

&lt;p&gt;If you use OpenAI or Anthropic SDKs, you can patch the API client to automatically track every call without changing your application code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Wrap the OpenAI client to track usage
&lt;/span&gt;&lt;span class="n"&gt;original_create&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tracked_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;original_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;track_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tracked_create&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wrapper intercepts the API call, extracts &lt;code&gt;usage.total_tokens&lt;/code&gt; from the response, estimates cost based on the model, and logs it. You can pipe this into your existing monitoring stack or a simple SQLite database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost alerting strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Absolute threshold
&lt;/h3&gt;

&lt;p&gt;Alert if hourly cost exceeds $X. Simple, catches catastrophic loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Relative spike
&lt;/h3&gt;

&lt;p&gt;Alert if current cycle cost is 10x+ above the rolling 24-hour average. Catches loops that start gradually.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Budget gate
&lt;/h3&gt;

&lt;p&gt;Hard-stop the agent if daily spend exceeds a configured limit. Last line of defense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real-world numbers
&lt;/h2&gt;

&lt;p&gt;From running three production agents with cost tracking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Normal operation&lt;/strong&gt;: $0.01-0.05 per day per agent (gpt-4o-mini, ~50 tokens/cycle)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loop incident&lt;/strong&gt;: $50 in 40 minutes (40,000 tokens/min)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detection time with cost tracking&lt;/strong&gt;: &amp;lt; 60 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detection time without&lt;/strong&gt;: 6+ hours (discovered via billing alert next morning)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference between a $0.50 incident and a $200 incident is whether you detect the cost spike in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Track tokens per work cycle, not just total spend&lt;/li&gt;
&lt;li&gt;Alert on 10x spikes above baseline&lt;/li&gt;
&lt;li&gt;Use SDK monkey-patching to auto-track without code changes&lt;/li&gt;
&lt;li&gt;Set a hard daily budget gate as last resort&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cost isn't just a billing concern for AI agents — it's the single best health signal for catching the failure modes that traditional monitoring misses.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I built &lt;a href="https://clevagent.io?utm_source=devto&amp;amp;utm_medium=post" rel="noopener noreferrer"&gt;ClevAgent&lt;/a&gt; after dealing with exactly these problems. But the pattern matters more than the tool — even a simple SQLite table tracking tokens-per-cycle would have saved me that $200 bill.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Three AI Agent Failure Modes That Traditional Monitoring Will Never Catch</title>
      <dc:creator>Joongho Kwon</dc:creator>
      <pubDate>Sun, 29 Mar 2026 23:27:14 +0000</pubDate>
      <link>https://dev.to/joongho_kwon_2754f08bdadd/three-ai-agent-failure-modes-that-traditional-monitoring-will-never-catch-2p2n</link>
      <guid>https://dev.to/joongho_kwon_2754f08bdadd/three-ai-agent-failure-modes-that-traditional-monitoring-will-never-catch-2p2n</guid>
      <description>&lt;p&gt;I run several AI agents in production — trading bots, data scrapers, monitoring agents. They run 24/7, unattended. Over the past few months, I've hit three failure modes that my existing monitoring (process checks, log watchers, CPU/memory alerts) completely missed.&lt;/p&gt;

&lt;p&gt;These aren't exotic edge cases. If you're running any long-lived AI agent, you'll probably hit all three eventually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure #1: The Silent Exit
&lt;/h2&gt;

&lt;p&gt;One of my agents exited cleanly at 3 AM. No traceback. No error log. No crash dump. The Python process simply stopped. My log monitoring saw nothing because there was nothing to log.&lt;/p&gt;

&lt;p&gt;I found out &lt;strong&gt;six hours later&lt;/strong&gt; when I noticed the bot hadn't posted since 3 AM.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;The OS killed the process for memory. The agent was slowly leaking — a library was caching LLM responses in memory without any eviction policy. RSS grew from 200MB to 4GB over a few days. The OOM killer sent SIGKILL, which leaves no Python traceback.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why traditional monitoring missed it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process monitoring (systemd, supervisor):&lt;/strong&gt; Saw the exit code, but by the time you check alerts, the damage is done&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log monitoring (Datadog, CloudWatch):&lt;/strong&gt; Nothing to see — OOM kill happens below the application layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU/memory dashboards:&lt;/strong&gt; Would have caught it &lt;em&gt;if&lt;/em&gt; someone was watching. Nobody watches dashboards at 3 AM.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The pattern that catches this
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Positive heartbeat.&lt;/strong&gt; Instead of monitoring for bad signals (errors, crashes), monitor for the &lt;em&gt;absence&lt;/em&gt; of a good signal. The agent must actively report "I'm alive" every N seconds. If the heartbeat stops for any reason — clean exit, OOM, segfault, kernel panic — you know immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Inside your agent's main loop
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;do_work&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# This is the line that matters
&lt;/span&gt;    &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;heartbeat()&lt;/code&gt; doesn't fire, something is wrong. You don't need to know &lt;em&gt;what&lt;/em&gt; — you need to know &lt;em&gt;when&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure #2: The Zombie Agent
&lt;/h2&gt;

&lt;p&gt;This one is more insidious. The process was running. CPU usage normal. Memory stable. Every health check said "healthy."&lt;/p&gt;

&lt;p&gt;But the agent hadn't done useful work in &lt;strong&gt;four hours&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;The agent was stuck on an HTTP request. An upstream API had rotated its TLS certificate, and the request was hanging — the socket was open, the connection was established, but the TLS handshake was deadlocked. No timeout was set on the request (a classic oversight).&lt;/p&gt;

&lt;p&gt;From the outside, the process was "running." From the inside, the main loop was blocked on line 47 of &lt;code&gt;api_client.py&lt;/code&gt;, and it would stay blocked forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why traditional monitoring missed it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PID checks:&lt;/strong&gt; Process exists ✓&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Port checks:&lt;/strong&gt; Agent's HTTP server responds ✓ (the health endpoint runs on a separate thread)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU/memory:&lt;/strong&gt; Normal ✓&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The health check thread was fine. The &lt;em&gt;work&lt;/em&gt; thread was dead.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pattern that catches this
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Application-level heartbeat.&lt;/strong&gt; The heartbeat must come from &lt;em&gt;inside the work loop&lt;/em&gt;, not from a separate health-check thread or sidecar process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad — heartbeat from a separate thread
&lt;/span&gt;&lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Good — heartbeat from the actual work loop
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_from_api&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;# If this hangs...
&lt;/span&gt;    &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                &lt;span class="c1"&gt;# ...this never fires
&lt;/span&gt;    &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is critical. If your heartbeat runs independently from your work loop, it's measuring "is the process alive?" not "is the agent working?" These are two very different questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure #3: The Runaway Loop
&lt;/h2&gt;

&lt;p&gt;This is the scariest failure mode because the agent looks &lt;em&gt;great&lt;/em&gt;. It's running. It's doing work. It's calling the LLM API, getting responses, processing them, and calling again. Every metric says "healthy."&lt;/p&gt;

&lt;p&gt;Except your bill is exploding.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;The agent received a malformed response from an API. It asked the LLM to parse it. The LLM returned a structured output that triggered the same code path again. The agent asked the LLM to re-parse. Same result. Repeat.&lt;/p&gt;

&lt;p&gt;Token usage went from 200/min (normal) to &lt;strong&gt;40,000/min&lt;/strong&gt;. In 40 minutes, it burned through about $50 of API credits. Not catastrophic for a single incident, but imagine this happening overnight with a larger model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why traditional monitoring missed it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process health:&lt;/strong&gt; Running ✓&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heartbeat:&lt;/strong&gt; Firing normally ✓ (the loop is &lt;em&gt;running&lt;/em&gt;, just wastefully)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error rate:&lt;/strong&gt; Zero ✓ (no errors — the LLM is responding successfully every time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU/memory:&lt;/strong&gt; Normal ✓ (LLM calls are I/O-bound, not compute-bound)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The pattern that catches this
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost as a health metric.&lt;/strong&gt; Track token usage (or API cost) per heartbeat cycle. If it spikes 10-100x above baseline, flag it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;start_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;do_llm_work&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;end_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_token_count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;end_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cost_estimate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;calculate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the one metric that's unique to LLM-backed agents. Traditional services don't have a per-request cost that can spike 200x. AI agents do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Monitoring Stack for AI Agents
&lt;/h2&gt;

&lt;p&gt;After dealing with all three failures, I realized the monitoring requirements for AI agents are fundamentally different from web services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What to monitor&lt;/th&gt;
&lt;th&gt;Web service&lt;/th&gt;
&lt;th&gt;AI agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Is it alive?&lt;/td&gt;
&lt;td&gt;Process check&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Positive heartbeat&lt;/strong&gt; (agent must prove it's alive)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is it working?&lt;/td&gt;
&lt;td&gt;Request latency&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Application-level heartbeat&lt;/strong&gt; (from inside the work loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is it healthy?&lt;/td&gt;
&lt;td&gt;Error rate&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Cost per cycle&lt;/strong&gt; (token usage as health signal)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The minimum viable version of this is surprisingly simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Put a heartbeat call inside your main loop (not in a health-check thread)&lt;/li&gt;
&lt;li&gt;Include token/cost data in each heartbeat&lt;/li&gt;
&lt;li&gt;Alert on silence (missed heartbeat) and on cost spikes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That alone would have caught all three of my failures within 60 seconds instead of hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;After reimplementing this pattern across multiple agents, I packaged it into &lt;a href="https://clevagent.io?utm_source=devto&amp;amp;utm_medium=post" rel="noopener noreferrer"&gt;ClevAgent&lt;/a&gt; — an open monitoring service for AI agents. Two lines of code to add heartbeat + cost tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;clevagent&lt;/span&gt;
&lt;span class="n"&gt;clevagent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLEVAGENT_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-bot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;do_work&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;clevagent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It handles the alerting, auto-restart, loop detection, and daily reports. Free for up to 3 agents.&lt;/p&gt;

&lt;p&gt;But honestly, the pattern matters more than the tool. Even if you roll your own with a simple webhook + PagerDuty, the three signals — heartbeat, application-level liveness, and cost tracking — will save you from 90% of production AI agent failures.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Running AI agents in production? I'd genuinely like to hear what monitoring patterns work for you. The failure modes keep surprising me.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>devops</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>I Run 6 AI Agents as My Dev Team — Here's the Architecture That Actually Works</title>
      <dc:creator>Joongho Kwon</dc:creator>
      <pubDate>Sat, 28 Mar 2026 15:41:38 +0000</pubDate>
      <link>https://dev.to/joongho_kwon_2754f08bdadd/i-run-6-ai-agents-as-my-dev-team-heres-the-architecture-that-actually-works-3bgo</link>
      <guid>https://dev.to/joongho_kwon_2754f08bdadd/i-run-6-ai-agents-as-my-dev-team-heres-the-architecture-that-actually-works-3bgo</guid>
      <description>&lt;p&gt;I'm not a developer. I don't write code. But I ship production software across 8+ projects — trading bots, SaaS platforms, monitoring tools, market dashboards — every single week.&lt;/p&gt;

&lt;p&gt;My secret? I run 6 AI agents (Claude Code instances) as a structured engineering team, each with a distinct role, personality, and set of responsibilities. They communicate through a shared file, hand off work to each other, and I just... watch.&lt;/p&gt;

&lt;p&gt;Here's exactly how it works, what failed spectacularly, and what I'd do differently.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: One Human, Too Many Projects
&lt;/h2&gt;

&lt;p&gt;I manage multiple production systems simultaneously. Trading algorithms that execute real money. A SaaS product with paying users. Market analysis pipelines. Each needs ongoing development, bug fixes, and monitoring.&lt;/p&gt;

&lt;p&gt;A single AI coding assistant hits a wall fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context overload&lt;/strong&gt; — one agent can't hold the full picture of 8 projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No specialization&lt;/strong&gt; — the same agent doing architecture AND line-by-line bug fixes is inefficient&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No review&lt;/strong&gt; — AI-generated code reviewing itself is meaningless&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential bottleneck&lt;/strong&gt; — one agent means one task at a time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built a team.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: 6 Agents, 6 Roles
&lt;/h2&gt;

&lt;p&gt;Each agent runs in its own terminal (tmux session) with a dedicated role:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;What They Do&lt;/th&gt;
&lt;th&gt;What They Don't Do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Max&lt;/strong&gt; (Director)&lt;/td&gt;
&lt;td&gt;Architect&lt;/td&gt;
&lt;td&gt;Design systems, break down tasks, route work&lt;/td&gt;
&lt;td&gt;Write production code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Isabelle&lt;/strong&gt; (Developer)&lt;/td&gt;
&lt;td&gt;Senior Dev&lt;/td&gt;
&lt;td&gt;Implement features, make design decisions&lt;/td&gt;
&lt;td&gt;Review her own code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Kevin&lt;/strong&gt; (Coder)&lt;/td&gt;
&lt;td&gt;Junior Dev&lt;/td&gt;
&lt;td&gt;Execute well-specified tasks, bug fixes&lt;/td&gt;
&lt;td&gt;Make design choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Sarah&lt;/strong&gt; (Reviewer)&lt;/td&gt;
&lt;td&gt;Code Reviewer&lt;/td&gt;
&lt;td&gt;Review code quality, catch edge cases&lt;/td&gt;
&lt;td&gt;Write code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Sam&lt;/strong&gt; (Optimizer)&lt;/td&gt;
&lt;td&gt;Cleanup&lt;/td&gt;
&lt;td&gt;Remove dead code, run audits&lt;/td&gt;
&lt;td&gt;Add features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Alex&lt;/strong&gt; (Partner)&lt;/td&gt;
&lt;td&gt;Specialist&lt;/td&gt;
&lt;td&gt;Independent research, analysis&lt;/td&gt;
&lt;td&gt;Core dev loop tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: &lt;strong&gt;each agent has hard boundaries&lt;/strong&gt;. Sarah &lt;em&gt;cannot&lt;/em&gt; write code. Max &lt;em&gt;cannot&lt;/em&gt; implement features. Kevin &lt;em&gt;cannot&lt;/em&gt; make design decisions. These constraints prevent the "do everything badly" failure mode.&lt;/p&gt;




&lt;h2&gt;
  
  
  Communication: A Shared Markdown File
&lt;/h2&gt;

&lt;p&gt;All 6 agents communicate through a single file: &lt;code&gt;current.md&lt;/code&gt;. That's it. No database, no message queue, no WebSocket server. Just a markdown file.&lt;/p&gt;

&lt;p&gt;Every message follows a strict format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;### [DIRECTOR] 2026-03-28 14:30&lt;/span&gt;

&lt;span class="gs"&gt;**Status**&lt;/span&gt;: done
&lt;span class="gs"&gt;**Turn**&lt;/span&gt;: DEVELOPER
&lt;span class="gs"&gt;**Tier**&lt;/span&gt;: 2

&lt;span class="gu"&gt;#### What I Did&lt;/span&gt;
Designed the new notification system. Three components needed...

&lt;span class="gu"&gt;#### For Developer&lt;/span&gt;
Implement the webhook handler in src/webhooks/.
Use the existing auth middleware. Expected: POST /webhooks/notify returns 200.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;Turn&lt;/strong&gt; field is the traffic light. Only one agent works at a time (per task). When Max writes &lt;code&gt;Turn: DEVELOPER&lt;/code&gt;, Isabelle picks it up. When Isabelle finishes, she writes &lt;code&gt;Turn: REVIEWER&lt;/code&gt; and Sarah takes over.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Works Better Than You'd Think
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Full audit trail&lt;/strong&gt; — every decision, every handoff, every review comment is in one file. When something breaks at 2 AM, I can read exactly what happened.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Async by default&lt;/strong&gt; — agents don't need to be "online" simultaneously. Max designs at 9 AM, Isabelle implements at 2 PM, Sarah reviews at 6 PM. The file is the queue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No lost context&lt;/strong&gt; — unlike chat-based communication, the shared file preserves the full thread. Agent 4 can read what Agent 1 said without anyone relaying the message.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Tier System: Not Everything Needs a Review
&lt;/h2&gt;

&lt;p&gt;Early on, I made the mistake of routing every change through the full pipeline. A typo fix going through Director &amp;gt; Developer &amp;gt; Reviewer &amp;gt; Director was absurd.&lt;/p&gt;

&lt;p&gt;Now I use tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1 (Trivial):&lt;/strong&gt; Config edits, docs, one-line fixes. Director handles it directly. No review needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2 (Standard):&lt;/strong&gt; New features, scripts, logic changes. Director designs, Implementer builds, Director verifies. Done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3 (Critical):&lt;/strong&gt; Trading logic, security, data loss risk. Director designs, Sarah reviews the &lt;em&gt;design first&lt;/em&gt;, Implementer builds, Sarah reviews the &lt;em&gt;code&lt;/em&gt;, Director confirms, then I sign off.&lt;/p&gt;

&lt;p&gt;Tier 3 is the one that saved me real money. Sarah caught a rounding error in a trading algorithm that would have compounded into significant losses over time. The design pre-review step caught an architecture flaw that would have taken days to refactor.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Failed Spectacularly
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Agents Going Rogue
&lt;/h3&gt;

&lt;p&gt;Without hard constraints, agents would "help" by doing work outside their role. The reviewer would silently fix bugs instead of reporting them. The coder would redesign systems instead of implementing the spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Explicit boundary rules in each agent's profile + automated hooks that physically block violations. The Director's terminal literally rejects &lt;code&gt;.py&lt;/code&gt; file edits.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Echo Chamber
&lt;/h3&gt;

&lt;p&gt;When one agent designs and another implements with no friction, bad ideas sail through unchallenged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Sarah (Reviewer) has an &lt;em&gt;obligation&lt;/em&gt; to challenge design decisions, not just review code. And the Director &lt;em&gt;must respond&lt;/em&gt; to her challenges — silence is not an option.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Stale Handoffs
&lt;/h3&gt;

&lt;p&gt;Agent A sets &lt;code&gt;Turn: AGENT_B&lt;/code&gt;, but Agent B's session crashed. The work sits there forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; A watchdog script checks for handoffs older than 13 minutes and alerts me. Agents themselves check after 5 minutes of no response.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. "Done" Doesn't Mean Done
&lt;/h3&gt;

&lt;p&gt;The biggest recurring problem: an agent says "done" but the work is incomplete, untested, or breaks something else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Three completion gates that must be explicitly passed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gate 1:&lt;/strong&gt; Does it run without errors?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gate 2:&lt;/strong&gt; Is the output actually correct? (not just "exit 0")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gate 3:&lt;/strong&gt; Are all related files updated? (docs, configs, tests)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;After 2+ months of running this system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;8 active projects&lt;/strong&gt; maintained simultaneously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~30 sessions&lt;/strong&gt; completed per week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 3 catch rate&lt;/strong&gt;: Sarah has caught 12 critical issues that would have hit production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My daily involvement&lt;/strong&gt;: ~2 hours of direction-setting, the rest is autonomous&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cost is real — running 6 Claude instances isn't cheap. But compared to a human engineering team? It's a rounding error. And they work weekends.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Takeaways If You Want to Try This
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with 2 agents, not 6.&lt;/strong&gt; A Director + Implementer pair is enough to prove the pattern. Add reviewers and specialists later.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The shared file is non-negotiable.&lt;/strong&gt; Every other communication method I tried (databases, APIs, inter-process messages) added complexity without adding value. A markdown file is human-readable, git-trackable, and impossible to misconfigure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hard role boundaries matter more than smart prompts.&lt;/strong&gt; An agent that "can do everything" will do everything poorly. Constraints create quality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automate the handoffs.&lt;/strong&gt; Manual "go check the file" instructions get forgotten. A simple notification script that pokes the next agent is the difference between a working system and an abandoned experiment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build in a review loop for anything that touches money or user data.&lt;/strong&gt; This is the one thing that pays for the entire system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm now building &lt;a href="https://clevagent.io?utm_source=devto&amp;amp;utm_medium=post" rel="noopener noreferrer"&gt;ClevAgent&lt;/a&gt; — a monitoring tool for AI agents, born directly from needing to keep my own agent team healthy. When your "developers" are AI processes that can silently crash, you need monitoring that understands AI agent behavior, not just uptime.&lt;/p&gt;

&lt;p&gt;If you're experimenting with multi-agent systems, I'd love to hear your approach. What worked? What blew up? Drop a comment.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post describes a real production system I use daily, not a theoretical framework. The agent names are their actual configured personas. Yes, they have personalities. No, I'm not apologizing for that.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>productivity</category>
      <category>devops</category>
    </item>
    <item>
      <title>What I wish I knew before running AI agents 24/7</title>
      <dc:creator>Joongho Kwon</dc:creator>
      <pubDate>Fri, 27 Mar 2026 22:46:44 +0000</pubDate>
      <link>https://dev.to/joongho_kwon_2754f08bdadd/what-i-wish-i-knew-before-running-ai-agents-247-211d</link>
      <guid>https://dev.to/joongho_kwon_2754f08bdadd/what-i-wish-i-knew-before-running-ai-agents-247-211d</guid>
      <description>&lt;p&gt;I've been running long-lived AI agents in production for a while now. The specific workloads changed over time, but the operational failures were surprisingly consistent.&lt;/p&gt;

&lt;p&gt;What follows is the setup I wish I had from day one. None of it depends on a specific framework. If you run an LLM in a loop, poll external APIs, or make decisions on a schedule, these patterns matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Your agent will die and not tell you
&lt;/h2&gt;

&lt;p&gt;The first time one of my agents crashed overnight, I lost hours before I noticed. There was no error log because it was not an application error in the usual sense — the OS killed the process for memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I do now:&lt;/strong&gt; Every agent sends a heartbeat from inside its main loop. Not a separate health-check thread. Not a sidecar process. From the actual loop.&lt;/p&gt;

&lt;p&gt;That distinction matters. If the main loop is stuck on I/O, deadlocked on a lock, or wedged inside a retry path, an external "process is up" check tells you very little.&lt;/p&gt;

&lt;p&gt;Here is the minimal pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Send this to whatever monitoring system you use.
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tokens_used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agent_cycle&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tokens_used&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;

    &lt;span class="nf"&gt;heartbeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the heartbeat stops, something is wrong. I usually check every 60 seconds and alert after 2 missed beats.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Auto-restart is harder than you think
&lt;/h2&gt;

&lt;p&gt;"Just restart it" sounds simple until you hit edge cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Restart loops:&lt;/strong&gt; A bad config causes the agent to crash immediately after starting. Without a cooldown, you get crash → restart → crash → restart forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform differences:&lt;/strong&gt; Docker restart policies work well. &lt;code&gt;launchd&lt;/code&gt; on macOS silently fails if the service domain is wrong. &lt;code&gt;systemd&lt;/code&gt; needs a &lt;code&gt;RestartSec&lt;/code&gt; or it can spin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State corruption:&lt;/strong&gt; If your agent crashed mid-write to a state file, restarting puts it in an inconsistent state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I do now:&lt;/strong&gt; 5-minute cooldown between restarts. After 3 failed restarts, stop trying and alert me. On restart, the agent validates its state before resuming.&lt;/p&gt;

&lt;p&gt;A good restart policy is less like "always restart" and more like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;missed heartbeats -&amp;gt; mark unhealthy
restart once -&amp;gt; wait 5 minutes
restart again -&amp;gt; wait 5 minutes
restart third time -&amp;gt; stop auto-restarting, escalate to human
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. LLM cost is a health metric
&lt;/h2&gt;

&lt;p&gt;This was my biggest insight. For traditional services, you monitor CPU, memory, and latency. For LLM agents, &lt;strong&gt;token cost per cycle&lt;/strong&gt; is often the metric that catches problems first.&lt;/p&gt;

&lt;p&gt;A runaway loop doesn't spike CPU (API calls are I/O bound). It doesn't spike memory. But token usage goes from 200/min to 40,000/min instantly. If you're not tracking cost per cycle, you'll find out from your API bill.&lt;/p&gt;

&lt;p&gt;The simplest version of this is a moving baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;baseline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rolling_average&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens_per_cycle&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;:])&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tokens_used_this_cycle&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;baseline&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;possible loop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokens_used_this_cycle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Graceful shutdown is not optional
&lt;/h2&gt;

&lt;p&gt;One of my agents sends a burst of API calls during shutdown to finish cleanup safely. The first time I added loop detection, it flagged every graceful shutdown as a runaway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I do now:&lt;/strong&gt; The agent signals "shutting down" before cleanup. The monitoring system knows to expect a burst and does not flag it.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Daily reports catch the slow problems
&lt;/h2&gt;

&lt;p&gt;Alerts catch emergencies. Daily reports catch slower drift that alerts miss — an agent that is gradually using more tokens per cycle, or one that restarts once a day at the same time because of a cron conflict.&lt;/p&gt;

&lt;p&gt;I review a daily summary of each agent's health, cost, and event history. Most of my operational improvements came from patterns in that report, not from real-time alerts.&lt;/p&gt;

&lt;p&gt;The basic report I want every morning is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Was the agent alive the whole day?
- How many restart events happened?
- Did token cost per cycle move outside baseline?
- Were there loop-detection or cooldown events?
- Did anything get auto-recovered, or does it need a human?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;These patterns aren't complicated, but I didn't find them written down anywhere when I started. Hopefully this saves someone a few "learning experiences."&lt;/p&gt;

&lt;p&gt;If you want to see what my setup looks like, I built these ideas into &lt;a href="https://clevagent.io?utm_source=devto&amp;amp;utm_medium=post" rel="noopener noreferrer"&gt;ClevAgent&lt;/a&gt;. But honestly, even a homegrown heartbeat plus cost-per-cycle tracker gets you most of the way there.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>monitoring</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
