<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: João Pedro Silva Setas</title>
    <description>The latest articles on DEV Community by João Pedro Silva Setas (@setas).</description>
    <link>https://dev.to/setas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3766706%2F44b843d1-96fc-4ba8-8820-fc213f0c0030.jpg</url>
      <title>DEV Community: João Pedro Silva Setas</title>
      <link>https://dev.to/setas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/setas"/>
    <language>en</language>
    <item>
      <title>What Happens When AI Agents Hallucinate? The boring part is the checkpoint.</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Wed, 08 Apr 2026 15:30:03 +0000</pubDate>
      <link>https://dev.to/setas/what-happens-when-ai-agents-hallucinate-the-boring-part-is-the-checkpoint-21eb</link>
      <guid>https://dev.to/setas/what-happens-when-ai-agents-hallucinate-the-boring-part-is-the-checkpoint-21eb</guid>
      <description>&lt;p&gt;Most agent-demo discourse treats hallucination like a model problem.&lt;/p&gt;

&lt;p&gt;Wrong answer in, wrong answer out.&lt;/p&gt;

&lt;p&gt;The worse failure in practice is simpler.&lt;/p&gt;

&lt;p&gt;A confident wrong output turns into company truth.&lt;/p&gt;

&lt;p&gt;Then it is no longer "a bad generation."&lt;/p&gt;

&lt;p&gt;It is copy. A metric. A product claim. A technical explanation. A decision someone is about to act on.&lt;/p&gt;

&lt;p&gt;I run a solo company with AI agent departments inside GitHub Copilot. The useful question for me is not how to eliminate hallucinations. I do not think that is realistic.&lt;/p&gt;

&lt;p&gt;The useful question is this:&lt;/p&gt;

&lt;p&gt;What stops wrong output from hardening into something real?&lt;/p&gt;

&lt;p&gt;The answer is boring.&lt;/p&gt;

&lt;p&gt;Review checkpoints. Memory discipline. Narrow rules about what an agent is allowed to assert without verification.&lt;/p&gt;

&lt;p&gt;That turned out to matter more than another clever prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hallucination gets more dangerous as the output gets closer to action
&lt;/h2&gt;

&lt;p&gt;An agent drafting a rough idea is fine.&lt;/p&gt;

&lt;p&gt;An agent confidently restating a stale revenue number, inventing a product capability, or describing system internals it never checked is not.&lt;/p&gt;

&lt;p&gt;In my setup, I treat "hallucination" broadly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a product claim that outruns the actual build&lt;/li&gt;
&lt;li&gt;a stale business fact repeated as if it were current&lt;/li&gt;
&lt;li&gt;a plausible technical explanation that was never checked against the real system&lt;/li&gt;
&lt;li&gt;a compliance or trust statement that sounds right but was never reviewed by the right specialist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That definition matters because bad output is not only about models inventing weird facts.&lt;/p&gt;

&lt;p&gt;It is about confident language outrunning verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Product claims need a checkpoint
&lt;/h2&gt;

&lt;p&gt;The cleanest example right now is OpenClawCloud.&lt;/p&gt;

&lt;p&gt;The direction I care about is governed execution: vendor independence, bounded runs, review checkpoints, and failure containment.&lt;/p&gt;

&lt;p&gt;That is the thesis.&lt;/p&gt;

&lt;p&gt;But the repo rule is explicit: wording around sandboxing, approval gates, audit trails, credential isolation, and secure-by-default behavior stays THESIS or ROADMAP level until the build work proves it live.&lt;/p&gt;

&lt;p&gt;That sounds pedantic until you see the alternative.&lt;/p&gt;

&lt;p&gt;A draft can slide from "this is where the product is going" to "this is what the product does today" in one paragraph.&lt;/p&gt;

&lt;p&gt;Same idea.&lt;/p&gt;

&lt;p&gt;Very different claim.&lt;/p&gt;

&lt;p&gt;So when a draft touches trust, compliance, security, or policy, I route it through an internal legal/compliance review step before publication.&lt;/p&gt;

&lt;p&gt;The point is not to make the copy sound safer.&lt;/p&gt;

&lt;p&gt;The point is to stop the draft from inventing a product I have not shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Stale facts need a checkpoint too
&lt;/h2&gt;

&lt;p&gt;Some hallucinations are not fabricated out of thin air.&lt;/p&gt;

&lt;p&gt;They are old truths repeated at the wrong time.&lt;/p&gt;

&lt;p&gt;That is why I use memory-first checks for time-sensitive business facts.&lt;/p&gt;

&lt;p&gt;Revenue figures.&lt;/p&gt;

&lt;p&gt;Compliance status.&lt;/p&gt;

&lt;p&gt;Deal terms.&lt;/p&gt;

&lt;p&gt;Anything where "technically true last week" can become wrong enough to mislead today.&lt;/p&gt;

&lt;p&gt;The rule is not "trust memory blindly."&lt;/p&gt;

&lt;p&gt;The rule is "look it up before you restate it."&lt;/p&gt;

&lt;p&gt;That reduces a very common failure mode in agent systems: stale state getting repeated with fresh confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Technical explanations get smoother than reality
&lt;/h2&gt;

&lt;p&gt;This is the easiest trap for content systems.&lt;/p&gt;

&lt;p&gt;An article about orchestration, memory, or agent handoffs can sound completely plausible while missing one important constraint.&lt;/p&gt;

&lt;p&gt;And if the paragraph reads cleanly, most people will not notice the miss.&lt;/p&gt;

&lt;p&gt;So public explanations of how my agent system works go through COO or CTO review.&lt;/p&gt;

&lt;p&gt;That keeps the description anchored to the real orchestration model instead of whatever smooth story the draft happened to produce.&lt;/p&gt;

&lt;p&gt;This matters especially for multi-agent systems, because the wrong explanation always sounds tempting.&lt;/p&gt;

&lt;p&gt;"The agents just call each other when needed" is smooth.&lt;/p&gt;

&lt;p&gt;It is also incomplete.&lt;/p&gt;

&lt;p&gt;The accurate framing is that the COO coordinates the execution flow and specialist review happens inside that orchestrated model.&lt;/p&gt;

&lt;p&gt;That is a better sentence because it is a truer one.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The point is not zero hallucinations
&lt;/h2&gt;

&lt;p&gt;I do not think the useful goal is perfect output.&lt;/p&gt;

&lt;p&gt;The useful goal is that wrong output hits a review checkpoint before it becomes copy, policy, or an operating decision.&lt;/p&gt;

&lt;p&gt;That shift changes the design.&lt;/p&gt;

&lt;p&gt;You stop obsessing over whether the model can sound confident.&lt;/p&gt;

&lt;p&gt;You start caring about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who is allowed to approve which kind of statement&lt;/li&gt;
&lt;li&gt;when a lookup is required before a fact can be restated&lt;/li&gt;
&lt;li&gt;which outputs need specialist review&lt;/li&gt;
&lt;li&gt;how a draft gets stopped before it crosses from interesting to operational&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are less exciting questions than "how autonomous is your system?"&lt;/p&gt;

&lt;p&gt;They are much closer to the real product surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this changed how I think about OpenClawCloud
&lt;/h2&gt;

&lt;p&gt;This is also why I keep coming back to governed execution.&lt;/p&gt;

&lt;p&gt;The market loves capability demos because they are easy to watch.&lt;/p&gt;

&lt;p&gt;But if an agent touches real work, the more important question is what happens when the output is confident and wrong.&lt;/p&gt;

&lt;p&gt;That is where review checkpoints, bounded execution, and clear intervention paths start to matter more than raw autonomy.&lt;/p&gt;

&lt;p&gt;For OpenClawCloud, I treat that as direction, not a shipped promise.&lt;/p&gt;

&lt;p&gt;The value I care about is not "the agent can do a lot."&lt;/p&gt;

&lt;p&gt;It is "wrong output does not get a free path into real systems."&lt;/p&gt;

&lt;p&gt;That is a much more boring story.&lt;/p&gt;

&lt;p&gt;It is also the one I trust.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Improver: How I Built an AI Agent That Upgrades Other AI Agents</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:16:27 +0000</pubDate>
      <link>https://dev.to/setas/the-improver-how-i-built-an-ai-agent-that-upgrades-other-ai-agents-2l9j</link>
      <guid>https://dev.to/setas/the-improver-how-i-built-an-ai-agent-that-upgrades-other-ai-agents-2l9j</guid>
      <description>&lt;p&gt;Most multi-agent writeups stop at specialization.&lt;/p&gt;

&lt;p&gt;Planner. Coder. Reviewer. Maybe a memory layer. Maybe a routing loop.&lt;/p&gt;

&lt;p&gt;That part is interesting, but it was not the part that started compounding for me.&lt;/p&gt;

&lt;p&gt;The part that changed the system was this: who improves the agents after they make the same mistake twice?&lt;/p&gt;

&lt;p&gt;I run a solo company with AI agent departments. There is a CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the rest. The specialists do the obvious work. The weird one is the Improver.&lt;/p&gt;

&lt;p&gt;It is the agent that reads mistakes, looks for recurring patterns, and edits the system itself.&lt;/p&gt;

&lt;p&gt;Not the product code.&lt;/p&gt;

&lt;p&gt;The operating system around the agents.&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;Because the useful version of self-improving agents is much more boring than the sci-fi version.&lt;/p&gt;

&lt;p&gt;And that is exactly why I trust it.&lt;/p&gt;

&lt;h2&gt;
  
  
  I did not need more agents. I needed better scar tissue.
&lt;/h2&gt;

&lt;p&gt;The first version of the system already had specialist agents with decent prompts.&lt;/p&gt;

&lt;p&gt;The problem was not "I wish I had one more role."&lt;/p&gt;

&lt;p&gt;The problem was repetition.&lt;/p&gt;

&lt;p&gt;The same kinds of issues kept appearing in different forms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;content that sounded technically correct but did not sound like me&lt;/li&gt;
&lt;li&gt;memory that stayed technically valid but got noisier every week&lt;/li&gt;
&lt;li&gt;tasks that were flagged as stale over and over without a real escalation path&lt;/li&gt;
&lt;li&gt;workflow instructions that were good enough for one run but not good enough to survive contact with the next one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one was fixable manually.&lt;/p&gt;

&lt;p&gt;But manual fixes do not compound.&lt;/p&gt;

&lt;p&gt;If every mistake becomes a one-off correction, the system never gets better. It just gets babysat.&lt;/p&gt;

&lt;p&gt;So I added an Improver agent whose whole job is turning mistakes into infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The raw input is not intuition. It is lessons.
&lt;/h2&gt;

&lt;p&gt;The Improver does not wake up and freestyle changes.&lt;/p&gt;

&lt;p&gt;It works from a very explicit input: lesson entities stored in shared memory.&lt;/p&gt;

&lt;p&gt;After a complex task, agents log what went wrong, why it mattered, and what changed.&lt;/p&gt;

&lt;p&gt;The structure is intentionally plain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lesson:2026-02-17:marketing-voice-authenticity
- Agent: Improver
- Category: process
- Summary: Marketing content was too generic
- Detail: Founder feedback showed the writing did not sound like a real engineer
- Action: Rewrote the voice guide and added a founder discovery protocol
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because it gives the Improver something better than vibes.&lt;/p&gt;

&lt;p&gt;It gets actual failure patterns.&lt;/p&gt;

&lt;p&gt;It can group lessons by category: bug, process, knowledge, tool, decision.&lt;/p&gt;

&lt;p&gt;Then it can ask a useful question: is this a one-off, or is this a gap in the system?&lt;/p&gt;

&lt;p&gt;If three unrelated tasks keep producing the same sort of friction, that is usually not user error.&lt;/p&gt;

&lt;p&gt;It is missing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Improver is allowed to change
&lt;/h2&gt;

&lt;p&gt;This agent has real edit authority, but the scope is narrow on purpose.&lt;/p&gt;

&lt;p&gt;Most of its work lives in the management repo, especially the files that define how the agents behave.&lt;/p&gt;

&lt;p&gt;Its change types are basically these:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Change type&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Typical file&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New skill&lt;/td&gt;
&lt;td&gt;Stores reusable knowledge the system keeps needing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.github/skills/*/SKILL.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New prompt&lt;/td&gt;
&lt;td&gt;Captures a recurring workflow&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.github/prompts/*.prompt.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent update&lt;/td&gt;
&lt;td&gt;Tightens responsibilities, guardrails, or working style&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.github/agents/*.agent.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Doc update&lt;/td&gt;
&lt;td&gt;Adds missing operational context&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;, &lt;code&gt;AGENTS.md&lt;/code&gt;, project docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory curation&lt;/td&gt;
&lt;td&gt;Cleans duplicates, adds relations, prunes stale state&lt;/td&gt;
&lt;td&gt;shared knowledge graph&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What its remit does not include is even more important.&lt;/p&gt;

&lt;p&gt;It is not supposed to rewrite product code because it feels clever.&lt;/p&gt;

&lt;p&gt;It is not supposed to invent tax or legal rules.&lt;/p&gt;

&lt;p&gt;It is not supposed to change company identity, product positioning, or authority boundaries on its own.&lt;/p&gt;

&lt;p&gt;And it follows the same operating constraints and source-of-truth rules as the rest of the system.&lt;/p&gt;

&lt;p&gt;The useful version of self-improvement is constrained and auditable.&lt;/p&gt;

&lt;p&gt;Not open-ended.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two trigger modes
&lt;/h2&gt;

&lt;p&gt;The Improver runs in two main ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Scheduled review
&lt;/h3&gt;

&lt;p&gt;There is a dedicated &lt;code&gt;/improve-agents&lt;/code&gt; prompt for periodic system review.&lt;/p&gt;

&lt;p&gt;That run audits the agent files, prompts, skills, and memory graph, then looks for gaps that should become reusable infrastructure.&lt;/p&gt;

&lt;p&gt;This is the slower, batch-style mode.&lt;/p&gt;

&lt;p&gt;Good for pattern detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Mid-task intervention
&lt;/h3&gt;

&lt;p&gt;This is the more useful mode in practice.&lt;/p&gt;

&lt;p&gt;If another agent notices a real gap while working, it calls the Improver immediately.&lt;/p&gt;

&lt;p&gt;Not after the task. During it.&lt;/p&gt;

&lt;p&gt;That turns "we should fix this later" into "fix the system now, then continue."&lt;/p&gt;

&lt;p&gt;The difference sounds small, but it changes the system from retrospective learning to live correction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real changes the Improver already made
&lt;/h2&gt;

&lt;p&gt;This is the part I care about most.&lt;/p&gt;

&lt;p&gt;The Improver is only interesting if the output is visible in the system afterward.&lt;/p&gt;

&lt;p&gt;Here are a few concrete changes it made from actual runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Marketing stopped sounding like marketing
&lt;/h3&gt;

&lt;p&gt;On Feb 17, founder feedback was blunt: the content did not sound like a real engineer.&lt;/p&gt;

&lt;p&gt;That became a lesson.&lt;/p&gt;

&lt;p&gt;The Improver responded by rewriting the Marketing agent's voice guide, adding anti-patterns, adding a content quality gate, and forcing a founder discovery protocol instead of generic startup copy.&lt;/p&gt;

&lt;p&gt;That was a real upgrade.&lt;/p&gt;

&lt;p&gt;Not just "write better next time."&lt;/p&gt;

&lt;h3&gt;
  
  
  The system got a domain registry
&lt;/h3&gt;

&lt;p&gt;On Feb 22, the instructions were updated to add a real Domain Registry with a separate Social URL column.&lt;/p&gt;

&lt;p&gt;That sounds administrative until a platform blocks one domain and not the fallback.&lt;/p&gt;

&lt;p&gt;OpenClawCloud is the live example. For public content, the correct social URL is &lt;code&gt;clawdcloud.net&lt;/code&gt;, not the blocked alternative.&lt;/p&gt;

&lt;p&gt;Without a registry, every agent has to remember that detail manually.&lt;/p&gt;

&lt;p&gt;With a registry, it becomes infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory stopped growing like a junk drawer
&lt;/h3&gt;

&lt;p&gt;Another improvement pass added memory hygiene rules.&lt;/p&gt;

&lt;p&gt;Standups and trend scans now have retention rules. Old noise gets pruned. Permanent lessons and decisions stay.&lt;/p&gt;

&lt;p&gt;That is not glamorous work, but stale memory is one of the fastest ways to make a multi-agent system look smart while behaving confused.&lt;/p&gt;

&lt;p&gt;Shared context only helps if it stays usable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chronic misses stopped getting polite excuses
&lt;/h3&gt;

&lt;p&gt;One of the most useful upgrades came later: chronic miss escalation.&lt;/p&gt;

&lt;p&gt;If a task misses two or more deadlines, the COO is now supposed to re-scope it, demote it, kill it, or add a real root-cause note.&lt;/p&gt;

&lt;p&gt;No more infinite carryover with a softer ETA.&lt;/p&gt;

&lt;p&gt;That was an important change because agent systems are very good at sounding disciplined while quietly tolerating drift.&lt;/p&gt;

&lt;p&gt;The Improver is useful precisely when it gets less polite about that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard part is not self-improvement. It is boundaries.
&lt;/h2&gt;

&lt;p&gt;The question I get most often is some version of: does this not drift into chaos?&lt;/p&gt;

&lt;p&gt;It would, if the Improver were allowed to treat the whole company as editable text.&lt;/p&gt;

&lt;p&gt;That is why the boundaries matter more than the mechanism.&lt;/p&gt;

&lt;p&gt;The agent can improve prompts, skills, workflows, and memory hygiene.&lt;/p&gt;

&lt;p&gt;Its remit does not include declaring new business facts.&lt;/p&gt;

&lt;p&gt;Its remit does not include quietly changing product claims.&lt;/p&gt;

&lt;p&gt;Its remit does not include deciding that existing constraints or review triggers are optional now.&lt;/p&gt;

&lt;p&gt;And it is not supposed to widen its own authority because that seems efficient.&lt;/p&gt;

&lt;p&gt;In other words, the system can improve the procedures that shape work.&lt;/p&gt;

&lt;p&gt;It cannot rewrite the constitution.&lt;/p&gt;

&lt;p&gt;That is the only reason this feels useful instead of reckless.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I like about it
&lt;/h2&gt;

&lt;p&gt;The best thing about the Improver is that it turns post-mortems into runtime assets.&lt;/p&gt;

&lt;p&gt;A normal post-mortem ends as a paragraph in a doc nobody reads again.&lt;/p&gt;

&lt;p&gt;This loop is different.&lt;/p&gt;

&lt;p&gt;The mistake becomes a lesson.&lt;br&gt;
The lesson becomes an instruction change.&lt;br&gt;
The instruction change affects the next run.&lt;/p&gt;

&lt;p&gt;That is the compound effect.&lt;/p&gt;

&lt;p&gt;Not infinite autonomy.&lt;/p&gt;

&lt;p&gt;Just a system that gets slightly harder to fool every time it learns something real.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;I do not think self-improving agent systems are interesting because they sound futuristic.&lt;/p&gt;

&lt;p&gt;I think they are interesting when they make operations more boring.&lt;/p&gt;

&lt;p&gt;Better guardrails.&lt;br&gt;
Cleaner memory.&lt;br&gt;
Sharper prompts.&lt;br&gt;
Fewer repeated mistakes.&lt;/p&gt;

&lt;p&gt;That is what the Improver does for me.&lt;/p&gt;

&lt;p&gt;It is not an agent building a better world in the background.&lt;/p&gt;

&lt;p&gt;It is an agent that reads scar tissue and turns it into better constraints.&lt;/p&gt;

&lt;p&gt;And for real work, I trust that far more.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>My AI Agents Talk to Each Other. Here's the Inter-Agent Communication Protocol</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Tue, 24 Mar 2026 07:59:27 +0000</pubDate>
      <link>https://dev.to/setas/my-ai-agents-talk-to-each-other-heres-the-inter-agent-communication-protocol-36j3</link>
      <guid>https://dev.to/setas/my-ai-agents-talk-to-each-other-heres-the-inter-agent-communication-protocol-36j3</guid>
      <description>&lt;p&gt;Most multi-agent demos skip the boring part.&lt;/p&gt;

&lt;p&gt;They show a planner, a coder, maybe a reviewer, and a nice loop between them.&lt;/p&gt;

&lt;p&gt;What they usually do not show is this: how does one agent know when it must ask another one for help?&lt;/p&gt;

&lt;p&gt;That turned out to be the hard part in my system.&lt;/p&gt;

&lt;p&gt;I run a solo company with AI agent departments. There is a CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the rest. They handle strategy, pricing, tax checks, content, technical reviews, and daily operations across five products.&lt;/p&gt;

&lt;p&gt;Giving them roles was easy.&lt;/p&gt;

&lt;p&gt;Making the handoffs reliable was not.&lt;/p&gt;

&lt;p&gt;Without a protocol, you get one of two bad outcomes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agents stay in their lane too hard and miss obvious cross-domain risks&lt;/li&gt;
&lt;li&gt;Agents ask everyone about everything and the system turns into a committee&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither scales.&lt;/p&gt;

&lt;p&gt;So I ended up writing a simple inter-agent communication protocol.&lt;/p&gt;

&lt;p&gt;Not a vague "collaborate when useful" instruction.&lt;/p&gt;

&lt;p&gt;An actual protocol with triggers, message format, loop prevention, and ownership rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem is not talking. It is knowing when to talk.
&lt;/h2&gt;

&lt;p&gt;The first version of my system had specialist agents, but consultation was soft.&lt;/p&gt;

&lt;p&gt;Marketing could write a post about a product.&lt;br&gt;
The CFO could model pricing.&lt;br&gt;
The Lawyer could review GDPR risk.&lt;br&gt;
The CTO could look at technical architecture.&lt;/p&gt;

&lt;p&gt;The issue was not capability.&lt;/p&gt;

&lt;p&gt;The issue was consistency.&lt;/p&gt;

&lt;p&gt;Sometimes an agent would ask for help when it should not.&lt;br&gt;
Sometimes it would skip a review it clearly needed.&lt;br&gt;
Sometimes two agents would bounce the same question back and forth.&lt;/p&gt;

&lt;p&gt;That is when I realized the real problem was not prompt quality.&lt;/p&gt;

&lt;p&gt;It was routing.&lt;/p&gt;

&lt;p&gt;If a pricing decision has tax implications, the CFO must consult the Accountant.&lt;br&gt;
If a public post describes how the system works, Marketing must get a technical accuracy check.&lt;br&gt;
If a content draft makes legal claims, Lawyer review is mandatory.&lt;/p&gt;

&lt;p&gt;Once you define those triggers explicitly, the system gets much calmer.&lt;/p&gt;
&lt;h2&gt;
  
  
  The trigger table is the whole game
&lt;/h2&gt;

&lt;p&gt;The core rule is simple: when work crosses into another domain, peer review becomes mandatory.&lt;/p&gt;

&lt;p&gt;That sounds obvious, but it only helps if the triggers are concrete.&lt;/p&gt;

&lt;p&gt;Here is the shape of the table I use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spending money, pricing, or margin assumptions -&amp;gt; consult CFO&lt;/li&gt;
&lt;li&gt;Tax, IVA, invoicing, or deductible expenses -&amp;gt; consult Accountant&lt;/li&gt;
&lt;li&gt;GDPR, contracts, terms, or liability -&amp;gt; consult Lawyer&lt;/li&gt;
&lt;li&gt;Architecture, infrastructure, or product internals -&amp;gt; consult CTO&lt;/li&gt;
&lt;li&gt;Revenue strategy or company direction -&amp;gt; consult CEO&lt;/li&gt;
&lt;li&gt;Launches, public messaging, or positioning -&amp;gt; consult Marketing&lt;/li&gt;
&lt;li&gt;Multi-step execution across teams -&amp;gt; consult COO&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That one table removed a lot of drift.&lt;/p&gt;

&lt;p&gt;Agents no longer need to guess whether a topic is "kind of legal" or "sort of technical."&lt;/p&gt;

&lt;p&gt;The trigger decides.&lt;/p&gt;
&lt;h2&gt;
  
  
  Every review request uses the same format
&lt;/h2&gt;

&lt;p&gt;I did not want agents sending free-form messages to each other.&lt;/p&gt;

&lt;p&gt;Free-form sounds flexible until you realize every handoff starts losing context in a slightly different way.&lt;/p&gt;

&lt;p&gt;So every review request follows the same structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Peer Review Request

From: [agent name]
Call chain: [Agent1 -&amp;gt; Agent2 -&amp;gt; Current]
Task: [what the founder asked for]
What I did: [current work so far]
What I need from you: [specific question]
Context: [only the facts needed for review]

Respond with:
1. APPROVED
2. CONCERNS
3. BLOCKING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That format does three useful things.&lt;/p&gt;

&lt;p&gt;First, it forces the requesting agent to explain what problem it is actually solving.&lt;/p&gt;

&lt;p&gt;Second, it gives the reviewer a narrow question instead of dumping the full task on them.&lt;/p&gt;

&lt;p&gt;Third, it makes the answer easy to incorporate back into the final result.&lt;/p&gt;

&lt;p&gt;The reviewer is not taking over ownership.&lt;/p&gt;

&lt;p&gt;It is a review, not a handoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  The COO is the orchestrator, not just another peer
&lt;/h2&gt;

&lt;p&gt;This part matters.&lt;/p&gt;

&lt;p&gt;From the outside, it can look like the agents just call each other directly.&lt;/p&gt;

&lt;p&gt;That is not how I think about it.&lt;/p&gt;

&lt;p&gt;The COO is the central coordinator of the system.&lt;/p&gt;

&lt;p&gt;That means the COO owns the execution flow, keeps track of what the founder actually asked for, and decides when work should branch into specialist review.&lt;/p&gt;

&lt;p&gt;Specialists still review each other's work.&lt;br&gt;
But the architecture is orchestrated, not social.&lt;/p&gt;

&lt;p&gt;That distinction matters because it keeps ownership clear.&lt;/p&gt;

&lt;p&gt;If Marketing asks Lawyer to review a product claim, Marketing still owns the post.&lt;br&gt;
If CFO asks Accountant to validate a tax assumption, CFO still owns the pricing output.&lt;br&gt;
If the founder asks for a daily standup, the COO still owns the final standup.&lt;/p&gt;

&lt;p&gt;Without that, every task becomes shared ownership.&lt;/p&gt;

&lt;p&gt;Shared ownership is where systems get fuzzy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two small rules prevent most loops
&lt;/h2&gt;

&lt;p&gt;The protocol has two guardrails that matter more than they look.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. No-callback rule
&lt;/h3&gt;

&lt;p&gt;An agent cannot call someone already in the current chain.&lt;/p&gt;

&lt;p&gt;If the chain is &lt;code&gt;COO -&amp;gt; Marketing -&amp;gt; Lawyer&lt;/code&gt;, the Lawyer cannot bounce the question back to COO or Marketing.&lt;/p&gt;

&lt;p&gt;That kills the most annoying class of loop immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Max depth 3
&lt;/h3&gt;

&lt;p&gt;If the chain already has three agents, the current agent must answer directly.&lt;/p&gt;

&lt;p&gt;No more consultation.&lt;/p&gt;

&lt;p&gt;This is not mathematically pure. It is operational.&lt;/p&gt;

&lt;p&gt;You need a point where the system stops expanding and returns an answer.&lt;/p&gt;

&lt;p&gt;In practice, depth 3 has been enough.&lt;br&gt;
It gives room for a real cross-check without turning every task into a recursive meeting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this catches in practice
&lt;/h2&gt;

&lt;p&gt;The best part of the protocol is not elegance. It is the mistakes it catches.&lt;/p&gt;

&lt;p&gt;A few common examples:&lt;/p&gt;

&lt;h3&gt;
  
  
  Marketing describing technical systems
&lt;/h3&gt;

&lt;p&gt;This used to be risky.&lt;/p&gt;

&lt;p&gt;It is very easy for a content agent to write something that sounds plausible about orchestration, memory, or infrastructure while getting one important detail wrong.&lt;/p&gt;

&lt;p&gt;Now the rule is explicit: any public content that describes how the system works gets a technical review before it goes out.&lt;/p&gt;

&lt;p&gt;That keeps the writing sharp without turning it into fiction.&lt;/p&gt;

&lt;h3&gt;
  
  
  CFO making a pricing argument that leaks into tax treatment
&lt;/h3&gt;

&lt;p&gt;Pricing is not just pricing.&lt;/p&gt;

&lt;p&gt;It leaks into invoicing, VAT treatment, margin assumptions, and in some cases legal structure.&lt;/p&gt;

&lt;p&gt;The CFO can still own the business recommendation. But when tax treatment is part of the answer, the Accountant must review it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lawyer checking claims before publication
&lt;/h3&gt;

&lt;p&gt;This one is simple and high leverage.&lt;/p&gt;

&lt;p&gt;If a post or landing page makes a trust, compliance, or security claim, it gets reviewed before publishing.&lt;/p&gt;

&lt;p&gt;That rule alone prevents a lot of avoidable embarrassment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The communication matrix is less important than the boundaries
&lt;/h2&gt;

&lt;p&gt;People like diagrams for this kind of thing.&lt;/p&gt;

&lt;p&gt;I do too.&lt;/p&gt;

&lt;p&gt;But the diagram is not the real system.&lt;/p&gt;

&lt;p&gt;The real system is a set of enforced boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who can review what&lt;/li&gt;
&lt;li&gt;when review becomes mandatory&lt;/li&gt;
&lt;li&gt;who owns the final answer&lt;/li&gt;
&lt;li&gt;when the chain stops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else is presentation.&lt;/p&gt;

&lt;p&gt;If you get those four things right, the system feels much more disciplined.&lt;/p&gt;

&lt;p&gt;If you leave them vague, even good agents start to look unreliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;The hard part of multi-agent systems is not specialization.&lt;/p&gt;

&lt;p&gt;It is coordination under constraints.&lt;/p&gt;

&lt;p&gt;Anyone can make a few agents call each other.&lt;/p&gt;

&lt;p&gt;The interesting part is deciding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;when consultation is required&lt;/li&gt;
&lt;li&gt;how context is passed cleanly&lt;/li&gt;
&lt;li&gt;how loops are prevented&lt;/li&gt;
&lt;li&gt;who still owns the result when multiple specialists touch it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why I ended up writing a protocol instead of just adding more prompt text.&lt;/p&gt;

&lt;p&gt;The protocol made the system less magical.&lt;/p&gt;

&lt;p&gt;It also made it more trustworthy.&lt;/p&gt;

&lt;p&gt;And in production, I will take trustworthy over magical every time.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Most AI agent demos optimize for capability. Production buyers pay for control.</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Thu, 19 Mar 2026 15:16:46 +0000</pubDate>
      <link>https://dev.to/setas/most-ai-agent-demos-optimize-for-capability-production-buyers-pay-for-control-1ea4</link>
      <guid>https://dev.to/setas/most-ai-agent-demos-optimize-for-capability-production-buyers-pay-for-control-1ea4</guid>
      <description>&lt;p&gt;Every week I see a new AI agent demo.&lt;/p&gt;

&lt;p&gt;Book the meeting. Send the email. Refactor the code. Triage the ticket. Trade the stock. Run the company.&lt;/p&gt;

&lt;p&gt;The demos are getting better. Some of them are genuinely impressive.&lt;/p&gt;

&lt;p&gt;But most of them are optimized for the wrong buyer.&lt;/p&gt;

&lt;p&gt;They are optimized for the person watching the demo, not for the person who has to run the system after the demo.&lt;/p&gt;

&lt;p&gt;That second person usually cares about different questions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What can this thing access?&lt;/li&gt;
&lt;li&gt;What happens when it gets stuck?&lt;/li&gt;
&lt;li&gt;How do I approve risky actions?&lt;/li&gt;
&lt;li&gt;What did it actually do?&lt;/li&gt;
&lt;li&gt;How do I stop it?&lt;/li&gt;
&lt;li&gt;How do I roll it back?&lt;/li&gt;
&lt;li&gt;Which secrets can it touch?&lt;/li&gt;
&lt;li&gt;How do I explain its behavior to my team?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the real product surface.&lt;/p&gt;

&lt;p&gt;Not just capability. Control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability gets the screenshot. Control gets the budget.
&lt;/h2&gt;

&lt;p&gt;I think a lot of the current agent market is repeating a familiar pattern.&lt;br&gt;
The first wave proves that the interaction is possible. You show that an LLM can use tools, keep context, and complete a multi-step task. That gets attention fast because it feels new.&lt;/p&gt;

&lt;p&gt;Then reality shows up.&lt;/p&gt;

&lt;p&gt;The agent fails halfway through a workflow. Or it retries a step six times and burns API credits. Or it drafts something a human should have reviewed first. Or it keeps running after the useful part is already done. Or it touches a system that should have been off-limits.&lt;/p&gt;

&lt;p&gt;At that point, the question changes.&lt;/p&gt;

&lt;p&gt;It is no longer, "Can this agent do the task?"&lt;/p&gt;

&lt;p&gt;It becomes, "Can I trust this system in an environment that matters?"&lt;/p&gt;

&lt;p&gt;That is where most demos stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability is necessary. It is not the whole answer.
&lt;/h2&gt;

&lt;p&gt;A lot of products respond to this by adding better visibility.&lt;/p&gt;

&lt;p&gt;You get traces, timelines, logs, token counts, screenshots, and event streams. I like all of that. You need it.&lt;/p&gt;

&lt;p&gt;But observability on its own is still passive.&lt;/p&gt;

&lt;p&gt;It tells you what happened.&lt;/p&gt;

&lt;p&gt;Production users usually need more than that. They need ways to shape what is allowed to happen in the first place.&lt;/p&gt;

&lt;p&gt;Watching an agent fail in high resolution is still failure.&lt;/p&gt;

&lt;p&gt;The control plane is the part that turns visibility into operational trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  The control-plane primitives I think matter
&lt;/h2&gt;

&lt;p&gt;If I were evaluating an agent platform for real work, these are the things I would care about first.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Narrow permissions by default
&lt;/h3&gt;

&lt;p&gt;An agent should not wake up with broad access to everything.&lt;/p&gt;

&lt;p&gt;It should have access to exactly the tools, environments, and credentials required for the job. Nothing more.&lt;/p&gt;

&lt;p&gt;If the task is reading support tickets, it does not also need production deploy access.&lt;/p&gt;

&lt;p&gt;If the task is drafting copy, it does not also need billing permissions.&lt;/p&gt;

&lt;p&gt;The default should be small blast radius.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Review points for expensive or risky actions
&lt;/h3&gt;

&lt;p&gt;The most important feature in an autonomous system is often a well-placed pause.&lt;/p&gt;

&lt;p&gt;Some actions should be automatic. Some should require a human checkpoint.&lt;/p&gt;

&lt;p&gt;That could mean spending above a threshold, writing to production systems, touching customer data, or sending something externally.&lt;/p&gt;

&lt;p&gt;I do not see human review as a weakness in the product. I see it as part of the product.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Auditability that is actually useful
&lt;/h3&gt;

&lt;p&gt;I want more than a generic activity log.&lt;/p&gt;

&lt;p&gt;I want to know which tool was called, under which boundaries, and what happened next.&lt;/p&gt;

&lt;p&gt;If something goes wrong, I should be able to reconstruct the path without guessing.&lt;/p&gt;

&lt;p&gt;That matters for debugging. It also matters for trust inside a team.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Recovery and rollback strategy
&lt;/h3&gt;

&lt;p&gt;People talk a lot about autonomous execution. They talk less about undo.&lt;/p&gt;

&lt;p&gt;But if an agent edits configuration, changes data, triggers a workflow, or mutates state, rollback matters.&lt;/p&gt;

&lt;p&gt;The system should not just be able to move forward. It should help me recover from a bad step without turning the whole incident into manual archaeology.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Credential boundaries
&lt;/h3&gt;

&lt;p&gt;This one is boring, which is exactly why it matters.&lt;/p&gt;

&lt;p&gt;Credentials should be isolated by environment, role, and task. Temporary access is better than broad standing access. Fine-grained scopes are better than one giant shared credential.&lt;/p&gt;

&lt;p&gt;The more agentic the workflow becomes, the more this matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Observability tied to action
&lt;/h3&gt;

&lt;p&gt;Yes, I still want traces and telemetry.&lt;/p&gt;

&lt;p&gt;But I want them connected to intervention. When I see a loop, I should be able to stop it. When I see cost drift, I should be able to tighten a boundary. When I see a repeated failure, I should be able to change how the runtime behaves.&lt;/p&gt;

&lt;p&gt;Good observability should make intervention simpler, not just diagnosis prettier.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is why I think the category is really about trust
&lt;/h2&gt;

&lt;p&gt;I do not think production users are buying "an agent" in the abstract.&lt;/p&gt;

&lt;p&gt;They are buying a system they can trust around an agent.&lt;/p&gt;

&lt;p&gt;That trust does not come from a benchmark.&lt;/p&gt;

&lt;p&gt;It comes from constraints.&lt;/p&gt;

&lt;p&gt;It comes from knowing that the runtime has boundaries. That risky actions can be reviewed. That behavior can be inspected. That failures can be contained. That humans can step in cleanly.&lt;/p&gt;

&lt;p&gt;The winning products in this space will probably look less magical over time, not more.&lt;/p&gt;

&lt;p&gt;They will feel more operational. More boring. More inspectable.&lt;/p&gt;

&lt;p&gt;That is a good sign.&lt;/p&gt;

&lt;p&gt;In infrastructure, boring is often what people pay for.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I would position OpenClawCloud
&lt;/h2&gt;

&lt;p&gt;This is the direction I find most interesting for OpenClawCloud.&lt;/p&gt;

&lt;p&gt;Not "host your agents in the cloud" as a generic message.&lt;/p&gt;

&lt;p&gt;That is too weak.&lt;/p&gt;

&lt;p&gt;The stronger message is closer to this:&lt;/p&gt;

&lt;p&gt;OpenClawCloud should be for teams that do not just want agents that can act. They want agents they can supervise.&lt;/p&gt;

&lt;p&gt;The value is not raw autonomy.&lt;/p&gt;

&lt;p&gt;The value is a managed runtime built around operational trust.&lt;/p&gt;

&lt;p&gt;If I am a small team, I probably do not want to assemble review points, action history, credential isolation, recovery strategy, and runtime visibility from scratch around every agent workflow.&lt;/p&gt;

&lt;p&gt;I want those concerns handled in one place.&lt;/p&gt;

&lt;p&gt;That is the real operational burden.&lt;/p&gt;

&lt;p&gt;And it is where I think the product story gets much stronger.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;Most agent demos today are selling capability.&lt;/p&gt;

&lt;p&gt;I think production buyers are already looking for control.&lt;/p&gt;

&lt;p&gt;That is where the serious budget goes.&lt;/p&gt;

&lt;p&gt;If your product helps me trust an agent in a real environment, I will pay attention.&lt;/p&gt;

&lt;p&gt;If it only helps me watch an impressive demo, I probably will not.&lt;/p&gt;

&lt;p&gt;That is the lens I am using for OpenClawCloud.&lt;/p&gt;

&lt;p&gt;If you are building or evaluating agent infrastructure, I think this is the right question to ask first:&lt;/p&gt;

&lt;p&gt;What is the control plane here?&lt;/p&gt;

&lt;p&gt;If you want to follow what I am building around that idea, take a look at clawdcloud.net.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>How 8 AI Agents Share a Brain — Building a Persistent Knowledge Graph with MCP</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Tue, 17 Mar 2026 15:29:06 +0000</pubDate>
      <link>https://dev.to/setas/how-8-ai-agents-share-a-brain-building-a-persistent-knowledge-graph-with-mcp-541l</link>
      <guid>https://dev.to/setas/how-8-ai-agents-share-a-brain-building-a-persistent-knowledge-graph-with-mcp-541l</guid>
      <description>&lt;p&gt;Every multi-agent demo looks smart until the agents need to remember something outside the current prompt.&lt;/p&gt;

&lt;p&gt;That is where most systems fall apart.&lt;/p&gt;

&lt;p&gt;A CEO agent can suggest strategy. A Marketing agent can draft a thread. A Lawyer agent can block a risky claim. But if each one only sees the current conversation, you do not have a company. You have eight clever goldfish.&lt;/p&gt;

&lt;p&gt;I run a solo company with 8 AI agents: CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the others. The part that makes the system actually compound is not the prompts. It is the shared memory layer.&lt;/p&gt;

&lt;p&gt;I built that memory as a persistent knowledge graph behind an MCP server. Every agent reads from it. Every agent can add to it. That is how the system remembers decisions, deadlines, lessons, client context, and what already happened this week.&lt;/p&gt;

&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Multi-agent systems need shared memory or they keep rediscovering the same context&lt;/li&gt;
&lt;li&gt;I use a knowledge graph exposed through MCP so every agent reads and writes the same institutional memory&lt;/li&gt;
&lt;li&gt;The hard part was not the schema. It was making file-backed memory survive concurrent writes&lt;/li&gt;
&lt;li&gt;The fix was pragmatic: async mutex, atomic writes, auto-repair on load, and strict retention rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Problem: Agents Forget the Company Exists
&lt;/h2&gt;

&lt;p&gt;A single chat session can hold a lot of context. A real company cannot depend on that.&lt;/p&gt;

&lt;p&gt;The COO needs to know whether &lt;code&gt;/weekly-review&lt;/code&gt; already ran. Marketing needs to know which product URL is allowed on X. The Accountant needs the ENI tax regime details. The Improver needs past mistakes. If that context lives only in old chats or random markdown files, each agent spends half its time re-learning the same facts.&lt;/p&gt;

&lt;p&gt;That creates three failure modes fast.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;repeated work&lt;/strong&gt;. The same questions get answered again because nobody knows the answer already exists.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;contradictions&lt;/strong&gt;. Marketing says a feature is ready. CTO knows it is not. Without a shared source of truth, both answers sound plausible.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;no compounding&lt;/strong&gt;. The system makes mistakes, but the mistakes do not become part of the system.&lt;/p&gt;

&lt;p&gt;That last one mattered most to me. If an agent screws up and nothing durable changes, you are paying for the same lesson twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Shared Brain Stores
&lt;/h2&gt;

&lt;p&gt;I kept the graph deliberately small. It stores things that change decisions, not raw documents.&lt;/p&gt;

&lt;p&gt;The core objects are entities and relations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SondMe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"entityType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"observations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Status: active"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Stack: Elixir/Phoenix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Domain: sondme.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And relations are simple, active-voice edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Marketing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Lawyer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"relationType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"consults"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, the graph stores a few categories really well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strategic decisions and their rationale&lt;/li&gt;
&lt;li&gt;Product status, launch dates, and URLs&lt;/li&gt;
&lt;li&gt;Prompt run trackers like &lt;code&gt;prompt-run:weekly-review&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Lessons learned after launches or incidents&lt;/li&gt;
&lt;li&gt;Deadlines and compliance reminders&lt;/li&gt;
&lt;li&gt;Client and pricing context when a deal structure matters later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just as important is what it does &lt;strong&gt;not&lt;/strong&gt; store.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raw file contents&lt;/li&gt;
&lt;li&gt;Entire chat transcripts&lt;/li&gt;
&lt;li&gt;Every observation forever&lt;/li&gt;
&lt;li&gt;Anything that is better left in the repo as a document&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That boundary matters. If memory becomes a dump of everything, agents stop trusting it because signal gets buried in noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Was the Right Boundary
&lt;/h2&gt;

&lt;p&gt;I did not want every agent reading arbitrary files directly and inventing its own storage conventions.&lt;/p&gt;

&lt;p&gt;The Model Context Protocol gave me a clean interface: memory becomes a tool, not a folder full of tribal knowledge.&lt;/p&gt;

&lt;p&gt;That changes the ergonomics a lot.&lt;/p&gt;

&lt;p&gt;Instead of "go search old notes and hope you find the right paragraph," the agent asks memory for a specific entity or adds an observation to an existing one. The protocol boundary also made it much easier to share the same memory across different agents and modes.&lt;/p&gt;

&lt;p&gt;It is the same reason APIs beat random database access. Fewer ways to be inconsistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Version Was Simple and Fragile
&lt;/h2&gt;

&lt;p&gt;The storage format is JSONL. One JSON object per line. Easy to inspect, easy to back up, easy to repair by hand.&lt;/p&gt;

&lt;p&gt;That simplicity was useful early on. I could open the file and understand what the system knew without needing a graph database, admin UI, or migration layer.&lt;/p&gt;

&lt;p&gt;But the naïve version had a nasty problem.&lt;/p&gt;

&lt;p&gt;When multiple agents wrote at roughly the same time, the server would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load the graph from disk&lt;/li&gt;
&lt;li&gt;Modify it in memory&lt;/li&gt;
&lt;li&gt;Write the whole graph back&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is fine in a single-writer world.&lt;/p&gt;

&lt;p&gt;A multi-agent system is not a single-writer world.&lt;/p&gt;

&lt;p&gt;If two write operations start from the same file state, the second write can wipe out the first one without throwing an obvious error. Worse, if a write is interrupted mid-flight, the JSONL file can end up partially corrupted.&lt;/p&gt;

&lt;p&gt;That means the shared brain becomes the failure point for the whole company.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug That Forced a Real Architecture
&lt;/h2&gt;

&lt;p&gt;This bug showed up exactly where you would expect: parallel tool calls.&lt;/p&gt;

&lt;p&gt;One part of the system would create entities. Another would create relations. Both thought they were doing a legitimate read-modify-write cycle. They were. Just not safely.&lt;/p&gt;

&lt;p&gt;The result was classic concurrent state pain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lost writes&lt;/li&gt;
&lt;li&gt;Duplicate entities&lt;/li&gt;
&lt;li&gt;Broken JSON lines&lt;/li&gt;
&lt;li&gt;Agents reading stale or malformed memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the moment when "it works in a demo" stops being useful.&lt;/p&gt;

&lt;p&gt;I did not solve it with a giant rewrite. I used a pragmatic local fork of &lt;code&gt;@modelcontextprotocol/server-memory&lt;/code&gt; and added three protections.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Async mutex
&lt;/h3&gt;

&lt;p&gt;All mutating operations go through a single queue. One write at a time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Mutex&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;locked&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;()();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is not glamorous. It is effective.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Atomic writes
&lt;/h3&gt;

&lt;p&gt;Every save writes to a temporary file first, then renames it over the original.&lt;/p&gt;

&lt;p&gt;That means a crash gives me either the old valid file or the new valid file. Not half of one and half of the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Auto-repair on load
&lt;/h3&gt;

&lt;p&gt;The loader wraps each line parse in a try/catch, skips corrupt lines, and deduplicates entities and relations.&lt;/p&gt;

&lt;p&gt;That turned memory corruption from a wake-up-and-debug event into a survivable incident.&lt;/p&gt;

&lt;p&gt;Not pretty. Very useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Knowledge Graph Beats Shared Notes
&lt;/h2&gt;

&lt;p&gt;A flat shared notes file works until you need relationships.&lt;/p&gt;

&lt;p&gt;Once you have agents consulting each other, products sharing infrastructure, deadlines tied to prompts, and lessons attached to incidents, the graph model becomes much more natural.&lt;/p&gt;

&lt;p&gt;A few examples from my setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The COO can see that &lt;code&gt;prompt-run:monthly-accounting&lt;/code&gt; is overdue without searching past chats&lt;/li&gt;
&lt;li&gt;Marketing can check the product registry before using a URL in a post&lt;/li&gt;
&lt;li&gt;The Improver can scan lesson entities and spot recurring failures&lt;/li&gt;
&lt;li&gt;Client deal structures can be stored once and reused by CFO and Accountant later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The graph is doing two jobs at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It is a memory layer&lt;/li&gt;
&lt;li&gt;It is a constraint layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That second part matters. Good memory is not just recall. It is preventing the system from making the same wrong move again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retention Rules Matter More Than People Expect
&lt;/h2&gt;

&lt;p&gt;The graph would be useless if it only grew.&lt;/p&gt;

&lt;p&gt;So I added retention rules.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standups: keep 7 days&lt;/li&gt;
&lt;li&gt;Trend scans: keep 7 days&lt;/li&gt;
&lt;li&gt;Campaigns: prune 30 days after completion&lt;/li&gt;
&lt;li&gt;Lessons and decisions: permanent&lt;/li&gt;
&lt;li&gt;Prompt trackers: permanent and tiny&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This sounds like housekeeping. It is actually part of system quality.&lt;/p&gt;

&lt;p&gt;If stale operational data hangs around forever, agents start mixing old state with current state. That is how you get false overdue alerts, outdated campaign assumptions, and dead leads showing up in fresh plans.&lt;/p&gt;

&lt;p&gt;Memory hygiene is part of reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed After Adding Shared Memory
&lt;/h2&gt;

&lt;p&gt;The best effect was not that agents became smarter.&lt;/p&gt;

&lt;p&gt;It was that they became less repetitive.&lt;/p&gt;

&lt;p&gt;The COO can run a standup without rediscovering the same recurring deadlines. Marketing can pick up the current positioning of a product without me re-explaining it. The Improver can look at actual accumulated mistakes instead of vague impressions.&lt;/p&gt;

&lt;p&gt;The system feels less like prompt orchestration and more like a company with institutional memory.&lt;/p&gt;

&lt;p&gt;That is the difference between a novelty and an operating model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Do Differently
&lt;/h2&gt;

&lt;p&gt;If I were rebuilding this today, I would make two changes earlier.&lt;/p&gt;

&lt;p&gt;First, I would design retention rules on day one. I added them after feeling the pain.&lt;/p&gt;

&lt;p&gt;Second, I would move sooner toward a BEAM-native version of this memory server. The JavaScript fork works, but a single GenServer processing writes sequentially is much closer to the shape of the problem.&lt;/p&gt;

&lt;p&gt;The current version is stable enough to run the company. It is not the final form.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Takeaway
&lt;/h2&gt;

&lt;p&gt;The interesting part of multi-agent systems is not "can one agent call another."&lt;/p&gt;

&lt;p&gt;It is whether the whole system can remember, constrain itself, and improve from mistakes.&lt;/p&gt;

&lt;p&gt;Without shared memory, every agent is just renting intelligence by the prompt.&lt;/p&gt;

&lt;p&gt;With a durable shared brain, the system starts to compound.&lt;/p&gt;

&lt;p&gt;That is the part I would build first.&lt;/p&gt;




&lt;p&gt;I’m João, a solo founder from Portugal building SaaS products with Elixir and Phoenix. I write about the real mechanics of running a company with AI agents: what works, what breaks, and what I’d change next.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Why Erlang's Supervision Trees Are the Missing Piece for AI Agents</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Wed, 11 Mar 2026 15:25:13 +0000</pubDate>
      <link>https://dev.to/setas/why-erlangs-supervision-trees-are-the-missing-piece-for-ai-agents-1mjo</link>
      <guid>https://dev.to/setas/why-erlangs-supervision-trees-are-the-missing-piece-for-ai-agents-1mjo</guid>
      <description>&lt;p&gt;Every week, a new AI agent framework launches. LangChain, CrewAI, AutoGen, Magentic-One — the list grows faster than anyone can evaluate.&lt;/p&gt;

&lt;p&gt;They all solve the same problem: how do you make an LLM do multi-step tasks? Chain some prompts, give it tools, add memory. Ship it.&lt;/p&gt;

&lt;p&gt;But none of them answer the question that actually matters in production: &lt;strong&gt;what happens when your agent crashes at 3am?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I run 8 AI agents that manage my solo company — CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the others. They share a persistent knowledge graph, consult each other automatically, and post content to social media while I sleep.&lt;/p&gt;

&lt;p&gt;They crash. Regularly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Agents Aren't Containers
&lt;/h2&gt;

&lt;p&gt;Here's the core problem most frameworks ignore: AI agents are deeply stateful.&lt;/p&gt;

&lt;p&gt;A web server is (mostly) stateless. Kill the container, spin up a new one from the same image. No data lost. Kubernetes was designed for exactly this pattern.&lt;/p&gt;

&lt;p&gt;AI agents are different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context accumulates&lt;/strong&gt; — an agent mid-task holds a conversation history, tool call results, intermediate reasoning. Lose that, and it starts over from scratch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failures are semantic, not just process failures&lt;/strong&gt; — "the agent entered an infinite loop and burned $50 in API tokens" is different from "the container OOM-killed." You need supervision that understands &lt;em&gt;what&lt;/em&gt; went wrong, not just &lt;em&gt;that&lt;/em&gt; something stopped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordination requires state&lt;/strong&gt; — agents that collaborate share context, delegate subtasks, track who's done what. Kill one, and the others are left with stale references.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Costs are real&lt;/strong&gt; — every crashed-and-restarted agent potentially re-runs expensive LLM calls. Crash recovery isn't just about uptime. It's about not burning money.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most frameworks deal with this by... not dealing with it. They assume the happy path. If something fails, you restart the whole script manually.&lt;/p&gt;

&lt;p&gt;That works for demos. It doesn't work when your agent is supposed to post a tweet at 14:00 UTC every day, rain or shine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Erlang Solved This in 1986
&lt;/h2&gt;

&lt;p&gt;In 1986, Joe Armstrong and the Ericsson team had a problem: build telephone switches that handle millions of concurrent calls with 99.999% uptime. That's 5.26 minutes of downtime per year.&lt;/p&gt;

&lt;p&gt;Their solution: don't prevent crashes. &lt;strong&gt;Expect them and recover automatically.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This led to OTP (Open Telecom Platform) and its killer feature: &lt;strong&gt;supervision trees&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The core idea is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every process has a &lt;strong&gt;supervisor&lt;/strong&gt; — a parent process whose only job is watching children&lt;/li&gt;
&lt;li&gt;When a child crashes, the supervisor restarts it according to a defined strategy&lt;/li&gt;
&lt;li&gt;Supervisors can supervise other supervisors — creating a tree of fault tolerance&lt;/li&gt;
&lt;li&gt;The restart happens in microseconds, not seconds&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's what a basic agent supervisor looks like in Elixir:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;AgentSupervisor&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Supervisor&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;name:&lt;/span&gt; &lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;children&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;AgentWorker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;id:&lt;/span&gt; &lt;span class="ss"&gt;:ceo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;role:&lt;/span&gt; &lt;span class="ss"&gt;:strategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;model:&lt;/span&gt; &lt;span class="ss"&gt;:claude_sonnet&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;AgentWorker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;id:&lt;/span&gt; &lt;span class="ss"&gt;:marketing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;role:&lt;/span&gt; &lt;span class="ss"&gt;:content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;model:&lt;/span&gt; &lt;span class="ss"&gt;:claude_sonnet&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;AgentWorker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;id:&lt;/span&gt; &lt;span class="ss"&gt;:accountant&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;role:&lt;/span&gt; &lt;span class="ss"&gt;:tax&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;model:&lt;/span&gt; &lt;span class="ss"&gt;:claude_haiku&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;MemoryServer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;path:&lt;/span&gt; &lt;span class="s2"&gt;"memory.jsonl"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;SchedulerWorker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;interval:&lt;/span&gt; &lt;span class="ss"&gt;:timer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;strategy:&lt;/span&gt; &lt;span class="ss"&gt;:one_for_one&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three restart strategies cover every failure pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;:one_for_one&lt;/code&gt;&lt;/strong&gt; — only restart the crashed process. Perfect for independent agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;:one_for_all&lt;/code&gt;&lt;/strong&gt; — restart everything if one crashes. Use when tightly coupled agent teams have shared state where partial state is worse than a full restart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;:rest_for_one&lt;/code&gt;&lt;/strong&gt; — restart the crashed process and everything started after it. Useful when later agents depend on earlier ones.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;Here's a real scenario from my system. My agents share a persistent knowledge graph stored as a JSONL file — one JSON object per line, each representing an entity or relation. Eight agents read and write to this file through a Model Context Protocol (MCP) memory server. Every strategic decision, client pipeline update, prompt run timestamp, and lesson learned goes here.&lt;/p&gt;

&lt;p&gt;The race condition was textbook. When multiple agents fire parallel tool calls — say, &lt;code&gt;create_entities&lt;/code&gt; and &lt;code&gt;create_relations&lt;/code&gt; in the same batch — both operations would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the entire JSONL file into memory&lt;/li&gt;
&lt;li&gt;Parse every line into an in-memory graph&lt;/li&gt;
&lt;li&gt;Append their new entities/relations&lt;/li&gt;
&lt;li&gt;Serialize the full graph back to disk&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 4 is the problem. Both operations read the &lt;em&gt;same&lt;/em&gt; file state. Both write back the &lt;em&gt;full&lt;/em&gt; graph plus their additions. The second write obliterates the first's additions entirely. No error, no warning — data just vanishes.&lt;/p&gt;

&lt;p&gt;In a typical framework, this would mean:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent tries to read memory → gets a JSON parse error (if a write was interrupted mid-line)&lt;/li&gt;
&lt;li&gt;Agent crashes or returns garbage&lt;/li&gt;
&lt;li&gt;I wake up, see broken output, manually debug the JSONL file&lt;/li&gt;
&lt;li&gt;Fix the file, restart everything&lt;/li&gt;
&lt;li&gt;Repeat next time it happens&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With supervision trees:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Memory server process detects corruption on load&lt;/li&gt;
&lt;li&gt;Process crashes — intentionally. In Erlang, crashing is a feature, not a bug.&lt;/li&gt;
&lt;li&gt;Supervisor restarts the memory server in microseconds&lt;/li&gt;
&lt;li&gt;On restart, the init callback runs auto-repair: wraps each &lt;code&gt;JSON.parse&lt;/code&gt; in a try/catch, skips corrupt lines, deduplicates entities by name and relations by &lt;code&gt;from|type|to&lt;/code&gt; key&lt;/li&gt;
&lt;li&gt;Agents resume with clean data&lt;/li&gt;
&lt;li&gt;I'm asleep. Everything just works.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix I implemented to address the root cause: a local fork of the MCP memory server with three additions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Async mutex&lt;/strong&gt; — a queue-based lock that serializes all write operations. When one &lt;code&gt;saveGraph()&lt;/code&gt; is running, subsequent calls wait their turn. This eliminates the read-modify-write race entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atomic writes&lt;/strong&gt; — every save writes to a &lt;code&gt;.tmp&lt;/code&gt; file first, then renames it over the original. A crash mid-write gives you either the old complete file or the new complete file — never a half-written mess.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-repair on load&lt;/strong&gt; — the graph loader wraps each line's &lt;code&gt;JSON.parse&lt;/code&gt; in a try/catch. Corrupt lines get skipped with a warning. Duplicate entities (same name) and duplicate relations (same from/type/to triple) are collapsed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's roughly what the mutex pattern looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Mutex&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_locked&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;()();&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Every mutating operation goes through the lock:&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;createEntities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loadGraph&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// read&lt;/span&gt;
    &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;newEntities&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// modify&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;saveGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;            &lt;span class="c1"&gt;// write (atomic)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is exactly the kind of infrastructure problem that disappears on the BEAM. Erlang processes don't share memory. Each process has its own heap. There's no concurrent write to the same file because the memory server is a single GenServer processing messages sequentially from its mailbox — mutual exclusion is built into the execution model, not bolted on with a mutex.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;the supervision tree doesn't prevent the bug&lt;/strong&gt;. It makes the bug survivable. The corrupt write still happens occasionally (on the JavaScript version — the BEAM version wouldn't have this class of bug at all), but the system recovers before anyone notices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Each Process Is an Island
&lt;/h2&gt;

&lt;p&gt;BEAM processes (Erlang's virtual machine) have properties that map perfectly to AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt; — each process has its own heap memory. A crash in one can't corrupt another. Your Marketing agent going haywire can't touch the Accountant's tax calculations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight&lt;/strong&gt; — each process is ~2KB. You can run hundreds of thousands on a single machine. An 8-agent system with tool workers, a memory server, and a scheduler process would fit comfortably on a machine with 256MB RAM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preemptive scheduling&lt;/strong&gt; — the BEAM VM enforces fair CPU sharing. One agent stuck in an expensive computation can't starve the others. Every agent gets its turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Message passing&lt;/strong&gt; — agents communicate by sending immutable messages. No shared mutable state, no locks, no race conditions (except at I/O boundaries, which is where the mutex comes in).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare this to running AI agents as Python threads or async tasks. One unhandled exception can take down the entire process. One memory leak slowly poisons the whole system. One blocking call freezes everything.&lt;/p&gt;

&lt;p&gt;My current system runs on Node.js with a hand-rolled mutex and atomic file writes to paper over exactly these problems. It works — 91% scheduler success rate, auto-repairing memory, months of uptime. But every fix is fighting the runtime instead of working with it. On the BEAM, process isolation and sequential mailbox processing eliminate entire categories of bugs before you write a line of application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Now
&lt;/h2&gt;

&lt;p&gt;AI agents are moving from demos to production. And production means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agents that run 24/7, not just during a demo&lt;/li&gt;
&lt;li&gt;Real money flowing through API calls ($0.01 per prompt adds up quick when an agent loops)&lt;/li&gt;
&lt;li&gt;Users depending on outputs — posts that need to go out, invoices that need to be generated, compliance deadlines that can't be missed&lt;/li&gt;
&lt;li&gt;Multiple agents coordinating, where one failure cascades if not contained&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The industry is rediscovering problems that telecom solved decades ago. Ericsson's AXD 301 switch achieved 99.9999999% uptime — nine nines — using these exact patterns. Not because the hardware never failed, but because the software expected failure and recovered faster than users noticed.&lt;/p&gt;

&lt;p&gt;Your AI agent doesn't need nine nines. But it does need to survive a 3am crash without you waking up to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Counterargument
&lt;/h2&gt;

&lt;p&gt;"But I'm not going to rewrite my Python agent in Elixir."&lt;/p&gt;

&lt;p&gt;Fair. And you don't have to. The supervision tree &lt;em&gt;pattern&lt;/em&gt; is more important than the language:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Wrap agents in health-check loops&lt;/strong&gt; that detect hangs and kill them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoint state regularly&lt;/strong&gt; so a restart doesn't lose everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set budget caps&lt;/strong&gt; that pause agents before they burn your API credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor semantically&lt;/strong&gt; — is the agent making progress, or is it looping?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But if you're choosing a foundation for a new agent system — especially one that needs to run multiple coordinating agents reliably — I'd argue the BEAM gives you a 40-year head start. These patterns aren't libraries you install. They're built into the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Build Next
&lt;/h2&gt;

&lt;p&gt;If I were starting a new AI agent platform from scratch today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process-per-agent&lt;/strong&gt; with OTP supervisors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State checkpointing&lt;/strong&gt; to PostgreSQL on every tool call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent spend tracking&lt;/strong&gt; with configurable budget caps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PubSub for inter-agent messaging&lt;/strong&gt; — no external message queue needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry hooks&lt;/strong&gt; for observability (OpenTelemetry + Sentry)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is roughly what I'm building with OpenClaw Cloud, and it's why I chose Elixir for the stack. Not because Elixir is trendy, but because the problem — running many stateful, failure-prone, communicating processes — is literally what the BEAM was designed for.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm João, a solo developer from Portugal building SaaS with Elixir and Phoenix. I recently wrote about &lt;a href="https://dev.to/setas/i-run-a-solo-company-with-ai-agent-departments-50nf"&gt;running a solo company with AI agent departments&lt;/a&gt; — this article is the technical deep-dive on why that system stays reliable. Find me on &lt;a href="https://x.com/joaosetas" rel="noopener noreferrer"&gt;X (@joaosetas)&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>I Run a Solo Company with AI Agent Departments</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Tue, 03 Mar 2026 10:41:20 +0000</pubDate>
      <link>https://dev.to/setas/i-run-a-solo-company-with-ai-agent-departments-50nf</link>
      <guid>https://dev.to/setas/i-run-a-solo-company-with-ai-agent-departments-50nf</guid>
      <description>&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I'm a solo founder running 5 SaaS products with 0 employees&lt;/li&gt;
&lt;li&gt;I built 8 AI agent "departments" using GitHub Copilot custom agents — CEO, CFO, COO, Lawyer, Accountant, Marketing, CTO, and an Improver that upgrades the others&lt;/li&gt;
&lt;li&gt;They share a persistent knowledge graph, consult each other automatically, and self-improve&lt;/li&gt;
&lt;li&gt;Here's how it actually works, with code snippets and honest tradeoffs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Premise
&lt;/h2&gt;

&lt;p&gt;I run a solo software company from Braga, Portugal. Five products. Zero employees. Zero funding.&lt;/p&gt;

&lt;p&gt;The products: &lt;a href="https://sondme.com" rel="noopener noreferrer"&gt;SondMe&lt;/a&gt; (radio monitoring), &lt;a href="https://countermark.ai" rel="noopener noreferrer"&gt;Countermark&lt;/a&gt; (bot detection), OpenClawCloud (AI agent hosting), Vertate (verification), and Agent-Inbox. All built with Elixir, Phoenix, and LiveView. All deployed on Fly.io for under €50/month total.&lt;/p&gt;

&lt;p&gt;The problem: even a solo founder needs to handle marketing, accounting, legal compliance, operations, financial planning, and tech decisions. Wearing all those hats meant things slipped. Deadlines got missed. Content didn't get posted. IVA filings almost got forgotten.&lt;/p&gt;

&lt;p&gt;So I built something weird: a full virtual company where every department is an AI agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Roster
&lt;/h2&gt;

&lt;p&gt;Each agent is a markdown file in &lt;code&gt;.github/agents/&lt;/code&gt; inside my management repo. GitHub Copilot loads the right agent based on which mode I'm working in. Here's the team:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;What It Actually Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CEO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strategy &amp;amp; trends&lt;/td&gt;
&lt;td&gt;Scans Hacker News and X for market signals. Validates product direction against trends.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CFO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Financial planning&lt;/td&gt;
&lt;td&gt;Pricing models, cash flow projections, cost analysis. Checks margins before I commit to anything.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;COO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Operations&lt;/td&gt;
&lt;td&gt;Runs daily standups. Maintains the sprint board. Orchestrates other agents.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Marketing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Content &amp;amp; growth&lt;/td&gt;
&lt;td&gt;Writes all social media content in my voice. Schedules posts. Runs engagement routines.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accountant&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tax &amp;amp; invoicing&lt;/td&gt;
&lt;td&gt;Portuguese IVA rules, IRS simplified regime, invoice requirements. Knows fiscal deadlines cold.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lawyer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;GDPR, contracts, Terms of Service. Reviews product claims before Marketing publishes them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CTO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;Build-vs-buy decisions, DevOps, stack consistency across all 5 products.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Improver&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Meta-agent&lt;/td&gt;
&lt;td&gt;Reads past mistakes and upgrades the other agents. Creates new skills. The system evolves itself.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't chatbots. Each agent has domain-specific instructions, access to real tools (MCP servers for X, dev.to, Sentry, scheduling, memory), and the authority to act autonomously.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works — The Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Agent Files
&lt;/h3&gt;

&lt;p&gt;Each agent is a &lt;code&gt;.agent.md&lt;/code&gt; file with structured instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Marketing Agent — AIFirst&lt;/span&gt;

&lt;span class="gu"&gt;## Core Responsibilities&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Content strategy and calendar
&lt;span class="p"&gt;-&lt;/span&gt; Social media posting (via X and dev.to MCP tools)
&lt;span class="p"&gt;-&lt;/span&gt; Community engagement
&lt;span class="p"&gt;-&lt;/span&gt; Launch planning

&lt;span class="gu"&gt;## Content Voice &amp;amp; Tone&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; First person singular ("I", never "we")
&lt;span class="p"&gt;-&lt;/span&gt; Technical substance over hype
&lt;span class="p"&gt;-&lt;/span&gt; Show the work — code, configs, real numbers
&lt;span class="p"&gt;-&lt;/span&gt; No: revolutionary, game-changing, leverage, synergy...

&lt;span class="gu"&gt;## Autonomous Execution&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Posts tweets directly via scheduler
&lt;span class="p"&gt;-&lt;/span&gt; Publishes dev.to articles (published: true)
&lt;span class="p"&gt;-&lt;/span&gt; Engagement: likes, replies, follows — every day
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: these aren't generic "be helpful" prompts. The Marketing agent knows my posting schedule, my voice quirks, which platforms I use, which URLs are blocked on X, and which products to rotate in the content calendar. The Accountant knows Portuguese ENI tax law, IVA quarterly deadlines, and the simplified IRS regime. Real domain expertise encoded in markdown.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared Memory — The Knowledge Graph
&lt;/h3&gt;

&lt;p&gt;This is where it gets interesting. All agents share a &lt;strong&gt;persistent knowledge graph&lt;/strong&gt; via a Model Context Protocol (MCP) memory server. What one agent learns, every other agent can read.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────┐    ┌─────────────┐    ┌──────────┐
│ Marketing│───→│             │←───│ CFO      │
│          │    │  Knowledge  │    │          │
│ CEO      │───→│    Graph    │←───│Accountant│
│          │    │             │    │          │
│ Lawyer   │───→│ (memory.jsonl)│←──│ Improver │
└──────────┘    └─────────────┘    └──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Entities have types: &lt;code&gt;product&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt;, &lt;code&gt;deadline&lt;/code&gt;, &lt;code&gt;client&lt;/code&gt;, &lt;code&gt;metric&lt;/code&gt;, &lt;code&gt;lesson&lt;/code&gt;. Relations use active voice: &lt;code&gt;owns&lt;/code&gt;, &lt;code&gt;uses&lt;/code&gt;, &lt;code&gt;built-with&lt;/code&gt;, &lt;code&gt;depends-on&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Real example of what's stored:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strategic decisions and their rationale&lt;/li&gt;
&lt;li&gt;Product status, launch dates, key metrics&lt;/li&gt;
&lt;li&gt;Financial data (pricing decisions, cost benchmarks)&lt;/li&gt;
&lt;li&gt;Legal and compliance decisions&lt;/li&gt;
&lt;li&gt;Lessons learned from launches and incidents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory has retention rules too — standups older than 7 days get pruned, but lessons and decisions are permanent. It's the company's institutional memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inter-Agent Communication
&lt;/h3&gt;

&lt;p&gt;Here's the part that surprised me most. Agents &lt;strong&gt;consult each other automatically&lt;/strong&gt; when their work crosses into another domain.&lt;/p&gt;

&lt;p&gt;The protocol works like this: each agent has a trigger table. When Marketing writes a product claim, it auto-calls the Lawyer for review. When CFO does pricing, it calls the Accountant to verify tax treatment. When CTO proposes infrastructure changes, it calls CFO to check the cost impact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CEO ←→ CFO        Strategy ↔ Financial viability
CEO ←→ CTO        Strategy ↔ Technical feasibility
CFO ←→ Accountant Financial plans ↔ Tax compliance
Marketing ←→ Lawyer  Campaigns ↔ Legal compliance
COO → any          Orchestrator can call any agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The peer review request format looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Peer Review Request&lt;/span&gt;

&lt;span class="gs"&gt;**From**&lt;/span&gt;: Marketing
&lt;span class="gs"&gt;**Call chain**&lt;/span&gt;: COO → Marketing
&lt;span class="gs"&gt;**Task**&lt;/span&gt;: Draft product launch tweet for Countermark
&lt;span class="gs"&gt;**What I did**&lt;/span&gt;: Wrote tweet claiming "99% bot detection accuracy"
&lt;span class="gs"&gt;**What I need from you**&lt;/span&gt;: Is this claim substantiated?

Please respond with:
&lt;span class="p"&gt;1.&lt;/span&gt; ✅ APPROVED
&lt;span class="p"&gt;2.&lt;/span&gt; ⚠️ CONCERNS
&lt;span class="p"&gt;3.&lt;/span&gt; 🔴 BLOCKING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call-chain tracking prevents infinite loops — each consultation includes who's already been called, and there's a max depth of 3. If CFO calls Accountant, the Accountant can't call CFO back.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Daily Standup
&lt;/h3&gt;

&lt;p&gt;Every morning, the COO agent runs a standup that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checks Sentry for errors across all 5 products&lt;/li&gt;
&lt;li&gt;Scans the sprint board for overdue tasks&lt;/li&gt;
&lt;li&gt;Checks if periodic prompts are overdue (weekly review, monthly accounting, quarterly IVA)&lt;/li&gt;
&lt;li&gt;Reads the knowledge graph for context&lt;/li&gt;
&lt;li&gt;Delegates tasks to other agents&lt;/li&gt;
&lt;li&gt;Produces a prioritized day plan&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's not a status meeting — it's an automated orchestration run that delegates work to the right specialist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-Improvement — The Improver Agent
&lt;/h3&gt;

&lt;p&gt;This is the weirdest (and possibly most valuable) part. There's a meta-agent called the Improver whose job is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read &lt;code&gt;lesson&lt;/code&gt; entities from memory (mistakes and learnings logged by other agents)&lt;/li&gt;
&lt;li&gt;Identify patterns across sessions&lt;/li&gt;
&lt;li&gt;Create new skills (reusable instruction files for specific domains)&lt;/li&gt;
&lt;li&gt;Update other agents' instructions when gaps are found&lt;/li&gt;
&lt;li&gt;Propose new agents when workload patterns suggest one is needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After every complex task, agents store a lesson:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Entity: lesson:2026-02-10:memory-corruption
Type: lesson
Observations:
&lt;span class="p"&gt;  -&lt;/span&gt; "Agent: CTO"
&lt;span class="p"&gt;  -&lt;/span&gt; "Category: bug"
&lt;span class="p"&gt;  -&lt;/span&gt; "Summary: Concurrent memory writes corrupted JSONL file"
&lt;span class="p"&gt;  -&lt;/span&gt; "Detail: Parallel tool calls to create_entities and create_relations
    caused race condition in the memory server"
&lt;span class="p"&gt;  -&lt;/span&gt; "Action: Added async mutex + atomic writes to local fork"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Improver reads these monthly and upgrades the system. The system literally improves itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Tradeoffs
&lt;/h2&gt;

&lt;p&gt;This isn't a "10x productivity" pitch. Here's what's actually hard:&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Windows Are Real
&lt;/h3&gt;

&lt;p&gt;Each agent operates within a context window. Long, complex tasks can exceed it. The solution: agents delegate heavy data-gathering to subagents to keep their own context focused. It works, but it's a constant architectural consideration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agents Hallucinate
&lt;/h3&gt;

&lt;p&gt;The Lawyer catches most compliance hallucinations before they reach production. The inter-agent review protocol exists because of this — multiple agents checking each other's work is the safety net.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Corruption
&lt;/h3&gt;

&lt;p&gt;We hit this one early. The knowledge graph is stored as a JSONL file. When multiple agents made parallel tool calls (writing entities and relations simultaneously), the file got corrupted — partial writes, duplicate entries, broken JSON lines.&lt;/p&gt;

&lt;p&gt;The fix: I forked the upstream MCP memory server and added three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Async mutex&lt;/strong&gt; — prevents concurrent &lt;code&gt;saveGraph()&lt;/code&gt; calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atomic writes&lt;/strong&gt; — writes to a &lt;code&gt;.tmp&lt;/code&gt; file then renames&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-repair on load&lt;/strong&gt; — skips corrupt lines and deduplicates&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  It's Not a Replacement for Thinking
&lt;/h3&gt;

&lt;p&gt;The agents are good at executing within their domain. They're bad at knowing when the domain is wrong. Strategic pivots, gut-feel product decisions, "this just doesn't feel right" — that's still me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Month 2 Results
&lt;/h2&gt;

&lt;p&gt;After two months of running this system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Revenue&lt;/strong&gt;: €6.09 (one subscriber, from day 2. No ads, no outreach.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: ~€42/month (Fly.io across all apps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content output&lt;/strong&gt;: 84+ tweets, 5 dev.to articles, multiple HN comments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time on marketing&lt;/strong&gt;: less than 1 hour per week (agents handle scheduling, drafting, and engagement)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt;: zero missed deadlines (IVA, IRS, Segurança Social all tracked)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The revenue is barely there. But I ship every week, the system keeps improving, and I'm building in public with a team that costs €0.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;The entire system lives in a single management repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.github/
  agents/
    ceo.agent.md
    cfo.agent.md
    coo.agent.md
    marketing.agent.md
    accountant.agent.md
    lawyer.agent.md
    cto.agent.md
    improver.agent.md
  copilot-instructions.md    # Global company identity + protocols
  skills/
    portuguese-tax/SKILL.md
    saas-pricing/SKILL.md
    seguranca-social/SKILL.md
  instructions/
    marketing.instructions.md
    ...
Marketing/
  social-media-sop.md
  social-media-strategy-2026.md
  drafts/
    week-2026-W09.md
    ideas.md
    ...
BOARD.md                     # Sprint board (COO-maintained)
Setas/
  Atividade.md               # Fiscal framework
  INSTRUCTIONS.md            # Operational manual
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;copilot-instructions.md&lt;/code&gt; file is loaded into every Copilot interaction. It defines the company identity, agent system, memory protocols, communication rules, and product registry. It's the constitution of the virtual company.&lt;/p&gt;

&lt;p&gt;Skills are reusable knowledge modules — &lt;code&gt;portuguese-tax/SKILL.md&lt;/code&gt; contains complete IVA scenarios, IRS regime rules, invoice requirements, and deadline calendars. The Accountant agent loads this skill automatically when handling tax questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;If I were starting fresh:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with 3 agents, not 8&lt;/strong&gt; — COO, Marketing, and Accountant cover 80% of the value. Add specialists when the workload justifies them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invest in memory early&lt;/strong&gt; — the knowledge graph is the most valuable part. It compounds over time. I wish I'd been more disciplined about what gets stored from day one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test agent outputs against each other&lt;/strong&gt; — the inter-agent review protocol was added after hallucinations caused problems. Build it in from the start.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;I'm not claiming AI agents replace human teams. They don't. What they do is let a solo founder operate with the &lt;em&gt;structure&lt;/em&gt; of a team — defined roles, communication protocols, institutional memory, and systematic improvement.&lt;/p&gt;

&lt;p&gt;The alternative was either hiring people I can't afford or continuing to drop balls. This gives me a middle path: structured execution with human judgment at the critical points.&lt;/p&gt;

&lt;p&gt;The system cost: €0 (GitHub Copilot is included in my existing subscription). The time to build: maybe 40 hours total over 2 months. The ongoing maintenance: the Improver handles most of it.&lt;/p&gt;

&lt;p&gt;If you're a solo founder drowning in operational overhead, this might be worth trying. Not because AI agents are magic — but because the &lt;em&gt;structure&lt;/em&gt; they enforce is valuable even when the agents themselves are imperfect.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm João, a solo developer from Portugal building SaaS products with Elixir. I write about the real experience of building in public — the numbers, the mistakes, and the weird experiments like this one. Follow me on &lt;a href="https://dev.to/joaosetas"&gt;dev.to&lt;/a&gt; or &lt;a href="https://x.com/joaosetas" rel="noopener noreferrer"&gt;X (@joaosetas)&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>startup</category>
    </item>
    <item>
      <title>What OpenClaw Actually Is — and Why Running Claws Needs a Cloud</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Sun, 22 Feb 2026 14:26:15 +0000</pubDate>
      <link>https://dev.to/setas/what-openclaw-actually-is-and-why-running-claws-needs-a-cloud-3e34</link>
      <guid>https://dev.to/setas/what-openclaw-actually-is-and-why-running-claws-needs-a-cloud-3e34</guid>
      <description>&lt;h1&gt;
  
  
  What OpenClaw Actually Is — and Why Running Claws Needs a Cloud
&lt;/h1&gt;

&lt;p&gt;I've seen the same question in three separate HN threads this week:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I don't even know what OpenClaw is."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fair enough. The name has been everywhere — Karpathy coined "Claws" as a category, Peter Steinberger joined OpenAI to scale the framework, and suddenly there are 1,400+ comments on HN about it. But the signal-to-noise ratio is terrible. Half the discussion is people explaining it to each other incorrectly.&lt;/p&gt;

&lt;p&gt;So here's a straightforward breakdown from someone who's been building infrastructure for this exact category of software.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw: the framework
&lt;/h2&gt;

&lt;p&gt;OpenClaw is an open-source framework for building autonomous AI agents that can use computers. Not chatbots. Not autocomplete. Agents that open terminals, write files, make API calls, browse the web, and chain complex multi-step tasks together.&lt;/p&gt;

&lt;p&gt;Think of it as the orchestration layer. You point it at a task — "refactor this module," "set up monitoring for this service," "research these 10 competitors and write a summary" — and it breaks that into subtasks, executes them, handles errors, and reports back.&lt;/p&gt;

&lt;p&gt;The framework handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool use&lt;/strong&gt; — file I/O, shell commands, HTTP requests, browser automation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; — persistent context across sessions so the agent remembers what it did yesterday&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent coordination&lt;/strong&gt; — multiple specialized agents collaborating on a task (one writes code, another reviews it, a third deploys it)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Gateway Protocol&lt;/strong&gt; — a standardized way for agents to discover and call external tools and services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It started as Claude Code's backbone, but now it's a foundation project. Model-agnostic in theory, though Claude still runs it best in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claws: the category
&lt;/h2&gt;

&lt;p&gt;Karpathy's contribution was naming the &lt;em&gt;category&lt;/em&gt;, not the framework. A "Claw" is any autonomous computer-using agent — whether it's built on OpenClaw, LangChain, CrewAI, or duct tape and bash scripts.&lt;/p&gt;

&lt;p&gt;His NanoClaw demo — ~4,000 lines of Python, fully auditable — showed that you don't need a massive framework to build one. And he's right. For tinkering, for learning, for running a single agent on your laptop while you watch it work, self-hosting is great.&lt;/p&gt;

&lt;p&gt;But here's where the conversation keeps going off the rails.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "I'll just run it on my Mac Mini" problem
&lt;/h2&gt;

&lt;p&gt;Every other comment in these threads is some variation of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Why would I need cloud hosting? I'll just run OpenClaw on my home server."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And for a Saturday afternoon project? Sure. But the moment you want agents running 24/7 — doing real work, handling real data, costing real money in API calls — you run into the same infrastructure problems that every production service hits:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Uptime is not optional&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your Mac Mini reboots for updates. Your ISP has an outage. Your cat steps on the power strip. Meanwhile, your agent was mid-task on a 45-minute code refactor and just lost all its context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Supervision is hard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What happens when an agent enters an infinite loop and burns through $200 in API tokens in 30 minutes? (This literally happened to someone this week — saw it on X.) You need something watching the watcher. Circuit breakers, spend caps, automatic restarts with state recovery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Multi-agent coordination requires real infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running one agent is a process. Running five agents that talk to each other, share memory, and coordinate tasks? That's a distributed system. You need process isolation, message passing, shared state management, and failure recovery. On your laptop, one crashed agent takes down the others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Memory management is a bottleneck&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents have limited context windows. When they compact, they lose information. Persistent memory across sessions — what the agent learned yesterday affecting what it does today — requires a proper storage layer, not a JSON file on disk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Cost observability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you're burning $0.02/request across 500 requests/day across 3 agents, you need dashboards. You need per-agent cost tracking. You need alerts before you get a $400 surprise on your Anthropic bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm building cloud hosting for Claws
&lt;/h2&gt;

&lt;p&gt;I run 5 Elixir apps on Fly.io for under €50/month. The reason the economics work is the BEAM virtual machine — Erlang's runtime, which Elixir runs on.&lt;/p&gt;

&lt;p&gt;OTP supervision trees are basically purpose-built for the exact problems Claws have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Each agent gets its own supervised process&lt;/span&gt;
&lt;span class="n"&gt;children&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;AgentSupervisor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;agent_id:&lt;/span&gt; &lt;span class="s2"&gt;"code-reviewer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;model:&lt;/span&gt; &lt;span class="ss"&gt;:claude_sonnet&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;AgentSupervisor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;agent_id:&lt;/span&gt; &lt;span class="s2"&gt;"deploy-bot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;model:&lt;/span&gt; &lt;span class="ss"&gt;:claude_haiku&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;persistence:&lt;/span&gt; &lt;span class="ss"&gt;:postgres&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;SpendTracker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;budget_cents:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;alert_at:&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="no"&gt;Supervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;strategy:&lt;/span&gt; &lt;span class="ss"&gt;:one_for_one&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If an agent crashes, the supervisor restarts it. If a supervisor crashes, its parent restarts &lt;em&gt;it&lt;/em&gt;. State is recovered from persistent storage. No systemd, no cron hacks, no Docker restart policies — just the runtime doing what it was designed for 30 years ago.&lt;/p&gt;

&lt;p&gt;This is what I'm building with OpenClawCloud. Multi-tenant Claw hosting on BEAM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process-per-agent isolation&lt;/strong&gt; — one tenant's runaway agent can't affect another's&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic supervision&lt;/strong&gt; — crash recovery with state persistence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spend tracking&lt;/strong&gt; — per-agent API cost monitoring with configurable caps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway Protocol support&lt;/strong&gt; — agents discover and use external tools through the standard protocol&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent memory&lt;/strong&gt; — agent context survives restarts, compactions, even deployments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's early. Really early. I have one paying subscriber and a handful of free users. But the timing feels right — people are building Claws faster than the infrastructure to run them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The self-hosting vs. managed hosting tradeoff
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend self-hosting is wrong. Karpathy's point about NanoClaw being ~4,000 lines of auditable code is a genuine trust argument. You can see exactly what it does. That matters.&lt;/p&gt;

&lt;p&gt;But it's the same tradeoff web developers made 15 years ago. You &lt;em&gt;can&lt;/em&gt; run your Rails app on a VPS you manage yourself. You &lt;em&gt;can&lt;/em&gt; handle your own backups, SSL certs, log aggregation, and 3 AM pages. Most people eventually decide they'd rather pay someone to handle that so they can focus on what the agent actually &lt;em&gt;does&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The infrastructure layer for Claws is going to be commoditized within a year. The question is whether you want to build it yourself or use something that already exists.&lt;/p&gt;

&lt;p&gt;I'm biased, obviously. But even if you don't use OpenClawCloud — if you're running agents in production, please don't run them on your Mac Mini. At minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up spend caps on your API provider&lt;/li&gt;
&lt;li&gt;Use process supervision (systemd at minimum, OTP if you're lucky enough to be on BEAM)&lt;/li&gt;
&lt;li&gt;Persist agent state externally (not in-memory, not local JSON)&lt;/li&gt;
&lt;li&gt;Monitor costs per agent, not just in aggregate&lt;/li&gt;
&lt;li&gt;Have a kill switch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Claws wave is real. The infrastructure to support it is still being built. That's the part I find interesting.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm João, a solo developer from Braga, Portugal. I build SaaS products with Elixir and ship them on Fly.io. OpenClawCloud is at &lt;a href="https://clawdcloud.net" rel="noopener noreferrer"&gt;clawdcloud.net&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>cloud</category>
      <category>opensource</category>
    </item>
    <item>
      <title>What Are "Claws"? And Why You Shouldn't Run Them on Your Mac Mini</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Sat, 21 Feb 2026 12:05:18 +0000</pubDate>
      <link>https://dev.to/setas/what-are-claws-and-why-you-shouldnt-run-them-on-your-mac-mini-4o1b</link>
      <guid>https://dev.to/setas/what-are-claws-and-why-you-shouldnt-run-them-on-your-mac-mini-4o1b</guid>
      <description>&lt;h1&gt;
  
  
  What Are "Claws"? And Why You Shouldn't Run Them on Your Mac Mini
&lt;/h1&gt;

&lt;p&gt;Andrej Karpathy just posted a mini-essay about buying a Mac Mini to tinker with what he calls "Claws" — persistent AI agent systems that sit on top of LLMs. He names OpenClaw, NanoClaw, zeroclaw, ironclaw, picoclaw. &lt;a href="https://simonwillison.net/2026/Feb/21/claws/" rel="noopener noreferrer"&gt;Simon Willison calls&lt;/a&gt; "Claw" a term of art for the entire category.&lt;/p&gt;

&lt;p&gt;When Karpathy names something, it sticks. He coined "vibe coding." This is the same energy.&lt;/p&gt;

&lt;p&gt;Here's his definition:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I've been building managed infrastructure for exactly this category. Let me break down what Claws are, why running them on a Mac Mini has real tradeoffs, and what the alternative looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes a Claw Different from an Agent?
&lt;/h2&gt;

&lt;p&gt;Regular LLM agents run, do a thing, and stop. You prompt them, they respond, maybe they call a tool, done.&lt;/p&gt;

&lt;p&gt;Claws are persistent. They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run continuously on hardware or a server&lt;/li&gt;
&lt;li&gt;Have their own scheduling — they do things without you asking&lt;/li&gt;
&lt;li&gt;Maintain context across sessions and conversations&lt;/li&gt;
&lt;li&gt;Communicate via messaging protocols (MCP, etc.)&lt;/li&gt;
&lt;li&gt;Orchestrate multiple agents with tool access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it as the difference between running a script and running a service. A Claw is a service. It's always on, always watching, always ready to act.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mac Mini Angle
&lt;/h2&gt;

&lt;p&gt;Karpathy bought a Mac Mini specifically to run Claws. The Apple Store told him they're "selling like hotcakes and everyone is confused." Makes sense — decent hardware, small form factor, runs 24/7 at home.&lt;/p&gt;

&lt;p&gt;But here's where I have some thoughts, as someone who's been running persistent Elixir services for a while now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Self-Hosting Pain List
&lt;/h2&gt;

&lt;p&gt;I love self-hosting. I really do. But running a persistent AI agent system on a box under your desk means you're now responsible for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Uptime.&lt;/strong&gt; Your Claw goes down when your power goes out, when your ISP hiccups, when macOS decides it needs to update at 3am. The whole point of a Claw is that it's always on. "Always on except when it isn't" is a rough spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Networking.&lt;/strong&gt; Your Claw needs to talk to the internet — receive webhooks, call APIs, expose endpoints. That means port forwarding, dynamic DNS, TLS certificates, and hoping your router cooperates. Your ISP probably gives you a dynamic IP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security.&lt;/strong&gt; You're running an AI agent with tool access on your home network. It can execute code, make API calls, access file systems. One misconfigured permission and your Claw can see everything on your LAN.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Updates and maintenance.&lt;/strong&gt; The Claw ecosystem is evolving fast. OpenClaw pushes updates regularly. You need to manage versions, handle breaking changes, keep dependencies current. On a personal Mac Mini, that's manual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Process supervision.&lt;/strong&gt; What happens when a Claw process crashes? On a Mac Mini, it just... dies. You need to build your own restart logic, health checks, and monitoring. This is a solved problem in production infrastructure, but not on your desktop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling.&lt;/strong&gt; Today you run one Claw. Tomorrow you want three. Next month you want one per project. A Mac Mini has finite resources and no way to scale horizontally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Managed Hosting Makes Sense
&lt;/h2&gt;

&lt;p&gt;I'm building &lt;a href="https://clawcloud.net" rel="noopener noreferrer"&gt;OpenClawCloud&lt;/a&gt; because I think this pain list is going to hit most people who try to run Claws seriously.&lt;/p&gt;

&lt;p&gt;The architecture is built on Elixir and runs on Fly.io. Here's why that matters for Claws specifically:&lt;/p&gt;

&lt;h3&gt;
  
  
  Supervision Trees
&lt;/h3&gt;

&lt;p&gt;Elixir's OTP supervision is designed exactly for this — long-running processes that need to stay alive. If a Claw process crashes, the supervisor restarts it automatically. No cron jobs, no systemd hacking, no Docker restart policies. It's built into the runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Process Isolation
&lt;/h3&gt;

&lt;p&gt;Each tenant's Claw runs in its own isolated process. One Claw crashing doesn't take down another. The BEAM VM was literally built for this — telecom-grade reliability for concurrent, independent processes. Ericsson designed it in the '80s to keep phone switches running. Turns out that's exactly what persistent AI agents need too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in Scheduling
&lt;/h3&gt;

&lt;p&gt;Claws need to do things on their own schedule. Elixir has &lt;code&gt;Process.send_after&lt;/code&gt;, &lt;code&gt;GenServer&lt;/code&gt; timers, and libraries like Oban for persistent job scheduling. No external cron needed. The agent's scheduler lives in the same runtime as the agent itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Economics That Work
&lt;/h3&gt;

&lt;p&gt;I run 5 Elixir apps on Fly.io for under €50/month total. The infrastructure is efficient enough that hosting multiple Claws per machine is practical without burning through a credit card.&lt;/p&gt;

&lt;h2&gt;
  
  
  The State of the Ecosystem
&lt;/h2&gt;

&lt;p&gt;Karpathy mentions several projects, each taking a different approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw&lt;/strong&gt; — the full-featured option, though Karpathy himself admits he's "a bit sus'd" about running it directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NanoClaw&lt;/strong&gt; — ~4000 lines of core code. Karpathy likes that it "fits into both my head and that of AI agents" — auditable and minimal, runs in containers by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;zeroclaw, ironclaw, picoclaw&lt;/strong&gt; — variations on the theme with different tradeoffs around size, security, and features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ecosystem hasn't consolidated yet. But the pattern is clear: people want persistent, tool-enabled AI agent systems that run autonomously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Is Going
&lt;/h2&gt;

&lt;p&gt;Karpathy naming this category matters. "Vibe coding" went from a tweet to a conference talk title in weeks. "Claws" as a term of art is going to follow the same trajectory. Simon Willison is already using it. It even comes with an established emoji: 🦞&lt;/p&gt;

&lt;p&gt;The interesting question isn't whether Claws are real — they obviously are. It's whether the infrastructure catches up. Right now, the default path is "buy hardware and figure it out yourself." That works for tinkering. For production use — agents managing your calendar, monitoring your infrastructure, handling customer requests — you need something more robust.&lt;/p&gt;

&lt;p&gt;That's the gap I'm building &lt;a href="https://clawcloud.net" rel="noopener noreferrer"&gt;OpenClawCloud&lt;/a&gt; to fill. You bring your Claw config, I handle deployment, uptime, and process supervision. No Mac Mini required.&lt;/p&gt;

&lt;p&gt;I'm a solo founder building this in Elixir from Braga, Portugal. It's early days, but the foundation is solid — and today, thanks to Karpathy, the category has a name.&lt;/p&gt;

&lt;p&gt;If you're running Claws on a Mac Mini and loving it? Respect. That's how all good infrastructure starts — someone tinkering at home until they need more.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm &lt;a href="https://x.com/joaosetas" rel="noopener noreferrer"&gt;@joaosetas&lt;/a&gt; on X. Building OpenClawCloud and other Elixir SaaS products in public.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I Run 5 Elixir Apps on Fly.io for Under €50/Month — Here's the Breakdown</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Wed, 18 Feb 2026 07:11:01 +0000</pubDate>
      <link>https://dev.to/setas/i-run-5-elixir-apps-on-flyio-for-under-eu50month-heres-the-breakdown-3dl2</link>
      <guid>https://dev.to/setas/i-run-5-elixir-apps-on-flyio-for-under-eu50month-heres-the-breakdown-3dl2</guid>
      <description>&lt;p&gt;I'm a solo founder running 5 SaaS products. All Elixir/Phoenix. All on Fly.io. Total infrastructure cost: under €50/month.&lt;/p&gt;

&lt;p&gt;Here's exactly how.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Products
&lt;/h2&gt;

&lt;p&gt;I'm building these under AIFirst, my solo software company based in Portugal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SondMe&lt;/strong&gt; — survey platform (Phoenix + LiveView)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Countermark&lt;/strong&gt; — bot detection without CAPTCHAs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenClawCloud&lt;/strong&gt; — managed hosting for AI agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vertate&lt;/strong&gt; — verification platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-Inbox&lt;/strong&gt; — AI agent communication interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All built with the same stack: Elixir, Phoenix, LiveView, PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Fly.io
&lt;/h2&gt;

&lt;p&gt;I evaluated several hosting options before settling on Fly.io. Here's why it won:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Pay for what you run
&lt;/h3&gt;

&lt;p&gt;No "instance hours" padding. I pay per machine, per second of uptime.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Auto-stop is a game-changer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My low-traffic apps sleep when nobody's using them. They wake up on the first request — cold start is about 2 seconds for a Phoenix app. Staging environments cost literally nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Elixir is a first-class citizen
&lt;/h3&gt;

&lt;p&gt;Fly.io has official Elixir support. &lt;code&gt;fly launch&lt;/code&gt; detects Phoenix, generates a working &lt;code&gt;Dockerfile&lt;/code&gt;, and handles releases. Libraries like &lt;code&gt;dns_cluster&lt;/code&gt; for distributed Erlang just work out of the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multi-region with zero re-architecture
&lt;/h3&gt;

&lt;p&gt;Start with one region (Amsterdam for me — closest to Portugal). Need US coverage later? One &lt;code&gt;fly scale&lt;/code&gt; command. No infrastructure redesign.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Typical fly.toml
&lt;/h2&gt;

&lt;p&gt;Here's the config I use across most of my apps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"my-app"&lt;/span&gt;
&lt;span class="py"&gt;primary_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ams"&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;

&lt;span class="nn"&gt;[env]&lt;/span&gt;
  &lt;span class="py"&gt;PHX_HOST&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"my-app.fly.dev"&lt;/span&gt;
  &lt;span class="py"&gt;PORT&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"8080"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="nn"&gt;[[vm]]&lt;/span&gt;
  &lt;span class="py"&gt;size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"shared-cpu-1x"&lt;/span&gt;
  &lt;span class="py"&gt;memory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"256mb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;256MB RAM&lt;/strong&gt; — Phoenix is incredibly memory-efficient. A full app with LiveView, PubSub, and Oban (background jobs) runs comfortably in 256MB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;shared-cpu-1x&lt;/strong&gt; — perfect for early-stage apps. Shared CPU is cheap and Phoenix doesn't need much compute for typical web workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;min_machines_running = 1&lt;/strong&gt; for production, &lt;strong&gt;0&lt;/strong&gt; for staging.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;p&gt;Here's the actual math:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;shared-cpu-1x machines (256MB)&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;~€32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fly Postgres (single node)&lt;/td&gt;
&lt;td&gt;3 clusters&lt;/td&gt;
&lt;td&gt;~€10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bandwidth&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSL certificates&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Free (auto)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~€42&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some apps share a Postgres cluster. I started with separate databases for everything, then consolidated. Three clusters handle all five apps comfortably.&lt;/p&gt;

&lt;h3&gt;
  
  
  For comparison
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Heroku&lt;/strong&gt;: 5 apps × $7 Eco dyno = $35, plus databases = $50+ — and Eco dynos sleep unpredictably&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Railway&lt;/strong&gt;: Similar compute pricing, less mature Elixir support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS ECS&lt;/strong&gt;: Minimum viable setup easily $80+/month after load balancers, NAT gateways, and CloudWatch&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Makes Elixir Special Here
&lt;/h2&gt;

&lt;p&gt;This setup works because the BEAM VM is unusually efficient for web applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt;: A Phoenix app with LiveView, background jobs, and real-time PubSub runs in 256MB. A comparable Node.js/Express setup with Socket.io and Bull queues needs 512MB+ to breathe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;: Each request gets a lightweight BEAM process (~2KB of memory). I can handle thousands of concurrent WebSocket connections on a single 256MB machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resilience&lt;/strong&gt;: OTP supervision trees restart crashed processes in milliseconds. No health check polling. No container-level restarts. If a GenServer handling webhook processing dies, its supervisor brings it back before anyone notices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Share databases early
&lt;/h3&gt;

&lt;p&gt;I wasted money running separate Postgres instances for every app. Most early-stage apps can share a database cluster with schema-level isolation. This cut my database costs by more than half.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use auto-stop from day one
&lt;/h3&gt;

&lt;p&gt;I ran machines 24/7 for a while before enabling auto-stop. For apps with fewer than 100 daily active users, there's no good reason to keep machines hot at night.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Set up internal networking
&lt;/h3&gt;

&lt;p&gt;Fly gives you a private WireGuard network between all your apps for free. I use it for internal API calls between services — no public internet, no extra latency, no auth overhead for service-to-service communication.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Monitor memory, not CPU
&lt;/h3&gt;

&lt;p&gt;For BEAM apps, memory is the constraining resource, not CPU. Set up alerts for when an app consistently uses &amp;gt;200MB of its 256MB allocation. That's your signal to bump to 512MB.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solo Founder Advantage
&lt;/h2&gt;

&lt;p&gt;The real point isn't the €42. It's the flexibility.&lt;/p&gt;

&lt;p&gt;When I want to test a new product idea, I spin up a new app on Fly.io. Added cost: about €8/month. If the idea doesn't work after a month, I &lt;code&gt;fly apps destroy&lt;/code&gt; it and my bill drops right back down.&lt;/p&gt;

&lt;p&gt;There's no minimum infrastructure investment keeping bad ideas alive. No long-term cloud commitments. No Kubernetes cluster that costs the same whether I run 1 app or 10.&lt;/p&gt;

&lt;p&gt;Elixir + Fly.io lets infrastructure costs scale linearly — and down just as easily as up. For a solo founder bootstrapping multiple products with zero funding, that's everything.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm João, a solo developer building SaaS products from Portugal. I write about Elixir, infrastructure, and the solo founder journey. Follow for more.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>devops</category>
      <category>saas</category>
      <category>startup</category>
    </item>
    <item>
      <title>Why I Chose Elixir Over Go and Rust for My Cloud Platform</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Mon, 16 Feb 2026 14:24:50 +0000</pubDate>
      <link>https://dev.to/setas/why-i-chose-elixir-over-go-and-rust-for-my-cloud-platform-2058</link>
      <guid>https://dev.to/setas/why-i-chose-elixir-over-go-and-rust-for-my-cloud-platform-2058</guid>
      <description>&lt;p&gt;I'm building &lt;a href="https://clawdcloud.net" rel="noopener noreferrer"&gt;OpenClaw Cloud&lt;/a&gt;, a managed platform where each user gets their own personal AI assistant running 24/7 in the cloud. When I started, I had a real decision to make about the core technology.&lt;/p&gt;

&lt;p&gt;Go and Rust were serious contenders. Both are fast, well-supported, and have massive ecosystems. I ended up choosing Elixir — not because it's "better" in some absolute sense, but because it was the right fit for this specific problem. Here's the full reasoning, with honest tradeoffs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Managing Hundreds of Long-Lived Processes
&lt;/h2&gt;

&lt;p&gt;OpenClaw Cloud manages one dedicated bot instance per user. Each instance is a long-running process that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains persistent WebSocket connections to chat platforms (Discord, Telegram, WhatsApp, Slack)&lt;/li&gt;
&lt;li&gt;Holds conversation state and context in memory&lt;/li&gt;
&lt;li&gt;Handles concurrent messages from multiple channels simultaneously&lt;/li&gt;
&lt;li&gt;Needs to be started, stopped, restarted, and monitored independently&lt;/li&gt;
&lt;li&gt;Must recover gracefully from crashes without affecting other users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a typical request/response web app. It's a &lt;strong&gt;process orchestration&lt;/strong&gt; problem — hundreds of stateful, concurrent, long-lived workers that need supervision and lifecycle management.&lt;/p&gt;

&lt;p&gt;That framing is what drove the decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Concurrency Models: Three Very Different Approaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Go: Goroutines and Channels
&lt;/h3&gt;

&lt;p&gt;Go's concurrency model is elegant. Goroutines are cheap (a few KB of stack), and channels provide a clean way to communicate between them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Spinning up a worker per user in Go&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;bot&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;NewBotInstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// blocks, handles reconnection internally&lt;/span&gt;
    &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is simple and works. Go would have been a perfectly fine choice for the raw concurrency part. The goroutine scheduler handles thousands of concurrent workers without breaking a sweat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it gets complicated&lt;/strong&gt;: Go doesn't have a built-in answer for &lt;em&gt;what happens when a goroutine crashes&lt;/em&gt;. You need to build your own supervisor logic — retry loops, health checks, graceful restarts. It's doable, but it's DIY. Every team ends up writing a slightly different version of process supervision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rust: Async with Tokio
&lt;/h3&gt;

&lt;p&gt;Rust with Tokio gives you async/await over a multi-threaded runtime. The performance is outstanding — near-zero overhead async I/O.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Spawning tasks with Tokio&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;bot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;BotInstance&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="nf"&gt;.run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// handles connections&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rust's async model is powerful, and you get memory safety guarantees at compile time. But the ownership model adds real friction when you're managing shared state across many concurrent tasks. &lt;code&gt;Arc&amp;lt;Mutex&amp;lt;T&amp;gt;&amp;gt;&lt;/code&gt; everywhere, lifetime annotations, and the borrow checker fighting you when you're passing context between tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest truth&lt;/strong&gt;: For a solo developer iterating fast on a product, Rust's compile-time overhead (both in build times and cognitive load) is significant. I love Rust for systems programming, but for a SaaS product where features change weekly, it slowed me down.&lt;/p&gt;

&lt;h3&gt;
  
  
  Elixir: Processes and OTP
&lt;/h3&gt;

&lt;p&gt;Elixir runs on the BEAM virtual machine, which was designed from the ground up for this exact problem — massive concurrency with isolated, lightweight processes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Each bot is a GenServer — a managed, supervised process&lt;/span&gt;
&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;Openclaw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;InstanceWorker&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;GenServer&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(%{&lt;/span&gt;&lt;span class="ss"&gt;user_id:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;GenServer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;name:&lt;/span&gt; &lt;span class="n"&gt;via_tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="c1"&gt;# Connect to chat platforms, set up state&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;user_id:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;connections:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="ss"&gt;status:&lt;/span&gt; &lt;span class="ss"&gt;:starting&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;handle_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:health_check&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="c1"&gt;# Periodic self-check — reconnect if needed&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:noreply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maybe_reconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BEAM processes are &lt;em&gt;extremely&lt;/em&gt; lightweight (~2KB each), fully isolated (no shared memory), and communicate via message passing. But the key differentiator isn't just the process model — it's everything built on top of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Supervision Trees: The Killer Feature
&lt;/h2&gt;

&lt;p&gt;This is where Elixir pulled decisively ahead for my use case.&lt;/p&gt;

&lt;p&gt;In OTP, every process lives inside a &lt;strong&gt;supervision tree&lt;/strong&gt;. Supervisors are processes that watch child processes and apply a restart strategy when things go wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;Openclaw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;InstanceSupervisor&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Horde&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;DynamicSupervisor&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;start_instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;child_spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="no"&gt;Openclaw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;InstanceWorker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;user_id:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;config:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="no"&gt;Horde&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;DynamicSupervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_child&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child_spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;stop_instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="no"&gt;Registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Openclaw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;InstanceRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Horde&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;DynamicSupervisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;terminate_child&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:not_found&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a bot instance crashes — maybe Discord's API returns an unexpected response, or a chat message triggers an unhandled edge case — the supervisor restarts it automatically. The other 200 bot instances running on the same node are completely unaffected because processes share no memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In Go&lt;/strong&gt;, I'd have to build all of this manually: a registry of running goroutines, health check loops, restart logic, graceful shutdown coordination. It's probably 1,000+ lines of infrastructure code that Elixir gives me for free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In Rust&lt;/strong&gt;, the situation is similar. Tokio has &lt;code&gt;JoinHandle&lt;/code&gt; for tracking spawned tasks, but building a full supervision tree with restart strategies, escalation policies, and distributed process registries is a major engineering effort.&lt;/p&gt;

&lt;p&gt;The OTP supervision model isn't just convenient — it changes how you think about failure. Instead of defensive programming ("catch every possible error"), you write the happy path and let the supervisor handle the rest. &lt;strong&gt;Let it crash&lt;/strong&gt; is a real philosophy, and it works remarkably well for managing many independent, failure-prone processes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hot Code Reloading: Zero-Downtime Deployments
&lt;/h2&gt;

&lt;p&gt;BEAM supports hot code swapping — you can deploy new code to a running system without restarting processes or dropping connections.&lt;/p&gt;

&lt;p&gt;For a platform where users have 24/7 always-on bot instances, this is huge. When I push an update to the platform code, I don't have to restart everyone's bot. The running processes can be updated in place, maintaining their state and connections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In production, Fly.io rolling deploys + BEAM hot code loading&lt;/span&gt;
&lt;span class="c1"&gt;# means existing connections stay alive during deployments&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, I use Fly.io's rolling deployments which handle most of this, but the BEAM's ability to maintain state across code changes is an additional safety net that neither Go nor Rust can match at the VM level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go&lt;/strong&gt; requires a full process restart for any code change. You can do rolling restarts behind a load balancer, but every goroutine's state is lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rust&lt;/strong&gt; requires recompilation and restart. The compile step alone takes minutes for a non-trivial project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-Time UI: Phoenix LiveView
&lt;/h2&gt;

&lt;p&gt;This isn't strictly a language comparison, but the web framework was part of the decision. Phoenix LiveView lets me build real-time, interactive UIs without writing JavaScript.&lt;/p&gt;

&lt;p&gt;The OpenClaw Cloud dashboard shows each user their bot's status, logs, and controls — all updating in real-time via WebSockets. When a bot instance starts, crashes, or reconnects, the UI reflects it instantly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LiveView receives real-time updates via PubSub&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;handle_info&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="ss"&gt;:instance_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="ss"&gt;status:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:noreply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:instance_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Building this in Go would mean a separate frontend (React, Vue, etc.) plus a WebSocket layer plus state synchronization logic. In Rust, same story — probably even more boilerplate with something like Axum + a JS frontend.&lt;/p&gt;

&lt;p&gt;LiveView collapses the frontend and backend into one coherent model. For a solo developer, that's a 2-3x productivity multiplier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Go and Rust Win (Honestly)
&lt;/h2&gt;

&lt;p&gt;I'd be doing a disservice if I didn't acknowledge where Go and Rust genuinely outperform Elixir:&lt;/p&gt;

&lt;h3&gt;
  
  
  Go Wins
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Raw throughput for CPU-bound work&lt;/strong&gt;: Go compiles to native code. If I were building a platform that needed heavy computation (video processing, ML inference), Go would be faster out of the box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity of deployment&lt;/strong&gt;: Single static binary. No runtime dependency. &lt;code&gt;go build &amp;amp;&amp;amp; scp&lt;/code&gt;. It doesn't get simpler than that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ecosystem breadth&lt;/strong&gt;: Go has libraries for &lt;em&gt;everything&lt;/em&gt;. Cloud SDKs, Kubernetes tooling, CLI tools — the ecosystem is massive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hiring&lt;/strong&gt;: If I were building a team, finding Go developers is much easier than finding Elixir developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rust Wins
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance ceiling&lt;/strong&gt;: Rust is as fast as C/C++ with memory safety. For systems-level work, nothing else comes close.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory efficiency&lt;/strong&gt;: Zero-cost abstractions and no garbage collector mean predictable, minimal memory usage. Critical for embedded systems or extremely resource-constrained environments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type system&lt;/strong&gt;: Rust's type system catches entire categories of bugs at compile time. The &lt;code&gt;Result&lt;/code&gt; and &lt;code&gt;Option&lt;/code&gt; types make error handling explicit and exhaustive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebAssembly&lt;/strong&gt;: Rust has the best WASM story. If I needed client-side compiled code, Rust would be my first choice.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Elixir's Weaknesses
&lt;/h3&gt;

&lt;p&gt;Let me be upfront about the tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Raw CPU performance&lt;/strong&gt;: The BEAM is not fast for computation. It's optimized for I/O-bound, concurrent workloads. If I had heavy number-crunching, I'd need to reach for NIFs (native functions) or offload to a separate service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller ecosystem&lt;/strong&gt;: Hex (Elixir's package manager) has ~15,000 packages vs npm's 2M+ or Go's massive standard library. Sometimes you write something from scratch that would be a &lt;code&gt;go get&lt;/code&gt; away in Go.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller talent pool&lt;/strong&gt;: Finding Elixir developers is harder. This matters less for a solo founder but would matter if I were scaling a team.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning curve&lt;/strong&gt;: OTP concepts (GenServer, Supervisor, Application) are powerful but take time to internalize. The functional programming paradigm is a shift for developers coming from OOP.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Elixir Won for This Specific Problem
&lt;/h2&gt;

&lt;p&gt;The decision came down to matching the technology to the problem domain:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Best Fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hundreds of concurrent, long-lived processes&lt;/td&gt;
&lt;td&gt;BEAM (Elixir)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automatic crash recovery per process&lt;/td&gt;
&lt;td&gt;OTP Supervisors (Elixir)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time UI without separate frontend&lt;/td&gt;
&lt;td&gt;Phoenix LiveView (Elixir)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zero-downtime deployments&lt;/td&gt;
&lt;td&gt;BEAM hot code reloading (Elixir)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distributed process registry&lt;/td&gt;
&lt;td&gt;Horde (Elixir)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Solo developer productivity&lt;/td&gt;
&lt;td&gt;LiveView + OTP = less code (Elixir)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw computation speed&lt;/td&gt;
&lt;td&gt;Go or Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maximum ecosystem breadth&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory-constrained environments&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a managed cloud platform orchestrating hundreds of stateful, long-running AI bot instances with real-time monitoring — Elixir wasn't just a good fit, it was almost purpose-built for the job.&lt;/p&gt;

&lt;p&gt;The BEAM was originally created by Ericsson in the 1980s to manage millions of concurrent telephone calls with 99.9999999% uptime. Managing a few hundred AI bots is a much simpler version of the same problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack in Practice
&lt;/h2&gt;

&lt;p&gt;For the curious, here's what the full OpenClaw Cloud stack looks like today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Elixir 1.17 / OTP 27&lt;/strong&gt; — core language and runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phoenix 1.8&lt;/strong&gt; with &lt;strong&gt;LiveView 1.1&lt;/strong&gt; — web framework and real-time UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horde&lt;/strong&gt; — distributed supervisor and registry for bot instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; via Ecto — data persistence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fly.io&lt;/strong&gt; — hosting (both platform and user instances)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stripe&lt;/strong&gt; — subscription billing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS + DaisyUI&lt;/strong&gt; — styling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentry&lt;/strong&gt; — error monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bandit&lt;/strong&gt; — HTTP server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total monthly infrastructure cost for running the platform: under $50/month on Fly.io.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Technology choices are always contextual. If I were building a CLI tool, I'd pick Go. If I were building a game engine, I'd pick Rust. But for a managed platform that supervises hundreds of concurrent, stateful, long-lived processes with real-time monitoring — Elixir and the BEAM are in a class of their own.&lt;/p&gt;

&lt;p&gt;The "let it crash" philosophy, supervision trees, lightweight processes, and LiveView for real-time UI made me more productive as a solo developer than I would have been in either Go or Rust. And when your competitive advantage is shipping fast with zero budget, productivity is everything.&lt;/p&gt;

&lt;p&gt;If you're evaluating languages for a similar problem — high concurrency, stateful processes, real-time features — give Elixir a serious look. The ecosystem is smaller but the core primitives are extraordinary.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm João, a solo developer from Portugal building &lt;a href="https://clawdcloud.net" rel="noopener noreferrer"&gt;OpenClaw Cloud&lt;/a&gt; and other SaaS products with Elixir. Follow me here or on X &lt;a href="https://x.com/joaosetas" rel="noopener noreferrer"&gt;@joaosetas&lt;/a&gt; for more build-in-public content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>cloud</category>
    </item>
    <item>
      <title>I Built a Managed Cloud Platform for Personal AI Assistants with Elixir</title>
      <dc:creator>João Pedro Silva Setas</dc:creator>
      <pubDate>Thu, 12 Feb 2026 13:09:21 +0000</pubDate>
      <link>https://dev.to/setas/i-built-a-managed-cloud-platform-for-personal-ai-assistants-with-elixir-5e5j</link>
      <guid>https://dev.to/setas/i-built-a-managed-cloud-platform-for-personal-ai-assistants-with-elixir-5e5j</guid>
      <description>&lt;p&gt;I'm João, a solo developer from Portugal, and I just launched &lt;strong&gt;OpenClaw Cloud&lt;/strong&gt; — a managed hosting platform that gives you your own personal AI assistant running 24/7 in the cloud, connected to all your chat apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I've been running my own AI assistant for a while — a bot that connects to WhatsApp, Discord, Telegram, Slack, and more. It's incredibly useful: I delegate tasks, ask questions, get summaries, and it's always available.&lt;/p&gt;

&lt;p&gt;But running it required maintaining a server, handling updates, managing Docker containers, and keeping everything alive. Most people who'd benefit from a personal AI assistant don't want to deal with that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;OpenClaw Cloud takes care of all the infrastructure. You sign up, configure your bot (name, API keys, connected apps), and we deploy a dedicated instance for you on Fly.io. It runs 24/7 — no servers to manage, no Docker to wrestle with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🦞 Your own isolated AI assistant instance&lt;/li&gt;
&lt;li&gt;💬 Connect WhatsApp, Telegram, Discord, Slack, Signal, iMessage&lt;/li&gt;
&lt;li&gt;☁️ Always on — no need to keep your computer running&lt;/li&gt;
&lt;li&gt;📱 Access and control from any device&lt;/li&gt;
&lt;li&gt;🔄 Automatic updates and maintenance&lt;/li&gt;
&lt;li&gt;🔒 Your own isolated environment — enterprise-grade security&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;p&gt;This is a dev.to post, so let's talk tech:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Elixir ~1.17 / OTP 27&lt;/strong&gt; with &lt;strong&gt;Phoenix ~1.8&lt;/strong&gt; and &lt;strong&gt;LiveView ~1.1&lt;/strong&gt; — the entire UI is server-rendered with real-time updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; via Ecto for data persistence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horde&lt;/strong&gt; for distributed process supervision — each bot instance is a GenServer managed by a Horde DynamicSupervisor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fly.io&lt;/strong&gt; for hosting — both the platform and user instances run on Fly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; for local development, &lt;strong&gt;Fly Machines&lt;/strong&gt; in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stripe&lt;/strong&gt; for subscriptions via their embedded pricing table&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google OAuth + email/password&lt;/strong&gt; for authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS + DaisyUI&lt;/strong&gt; for the UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bandit&lt;/strong&gt; as the HTTP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentry&lt;/strong&gt; for error monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture Highlight: Instance Management
&lt;/h3&gt;

&lt;p&gt;Each user's bot is a separate process managed by a &lt;code&gt;ContainerBackend&lt;/code&gt; behaviour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In development, instances run as Docker containers&lt;/span&gt;
&lt;span class="no"&gt;Openclaw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Instances&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;ContainerBackend&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Docker&lt;/span&gt;

&lt;span class="c1"&gt;# In production, they're Fly.io machines&lt;/span&gt;
&lt;span class="no"&gt;Openclaw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Instances&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;ContainerBackend&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Fly&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;InstanceWorker&lt;/code&gt; GenServer manages each bot's lifecycle — creation, start, stop, restart — and reports status back to the LiveView dashboard in real-time via Horde's distributed registry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Elixir?
&lt;/h3&gt;

&lt;p&gt;Managing many concurrent, long-lived bot instances is exactly the kind of problem Elixir was built for. Each instance is a lightweight process, supervised and distributed across nodes. OTP's supervision trees mean that if a worker crashes, it gets restarted automatically. And LiveView gives me real-time UI updates without writing a single line of JavaScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Status
&lt;/h2&gt;

&lt;p&gt;OpenClaw Cloud is in &lt;strong&gt;early access&lt;/strong&gt; with a Hobby plan available. The onboarding flow walks you through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creating your account&lt;/li&gt;
&lt;li&gt;Setting up your Discord bot token and API keys&lt;/li&gt;
&lt;li&gt;Configuring intents and behavior&lt;/li&gt;
&lt;li&gt;Deploying your instance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's built on top of &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, an open-source personal AI assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;More chat platform integrations&lt;/li&gt;
&lt;li&gt;Voice features (TTS via ElevenLabs/OpenAI)&lt;/li&gt;
&lt;li&gt;Memory and context persistence&lt;/li&gt;
&lt;li&gt;Custom personality configurations&lt;/li&gt;
&lt;li&gt;Usage analytics dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;Early access is live at &lt;a href="https://clawdcloud.net" rel="noopener noreferrer"&gt;clawdcloud.net&lt;/a&gt;. I'd love feedback from the dev community — what features would make this useful for you?&lt;/p&gt;

&lt;p&gt;If you're interested in the Elixir/Phoenix architecture, happy to go deeper on any part of the stack in future posts.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built solo from Braga, Portugal 🇵🇹&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloud</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
