DEV Community: João Pedro Silva Setas

I Added a Pro Plan to OpenClawCloud Because Heavy Runs Hit Boring Limits First

João Pedro Silva Setas — Tue, 28 Apr 2026 08:45:01 +0000

I Added a Pro Plan to OpenClawCloud Because Heavy Runs Hit Boring Limits First

The first limit heavier AI workloads hit is usually not intelligence.

It is the boring stuff.

CPU.
RAM.
Disk.
And a better way to see what is about to run before you let it go.

That is why I added a Pro plan to OpenClawCloud.

It is live at $29/month and keeps the same basic starting point as Hobby, but adds more headroom where it actually matters:

4 vCPU
8GB RAM
160GB SSD storage
Run Preview before execution
Instance Activity Log
Priority support

Hobby still makes sense when you are exploring, testing, or just getting your first setup running. Pro is for the point where that smaller box stops being comfortable.

I wanted the step up to be practical, not ornamental. More room to run. More storage for longer-lived workspaces and heavier tasks. Better visibility into what is about to happen and what just happened.

The broader OpenClawCloud direction I care about is still governed execution: clearer review surfaces, better failure inspection, and less dependence on one provider's policy decisions. Useful product work usually starts with the boring constraints first. If the runtime does not have enough headroom, the bigger thesis does not matter yet.

So this is the current shape of the product:

Hobby for smaller setups and exploration
Pro for heavier runs that need more compute, more storage, and better visibility

That feels like the right next step for where OpenClawCloud is now.

If you are already past the point where the smallest plan feels comfortable, Pro is live.

What becomes the first bottleneck for you when AI workloads get heavier: CPU, memory, storage, or visibility into each run?

What Happens When AI Agents Hallucinate? The boring part is the checkpoint.

João Pedro Silva Setas — Wed, 08 Apr 2026 15:30:03 +0000

Most agent-demo discourse treats hallucination like a model problem.

Wrong answer in, wrong answer out.

The worse failure in practice is simpler.

A confident wrong output turns into company truth.

Then it is no longer "a bad generation."

It is copy. A metric. A product claim. A technical explanation. A decision someone is about to act on.

I run a solo company with AI agent departments inside GitHub Copilot. The useful question for me is not how to eliminate hallucinations. I do not think that is realistic.

The useful question is this:

What stops wrong output from hardening into something real?

The answer is boring.

Review checkpoints. Memory discipline. Narrow rules about what an agent is allowed to assert without verification.

That turned out to matter more than another clever prompt.

Hallucination gets more dangerous as the output gets closer to action

An agent drafting a rough idea is fine.

An agent confidently restating a stale revenue number, inventing a product capability, or describing system internals it never checked is not.

In my setup, I treat "hallucination" broadly:

a product claim that outruns the actual build
a stale business fact repeated as if it were current
a plausible technical explanation that was never checked against the real system
a compliance or trust statement that sounds right but was never reviewed by the right specialist

That definition matters because bad output is not only about models inventing weird facts.

It is about confident language outrunning verification.

1. Product claims need a checkpoint

The cleanest example right now is OpenClawCloud.

The direction I care about is governed execution: vendor independence, bounded runs, review checkpoints, and failure containment.

That is the thesis.

But the repo rule is explicit: wording around sandboxing, approval gates, audit trails, credential isolation, and secure-by-default behavior stays THESIS or ROADMAP level until the build work proves it live.

That sounds pedantic until you see the alternative.

A draft can slide from "this is where the product is going" to "this is what the product does today" in one paragraph.

Same idea.

Very different claim.

So when a draft touches trust, compliance, security, or policy, I route it through an internal legal/compliance review step before publication.

The point is not to make the copy sound safer.

The point is to stop the draft from inventing a product I have not shipped.

2. Stale facts need a checkpoint too

Some hallucinations are not fabricated out of thin air.

They are old truths repeated at the wrong time.

That is why I use memory-first checks for time-sensitive business facts.

Revenue figures.

Compliance status.

Deal terms.

Anything where "technically true last week" can become wrong enough to mislead today.

The rule is not "trust memory blindly."

The rule is "look it up before you restate it."

That reduces a very common failure mode in agent systems: stale state getting repeated with fresh confidence.

3. Technical explanations get smoother than reality

This is the easiest trap for content systems.

An article about orchestration, memory, or agent handoffs can sound completely plausible while missing one important constraint.

And if the paragraph reads cleanly, most people will not notice the miss.

So public explanations of how my agent system works go through COO or CTO review.

That keeps the description anchored to the real orchestration model instead of whatever smooth story the draft happened to produce.

This matters especially for multi-agent systems, because the wrong explanation always sounds tempting.

"The agents just call each other when needed" is smooth.

It is also incomplete.

The accurate framing is that the COO coordinates the execution flow and specialist review happens inside that orchestrated model.

That is a better sentence because it is a truer one.

4. The point is not zero hallucinations

I do not think the useful goal is perfect output.

The useful goal is that wrong output hits a review checkpoint before it becomes copy, policy, or an operating decision.

That shift changes the design.

You stop obsessing over whether the model can sound confident.

You start caring about:

who is allowed to approve which kind of statement
when a lookup is required before a fact can be restated
which outputs need specialist review
how a draft gets stopped before it crosses from interesting to operational

Those are less exciting questions than "how autonomous is your system?"

They are much closer to the real product surface.

Why this changed how I think about OpenClawCloud

This is also why I keep coming back to governed execution.

The market loves capability demos because they are easy to watch.

But if an agent touches real work, the more important question is what happens when the output is confident and wrong.

That is where review checkpoints, bounded execution, and clear intervention paths start to matter more than raw autonomy.

For OpenClawCloud, I treat that as direction, not a shipped promise.

The value I care about is not "the agent can do a lot."

It is "wrong output does not get a free path into real systems."

That is a much more boring story.

It is also the one I trust.

The Improver: How I Built an AI Agent That Upgrades Other AI Agents

João Pedro Silva Setas — Tue, 31 Mar 2026 07:16:27 +0000

Most multi-agent writeups stop at specialization.

Planner. Coder. Reviewer. Maybe a memory layer. Maybe a routing loop.

That part is interesting, but it was not the part that started compounding for me.

The part that changed the system was this: who improves the agents after they make the same mistake twice?

I run a solo company with AI agent departments. There is a CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the rest. The specialists do the obvious work. The weird one is the Improver.

It is the agent that reads mistakes, looks for recurring patterns, and edits the system itself.

Not the product code.

The operating system around the agents.

That distinction matters.

Because the useful version of self-improving agents is much more boring than the sci-fi version.

And that is exactly why I trust it.

I did not need more agents. I needed better scar tissue.

The first version of the system already had specialist agents with decent prompts.

The problem was not "I wish I had one more role."

The problem was repetition.

The same kinds of issues kept appearing in different forms:

content that sounded technically correct but did not sound like me
memory that stayed technically valid but got noisier every week
tasks that were flagged as stale over and over without a real escalation path
workflow instructions that were good enough for one run but not good enough to survive contact with the next one

Each one was fixable manually.

But manual fixes do not compound.

If every mistake becomes a one-off correction, the system never gets better. It just gets babysat.

So I added an Improver agent whose whole job is turning mistakes into infrastructure.

The raw input is not intuition. It is lessons.

The Improver does not wake up and freestyle changes.

It works from a very explicit input: lesson entities stored in shared memory.

After a complex task, agents log what went wrong, why it mattered, and what changed.

The structure is intentionally plain:

lesson:2026-02-17:marketing-voice-authenticity
- Agent: Improver
- Category: process
- Summary: Marketing content was too generic
- Detail: Founder feedback showed the writing did not sound like a real engineer
- Action: Rewrote the voice guide and added a founder discovery protocol

That matters because it gives the Improver something better than vibes.

It gets actual failure patterns.

It can group lessons by category: bug, process, knowledge, tool, decision.

Then it can ask a useful question: is this a one-off, or is this a gap in the system?

If three unrelated tasks keep producing the same sort of friction, that is usually not user error.

It is missing infrastructure.

What the Improver is allowed to change

This agent has real edit authority, but the scope is narrow on purpose.

Most of its work lives in the management repo, especially the files that define how the agents behave.

Its change types are basically these:

Change type	What it does	Typical file
New skill	Stores reusable knowledge the system keeps needing	`.github/skills/*/SKILL.md`
New prompt	Captures a recurring workflow	`.github/prompts/*.prompt.md`
Agent update	Tightens responsibilities, guardrails, or working style	`.github/agents/*.agent.md`
Doc update	Adds missing operational context	`.github/copilot-instructions.md`, `AGENTS.md`, project docs
Memory curation	Cleans duplicates, adds relations, prunes stale state	shared knowledge graph

What its remit does not include is even more important.

It is not supposed to rewrite product code because it feels clever.

It is not supposed to invent tax or legal rules.

It is not supposed to change company identity, product positioning, or authority boundaries on its own.

And it follows the same operating constraints and source-of-truth rules as the rest of the system.

The useful version of self-improvement is constrained and auditable.

Not open-ended.

The two trigger modes

The Improver runs in two main ways.

1. Scheduled review

There is a dedicated /improve-agents prompt for periodic system review.

That run audits the agent files, prompts, skills, and memory graph, then looks for gaps that should become reusable infrastructure.

This is the slower, batch-style mode.

Good for pattern detection.

2. Mid-task intervention

This is the more useful mode in practice.

If another agent notices a real gap while working, it calls the Improver immediately.

Not after the task. During it.

That turns "we should fix this later" into "fix the system now, then continue."

The difference sounds small, but it changes the system from retrospective learning to live correction.

Real changes the Improver already made

This is the part I care about most.

The Improver is only interesting if the output is visible in the system afterward.

Here are a few concrete changes it made from actual runs.

Marketing stopped sounding like marketing

On Feb 17, founder feedback was blunt: the content did not sound like a real engineer.

That became a lesson.

The Improver responded by rewriting the Marketing agent's voice guide, adding anti-patterns, adding a content quality gate, and forcing a founder discovery protocol instead of generic startup copy.

That was a real upgrade.

Not just "write better next time."

The system got a domain registry

On Feb 22, the instructions were updated to add a real Domain Registry with a separate Social URL column.

That sounds administrative until a platform blocks one domain and not the fallback.

OpenClawCloud is the live example. For public content, the correct social URL is clawdcloud.net, not the blocked alternative.

Without a registry, every agent has to remember that detail manually.

With a registry, it becomes infrastructure.

Memory stopped growing like a junk drawer

Another improvement pass added memory hygiene rules.

Standups and trend scans now have retention rules. Old noise gets pruned. Permanent lessons and decisions stay.

That is not glamorous work, but stale memory is one of the fastest ways to make a multi-agent system look smart while behaving confused.

Shared context only helps if it stays usable.

Chronic misses stopped getting polite excuses

One of the most useful upgrades came later: chronic miss escalation.

If a task misses two or more deadlines, the COO is now supposed to re-scope it, demote it, kill it, or add a real root-cause note.

No more infinite carryover with a softer ETA.

That was an important change because agent systems are very good at sounding disciplined while quietly tolerating drift.

The Improver is useful precisely when it gets less polite about that.

The hard part is not self-improvement. It is boundaries.

The question I get most often is some version of: does this not drift into chaos?

It would, if the Improver were allowed to treat the whole company as editable text.

That is why the boundaries matter more than the mechanism.

The agent can improve prompts, skills, workflows, and memory hygiene.

Its remit does not include declaring new business facts.

Its remit does not include quietly changing product claims.

Its remit does not include deciding that existing constraints or review triggers are optional now.

And it is not supposed to widen its own authority because that seems efficient.

In other words, the system can improve the procedures that shape work.

It cannot rewrite the constitution.

That is the only reason this feels useful instead of reckless.

What I like about it

The best thing about the Improver is that it turns post-mortems into runtime assets.

A normal post-mortem ends as a paragraph in a doc nobody reads again.

This loop is different.

The mistake becomes a lesson.
The lesson becomes an instruction change.
The instruction change affects the next run.

That is the compound effect.

Not infinite autonomy.

Just a system that gets slightly harder to fool every time it learns something real.

My take

I do not think self-improving agent systems are interesting because they sound futuristic.

I think they are interesting when they make operations more boring.

Better guardrails.
Cleaner memory.
Sharper prompts.
Fewer repeated mistakes.

That is what the Improver does for me.

It is not an agent building a better world in the background.

It is an agent that reads scar tissue and turns it into better constraints.

And for real work, I trust that far more.

My AI Agents Talk to Each Other. Here's the Inter-Agent Communication Protocol

João Pedro Silva Setas — Tue, 24 Mar 2026 07:59:27 +0000

Most multi-agent demos skip the boring part.

They show a planner, a coder, maybe a reviewer, and a nice loop between them.

What they usually do not show is this: how does one agent know when it must ask another one for help?

That turned out to be the hard part in my system.

I run a solo company with AI agent departments. There is a CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the rest. They handle strategy, pricing, tax checks, content, technical reviews, and daily operations across five products.

Giving them roles was easy.

Making the handoffs reliable was not.

Without a protocol, you get one of two bad outcomes.

Agents stay in their lane too hard and miss obvious cross-domain risks
Agents ask everyone about everything and the system turns into a committee

Neither scales.

So I ended up writing a simple inter-agent communication protocol.

Not a vague "collaborate when useful" instruction.

An actual protocol with triggers, message format, loop prevention, and ownership rules.

The problem is not talking. It is knowing when to talk.

The first version of my system had specialist agents, but consultation was soft.

Marketing could write a post about a product.
The CFO could model pricing.
The Lawyer could review GDPR risk.
The CTO could look at technical architecture.

The issue was not capability.

The issue was consistency.

Sometimes an agent would ask for help when it should not.
Sometimes it would skip a review it clearly needed.
Sometimes two agents would bounce the same question back and forth.

That is when I realized the real problem was not prompt quality.

It was routing.

If a pricing decision has tax implications, the CFO must consult the Accountant.
If a public post describes how the system works, Marketing must get a technical accuracy check.
If a content draft makes legal claims, Lawyer review is mandatory.

Once you define those triggers explicitly, the system gets much calmer.

The trigger table is the whole game

The core rule is simple: when work crosses into another domain, peer review becomes mandatory.

That sounds obvious, but it only helps if the triggers are concrete.

Here is the shape of the table I use:

Spending money, pricing, or margin assumptions -> consult CFO
Tax, IVA, invoicing, or deductible expenses -> consult Accountant
GDPR, contracts, terms, or liability -> consult Lawyer
Architecture, infrastructure, or product internals -> consult CTO
Revenue strategy or company direction -> consult CEO
Launches, public messaging, or positioning -> consult Marketing
Multi-step execution across teams -> consult COO

That one table removed a lot of drift.

Agents no longer need to guess whether a topic is "kind of legal" or "sort of technical."

The trigger decides.

Every review request uses the same format

I did not want agents sending free-form messages to each other.

Free-form sounds flexible until you realize every handoff starts losing context in a slightly different way.

So every review request follows the same structure:

## Peer Review Request

From: [agent name]
Call chain: [Agent1 -> Agent2 -> Current]
Task: [what the founder asked for]
What I did: [current work so far]
What I need from you: [specific question]
Context: [only the facts needed for review]

Respond with:
1. APPROVED
2. CONCERNS
3. BLOCKING

That format does three useful things.

First, it forces the requesting agent to explain what problem it is actually solving.

Second, it gives the reviewer a narrow question instead of dumping the full task on them.

Third, it makes the answer easy to incorporate back into the final result.

The reviewer is not taking over ownership.

It is a review, not a handoff.

The COO is the orchestrator, not just another peer

This part matters.

From the outside, it can look like the agents just call each other directly.

That is not how I think about it.

The COO is the central coordinator of the system.

That means the COO owns the execution flow, keeps track of what the founder actually asked for, and decides when work should branch into specialist review.

Specialists still review each other's work.
But the architecture is orchestrated, not social.

That distinction matters because it keeps ownership clear.

If Marketing asks Lawyer to review a product claim, Marketing still owns the post.
If CFO asks Accountant to validate a tax assumption, CFO still owns the pricing output.
If the founder asks for a daily standup, the COO still owns the final standup.

Without that, every task becomes shared ownership.

Shared ownership is where systems get fuzzy.

Two small rules prevent most loops

The protocol has two guardrails that matter more than they look.

1. No-callback rule

An agent cannot call someone already in the current chain.

If the chain is COO -> Marketing -> Lawyer, the Lawyer cannot bounce the question back to COO or Marketing.

That kills the most annoying class of loop immediately.

2. Max depth 3

If the chain already has three agents, the current agent must answer directly.

No more consultation.

This is not mathematically pure. It is operational.

You need a point where the system stops expanding and returns an answer.

In practice, depth 3 has been enough.
It gives room for a real cross-check without turning every task into a recursive meeting.

What this catches in practice

The best part of the protocol is not elegance. It is the mistakes it catches.

A few common examples:

Marketing describing technical systems

This used to be risky.

It is very easy for a content agent to write something that sounds plausible about orchestration, memory, or infrastructure while getting one important detail wrong.

Now the rule is explicit: any public content that describes how the system works gets a technical review before it goes out.

That keeps the writing sharp without turning it into fiction.

CFO making a pricing argument that leaks into tax treatment

Pricing is not just pricing.

It leaks into invoicing, VAT treatment, margin assumptions, and in some cases legal structure.

The CFO can still own the business recommendation. But when tax treatment is part of the answer, the Accountant must review it.

Lawyer checking claims before publication

This one is simple and high leverage.

If a post or landing page makes a trust, compliance, or security claim, it gets reviewed before publishing.

That rule alone prevents a lot of avoidable embarrassment.

The communication matrix is less important than the boundaries

People like diagrams for this kind of thing.

I do too.

But the diagram is not the real system.

The real system is a set of enforced boundaries:

who can review what
when review becomes mandatory
who owns the final answer
when the chain stops

Everything else is presentation.

If you get those four things right, the system feels much more disciplined.

If you leave them vague, even good agents start to look unreliable.

My take

The hard part of multi-agent systems is not specialization.

It is coordination under constraints.

Anyone can make a few agents call each other.

The interesting part is deciding:

when consultation is required
how context is passed cleanly
how loops are prevented
who still owns the result when multiple specialists touch it

That is why I ended up writing a protocol instead of just adding more prompt text.

The protocol made the system less magical.

It also made it more trustworthy.

And in production, I will take trustworthy over magical every time.

Most AI agent demos optimize for capability. Production buyers pay for control.

João Pedro Silva Setas — Thu, 19 Mar 2026 15:16:46 +0000

Every week I see a new AI agent demo.

Book the meeting. Send the email. Refactor the code. Triage the ticket. Trade the stock. Run the company.

The demos are getting better. Some of them are genuinely impressive.

But most of them are optimized for the wrong buyer.

They are optimized for the person watching the demo, not for the person who has to run the system after the demo.

That second person usually cares about different questions.

What can this thing access?
What happens when it gets stuck?
How do I approve risky actions?
What did it actually do?
How do I stop it?
How do I roll it back?
Which secrets can it touch?
How do I explain its behavior to my team?

That is the real product surface.

Not just capability. Control.

Capability gets the screenshot. Control gets the budget.

I think a lot of the current agent market is repeating a familiar pattern.
The first wave proves that the interaction is possible. You show that an LLM can use tools, keep context, and complete a multi-step task. That gets attention fast because it feels new.

Then reality shows up.

The agent fails halfway through a workflow. Or it retries a step six times and burns API credits. Or it drafts something a human should have reviewed first. Or it keeps running after the useful part is already done. Or it touches a system that should have been off-limits.

At that point, the question changes.

It is no longer, "Can this agent do the task?"

It becomes, "Can I trust this system in an environment that matters?"

That is where most demos stop.

Observability is necessary. It is not the whole answer.

A lot of products respond to this by adding better visibility.

You get traces, timelines, logs, token counts, screenshots, and event streams. I like all of that. You need it.

But observability on its own is still passive.

It tells you what happened.

Production users usually need more than that. They need ways to shape what is allowed to happen in the first place.

Watching an agent fail in high resolution is still failure.

The control plane is the part that turns visibility into operational trust.

The control-plane primitives I think matter

If I were evaluating an agent platform for real work, these are the things I would care about first.

1. Narrow permissions by default

An agent should not wake up with broad access to everything.

It should have access to exactly the tools, environments, and credentials required for the job. Nothing more.

If the task is reading support tickets, it does not also need production deploy access.

If the task is drafting copy, it does not also need billing permissions.

The default should be small blast radius.

2. Review points for expensive or risky actions

The most important feature in an autonomous system is often a well-placed pause.

Some actions should be automatic. Some should require a human checkpoint.

That could mean spending above a threshold, writing to production systems, touching customer data, or sending something externally.

I do not see human review as a weakness in the product. I see it as part of the product.

3. Auditability that is actually useful

I want more than a generic activity log.

I want to know which tool was called, under which boundaries, and what happened next.

If something goes wrong, I should be able to reconstruct the path without guessing.

That matters for debugging. It also matters for trust inside a team.

4. Recovery and rollback strategy

People talk a lot about autonomous execution. They talk less about undo.

But if an agent edits configuration, changes data, triggers a workflow, or mutates state, rollback matters.

The system should not just be able to move forward. It should help me recover from a bad step without turning the whole incident into manual archaeology.

5. Credential boundaries

This one is boring, which is exactly why it matters.

Credentials should be isolated by environment, role, and task. Temporary access is better than broad standing access. Fine-grained scopes are better than one giant shared credential.

The more agentic the workflow becomes, the more this matters.

6. Observability tied to action

Yes, I still want traces and telemetry.

But I want them connected to intervention. When I see a loop, I should be able to stop it. When I see cost drift, I should be able to tighten a boundary. When I see a repeated failure, I should be able to change how the runtime behaves.

Good observability should make intervention simpler, not just diagnosis prettier.

This is why I think the category is really about trust

I do not think production users are buying "an agent" in the abstract.

They are buying a system they can trust around an agent.

That trust does not come from a benchmark.

It comes from constraints.

It comes from knowing that the runtime has boundaries. That risky actions can be reviewed. That behavior can be inspected. That failures can be contained. That humans can step in cleanly.

The winning products in this space will probably look less magical over time, not more.

They will feel more operational. More boring. More inspectable.

That is a good sign.

In infrastructure, boring is often what people pay for.

How I would position OpenClawCloud

This is the direction I find most interesting for OpenClawCloud.

Not "host your agents in the cloud" as a generic message.

That is too weak.

The stronger message is closer to this:

OpenClawCloud should be for teams that do not just want agents that can act. They want agents they can supervise.

The value is not raw autonomy.

The value is a managed runtime built around operational trust.

If I am a small team, I probably do not want to assemble review points, action history, credential isolation, recovery strategy, and runtime visibility from scratch around every agent workflow.

I want those concerns handled in one place.

That is the real operational burden.

And it is where I think the product story gets much stronger.

My take

Most agent demos today are selling capability.

I think production buyers are already looking for control.

That is where the serious budget goes.

If your product helps me trust an agent in a real environment, I will pay attention.

If it only helps me watch an impressive demo, I probably will not.

That is the lens I am using for OpenClawCloud.

If you are building or evaluating agent infrastructure, I think this is the right question to ask first:

What is the control plane here?

If you want to follow what I am building around that idea, take a look at clawdcloud.net.

How 8 AI Agents Share a Brain — Building a Persistent Knowledge Graph with MCP

João Pedro Silva Setas — Tue, 17 Mar 2026 15:29:06 +0000

Every multi-agent demo looks smart until the agents need to remember something outside the current prompt.

That is where most systems fall apart.

A CEO agent can suggest strategy. A Marketing agent can draft a thread. A Lawyer agent can block a risky claim. But if each one only sees the current conversation, you do not have a company. You have eight clever goldfish.

I run a solo company with 8 AI agents: CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the others. The part that makes the system actually compound is not the prompts. It is the shared memory layer.

I built that memory as a persistent knowledge graph behind an MCP server. Every agent reads from it. Every agent can add to it. That is how the system remembers decisions, deadlines, lessons, client context, and what already happened this week.

TLDR

Multi-agent systems need shared memory or they keep rediscovering the same context
I use a knowledge graph exposed through MCP so every agent reads and writes the same institutional memory
The hard part was not the schema. It was making file-backed memory survive concurrent writes
The fix was pragmatic: async mutex, atomic writes, auto-repair on load, and strict retention rules

The Problem: Agents Forget the Company Exists

A single chat session can hold a lot of context. A real company cannot depend on that.

The COO needs to know whether /weekly-review already ran. Marketing needs to know which product URL is allowed on X. The Accountant needs the ENI tax regime details. The Improver needs past mistakes. If that context lives only in old chats or random markdown files, each agent spends half its time re-learning the same facts.

That creates three failure modes fast.

First, repeated work. The same questions get answered again because nobody knows the answer already exists.

Second, contradictions. Marketing says a feature is ready. CTO knows it is not. Without a shared source of truth, both answers sound plausible.

Third, no compounding. The system makes mistakes, but the mistakes do not become part of the system.

That last one mattered most to me. If an agent screws up and nothing durable changes, you are paying for the same lesson twice.

What the Shared Brain Stores

I kept the graph deliberately small. It stores things that change decisions, not raw documents.

The core objects are entities and relations.

{
  "name": "SondMe",
  "entityType": "product",
  "observations": [
    "Status: active",
    "Stack: Elixir/Phoenix",
    "Domain: sondme.com"
  ]
}

And relations are simple, active-voice edges:

{
  "from": "Marketing",
  "to": "Lawyer",
  "relationType": "consults"
}

In practice, the graph stores a few categories really well:

Strategic decisions and their rationale
Product status, launch dates, and URLs
Prompt run trackers like prompt-run:weekly-review
Lessons learned after launches or incidents
Deadlines and compliance reminders
Client and pricing context when a deal structure matters later

Just as important is what it does not store.

Raw file contents
Entire chat transcripts
Every observation forever
Anything that is better left in the repo as a document

That boundary matters. If memory becomes a dump of everything, agents stop trusting it because signal gets buried in noise.

Why MCP Was the Right Boundary

I did not want every agent reading arbitrary files directly and inventing its own storage conventions.

The Model Context Protocol gave me a clean interface: memory becomes a tool, not a folder full of tribal knowledge.

That changes the ergonomics a lot.

Instead of "go search old notes and hope you find the right paragraph," the agent asks memory for a specific entity or adds an observation to an existing one. The protocol boundary also made it much easier to share the same memory across different agents and modes.

It is the same reason APIs beat random database access. Fewer ways to be inconsistent.

The First Version Was Simple and Fragile

The storage format is JSONL. One JSON object per line. Easy to inspect, easy to back up, easy to repair by hand.

That simplicity was useful early on. I could open the file and understand what the system knew without needing a graph database, admin UI, or migration layer.

But the naïve version had a nasty problem.

When multiple agents wrote at roughly the same time, the server would:

Load the graph from disk
Modify it in memory
Write the whole graph back

That is fine in a single-writer world.

A multi-agent system is not a single-writer world.

If two write operations start from the same file state, the second write can wipe out the first one without throwing an obvious error. Worse, if a write is interrupted mid-flight, the JSONL file can end up partially corrupted.

That means the shared brain becomes the failure point for the whole company.

The Bug That Forced a Real Architecture

This bug showed up exactly where you would expect: parallel tool calls.

One part of the system would create entities. Another would create relations. Both thought they were doing a legitimate read-modify-write cycle. They were. Just not safely.

The result was classic concurrent state pain:

Lost writes
Duplicate entities
Broken JSON lines
Agents reading stale or malformed memory

That is the moment when "it works in a demo" stops being useful.

I did not solve it with a giant rewrite. I used a pragmatic local fork of @modelcontextprotocol/server-memory and added three protections.

1. Async mutex

All mutating operations go through a single queue. One write at a time.

class Mutex {
  constructor() {
    this.queue = [];
    this.locked = false;
  }

  async acquire() {
    return new Promise(resolve => {
      if (!this.locked) {
        this.locked = true;
        resolve();
      } else {
        this.queue.push(resolve);
      }
    });
  }

  release() {
    if (this.queue.length > 0) {
      this.queue.shift()();
    } else {
      this.locked = false;
    }
  }
}

It is not glamorous. It is effective.

2. Atomic writes

Every save writes to a temporary file first, then renames it over the original.

That means a crash gives me either the old valid file or the new valid file. Not half of one and half of the other.

3. Auto-repair on load

The loader wraps each line parse in a try/catch, skips corrupt lines, and deduplicates entities and relations.

That turned memory corruption from a wake-up-and-debug event into a survivable incident.

Not pretty. Very useful.

Why a Knowledge Graph Beats Shared Notes

A flat shared notes file works until you need relationships.

Once you have agents consulting each other, products sharing infrastructure, deadlines tied to prompts, and lessons attached to incidents, the graph model becomes much more natural.

A few examples from my setup:

The COO can see that prompt-run:monthly-accounting is overdue without searching past chats
Marketing can check the product registry before using a URL in a post
The Improver can scan lesson entities and spot recurring failures
Client deal structures can be stored once and reused by CFO and Accountant later

The graph is doing two jobs at once:

It is a memory layer
It is a constraint layer

That second part matters. Good memory is not just recall. It is preventing the system from making the same wrong move again.

Retention Rules Matter More Than People Expect

The graph would be useless if it only grew.

So I added retention rules.

Standups: keep 7 days
Trend scans: keep 7 days
Campaigns: prune 30 days after completion
Lessons and decisions: permanent
Prompt trackers: permanent and tiny

This sounds like housekeeping. It is actually part of system quality.

If stale operational data hangs around forever, agents start mixing old state with current state. That is how you get false overdue alerts, outdated campaign assumptions, and dead leads showing up in fresh plans.

Memory hygiene is part of reliability.

What Changed After Adding Shared Memory

The best effect was not that agents became smarter.

It was that they became less repetitive.

The COO can run a standup without rediscovering the same recurring deadlines. Marketing can pick up the current positioning of a product without me re-explaining it. The Improver can look at actual accumulated mistakes instead of vague impressions.

The system feels less like prompt orchestration and more like a company with institutional memory.

That is the difference between a novelty and an operating model.

What I Would Do Differently

If I were rebuilding this today, I would make two changes earlier.

First, I would design retention rules on day one. I added them after feeling the pain.

Second, I would move sooner toward a BEAM-native version of this memory server. The JavaScript fork works, but a single GenServer processing writes sequentially is much closer to the shape of the problem.

The current version is stable enough to run the company. It is not the final form.

The Real Takeaway

The interesting part of multi-agent systems is not "can one agent call another."

It is whether the whole system can remember, constrain itself, and improve from mistakes.

Without shared memory, every agent is just renting intelligence by the prompt.

With a durable shared brain, the system starts to compound.

That is the part I would build first.

I’m João, a solo founder from Portugal building SaaS products with Elixir and Phoenix. I write about the real mechanics of running a company with AI agents: what works, what breaks, and what I’d change next.

Why Erlang's Supervision Trees Are the Missing Piece for AI Agents

João Pedro Silva Setas — Wed, 11 Mar 2026 15:25:13 +0000

Every week, a new AI agent framework launches. LangChain, CrewAI, AutoGen, Magentic-One — the list grows faster than anyone can evaluate.

They all solve the same problem: how do you make an LLM do multi-step tasks? Chain some prompts, give it tools, add memory. Ship it.

But none of them answer the question that actually matters in production: what happens when your agent crashes at 3am?

I run 8 AI agents that manage my solo company — CEO, CFO, COO, Marketing, Accountant, Lawyer, CTO, and an Improver that upgrades the others. They share a persistent knowledge graph, consult each other automatically, and post content to social media while I sleep.

They crash. Regularly.

Why AI Agents Aren't Containers

Here's the core problem most frameworks ignore: AI agents are deeply stateful.

A web server is (mostly) stateless. Kill the container, spin up a new one from the same image. No data lost. Kubernetes was designed for exactly this pattern.

AI agents are different:

Context accumulates — an agent mid-task holds a conversation history, tool call results, intermediate reasoning. Lose that, and it starts over from scratch.
Failures are semantic, not just process failures — "the agent entered an infinite loop and burned $50 in API tokens" is different from "the container OOM-killed." You need supervision that understands what went wrong, not just that something stopped.
Coordination requires state — agents that collaborate share context, delegate subtasks, track who's done what. Kill one, and the others are left with stale references.
Costs are real — every crashed-and-restarted agent potentially re-runs expensive LLM calls. Crash recovery isn't just about uptime. It's about not burning money.

Most frameworks deal with this by... not dealing with it. They assume the happy path. If something fails, you restart the whole script manually.

That works for demos. It doesn't work when your agent is supposed to post a tweet at 14:00 UTC every day, rain or shine.

Erlang Solved This in 1986

In 1986, Joe Armstrong and the Ericsson team had a problem: build telephone switches that handle millions of concurrent calls with 99.999% uptime. That's 5.26 minutes of downtime per year.

Their solution: don't prevent crashes. Expect them and recover automatically.

This led to OTP (Open Telecom Platform) and its killer feature: supervision trees.

The core idea is simple:

Every process has a supervisor — a parent process whose only job is watching children
When a child crashes, the supervisor restarts it according to a defined strategy
Supervisors can supervise other supervisors — creating a tree of fault tolerance
The restart happens in microseconds, not seconds

Here's what a basic agent supervisor looks like in Elixir:

defmodule AgentSupervisor do
  use Supervisor

  def start_link(opts) do
    Supervisor.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def init(_opts) do
    children = [
      {AgentWorker, id: :ceo, role: :strategy, model: :claude_sonnet},
      {AgentWorker, id: :marketing, role: :content, model: :claude_sonnet},
      {AgentWorker, id: :accountant, role: :tax, model: :claude_haiku},
      {MemoryServer, path: "memory.jsonl"},
      {SchedulerWorker, interval: :timer.minutes(5)}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end
end

Three restart strategies cover every failure pattern:

:one_for_one — only restart the crashed process. Perfect for independent agents.
:one_for_all — restart everything if one crashes. Use when tightly coupled agent teams have shared state where partial state is worse than a full restart.
:rest_for_one — restart the crashed process and everything started after it. Useful when later agents depend on earlier ones.

What This Looks Like in Practice

Here's a real scenario from my system. My agents share a persistent knowledge graph stored as a JSONL file — one JSON object per line, each representing an entity or relation. Eight agents read and write to this file through a Model Context Protocol (MCP) memory server. Every strategic decision, client pipeline update, prompt run timestamp, and lesson learned goes here.

The race condition was textbook. When multiple agents fire parallel tool calls — say, create_entities and create_relations in the same batch — both operations would:

Read the entire JSONL file into memory
Parse every line into an in-memory graph
Append their new entities/relations
Serialize the full graph back to disk

Step 4 is the problem. Both operations read the same file state. Both write back the full graph plus their additions. The second write obliterates the first's additions entirely. No error, no warning — data just vanishes.

In a typical framework, this would mean:

Agent tries to read memory → gets a JSON parse error (if a write was interrupted mid-line)
Agent crashes or returns garbage
I wake up, see broken output, manually debug the JSONL file
Fix the file, restart everything
Repeat next time it happens

With supervision trees:

Memory server process detects corruption on load
Process crashes — intentionally. In Erlang, crashing is a feature, not a bug.
Supervisor restarts the memory server in microseconds
On restart, the init callback runs auto-repair: wraps each JSON.parse in a try/catch, skips corrupt lines, deduplicates entities by name and relations by from|type|to key
Agents resume with clean data
I'm asleep. Everything just works.

The fix I implemented to address the root cause: a local fork of the MCP memory server with three additions:

Async mutex — a queue-based lock that serializes all write operations. When one saveGraph() is running, subsequent calls wait their turn. This eliminates the read-modify-write race entirely.
Atomic writes — every save writes to a .tmp file first, then renames it over the original. A crash mid-write gives you either the old complete file or the new complete file — never a half-written mess.
Auto-repair on load — the graph loader wraps each line's JSON.parse in a try/catch. Corrupt lines get skipped with a warning. Duplicate entities (same name) and duplicate relations (same from/type/to triple) are collapsed.

Here's roughly what the mutex pattern looks like:

class Mutex {
  constructor() { this._queue = []; this._locked = false; }
  async acquire() {
    return new Promise(resolve => {
      if (!this._locked) { this._locked = true; resolve(); }
      else { this._queue.push(resolve); }
    });
  }
  release() {
    if (this._queue.length > 0) this._queue.shift()();
    else this._locked = false;
  }
}

// Every mutating operation goes through the lock:
async createEntities(entities) {
  await this.mutex.acquire();
  try {
    const graph = await this.loadGraph();  // read
    graph.entities.push(...newEntities);    // modify
    await this.saveGraph(graph);            // write (atomic)
  } finally { this.mutex.release(); }
}

This is exactly the kind of infrastructure problem that disappears on the BEAM. Erlang processes don't share memory. Each process has its own heap. There's no concurrent write to the same file because the memory server is a single GenServer processing messages sequentially from its mailbox — mutual exclusion is built into the execution model, not bolted on with a mutex.

The key insight: the supervision tree doesn't prevent the bug. It makes the bug survivable. The corrupt write still happens occasionally (on the JavaScript version — the BEAM version wouldn't have this class of bug at all), but the system recovers before anyone notices.

Each Process Is an Island

BEAM processes (Erlang's virtual machine) have properties that map perfectly to AI agents:

Isolation — each process has its own heap memory. A crash in one can't corrupt another. Your Marketing agent going haywire can't touch the Accountant's tax calculations.
Lightweight — each process is ~2KB. You can run hundreds of thousands on a single machine. An 8-agent system with tool workers, a memory server, and a scheduler process would fit comfortably on a machine with 256MB RAM.
Preemptive scheduling — the BEAM VM enforces fair CPU sharing. One agent stuck in an expensive computation can't starve the others. Every agent gets its turn.
Message passing — agents communicate by sending immutable messages. No shared mutable state, no locks, no race conditions (except at I/O boundaries, which is where the mutex comes in).

Compare this to running AI agents as Python threads or async tasks. One unhandled exception can take down the entire process. One memory leak slowly poisons the whole system. One blocking call freezes everything.

My current system runs on Node.js with a hand-rolled mutex and atomic file writes to paper over exactly these problems. It works — 91% scheduler success rate, auto-repairing memory, months of uptime. But every fix is fighting the runtime instead of working with it. On the BEAM, process isolation and sequential mailbox processing eliminate entire categories of bugs before you write a line of application code.

Why This Matters Now

AI agents are moving from demos to production. And production means:

Agents that run 24/7, not just during a demo
Real money flowing through API calls ($0.01 per prompt adds up quick when an agent loops)
Users depending on outputs — posts that need to go out, invoices that need to be generated, compliance deadlines that can't be missed
Multiple agents coordinating, where one failure cascades if not contained

The industry is rediscovering problems that telecom solved decades ago. Ericsson's AXD 301 switch achieved 99.9999999% uptime — nine nines — using these exact patterns. Not because the hardware never failed, but because the software expected failure and recovered faster than users noticed.

Your AI agent doesn't need nine nines. But it does need to survive a 3am crash without you waking up to fix it.

The Counterargument

"But I'm not going to rewrite my Python agent in Elixir."

Fair. And you don't have to. The supervision tree pattern is more important than the language:

Wrap agents in health-check loops that detect hangs and kill them
Checkpoint state regularly so a restart doesn't lose everything
Set budget caps that pause agents before they burn your API credits
Monitor semantically — is the agent making progress, or is it looping?

But if you're choosing a foundation for a new agent system — especially one that needs to run multiple coordinating agents reliably — I'd argue the BEAM gives you a 40-year head start. These patterns aren't libraries you install. They're built into the runtime.

What I'd Build Next

If I were starting a new AI agent platform from scratch today:

Process-per-agent with OTP supervisors
State checkpointing to PostgreSQL on every tool call
Per-agent spend tracking with configurable budget caps
PubSub for inter-agent messaging — no external message queue needed
Telemetry hooks for observability (OpenTelemetry + Sentry)

This is roughly what I'm building with OpenClaw Cloud, and it's why I chose Elixir for the stack. Not because Elixir is trendy, but because the problem — running many stateful, failure-prone, communicating processes — is literally what the BEAM was designed for.

I'm João, a solo developer from Portugal building SaaS with Elixir and Phoenix. I recently wrote about running a solo company with AI agent departments — this article is the technical deep-dive on why that system stays reliable. Find me on X (@joaosetas).

I Run a Solo Company with AI Agent Departments

João Pedro Silva Setas — Tue, 03 Mar 2026 10:41:20 +0000

TLDR:

I'm a solo founder running 5 SaaS products with 0 employees
I built 8 AI agent "departments" using GitHub Copilot custom agents — CEO, CFO, COO, Lawyer, Accountant, Marketing, CTO, and an Improver that upgrades the others
They share a persistent knowledge graph, consult each other automatically, and self-improve
Here's how it actually works, with code snippets and honest tradeoffs

The Premise

I run a solo software company from Braga, Portugal. Five products. Zero employees. Zero funding.

The products: SondMe (radio monitoring), Countermark (bot detection), OpenClawCloud (AI agent hosting), Vertate (verification), and Agent-Inbox. All built with Elixir, Phoenix, and LiveView. All deployed on Fly.io for under €50/month total.

The problem: even a solo founder needs to handle marketing, accounting, legal compliance, operations, financial planning, and tech decisions. Wearing all those hats meant things slipped. Deadlines got missed. Content didn't get posted. IVA filings almost got forgotten.

So I built something weird: a full virtual company where every department is an AI agent.

The Agent Roster

Each agent is a markdown file in .github/agents/ inside my management repo. GitHub Copilot loads the right agent based on which mode I'm working in. Here's the team:

Agent	Role	What It Actually Does
CEO	Strategy & trends	Scans Hacker News and X for market signals. Validates product direction against trends.
CFO	Financial planning	Pricing models, cash flow projections, cost analysis. Checks margins before I commit to anything.
COO	Operations	Runs daily standups. Maintains the sprint board. Orchestrates other agents.
Marketing	Content & growth	Writes all social media content in my voice. Schedules posts. Runs engagement routines.
Accountant	Tax & invoicing	Portuguese IVA rules, IRS simplified regime, invoice requirements. Knows fiscal deadlines cold.
Lawyer	Compliance	GDPR, contracts, Terms of Service. Reviews product claims before Marketing publishes them.
CTO	Architecture	Build-vs-buy decisions, DevOps, stack consistency across all 5 products.
Improver	Meta-agent	Reads past mistakes and upgrades the other agents. Creates new skills. The system evolves itself.

These aren't chatbots. Each agent has domain-specific instructions, access to real tools (MCP servers for X, dev.to, Sentry, scheduling, memory), and the authority to act autonomously.

How It Works — The Architecture

Agent Files

Each agent is a .agent.md file with structured instructions:

# Marketing Agent — AIFirst

## Core Responsibilities
- Content strategy and calendar
- Social media posting (via X and dev.to MCP tools)
- Community engagement
- Launch planning

## Content Voice & Tone
- First person singular ("I", never "we")
- Technical substance over hype
- Show the work — code, configs, real numbers
- No: revolutionary, game-changing, leverage, synergy...

## Autonomous Execution
- Posts tweets directly via scheduler
- Publishes dev.to articles (published: true)
- Engagement: likes, replies, follows — every day

The key insight: these aren't generic "be helpful" prompts. The Marketing agent knows my posting schedule, my voice quirks, which platforms I use, which URLs are blocked on X, and which products to rotate in the content calendar. The Accountant knows Portuguese ENI tax law, IVA quarterly deadlines, and the simplified IRS regime. Real domain expertise encoded in markdown.

Shared Memory — The Knowledge Graph

This is where it gets interesting. All agents share a persistent knowledge graph via a Model Context Protocol (MCP) memory server. What one agent learns, every other agent can read.

┌──────────┐    ┌─────────────┐    ┌──────────┐
│ Marketing│───→│             │←───│ CFO      │
│          │    │  Knowledge  │    │          │
│ CEO      │───→│    Graph    │←───│Accountant│
│          │    │             │    │          │
│ Lawyer   │───→│ (memory.jsonl)│←──│ Improver │
└──────────┘    └─────────────┘    └──────────┘

Entities have types: product, decision, deadline, client, metric, lesson. Relations use active voice: owns, uses, built-with, depends-on.

Real example of what's stored:

Strategic decisions and their rationale
Product status, launch dates, key metrics
Financial data (pricing decisions, cost benchmarks)
Legal and compliance decisions
Lessons learned from launches and incidents

The memory has retention rules too — standups older than 7 days get pruned, but lessons and decisions are permanent. It's the company's institutional memory.

Inter-Agent Communication

Here's the part that surprised me most. Agents consult each other automatically when their work crosses into another domain.

The protocol works like this: each agent has a trigger table. When Marketing writes a product claim, it auto-calls the Lawyer for review. When CFO does pricing, it calls the Accountant to verify tax treatment. When CTO proposes infrastructure changes, it calls CFO to check the cost impact.

CEO ←→ CFO        Strategy ↔ Financial viability
CEO ←→ CTO        Strategy ↔ Technical feasibility
CFO ←→ Accountant Financial plans ↔ Tax compliance
Marketing ←→ Lawyer  Campaigns ↔ Legal compliance
COO → any          Orchestrator can call any agent

The peer review request format looks like this:

## Peer Review Request

**From**: Marketing
**Call chain**: COO → Marketing
**Task**: Draft product launch tweet for Countermark
**What I did**: Wrote tweet claiming "99% bot detection accuracy"
**What I need from you**: Is this claim substantiated?

Please respond with:
1. ✅ APPROVED
2. ⚠️ CONCERNS
3. 🔴 BLOCKING

Call-chain tracking prevents infinite loops — each consultation includes who's already been called, and there's a max depth of 3. If CFO calls Accountant, the Accountant can't call CFO back.

The Daily Standup

Every morning, the COO agent runs a standup that:

Checks Sentry for errors across all 5 products
Scans the sprint board for overdue tasks
Checks if periodic prompts are overdue (weekly review, monthly accounting, quarterly IVA)
Reads the knowledge graph for context
Delegates tasks to other agents
Produces a prioritized day plan

It's not a status meeting — it's an automated orchestration run that delegates work to the right specialist.

Self-Improvement — The Improver Agent

This is the weirdest (and possibly most valuable) part. There's a meta-agent called the Improver whose job is to:

Read lesson entities from memory (mistakes and learnings logged by other agents)
Identify patterns across sessions
Create new skills (reusable instruction files for specific domains)
Update other agents' instructions when gaps are found
Propose new agents when workload patterns suggest one is needed

After every complex task, agents store a lesson:

Entity: lesson:2026-02-10:memory-corruption
Type: lesson
Observations:
  - "Agent: CTO"
  - "Category: bug"
  - "Summary: Concurrent memory writes corrupted JSONL file"
  - "Detail: Parallel tool calls to create_entities and create_relations
    caused race condition in the memory server"
  - "Action: Added async mutex + atomic writes to local fork"

The Improver reads these monthly and upgrades the system. The system literally improves itself.

The Honest Tradeoffs

This isn't a "10x productivity" pitch. Here's what's actually hard:

Context Windows Are Real

Each agent operates within a context window. Long, complex tasks can exceed it. The solution: agents delegate heavy data-gathering to subagents to keep their own context focused. It works, but it's a constant architectural consideration.

Agents Hallucinate

The Lawyer catches most compliance hallucinations before they reach production. The inter-agent review protocol exists because of this — multiple agents checking each other's work is the safety net.

Memory Corruption

We hit this one early. The knowledge graph is stored as a JSONL file. When multiple agents made parallel tool calls (writing entities and relations simultaneously), the file got corrupted — partial writes, duplicate entries, broken JSON lines.

The fix: I forked the upstream MCP memory server and added three things:

Async mutex — prevents concurrent saveGraph() calls
Atomic writes — writes to a .tmp file then renames
Auto-repair on load — skips corrupt lines and deduplicates

It's Not a Replacement for Thinking

The agents are good at executing within their domain. They're bad at knowing when the domain is wrong. Strategic pivots, gut-feel product decisions, "this just doesn't feel right" — that's still me.

Month 2 Results

After two months of running this system:

Revenue: €6.09 (one subscriber, from day 2. No ads, no outreach.)
Infrastructure: ~€42/month (Fly.io across all apps)
Content output: 84+ tweets, 5 dev.to articles, multiple HN comments
Time on marketing: less than 1 hour per week (agents handle scheduling, drafting, and engagement)
Compliance: zero missed deadlines (IVA, IRS, Segurança Social all tracked)

The revenue is barely there. But I ship every week, the system keeps improving, and I'm building in public with a team that costs €0.

The Code

The entire system lives in a single management repo:

.github/
  agents/
    ceo.agent.md
    cfo.agent.md
    coo.agent.md
    marketing.agent.md
    accountant.agent.md
    lawyer.agent.md
    cto.agent.md
    improver.agent.md
  copilot-instructions.md    # Global company identity + protocols
  skills/
    portuguese-tax/SKILL.md
    saas-pricing/SKILL.md
    seguranca-social/SKILL.md
  instructions/
    marketing.instructions.md
    ...
Marketing/
  social-media-sop.md
  social-media-strategy-2026.md
  drafts/
    week-2026-W09.md
    ideas.md
    ...
BOARD.md                     # Sprint board (COO-maintained)
Setas/
  Atividade.md               # Fiscal framework
  INSTRUCTIONS.md            # Operational manual

The copilot-instructions.md file is loaded into every Copilot interaction. It defines the company identity, agent system, memory protocols, communication rules, and product registry. It's the constitution of the virtual company.

Skills are reusable knowledge modules — portuguese-tax/SKILL.md contains complete IVA scenarios, IRS regime rules, invoice requirements, and deadline calendars. The Accountant agent loads this skill automatically when handling tax questions.

What I'd Do Differently

If I were starting fresh:

Start with 3 agents, not 8 — COO, Marketing, and Accountant cover 80% of the value. Add specialists when the workload justifies them.
Invest in memory early — the knowledge graph is the most valuable part. It compounds over time. I wish I'd been more disciplined about what gets stored from day one.
Test agent outputs against each other — the inter-agent review protocol was added after hallucinations caused problems. Build it in from the start.

Why This Matters

I'm not claiming AI agents replace human teams. They don't. What they do is let a solo founder operate with the structure of a team — defined roles, communication protocols, institutional memory, and systematic improvement.

The alternative was either hiring people I can't afford or continuing to drop balls. This gives me a middle path: structured execution with human judgment at the critical points.

The system cost: €0 (GitHub Copilot is included in my existing subscription). The time to build: maybe 40 hours total over 2 months. The ongoing maintenance: the Improver handles most of it.

If you're a solo founder drowning in operational overhead, this might be worth trying. Not because AI agents are magic — but because the structure they enforce is valuable even when the agents themselves are imperfect.

I'm João, a solo developer from Portugal building SaaS products with Elixir. I write about the real experience of building in public — the numbers, the mistakes, and the weird experiments like this one. Follow me on dev.to or X (@joaosetas).

What OpenClaw Actually Is — and Why Running Claws Needs a Cloud

João Pedro Silva Setas — Sun, 22 Feb 2026 14:26:15 +0000

What OpenClaw Actually Is — and Why Running Claws Needs a Cloud

I've seen the same question in three separate HN threads this week:

"I don't even know what OpenClaw is."

Fair enough. The name has been everywhere — Karpathy coined "Claws" as a category, Peter Steinberger joined OpenAI to scale the framework, and suddenly there are 1,400+ comments on HN about it. But the signal-to-noise ratio is terrible. Half the discussion is people explaining it to each other incorrectly.

So here's a straightforward breakdown from someone who's been building infrastructure for this exact category of software.

OpenClaw: the framework

OpenClaw is an open-source framework for building autonomous AI agents that can use computers. Not chatbots. Not autocomplete. Agents that open terminals, write files, make API calls, browse the web, and chain complex multi-step tasks together.

Think of it as the orchestration layer. You point it at a task — "refactor this module," "set up monitoring for this service," "research these 10 competitors and write a summary" — and it breaks that into subtasks, executes them, handles errors, and reports back.

The framework handles:

Tool use — file I/O, shell commands, HTTP requests, browser automation
Memory — persistent context across sessions so the agent remembers what it did yesterday
Multi-agent coordination — multiple specialized agents collaborating on a task (one writes code, another reviews it, a third deploys it)
The Gateway Protocol — a standardized way for agents to discover and call external tools and services

It started as Claude Code's backbone, but now it's a foundation project. Model-agnostic in theory, though Claude still runs it best in practice.

Claws: the category

Karpathy's contribution was naming the category, not the framework. A "Claw" is any autonomous computer-using agent — whether it's built on OpenClaw, LangChain, CrewAI, or duct tape and bash scripts.

His NanoClaw demo — ~4,000 lines of Python, fully auditable — showed that you don't need a massive framework to build one. And he's right. For tinkering, for learning, for running a single agent on your laptop while you watch it work, self-hosting is great.

But here's where the conversation keeps going off the rails.

The "I'll just run it on my Mac Mini" problem

Every other comment in these threads is some variation of:

"Why would I need cloud hosting? I'll just run OpenClaw on my home server."

And for a Saturday afternoon project? Sure. But the moment you want agents running 24/7 — doing real work, handling real data, costing real money in API calls — you run into the same infrastructure problems that every production service hits:

1. Uptime is not optional

Your Mac Mini reboots for updates. Your ISP has an outage. Your cat steps on the power strip. Meanwhile, your agent was mid-task on a 45-minute code refactor and just lost all its context.

2. Supervision is hard

What happens when an agent enters an infinite loop and burns through $200 in API tokens in 30 minutes? (This literally happened to someone this week — saw it on X.) You need something watching the watcher. Circuit breakers, spend caps, automatic restarts with state recovery.

3. Multi-agent coordination requires real infrastructure

Running one agent is a process. Running five agents that talk to each other, share memory, and coordinate tasks? That's a distributed system. You need process isolation, message passing, shared state management, and failure recovery. On your laptop, one crashed agent takes down the others.

4. Memory management is a bottleneck

Agents have limited context windows. When they compact, they lose information. Persistent memory across sessions — what the agent learned yesterday affecting what it does today — requires a proper storage layer, not a JSON file on disk.

5. Cost observability

When you're burning $0.02/request across 500 requests/day across 3 agents, you need dashboards. You need per-agent cost tracking. You need alerts before you get a $400 surprise on your Anthropic bill.

Why I'm building cloud hosting for Claws

I run 5 Elixir apps on Fly.io for under €50/month. The reason the economics work is the BEAM virtual machine — Erlang's runtime, which Elixir runs on.

OTP supervision trees are basically purpose-built for the exact problems Claws have:

# Each agent gets its own supervised process
children = [
  {AgentSupervisor, agent_id: "code-reviewer", model: :claude_sonnet},
  {AgentSupervisor, agent_id: "deploy-bot", model: :claude_haiku},
  {MemoryStore, persistence: :postgres},
  {SpendTracker, budget_cents: 5000, alert_at: 4000}
]

Supervisor.start_link(children, strategy: :one_for_one)

If an agent crashes, the supervisor restarts it. If a supervisor crashes, its parent restarts it. State is recovered from persistent storage. No systemd, no cron hacks, no Docker restart policies — just the runtime doing what it was designed for 30 years ago.

This is what I'm building with OpenClawCloud. Multi-tenant Claw hosting on BEAM:

Process-per-agent isolation — one tenant's runaway agent can't affect another's
Automatic supervision — crash recovery with state persistence
Spend tracking — per-agent API cost monitoring with configurable caps
Gateway Protocol support — agents discover and use external tools through the standard protocol
Persistent memory — agent context survives restarts, compactions, even deployments

It's early. Really early. I have one paying subscriber and a handful of free users. But the timing feels right — people are building Claws faster than the infrastructure to run them.

The self-hosting vs. managed hosting tradeoff

I'm not going to pretend self-hosting is wrong. Karpathy's point about NanoClaw being ~4,000 lines of auditable code is a genuine trust argument. You can see exactly what it does. That matters.

But it's the same tradeoff web developers made 15 years ago. You can run your Rails app on a VPS you manage yourself. You can handle your own backups, SSL certs, log aggregation, and 3 AM pages. Most people eventually decide they'd rather pay someone to handle that so they can focus on what the agent actually does.

The infrastructure layer for Claws is going to be commoditized within a year. The question is whether you want to build it yourself or use something that already exists.

I'm biased, obviously. But even if you don't use OpenClawCloud — if you're running agents in production, please don't run them on your Mac Mini. At minimum:

Set up spend caps on your API provider
Use process supervision (systemd at minimum, OTP if you're lucky enough to be on BEAM)
Persist agent state externally (not in-memory, not local JSON)
Monitor costs per agent, not just in aggregate
Have a kill switch

The Claws wave is real. The infrastructure to support it is still being built. That's the part I find interesting.

I'm João, a solo developer from Braga, Portugal. I build SaaS products with Elixir and ship them on Fly.io. OpenClawCloud is at clawdcloud.net.

What Are "Claws"? And Why You Shouldn't Run Them on Your Mac Mini

João Pedro Silva Setas — Sat, 21 Feb 2026 12:05:18 +0000

What Are "Claws"? And Why You Shouldn't Run Them on Your Mac Mini

Andrej Karpathy just posted a mini-essay about buying a Mac Mini to tinker with what he calls "Claws" — persistent AI agent systems that sit on top of LLMs. He names OpenClaw, NanoClaw, zeroclaw, ironclaw, picoclaw. Simon Willison calls "Claw" a term of art for the entire category.

When Karpathy names something, it sticks. He coined "vibe coding." This is the same energy.

Here's his definition:

"Just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level."

I've been building managed infrastructure for exactly this category. Let me break down what Claws are, why running them on a Mac Mini has real tradeoffs, and what the alternative looks like.

What Makes a Claw Different from an Agent?

Regular LLM agents run, do a thing, and stop. You prompt them, they respond, maybe they call a tool, done.

Claws are persistent. They:

Run continuously on hardware or a server
Have their own scheduling — they do things without you asking
Maintain context across sessions and conversations
Communicate via messaging protocols (MCP, etc.)
Orchestrate multiple agents with tool access

Think of it as the difference between running a script and running a service. A Claw is a service. It's always on, always watching, always ready to act.

The Mac Mini Angle

Karpathy bought a Mac Mini specifically to run Claws. The Apple Store told him they're "selling like hotcakes and everyone is confused." Makes sense — decent hardware, small form factor, runs 24/7 at home.

But here's where I have some thoughts, as someone who's been running persistent Elixir services for a while now.

The Self-Hosting Pain List

I love self-hosting. I really do. But running a persistent AI agent system on a box under your desk means you're now responsible for:

Uptime. Your Claw goes down when your power goes out, when your ISP hiccups, when macOS decides it needs to update at 3am. The whole point of a Claw is that it's always on. "Always on except when it isn't" is a rough spec.

Networking. Your Claw needs to talk to the internet — receive webhooks, call APIs, expose endpoints. That means port forwarding, dynamic DNS, TLS certificates, and hoping your router cooperates. Your ISP probably gives you a dynamic IP.

Security. You're running an AI agent with tool access on your home network. It can execute code, make API calls, access file systems. One misconfigured permission and your Claw can see everything on your LAN.

Updates and maintenance. The Claw ecosystem is evolving fast. OpenClaw pushes updates regularly. You need to manage versions, handle breaking changes, keep dependencies current. On a personal Mac Mini, that's manual work.

Process supervision. What happens when a Claw process crashes? On a Mac Mini, it just... dies. You need to build your own restart logic, health checks, and monitoring. This is a solved problem in production infrastructure, but not on your desktop.

Scaling. Today you run one Claw. Tomorrow you want three. Next month you want one per project. A Mac Mini has finite resources and no way to scale horizontally.

Why Managed Hosting Makes Sense

I'm building OpenClawCloud because I think this pain list is going to hit most people who try to run Claws seriously.

The architecture is built on Elixir and runs on Fly.io. Here's why that matters for Claws specifically:

Supervision Trees

Elixir's OTP supervision is designed exactly for this — long-running processes that need to stay alive. If a Claw process crashes, the supervisor restarts it automatically. No cron jobs, no systemd hacking, no Docker restart policies. It's built into the runtime.

Process Isolation

Each tenant's Claw runs in its own isolated process. One Claw crashing doesn't take down another. The BEAM VM was literally built for this — telecom-grade reliability for concurrent, independent processes. Ericsson designed it in the '80s to keep phone switches running. Turns out that's exactly what persistent AI agents need too.

Built-in Scheduling

Claws need to do things on their own schedule. Elixir has Process.send_after, GenServer timers, and libraries like Oban for persistent job scheduling. No external cron needed. The agent's scheduler lives in the same runtime as the agent itself.

Economics That Work

I run 5 Elixir apps on Fly.io for under €50/month total. The infrastructure is efficient enough that hosting multiple Claws per machine is practical without burning through a credit card.

The State of the Ecosystem

Karpathy mentions several projects, each taking a different approach:

OpenClaw — the full-featured option, though Karpathy himself admits he's "a bit sus'd" about running it directly
NanoClaw — ~4000 lines of core code. Karpathy likes that it "fits into both my head and that of AI agents" — auditable and minimal, runs in containers by default
zeroclaw, ironclaw, picoclaw — variations on the theme with different tradeoffs around size, security, and features

The ecosystem hasn't consolidated yet. But the pattern is clear: people want persistent, tool-enabled AI agent systems that run autonomously.

Where This Is Going

Karpathy naming this category matters. "Vibe coding" went from a tweet to a conference talk title in weeks. "Claws" as a term of art is going to follow the same trajectory. Simon Willison is already using it. It even comes with an established emoji: 🦞

The interesting question isn't whether Claws are real — they obviously are. It's whether the infrastructure catches up. Right now, the default path is "buy hardware and figure it out yourself." That works for tinkering. For production use — agents managing your calendar, monitoring your infrastructure, handling customer requests — you need something more robust.

That's the gap I'm building OpenClawCloud to fill. You bring your Claw config, I handle deployment, uptime, and process supervision. No Mac Mini required.

I'm a solo founder building this in Elixir from Braga, Portugal. It's early days, but the foundation is solid — and today, thanks to Karpathy, the category has a name.

If you're running Claws on a Mac Mini and loving it? Respect. That's how all good infrastructure starts — someone tinkering at home until they need more.

I'm @joaosetas on X. Building OpenClawCloud and other Elixir SaaS products in public.

I Run 5 Elixir Apps on Fly.io for Under €50/Month — Here's the Breakdown

João Pedro Silva Setas — Wed, 18 Feb 2026 07:11:01 +0000

I'm a solo founder running 5 SaaS products. All Elixir/Phoenix. All on Fly.io. Total infrastructure cost: under €50/month.

Here's exactly how.

The Products

I'm building these under AIFirst, my solo software company based in Portugal:

SondMe — survey platform (Phoenix + LiveView)
Countermark — bot detection without CAPTCHAs
OpenClawCloud — managed hosting for AI agents
Vertate — verification platform
Agent-Inbox — AI agent communication interface

All built with the same stack: Elixir, Phoenix, LiveView, PostgreSQL.

Why Fly.io

I evaluated several hosting options before settling on Fly.io. Here's why it won:

1. Pay for what you run

No "instance hours" padding. I pay per machine, per second of uptime.

2. Auto-stop is a game-changer

[http_service]
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0

My low-traffic apps sleep when nobody's using them. They wake up on the first request — cold start is about 2 seconds for a Phoenix app. Staging environments cost literally nothing.

3. Elixir is a first-class citizen

Fly.io has official Elixir support. fly launch detects Phoenix, generates a working Dockerfile, and handles releases. Libraries like dns_cluster for distributed Erlang just work out of the box.

4. Multi-region with zero re-architecture

Start with one region (Amsterdam for me — closest to Portugal). Need US coverage later? One fly scale command. No infrastructure redesign.

A Typical fly.toml

Here's the config I use across most of my apps:

app = "my-app"
primary_region = "ams"

[build]

[env]
  PHX_HOST = "my-app.fly.dev"
  PORT = "8080"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1

[[vm]]
  size = "shared-cpu-1x"
  memory = "256mb"

Key decisions:

256MB RAM — Phoenix is incredibly memory-efficient. A full app with LiveView, PubSub, and Oban (background jobs) runs comfortably in 256MB.
shared-cpu-1x — perfect for early-stage apps. Shared CPU is cheap and Phoenix doesn't need much compute for typical web workloads.
min_machines_running = 1 for production, 0 for staging.

Cost Breakdown

Here's the actual math:

Item	Count	Monthly Cost
shared-cpu-1x machines (256MB)	8	~€32
Fly Postgres (single node)	3 clusters	~€10
Bandwidth	—	Free tier
SSL certificates	—	Free (auto)
Total		~€42

Some apps share a Postgres cluster. I started with separate databases for everything, then consolidated. Three clusters handle all five apps comfortably.

For comparison

Heroku: 5 apps × $7 Eco dyno = $35, plus databases = $50+ — and Eco dynos sleep unpredictably
Railway: Similar compute pricing, less mature Elixir support
AWS ECS: Minimum viable setup easily $80+/month after load balancers, NAT gateways, and CloudWatch

What Makes Elixir Special Here

This setup works because the BEAM VM is unusually efficient for web applications.

Memory: A Phoenix app with LiveView, background jobs, and real-time PubSub runs in 256MB. A comparable Node.js/Express setup with Socket.io and Bull queues needs 512MB+ to breathe.

Concurrency: Each request gets a lightweight BEAM process (~2KB of memory). I can handle thousands of concurrent WebSocket connections on a single 256MB machine.

Resilience: OTP supervision trees restart crashed processes in milliseconds. No health check polling. No container-level restarts. If a GenServer handling webhook processing dies, its supervisor brings it back before anyone notices.

Lessons Learned

1. Share databases early

I wasted money running separate Postgres instances for every app. Most early-stage apps can share a database cluster with schema-level isolation. This cut my database costs by more than half.

2. Use auto-stop from day one

I ran machines 24/7 for a while before enabling auto-stop. For apps with fewer than 100 daily active users, there's no good reason to keep machines hot at night.

3. Set up internal networking

Fly gives you a private WireGuard network between all your apps for free. I use it for internal API calls between services — no public internet, no extra latency, no auth overhead for service-to-service communication.

4. Monitor memory, not CPU

For BEAM apps, memory is the constraining resource, not CPU. Set up alerts for when an app consistently uses >200MB of its 256MB allocation. That's your signal to bump to 512MB.

The Solo Founder Advantage

The real point isn't the €42. It's the flexibility.

When I want to test a new product idea, I spin up a new app on Fly.io. Added cost: about €8/month. If the idea doesn't work after a month, I fly apps destroy it and my bill drops right back down.

There's no minimum infrastructure investment keeping bad ideas alive. No long-term cloud commitments. No Kubernetes cluster that costs the same whether I run 1 app or 10.

Elixir + Fly.io lets infrastructure costs scale linearly — and down just as easily as up. For a solo founder bootstrapping multiple products with zero funding, that's everything.

I'm João, a solo developer building SaaS products from Portugal. I write about Elixir, infrastructure, and the solo founder journey. Follow for more.

Why I Chose Elixir Over Go and Rust for My Cloud Platform

João Pedro Silva Setas — Mon, 16 Feb 2026 14:24:50 +0000

I'm building OpenClaw Cloud, a managed platform where each user gets their own personal AI assistant running 24/7 in the cloud. When I started, I had a real decision to make about the core technology.

Go and Rust were serious contenders. Both are fast, well-supported, and have massive ecosystems. I ended up choosing Elixir — not because it's "better" in some absolute sense, but because it was the right fit for this specific problem. Here's the full reasoning, with honest tradeoffs.

The Problem: Managing Hundreds of Long-Lived Processes

OpenClaw Cloud manages one dedicated bot instance per user. Each instance is a long-running process that:

Maintains persistent WebSocket connections to chat platforms (Discord, Telegram, WhatsApp, Slack)
Holds conversation state and context in memory
Handles concurrent messages from multiple channels simultaneously
Needs to be started, stopped, restarted, and monitored independently
Must recover gracefully from crashes without affecting other users

This isn't a typical request/response web app. It's a process orchestration problem — hundreds of stateful, concurrent, long-lived workers that need supervision and lifecycle management.

That framing is what drove the decision.

Concurrency Models: Three Very Different Approaches

Go: Goroutines and Channels

Go's concurrency model is elegant. Goroutines are cheap (a few KB of stack), and channels provide a clean way to communicate between them.

// Spinning up a worker per user in Go
for _, user := range users {
    go func(u User) {
        bot := NewBotInstance(u)
        bot.Run() // blocks, handles reconnection internally
    }(user)
}

This is simple and works. Go would have been a perfectly fine choice for the raw concurrency part. The goroutine scheduler handles thousands of concurrent workers without breaking a sweat.

Where it gets complicated: Go doesn't have a built-in answer for what happens when a goroutine crashes. You need to build your own supervisor logic — retry loops, health checks, graceful restarts. It's doable, but it's DIY. Every team ends up writing a slightly different version of process supervision.

Rust: Async with Tokio

Rust with Tokio gives you async/await over a multi-threaded runtime. The performance is outstanding — near-zero overhead async I/O.

// Spawning tasks with Tokio
for user in users {
    tokio::spawn(async move {
        let bot = BotInstance::new(user);
        bot.run().await; // handles connections
    });
}

Rust's async model is powerful, and you get memory safety guarantees at compile time. But the ownership model adds real friction when you're managing shared state across many concurrent tasks. Arc<Mutex<T>> everywhere, lifetime annotations, and the borrow checker fighting you when you're passing context between tasks.

The honest truth: For a solo developer iterating fast on a product, Rust's compile-time overhead (both in build times and cognitive load) is significant. I love Rust for systems programming, but for a SaaS product where features change weekly, it slowed me down.

Elixir: Processes and OTP

Elixir runs on the BEAM virtual machine, which was designed from the ground up for this exact problem — massive concurrency with isolated, lightweight processes.

# Each bot is a GenServer — a managed, supervised process
defmodule Openclaw.InstanceWorker do
  use GenServer

  def start_link(%{user_id: user_id} = args) do
    GenServer.start_link(__MODULE__, args, name: via_tuple(user_id))
  end

  def init(args) do
    # Connect to chat platforms, set up state
    {:ok, %{user_id: args.user_id, connections: [], status: :starting}}
  end

  def handle_info(:health_check, state) do
    # Periodic self-check — reconnect if needed
    {:noreply, maybe_reconnect(state)}
  end
end

BEAM processes are extremely lightweight (~2KB each), fully isolated (no shared memory), and communicate via message passing. But the key differentiator isn't just the process model — it's everything built on top of it.

Supervision Trees: The Killer Feature

This is where Elixir pulled decisively ahead for my use case.

In OTP, every process lives inside a supervision tree. Supervisors are processes that watch child processes and apply a restart strategy when things go wrong.

defmodule Openclaw.InstanceSupervisor do
  use Horde.DynamicSupervisor

  def start_instance(user_id, config) do
    child_spec = {Openclaw.InstanceWorker, %{user_id: user_id, config: config}}
    Horde.DynamicSupervisor.start_child(__MODULE__, child_spec)
  end

  def stop_instance(user_id) do
    case Registry.lookup(Openclaw.InstanceRegistry, user_id) do
      [{pid, _}] -> Horde.DynamicSupervisor.terminate_child(__MODULE__, pid)
      [] -> {:error, :not_found}
    end
  end
end

If a bot instance crashes — maybe Discord's API returns an unexpected response, or a chat message triggers an unhandled edge case — the supervisor restarts it automatically. The other 200 bot instances running on the same node are completely unaffected because processes share no memory.

In Go, I'd have to build all of this manually: a registry of running goroutines, health check loops, restart logic, graceful shutdown coordination. It's probably 1,000+ lines of infrastructure code that Elixir gives me for free.

In Rust, the situation is similar. Tokio has JoinHandle for tracking spawned tasks, but building a full supervision tree with restart strategies, escalation policies, and distributed process registries is a major engineering effort.

The OTP supervision model isn't just convenient — it changes how you think about failure. Instead of defensive programming ("catch every possible error"), you write the happy path and let the supervisor handle the rest. Let it crash is a real philosophy, and it works remarkably well for managing many independent, failure-prone processes.

Hot Code Reloading: Zero-Downtime Deployments

BEAM supports hot code swapping — you can deploy new code to a running system without restarting processes or dropping connections.

For a platform where users have 24/7 always-on bot instances, this is huge. When I push an update to the platform code, I don't have to restart everyone's bot. The running processes can be updated in place, maintaining their state and connections.

# In production, Fly.io rolling deploys + BEAM hot code loading
# means existing connections stay alive during deployments

In practice, I use Fly.io's rolling deployments which handle most of this, but the BEAM's ability to maintain state across code changes is an additional safety net that neither Go nor Rust can match at the VM level.

Go requires a full process restart for any code change. You can do rolling restarts behind a load balancer, but every goroutine's state is lost.

Rust requires recompilation and restart. The compile step alone takes minutes for a non-trivial project.

Real-Time UI: Phoenix LiveView

This isn't strictly a language comparison, but the web framework was part of the decision. Phoenix LiveView lets me build real-time, interactive UIs without writing JavaScript.

The OpenClaw Cloud dashboard shows each user their bot's status, logs, and controls — all updating in real-time via WebSockets. When a bot instance starts, crashes, or reconnects, the UI reflects it instantly.

# LiveView receives real-time updates via PubSub
def handle_info({:instance_status, %{status: status}}, socket) do
  {:noreply, assign(socket, :instance_status, status)}
end

Building this in Go would mean a separate frontend (React, Vue, etc.) plus a WebSocket layer plus state synchronization logic. In Rust, same story — probably even more boilerplate with something like Axum + a JS frontend.

LiveView collapses the frontend and backend into one coherent model. For a solo developer, that's a 2-3x productivity multiplier.

Where Go and Rust Win (Honestly)

I'd be doing a disservice if I didn't acknowledge where Go and Rust genuinely outperform Elixir:

Go Wins

Raw throughput for CPU-bound work: Go compiles to native code. If I were building a platform that needed heavy computation (video processing, ML inference), Go would be faster out of the box.
Simplicity of deployment: Single static binary. No runtime dependency. go build && scp. It doesn't get simpler than that.
Ecosystem breadth: Go has libraries for everything. Cloud SDKs, Kubernetes tooling, CLI tools — the ecosystem is massive.
Hiring: If I were building a team, finding Go developers is much easier than finding Elixir developers.

Rust Wins

Performance ceiling: Rust is as fast as C/C++ with memory safety. For systems-level work, nothing else comes close.
Memory efficiency: Zero-cost abstractions and no garbage collector mean predictable, minimal memory usage. Critical for embedded systems or extremely resource-constrained environments.
Type system: Rust's type system catches entire categories of bugs at compile time. The Result and Option types make error handling explicit and exhaustive.
WebAssembly: Rust has the best WASM story. If I needed client-side compiled code, Rust would be my first choice.

Elixir's Weaknesses

Let me be upfront about the tradeoffs:

Raw CPU performance: The BEAM is not fast for computation. It's optimized for I/O-bound, concurrent workloads. If I had heavy number-crunching, I'd need to reach for NIFs (native functions) or offload to a separate service.
Smaller ecosystem: Hex (Elixir's package manager) has ~15,000 packages vs npm's 2M+ or Go's massive standard library. Sometimes you write something from scratch that would be a go get away in Go.
Smaller talent pool: Finding Elixir developers is harder. This matters less for a solo founder but would matter if I were scaling a team.
Learning curve: OTP concepts (GenServer, Supervisor, Application) are powerful but take time to internalize. The functional programming paradigm is a shift for developers coming from OOP.

Why Elixir Won for This Specific Problem

The decision came down to matching the technology to the problem domain:

Requirement	Best Fit
Hundreds of concurrent, long-lived processes	BEAM (Elixir)
Automatic crash recovery per process	OTP Supervisors (Elixir)
Real-time UI without separate frontend	Phoenix LiveView (Elixir)
Zero-downtime deployments	BEAM hot code reloading (Elixir)
Distributed process registry	Horde (Elixir)
Solo developer productivity	LiveView + OTP = less code (Elixir)
Raw computation speed	Go or Rust
Maximum ecosystem breadth	Go
Memory-constrained environments	Rust

For a managed cloud platform orchestrating hundreds of stateful, long-running AI bot instances with real-time monitoring — Elixir wasn't just a good fit, it was almost purpose-built for the job.

The BEAM was originally created by Ericsson in the 1980s to manage millions of concurrent telephone calls with 99.9999999% uptime. Managing a few hundred AI bots is a much simpler version of the same problem.

The Stack in Practice

For the curious, here's what the full OpenClaw Cloud stack looks like today:

Elixir 1.17 / OTP 27 — core language and runtime
Phoenix 1.8 with LiveView 1.1 — web framework and real-time UI
Horde — distributed supervisor and registry for bot instances
PostgreSQL via Ecto — data persistence
Fly.io — hosting (both platform and user instances)
Stripe — subscription billing
Tailwind CSS + DaisyUI — styling
Sentry — error monitoring
Bandit — HTTP server

Total monthly infrastructure cost for running the platform: under $50/month on Fly.io.

Final Thoughts

Technology choices are always contextual. If I were building a CLI tool, I'd pick Go. If I were building a game engine, I'd pick Rust. But for a managed platform that supervises hundreds of concurrent, stateful, long-lived processes with real-time monitoring — Elixir and the BEAM are in a class of their own.

The "let it crash" philosophy, supervision trees, lightweight processes, and LiveView for real-time UI made me more productive as a solo developer than I would have been in either Go or Rust. And when your competitive advantage is shipping fast with zero budget, productivity is everything.

If you're evaluating languages for a similar problem — high concurrency, stateful processes, real-time features — give Elixir a serious look. The ecosystem is smaller but the core primitives are extraordinary.

I'm João, a solo developer from Portugal building OpenClaw Cloud and other SaaS products with Elixir. Follow me here or on X @joaosetas for more build-in-public content.