DEV Community: Agustin Montoya

Building Voice Agents That Actually Speak Spanish

Agustin Montoya — Fri, 27 Feb 2026 23:04:35 +0000

I have been researching how to build voice agents, especially in Spanish. Here is what I got working so far: it saves client contacts, books and organizes appointments, and can transfer calls to other agents or real people.

No phone number yet - enabling it when someone is actually interested. If you are curious you can try the web widget at voice.triqual.dev

Anyone else building something similar?

I Built a Desktop App That Coaches Me During Technical Interviews

Agustin Montoya — Fri, 27 Feb 2026 23:03:46 +0000

I built a thing: a desktop app that sits alongside Zoom or Meet while I conduct technical interviews. It listens live and tells me what to dig into next.

Not a recording analyzer. Real-time transcription of both sides, flags knowledge gaps, suggests follow-up questions as the conversation happens. 3 interviews done so far.

Try it: interview-companion.triqual.dev

Anyone else building tooling around their own interview process?

Self-Healing Tests: What Happens When Your Test Suite Fixes Itself

Agustin Montoya — Thu, 26 Feb 2026 17:25:13 +0000

TL;DR

Most test failures aren't bugs — they're maintenance debt from UI drift
Self-healing tests detect DOM changes and update selectors automatically
Triqual QA healed 12/14 broken tests after a login redesign; the 2 that stayed broken had real bugs
False confidence is the real danger — a "healed" test once hid a billing bug for 3 weeks

14 tests broke overnight. Zero were bugs.

A designer pushed a new login page after hours — new field labels, restructured DOM, shinier buttons. Nothing functional changed. But 14 tests were screaming.

This is the QA engineer's Groundhog Day. UI changes, tests break, you spend your morning updating selectors instead of finding actual problems. The tests aren't protecting quality. They're creating busywork.

The Math Nobody Talks About

You spend 3 hours writing a test. It runs 200 times over 6 months. It catches 1 bug. Total ROI: that one bug divided by all the hours you spent writing + maintaining it.

Most teams are underwater on this equation. I've seen teams spend 40% of their QA time fixing broken tests, not testing new features. Forty percent.

The worst part? You're not fixing the product. You're fixing the test. The login form works fine. Your data-testid="username-input" just moved from a div to a label wrapper. You're playing whack-a-mole with CSS selectors while real bugs sneak through.

I got tired of being a selector maintenance bot.

How Self-Healing Actually Works

Triqual QA doesn't use magic. It uses pattern matching, AST diffing, and learned selectors.

Here's what happens when a test fails:

Captures the DOM state at failure
Compares against the last known good state
Looks for structural similarities — did this button move inside a form? Did that input change from id="email" to name="email"?
Generates candidate selector repairs
Validates them against the current page
Re-runs the test with the fix

If the test passes, it logs the repair and updates the test file. If it fails again, it flags it for human review — because a persistent failure might actually be a bug.

That login page redesign? Triqual QA detected the DOM changes, updated 12 of the 14 broken selectors, and re-ran the suite in under 90 seconds. The 2 tests that still failed? One was a broken redirect after login. The other was a validation message that got removed by accident. Real bugs. Caught because the self-healing couldn't find a valid repair path.

Tests that adapt to intentional changes and scream about unintentional ones.

The Dangerous Part: False Confidence

I almost shipped broken code because of this.

A checkout flow test kept "healing" itself. Every deploy, it would adjust to some minor change and pass. Green checkmark. Ship it.

The problem? The test was healing by weakening the assertion. The "total price" element had moved during a refactor. Triqual QA couldn't find it reliably, so it generated a repair that skipped the price verification. Just clicked "Checkout" and called it success.

The test passed for three weeks. We were charging customers wrong amounts and had no idea.

This is the false confidence problem. When your tests fix themselves, you stop looking at them. You trust the green checkmark. You forget that healing is a heuristic, not a guarantee.

We fixed this by adding confidence scoring. Every self-healed test gets a score: how similar was the repair to the original intent? Low confidence? Human review required. We also added assertion preservation — if it can't verify a specific value, it fails the test rather than guesses.

A Catalog of Self-Healing Sins

The ghost assertion: A healed test that passed by removing validation logic. Cost us 3 weeks of incorrect billing.
The false positive cascade: One DOM change triggered 8 self-heals. 6 were correct. 2 healed onto the wrong elements (a "Cancel" button instead of "Submit"). The tests passed. The feature was broken.
The silent drift: A test healed so many times over 4 months that it was testing a completely different flow than originally written. The code it was "protecting" had been deleted. Nobody noticed because the test kept passing.

Self-healing is a power tool. It will cut your hand off if you stop paying attention.

The Hot Take

Most test failures aren't bugs. They're maintenance debt.

Your test suite is a liability that grows with every UI change. The more tests you write, the more debt you accumulate.

Self-healing doesn't eliminate this debt. It automates the payments. You still pay — just with compute instead of engineer hours.

I'll take that trade. I'd rather spend compute fixing selectors than have a human spend 3 hours doing it. That human can find actual bugs instead.

Triqual QA is built for developers who are tired of being selector maintenance bots. It generates tests, heals them when the UI changes, and flags the failures that actually matter.

How much time did you spend last week fixing tests instead of fixing code?

Your AI Agents Don’t Share Memory. That’s the Problem.

Agustin Montoya — Thu, 26 Feb 2026 17:25:08 +0000

TL;DR

Most multi-agent setups treat each agent like a hermit — isolated, forgetful, unaware
Agent A finds a bug pattern. Two weeks later, Agent B hits the same wall. Again.
Quoth gives agents shared knowledge + private memory — they learn collectively without stepping on each other
I learned this the hard way after watching my own agents contradict each other for 3 months

It was a Tuesday afternoon. I watched my orchestrator agent decide to retry a flaky API call with exponential backoff.

The problem? My QA agent had already tried that exact approach two days earlier. It failed. The QA agent documented it in its own memory file. The orchestrator couldn't see it. Didn't even know the other agent existed.

So there I was, paying for the same lesson twice. The orchestrator burned through compute and 4 minutes just to reach a conclusion that was already written down — in a file it had no access to.

This isn't a rare edge case. This is the default state of multi-agent systems.

The Problem: Islands of Amnesia

We build these elaborate agent setups — one for coding, one for research, one for QA, one for deployment. They look impressive in diagrams. Arrows everywhere. "Intelligent orchestration."

But look under the hood and you'll find the same pattern: each agent has its own memory, its own context window, its own pile of notes it wrote to itself. They're not a team. They're roommates who never talk.

The costs stack up fast:

Duplicate work. My ad pipeline agent spent 6 hours refining an image prompt strategy last month. This week, my content agent needed a similar approach for a different campaign. Started from zero. Re-learned what "negative prompting" means for our brand. Burned another 4 hours.

Conflicting decisions. The orchestrator picked one hosting provider for a deployment. The infrastructure agent — which had actually read the project requirements — had already ruled it out due to a dependency conflict. They didn't know about each other. The build failed. I found out 20 minutes later.

Knowledge silos. One agent learns something useful. That knowledge dies in its local memory when the session ends. The next agent starts fresh, makes the same mistakes, learns the same lessons. It's like hiring employees who never hand off their work.

I tracked this for a month. My agents "learned" the same 14 patterns an average of 2.3 times each. That's 32 redundant learning cycles.

The Solution: Shared Brains, Private Notebooks

Here's what actually works: split memory into two layers.

Shared knowledge is for things every agent should know. Bug patterns that keep showing up. Architectural decisions and why they were made. Which APIs are flaky. What tone works for your brand. This lives in one place, searchable by any agent that needs it.

Private memory is for agent-specific context. The QA agent's current test file. The ad pipeline's draft concepts. Temporary state that doesn't need to pollute the shared pool.

Quoth is built around this split. When my QA agent finds a bug pattern, it publishes to shared knowledge. Two weeks later, when my ad pipeline agent hits something similar, it searches first. Finds the pattern. Adjusts. Doesn't waste time rediscovering what we already know.

The orchestrator checks shared knowledge before making decisions. It found that "exponential backoff doesn't work for this API" note. Chose a different approach. Saved time and my sanity.

This isn't about building a hive mind. It's about not paying for the same insight twice.

What Went Wrong

Version 1: Agents auto-publishing everything. I thought "more data is better." Wrong. My agents started dumping raw session logs into shared knowledge. 90% noise. Search results became useless. I spent more time filtering garbage than using the system.

Fix: Agents now propose updates with reasoning and evidence. The shared layer curates. Not everything deserves to live forever.

Version 2: Flat search. Early searches returned anything that matched keywords. "Bug in auth" returned results about authentication bugs, authorization edge cases, and a random note about insect photography (thanks, ambiguous embeddings).

Fix: Semantic search with context windows. Results are ranked by relevance to the current task, not just keyword overlap.

Version 3: The overwrite incident. Two agents edited the same documentation simultaneously. One agent's findings clobbered the other's. I lost a day of QA insights because two processes raced.

Fix: Versioned updates with conflict detection. Agents propose changes; the system merges or flags conflicts. No silent overwrites.

Each failure taught me something: shared knowledge needs curation, not just aggregation.

What's Next

I'm running 8 agents now. They share what they learn automatically. When the content agent writes a headline that converts well, the ad pipeline agent knows about it. When the QA agent identifies a brittle test pattern, the coding agent avoids it.

The duplicate work is down ~70%. Session times dropped. Token usage flattened. Most importantly, my agents stopped contradicting each other.

Quoth is the multi-agent knowledge platform I built to solve this. Shared memory without the mess. If you're running multiple agents in silos, you're leaving money and time on the table.

How do your agents share what they learn?

$0 in API Keys: Running AI Agents on a Subscription Instead

Agustin Montoya — Thu, 26 Feb 2026 17:24:42 +0000

TL;DR:

I run 8 AI agents on exactly $0 in API keys
Total infrastructure cost: ~$20/month
Fixed cost beats unpredictable per-token billing every time
API keys are the cloud computing bill of AI. Subscriptions are the flat-rate plan.

I stared at my dashboard, waiting for the $500 bill.

It was my first month running autonomous agents. I had heard the horror stories: developers waking up to four-figure API bills because a loop went infinite, a cron job misfired, or someone fat-fingered a deployment. I had rate limit alerts set up. I had budgets configured. I had anxiety.

The bill never came.

Instead, I paid my regular $20 for infrastructure and exactly $0 for AI inference. No metered billing. No per-token calculations. No rate limit anxiety at 2 AM.

Here's how — and why I think the API key approach is a trap.

The Problem: API Key Economics Are Broken

Everyone defaults to API keys. It's the path of least resistance. You grab a key, plug it into your agent framework, and you're off.

Then reality hits.

You're paying per million tokens. Sounds cheap until you're running 8 agents doing parallel web research, code reviews, test generation, and document analysis. Suddenly you're burning through millions of tokens daily. The math stops working.

The hidden costs nobody talks about:

Rate limit gymnastics: You hit TPM limits, so you add exponential backoff. Then your agents slow down. Then you add caching layers. Then you're managing infrastructure to manage your infrastructure.
Failover complexity: You set up multiple providers. Now you're tracking spend across three dashboards, handling different error formats, and praying your fallback logic actually works when Provider A chokes at 3 PM on a Tuesday.
Bill shock: Variable costs are a nightmare for side projects. My first projected month with API keys was $340. For a hobby project. Hard pass.

API keys are the cloud computing bill of AI. Subscriptions are the flat-rate plan.

The Alternative: Subscription-Based Inference

I route everything through OpenClaw, an open-source agent runtime that handles inference through a subscription model.

Same models. Same quality. Predictable cost.

Component	Monthly Cost
Small VPS	~$15
Block storage	~$2
Database (free tier)	$0
AI inference	$0
Total	~$17-20

Eight agents. Multiple concurrent tasks. Zero per-token billing.

The agents don't know the difference. They make their calls, get their responses, and do their work. OpenClaw handles the routing. I handle the coffee.

What Went Wrong

The Rate Limit Month

Early on, I tried mixing approaches. Some agents on API keys, some on subscription. One month, the API-key agents went rogue. A web research agent got stuck in a citation loop, hammering the API for six hours before I noticed.

Burned through my monthly quota in a day. Had to emergency-migrate everything to the subscription route. That was the last time I trusted variable billing with unattended automation.

The Failover That Didn't

I spent a weekend setting up "smart" failover logic. If Provider A failed, fall back to Provider B. It worked great in testing.

In production, Provider A failed during a batch job. The failover triggered. Provider B was also degraded (turns out they share infrastructure). The cascade took down three hours of queued work. "Resilient" architecture became a single point of failure because I overcomplicated it.

The API Key Management Tax

At one point I had keys for: primary inference, backup inference, embeddings, image generation, and a "just in case" provider I never used. Rotating them was a quarterly nightmare. Tracking which agent used which key required a spreadsheet.

Now I have one system. One credential file. One mental model.

Why This Matters

If you're building with AI, cost predictability isn't a luxury — it's survival.

Side projects die when the infrastructure bill exceeds the fun budget. Startups die when a viral moment turns into a five-figure surprise.

The subscription model isn't perfect. You need to run your own gateway. You need to manage your own infrastructure. But that small VPS? It does one thing reliably, month after month, without billing surprises.

Triqual — our ecosystem of AI agents — runs entirely on this stack. QA, voice AI, research, content, ad pipeline. All of them. $20 total.

I'm not saying API keys are evil. If you need sub-100ms latency or specific model versions, they're the right tool. But for 90% of agent workloads? You're paying a complexity tax you don't need to pay.

What's your monthly AI bill for running agents? Drop a comment — I'm genuinely curious how others are solving this.

Why We Built Bilingual Voice Agents Instead of English-Only

Agustin Montoya — Thu, 26 Feb 2026 17:24:36 +0000

TL;DR

Most voice AI is built for English speakers and treats Spanish as an afterthought
500+ million Spanish speakers globally, yet the market gets ignored by US startups
Building bilingual from day 1 unlocked a cost arbitrage opportunity from Argentina
Voice AI by Triqual handles both languages natively — not as a translation layer

I was on a demo call last month. A small logistics company in Mexico City explained they tried three different voice AI platforms. All of them worked great in English. In Spanish? One pronounced "José" as "Joe-say" (rhyming with "maybe"). Another couldn't handle the customer who switched mid-sentence from Spanish to English. The third just gave up entirely.

They were ready to abandon voice AI completely.

That's when it clicked. Everyone's building voice agents for San Francisco engineers. Meanwhile, there's a massive, underserved market right next door.

Why Bilingual Isn't a Nice-to-Have

Spanish is the second most spoken native language on the planet. 500+ million people. The US alone has 42 million native Spanish speakers — more than Spain. Latin America's digital economy is growing 15% year over year. Mexico has the highest e-commerce growth rate in the Americas.

Yet walk into any YC demo day and count the voice AI startups building for Spanish-first markets. I'll wait.

Here's the uncomfortable truth: most voice AI treats non-English as a translation layer. They run speech-to-text in Spanish, translate to English, process the intent, translate back, then text-to-speech. It's a game of telephone with your customer's experience.

The result? 2-3 second latency. Weird phrasing. Names butchered. Context lost.

When your competitor is a human who speaks the language natively, that's not a fair fight.

The Argentina Angle

I'm based in Argentina. I think in Spanish, code-switch constantly, and build for people who do the same.

This gave me something most US startups don't have: a built-in stress test for bilingual voice AI. I couldn't ignore Spanish even if I wanted to. My beta testers, my friends, my family — they all speak Spanish. English-only wasn't an option unless I wanted to build in a vacuum.

It also created a cost arbitrage. Building from Argentina for a market (Latin America) that US competitors ignore means lower operational costs and less competition. I'm not competing with OpenAI's voice product. I'm competing with the local call center that charges $8/hour and speaks perfect Spanish. That's a winnable fight.

How It Actually Works

I won't bore you with the stack. The point is architectural.

Voice AI by Triqual processes Spanish and English as first-class citizens. No translation layer. The agent detects language on the fly and handles code-switching naturally — because that's how people actually talk.

When a customer says "Necesito hablar con el manager about my refund," the agent doesn't glitch out. It understands. Context carries across languages.

Names are tricky. We spent weeks on pronunciation rules for Spanish names, regional accents, the whole spectrum. "Yolanda" shouldn't sound like a gringo trying to order at a taqueria.

The latency? Under 800ms end-to-end. Because there's no translation layer adding friction.

What Went Wrong

Week 3: First Spanish voice test. I asked it to say "Buenos días, soy el asistente de Triqual." It said "Buenos dias, soy el ass-is-tent of Tree-kwal." I wanted to cry.

Week 7: Accent detection was garbage. The agent worked fine with Mexican Spanish. Argentine Spanish? It heard "ll" sounds and just... panicked. Had to rebuild phoneme mapping from scratch.

Week 12: Real customer call. Customer's name was "Ñoño." The agent pronounced it "N-yo-n-yo" instead of "Nyoh-nyoh." The customer hung up. Lost a potential deal over one letter.

Month 4: Tried a "unified voice" that spoke both languages. Turns out Spanish speakers can hear the subtle English phonetic influence and it creeps them out. Had to split to language-specific voice models. Doubled inference costs overnight.

Each failure taught me something. Bilingual isn't just about translation. It's about cultural fluency. The rhythm of speech. The formality levels. When to use "tú" vs "usted."

What's Next

Portuguese is next. Brazil is 215 million people. Same problem, same opportunity.

Also experimenting with regional accent models. Mexican Spanish vs Colombian Spanish vs Argentine Spanish. They're different. Treating "Spanish" as one language is like treating a New Yorker and a Texan as identical speakers.

If you're curious what bilingual voice AI actually sounds like, check out Voice AI by Triqual. Built for businesses that serve Spanish and English customers without making either group feel like an afterthought.

What languages are your AI agents speaking? And more importantly — are they actually speaking them, or just translating?

triqual.dev is Live — Building in Public, Week 1

Agustin Montoya — Thu, 26 Feb 2026 11:04:39 +0000

TL;DR:

triqual.dev shipped — central hub for all 6 products
Built with Next.js 16 + Tailwind v4 in one session using 4 sub-agents
Triqual is now the platform, not just the QA plugin
Voice AI and Interview Companion are the hero products

Last week I posted about the 8-agent fleet running on $20/month. That was the origin story. This week, the actual platform shipped.

triqual.dev is live.

How It Got Built

Next.js 16, Tailwind v4, Framer Motion. I split the build into 4 phases, each handled by a sub-agent:

Scaffold — project setup, design tokens, warm dark color system (#110F0B base, gold accent #C8A96E)
Content — hero, product cards, terminal mockup showing the ecosystem
Interactions — horizontal scroll lane for products, elastic hover micro-interactions
Polish — mobile hamburger, SEO metadata, favicon

Total compute: ~6 hours across sub-agents. I reviewed between each phase and course-corrected twice — once when the hero was too QA-focused, once when the product grid looked like a template.

The Design Research That Changed Everything

I ran a deep research session on 2026 landing page trends (Moonshot AI web search, 89/100 confidence score). The findings that shaped the design:

Warm dark > cold black. Full #000 is associated with 2023 crypto. Shifted to #110F0B with 5-8% red warmth.
Serif + mono pair. Playfair Display for headlines, Geist Mono for code. Says "craft" without saying it.
Conversational copy. Changed "Build AI. Test AI. Ship AI." → "We build AI agents that actually work."
Specific metrics > vague claims. The terminal mockup shows real stats: "Voice AI — 3 agents active, 12 calls today."

The Product Lineup

Six products, ordered by where the money is:

Voice AI (voice.triqual.dev) — bilingual AI voice agents. Hero product #1.
Interview Companion (interview-companion.triqual.dev) — real-time interview analysis. Hero product #2.
Studio (studio.triqual.dev) — AI ad pipeline for small businesses without design.
Quoth (quoth.triqual.dev) — multi-agent knowledge platform. The brain.
Exolar (exolar.triqual.dev) — QA analytics with AI failure clustering.
Triqual QA (plugin.triqual.dev) — autonomous test gen for Claude Code. Where it started.

The blog lives at labs.triqual.dev (you're reading it).

The Brand Shift

For two months, Triqual was "the QA plugin." I built it to stop writing repetitive test boilerplate. It worked.

But the plugin market has a low ceiling. You're at the mercy of the host platform. Meanwhile, Voice AI and Interview Companion solve business problems with clear ROI.

So: Triqual is the platform now. The QA plugin is the gateway — gets devs in the door. Voice AI and Interview Companion are the revenue engines.

Not a pivot. A clarification. The pieces were always there.

What Went Wrong

Mobile nav was broken for 4 hours. Hamburger opened but wouldn't close. Chrome DevTools didn't show it — caught it on my actual phone.
Hashnode SSL took forever. DNS verified globally but Hashnode stuck on "pending verification" for 7+ hours. Still fighting it.
Spent an hour debugging a FOUC that was just a cached stylesheet. Hard refresh fixed it. I felt stupid.

What's Next

Product pages with pricing
Case studies with real numbers
Documentation that isn't "just read the code"
Open self-serve access

No timelines. Ships when it ships.

→ triqual.dev

Building in public from Argentina. 8 agents, 3 nodes, $20/month. Follow along on X.

Multi-Agent AI: 5 Coordination Patterns I Learned the Hard Way

Agustin Montoya — Thu, 26 Feb 2026 10:48:23 +0000

TL;DR:

Direct agent-to-agent calls create distributed monoliths. Use a message bus.
Big agent tasks hallucinate. Small sequential spawns with review between each are faster.
Your 2GB server can't run Chromium. Match workloads to hardware or watch things OOM.
Shared knowledge + private memory. Not everything belongs in the same bucket.
Agents go down. Build for it.

I run 8 AI agents across 3 machines. A $15/month EC2, a Mac Mini, and a WSL2 workstation with a GPU. They handle QA, voice AI, ad creative, knowledge management, and interview analysis.

After two months of things breaking in creative ways, here are the coordination patterns that survived contact with reality.

1. Message Bus Over Direct Calls

My first architecture: Agent A calls Agent B's endpoint. Agent B needs context from Agent C. Agent C is offline.

Cascading failure. Everything dies.

The fix was embarrassingly simple — a shared message bus. We built Quoth, a multi-agent knowledge platform. Agents publish messages to a shared channel. Other agents subscribe to what they care about. Messages persist until acknowledged.

agent:main → bus: "New interview starting, candidate-123" (priority: high)
agent:interviews → bus: (subscribes, picks it up when ready)
agent:main → bus: "Run QA on PR #234" (priority: normal)  
agent:attqa → bus: (offline, picks it up 3 hours later)

Why it works: Agents don't need to know each other's APIs, endpoints, or even if they're online. The bus decouples everything. An agent can be down for hours and catch up when it comes back.

The gotcha: You need priority levels. Without them, a low-priority "update docs" message blocks a high-priority "production is broken" alert.

2. Spawn Small, Review, Repeat

I built a swarm skill that parallelizes work across sub-agents. First attempt: "Refactor all 6 modules and update the test suite."

The result was a mess. Agents hallucinated imports that didn't exist, created circular dependencies, and duplicated work. More than 5-6 parallel agents doesn't improve output — it degrades it.

The pattern that works:

spawn("Refactor module A: extract shared utils")
→ review output
→ spawn("Refactor module B: use new shared utils from A")  
→ review output
→ spawn("Update tests for A and B")
→ review output

Sequential. Small. Reviewed between each step.

It feels slower. It's not. Each spawn gets complete, updated context. No mid-flight corrections. No "wait, also change this" messages that may or may not arrive before the agent finishes.

The gotcha: If requirements change while an agent is running, let it finish. Review. Spawn a new one with the updated requirements. Trying to steer a running agent is unreliable.

3. Match Hardware to Workload

This one cost me an afternoon and a crashed EC2 instance.

Chromium takes ~500MB of RAM. My EC2 has 1.9GB total. I ran a Playwright script to take screenshots. Two browser instances later, the OOM killer nuked everything — including the main gateway process. All 3 agents on that node went dark.

Now each node has an explicit role:

EC2 (2GB RAM): Orchestration, text processing, API calls. Never a browser.
Mac Mini: Browser automation, development workflows, QA testing.
WSL2 + RTX 3080: GPU inference, image generation, heavy Playwright jobs.

# Every script that uses Playwright starts with this
import os
mem_gb = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') / (1024**3)
if mem_gb < 4:
    print("Not enough RAM for browser automation. Aborting.")
    sys.exit(1)

Crude but effective. I haven't crashed a node since.

The gotcha: It's not just RAM. CPU matters for video processing, disk I/O matters for large model files, and network latency matters for real-time voice. Profile your workloads, don't just count gigabytes.

4. Shared Knowledge, Private Memory

Every agent needs memory. The question is: what gets shared?

Wrong approach: everything in one database. My QA agent's test failure patterns mixed with the ad pipeline's lead scoring data. Searches returned irrelevant noise.

The split that works:

Shared knowledge (Quoth): Architecture decisions, API contracts, deployment procedures. Things any agent might need.
Private memory (local files): Session notes, work-in-progress, agent-specific context. Things only that agent cares about.

Each agent has a MEMORY.md (curated long-term) and daily memory/YYYY-MM-DD.md files (raw logs). The shared knowledge bus handles cross-agent documentation.

The gotcha: Agents will write shared docs from their own perspective. "The deployment process" means something different to the QA agent (run tests → deploy) versus the ad pipeline agent (generate assets → upload → deploy). Shared knowledge needs a review step — don't let agents auto-publish to shared indexes without validation.

5. Design for Agent Downtime

Agents crash. Nodes lose network. Gateways restart. SSH connections drop.

In any given week, at least one of my 8 agents is offline for some period. The Mac goes to sleep. The WSL2 instance loses its network bridge. The EC2 gets rate-limited.

The system can't depend on 100% uptime from any agent. Three rules:

Messages persist: If an agent is offline, messages queue. When it comes back, it catches up.
No blocking dependencies: Agent A can request work from Agent B, but A keeps working. If B never responds, A doesn't hang.
Health checks with alerts: A simple heartbeat (every 30 min). If an agent misses 3 heartbeats, alert. Don't wait for a user to notice.

# Heartbeat check (runs on the orchestrator)
for agent in fleet:
    last_seen = get_last_heartbeat(agent)
    if now() - last_seen > 90 minutes:
        alert(f"{agent.name} hasn't checked in for {minutes} minutes")

The gotcha: "Offline" isn't binary. An agent can respond to heartbeats but be stuck in an error loop, burning tokens on repeated 429 retries. Check for useful activity, not just any activity.

The Honest Part

These patterns weren't invented. They were extracted from failures. The message bus exists because direct calls failed. Small spawns exist because big ones hallucinated. The hardware matching exists because I crashed production.

Two months in, the fleet handles work across QA, voice AI, ad creative, knowledge curation, and interview analysis. The total infrastructure cost is about $20/month. The actual AI inference costs $0 in API keys (Claude Max subscription through OpenClaw).

It's not elegant. But it works.

Building from Argentina. The code, the agents, and the ecosystem are at triqual.dev.