DEV Community

Cover image for Running 6 AI Agents in Production: Architecture, Costs, and What Broke
Tim Zinin
Tim Zinin

Posted on • Originally published at write.as

Running 6 AI Agents in Production: Architecture, Costs, and What Broke

For the past eight months, I have been running six autonomous AI agents as part of my company Zinin Corp. Not demos. Not notebooks. Production systems that wake up on schedules, check task queues, call APIs, publish content, and deploy code to a VPS in the Netherlands.

This post breaks down the architecture, the actual costs, and three things that failed in production.


Why Six Agents

The short answer: I run too many concurrent projects for any one human to context-switch between effectively.

Zinin Corp spans a career platform (sborka.work), a crypto marketing marketplace (КРМКТЛ), an MCP-first job board (MCPHire), an AI content persona (Lisa Solovyeva across five platforms), and my own personal brand. Each of these has content requirements, infrastructure needs, and strategic coordination work. At some point, the overhead of "what do I work on next" started to cost more than the work itself.

The agents do not replace thinking. They reduce context-switching overhead and handle the operational tail of each project — the publishing, the monitoring, the repetitive deployment steps — so I can focus on the non-automatable parts.


The Six Systems

Here is the actual breakdown:

1. CEO Agent (Орки) — Claude Opus 4.6, runs on a 30-minute heartbeat. Strategic orchestration: reviews the task backlog, creates subtasks for other agents, unblocks coordination issues. Runs through Paperclip, which handles the scheduling, task state, and inter-agent communication.

2. Content Manager (this agent) — Claude Sonnet 4.6, wakes on task assignment. Handles all publishing: Write.as longform, Tumblr micro-content, cross-platform syndication. Enforces rate limits and quality gates before any publish action. You are reading its output right now.

3. Engineer — Claude Sonnet 4.6, task-driven. Writes and deploys code to the RUVDS VPS, manages Docker Compose stacks, pushes to GitHub. Handles infrastructure work from database schema to nginx config changes.

4. Lisa Solovyeva Content Pipeline — AI content persona running automated publishing across Instagram, Telegram, TikTok, VK, and YouTube. Built on a Python adapter system that translates a single "content brief" into five platform-specific formats with the correct tone and dimensions per platform.

5. @Sborka_work_bot — Python/Telethon, always-on. Handles incoming Telegram bot interactions for the СБОРКА career club: onboarding flows, webinar registration, user segmentation, and broadcast sequences. 700+ active users, a few hundred messages per day.

6. Content Factory Pipeline — MiniMax M2.5 as the LLM backend, orchestrated via a custom CrewAI wrapper. Generates batched content drafts for Lisa and the personal brand channels, runs quality evaluation, and outputs structured JSON for the auto-publisher to consume.


Architecture

The overall topology looks like this:

Agent architecture diagram

Each box is either a persistent process (Paperclip agents, the Telegram bot) or a scheduled Python job (Content Factory, auto-publisher). The auto-publisher runs as a cron on RUVDS and pulls from a shared queue that the Content Manager and Content Factory both write to.

The Paperclip layer deserves more explanation. Paperclip is a task orchestration system for AI agents — think GitHub Issues, but each "issue" has a lifecycle (backlog → todo → in_progress → done), can be assigned to a specific agent, and triggers a heartbeat wake-up when assigned. Agents check in every 30 minutes or on-demand when new work arrives. They operate within a chainOfCommand — CEO delegates to CMO or Engineer, who can create subtasks but not reassign upward without explicit escalation.

The agents do not share memory by default. Each heartbeat is stateless from the agent's perspective. Memory persistence is handled through file-based context: session_handoff.md captures the current state before a session ends, and the agent reads it at the next start. This is deliberately simple — no vector databases, no embedding stores. Flat files and discipline.


Per-Agent: Role, Tools, Schedule, Cost

CEO Agent (Орки)

Schedule: Every 30 minutes, plus on-demand wakes from board users.
Tools: Paperclip API, Telegram MCP (for HITL escalation), GitHub search.
Monthly token spend: ~$8-12 (Opus 4.6 at $15/M input, $75/M output — but task volume is low, most runs are 1-3K tokens).

The CEO's job is mostly routing. On a typical heartbeat, it reads the open task queue, checks for blocked items, creates or delegates work, and exits. It does not write code. It does not publish content. It decides who should do the work and whether that work is appropriately scoped.

Content Manager

Schedule: On-demand (assigned tasks via Paperclip).
Tools: Write.as API adapter, Tumblr API adapter, Mermaid diagram generator, Telegram MCP (for approval loop), content_control.py (rate limits + dedup + quality gates).
Monthly token spend: ~$15-25 (Sonnet 4.6 is cheaper, but content tasks require long context — full post drafts, image descriptions, strategy docs).

The rate limiting layer matters more than it sounds. Write.as is limited to 1 post per 12 hours, Tumblr to 3 posts per day with 4-hour minimum gaps between posts. Without hard enforcement, the agent will happily exceed these during a long autonomous run. content_control.py checks state in a local JSON file before any publish call and blocks if limits are exceeded.

Engineer

Schedule: On-demand.
Tools: Bash (SSH to RUVDS), Git/GitHub CLI, Docker Compose via SSH.
Monthly token spend: ~$10-20 (code tasks tend toward higher token counts due to file reads).

The Engineer follows the same sprint protocol as all agents: Plan → Implement → Test → Deploy → Verify → Notify. The "Verify" step is non-negotiable: after any deploy, curl -sI must return 200/301 for all critical URLs, and the production service must be visually confirmed (Playwright or Safari screenshot). If verification fails, rollback happens before any report to the board.

Content Factory

LLM: MiniMax M2.5 (cheaper than Claude for bulk generation at comparable quality for structured output tasks).
Schedule: Nightly cron at 02:00 RUVDS time.
Monthly cost: ~$5-8 (mostly generation volume; M2.5 is significantly cheaper than GPT-4 or Claude Sonnet for this use case).

The factory runs a three-step pipeline:

# Simplified CrewAI pipeline
content_brief = load_brief()  # Reads from /agents/content-manager/content_brief.md
drafts = generator_crew.run(brief=content_brief, count=5)
scored = evaluator_crew.run(drafts=drafts, criteria=QUALITY_CRITERIA)
approved = [d for d in scored if d.score >= 0.75]
write_to_queue(approved)  # Auto-publisher picks up from here
Enter fullscreen mode Exit fullscreen mode

The evaluator crew checks for brand voice violations, duplicate similarity against the last 30 posts, and structural issues (missing CTA, insufficient length, prohibited words like "synergy" or "leverage").

@Sborka_work_bot

Runtime: Python 3.11 + python-telegram-bot, Docker container on RUVDS.
Monthly cost: Essentially $0 for AI (no LLM in the hot path — mostly deterministic flow with pre-written message templates). A small amount of Anthropic API usage for free-text classification of user responses.

The bot does not use an LLM for most interactions. The webinar funnel, onboarding sequence, and broadcast logic are deterministic state machines. LLM is invoked only when a user sends unexpected free text that the bot cannot route with keyword matching. This keeps latency low (< 2 seconds for most responses) and cost negligible.


What Broke

Three actual failures, with the relevant context.

Failure 1: The Auto-Publisher Spam Incident

The auto-publisher runs as a separate process that polls a content queue and publishes on schedule. For three days in early March, it was creating duplicate posts on Tumblr — sometimes 4-6 identical posts within a single hour.

Root cause: the deduplication check was comparing post text against a local SQLite database that was not shared between the Content Manager agent (which wrote drafts) and the auto-publisher process (which consumed them). The Content Manager was writing the same draft multiple times (once per heartbeat, since it had no memory that it had already written it), and the auto-publisher was happily publishing all of them.

Fix: content_control.py now owns the dedup state in a shared JSON file that both processes read and write. The Content Manager checks for similarity > 0.85 before writing to the queue. 51 posts were published before this was caught; 25 of them were manually deleted.

# content_control.py — dedup check
def check_duplicate(text: str, platform: str) -> dict:
    history = load_history(platform)  # shared JSON, not in-process
    for entry in history[-30:]:  # check last 30 posts
        similarity = sequence_similarity(text, entry["text"])
        if similarity > 0.85:
            return {"is_duplicate": True, "similarity": similarity, "matched": entry["id"]}
    return {"is_duplicate": False}
Enter fullscreen mode Exit fullscreen mode

Failure 2: Tumblr OAuth Bio Update

The Tumblr API v2 uses OAuth 1.0a for authentication. The bio update endpoint, specifically, returns a 200 OK response but does not actually change the bio in the UI. This is either a known Tumblr API bug or a silent permission issue with the OAuth token type I was using.

The Content Manager spent several heartbeats attempting to update the Tumblr bio to reflect the correct brand voice ("building ai agents that argue with each other at 3am"), checking the API response (200 OK each time), marking the task done, and then finding on the next check that the bio was unchanged.

Fix: bio update was switched to browser automation via Safari and osascript, which works but is fragile. The longer-term solution is to document this as a known Tumblr API limitation and handle bio updates as a "Tim action required" item rather than an agent task.

The lesson here is that 200 OK is not a guarantee. Verification must check the actual production state, not the API response.

Failure 3: Context Loss During Long Tasks

Paperclip runs each agent heartbeat with a limited context window. For long tasks (a 2,000-word post draft, a complex multi-file code change), the agent can run out of context mid-task. When this happens, the session is compressed, and the next heartbeat starts with a truncated view of what was done.

Without explicit state management, this leads to partially-completed work being marked as done, or work being restarted from scratch in the next run.

Fix: session_handoff.md is updated continuously during a heartbeat — not just at the end. Every significant action writes a log entry immediately. Before any "done" report, the agent verifies the actual deliverable (URL accessible, file exists, post published) rather than relying on internal memory of what was done.

# Format for session_handoff.md updates
[07:01] PUBLISHED Write.as post — running-6-ai-agents-in-production
[07:01] URL confirmed: https://write.as/timzinin/running-6-ai-agents-in-production
[07:01] content_control.py --log-publish called — state updated
Enter fullscreen mode Exit fullscreen mode

Economics

Actual numbers, March 2026:

System Monthly AI Cost Monthly Infra Total
CEO Agent ~$10 (shared VPS) ~$10
Content Manager ~$20 (shared VPS) ~$20
Engineer ~$15 (shared VPS) ~$15
Content Factory ~$7 (cron, no overhead) ~$7
Sborka Bot ~$3 (Docker container) ~$3
Lisa Pipeline ~$12 (shared VPS) ~$12
RUVDS VPS (shared infra) ~$25 $25
Total ~$67 ~$25 ~$92/month

For comparison: equivalent manual work would require roughly 15-20 hours per week of my time. At a conservative $50/hour opportunity cost, that is $3,000-4,000/month of work being automated for under $100.

The caveat: these agents are not autonomous. They require periodic supervision, occasional manual intervention (Failure 2 above, for example), and significant upfront engineering to set up correctly. The ROI calculation depends heavily on how you value setup time.

What surprised me: the Paperclip orchestration layer itself is not the expensive part. The token cost of the CEO agent making routing decisions is negligible. The expensive part is the content creation tasks — specifically, long-context sessions where the agent reads multiple strategy documents, writes a draft, evaluates it, and revises. Those sessions run 20-40K tokens per task.


Lessons

Start with the guardrails before the autonomy. The rate limiting, dedup checks, and quality gates were added after the first spam incident. They should have been the first thing built. An agent with unconstrained publish access will eventually cause a mess.

Verification is not optional. Every output has a corresponding check. Published post → URL is accessible. Deployed code → curl returns 200. Bio updated → screenshot confirms bio text. The overhead of verification is real, but the cost of propagating false "done" states through a multi-agent system is higher.

Flat files beat fancy databases for small-scale agent state. I considered using PostgreSQL for the shared dedup state and a vector store for post embeddings. Both would have added infrastructure complexity. A JSON file with a 30-post rolling window of SHA256 hashes works fine at this scale. Add complexity when the simple thing breaks.

Cheap models for high-volume, expensive models for judgment. MiniMax M2.5 handles bulk generation tasks. Claude Opus handles strategic decisions. The quality difference on structured generation tasks (content drafts from a brief) is negligible at a third of the cost.

Agent coordination overhead is real. The CEO spends non-trivial time figuring out what to delegate and whether delegated tasks were done correctly. For a three-person team, direct assignment is often faster. The orchestration layer pays for itself when the task volume exceeds what a human can track, not before.


What Is Next

The current system handles content and infrastructure reasonably well. The next layer is economics: tracking actual conversion from content to signups, from signups to paid users, from agent output to revenue.

I am also exploring self-hosted WriteFreely as a canonical blog (instead of write.as/timzinin) to get full CSS control and better ActivityPub federation management. The migration cost is the ActivityPub handle change — subscribers to @timzinin@write.as do not automatically follow @timzinin@blog.sborka.work, so the announcement and transition window needs planning.


I write about building AI systems and the infrastructure behind them. More at timzinin.com.


Originally published on my Write.as blog. Follow me there via ActivityPub at @timzinin@write.as for more production AI insights.

I build multi-agent AI systems for revenue. More at timzinin.com.

Top comments (0)