Pavel Gajvoronski

Posted on Apr 13 • Edited on Apr 15

I'm Building a Platform That Deploys AI Companies From a Single Sentence

#buildinpublic #agents #startup #architecture

I'm building an AI Company Builder — a platform where you describe a business idea in plain text, and a team of 28 AI agents researches the market, validates viability, designs the product, writes the code, creates content, and runs marketing. All autonomously.

This is not a chatbot. This is not another wrapper around ChatGPT. This is a full-stack agent orchestration platform with a two-layer architecture, persistent knowledge vault, multi-model routing across 300+ models, and a built-in marketplace for buying and selling AI-powered businesses.

I want to share the architecture, the tech stack, and the decisions I made — because I haven't seen anyone build exactly this combination yet.

The problem

Right now, if you want to launch a business with AI, you're stitching together 5-10 tools manually: ChatGPT for strategy, Lovable for code, Jasper for content, Perplexity for research, Notion for knowledge, Zapier for automation. Each tool does one thing. None of them talk to each other. And none of them understand your business as a whole.

What if one platform did it all? Not by being mediocre at everything — but by orchestrating specialized agents, each expert in their domain, all sharing context through a persistent knowledge vault?

The architecture: two layers, not one

Most agent platforms put all agents on the same level. Marketing agent, coding agent, sales agent — flat list, no hierarchy. This works for simple automation but breaks down when you're building a complete business.

I went with a two-layer approach:

Business layer — 7 manager agents that understand your specific business. Product Manager (Max), Marketing Lead (Ivy), Sales Strategist (Sam), Financial Analyst (Finn), Customer Success (Joy), Legal Advisor (Lex), and a Business Generator (Chief) that creates the whole structure from your description. These agents know your niche, your competitors, your audience.

Tool layer — 21 universal agents that do the actual work. Architect (Atlas), Designer (Maya), Frontend Dev (Kai), Backend Dev (Dev), Security (Shield), Researcher (Nova), Writer (Sage), and 14 more. These don't know your business — they know their craft. The business layer delegates to them with full context.

The key insight: business agents are per-business instances, tool agents are shared. If you run 3 businesses simultaneously, each has its own Max and Ivy, but they all share the same Atlas and Kai. This scales without multiplying costs.

Model routing: 90% quality at 7% cost

Running everything on Claude Opus 4.6 would cost a fortune. Running everything on a cheap model would produce garbage. The answer is intelligent routing.

I use OpenRouter as a gateway to 300+ models, organized in 4 tiers:

Tier	Models	Cost/1M tokens	Use case
Free	Llama 3.3 70B	$0	Routing, classification
Budget	DeepSeek V3, Gemini Flash	$0.14-0.60	Content writing, planning
Performance	MiniMax M2.7	$0.30-1.20	Coding, testing, debugging
Premium	Claude Sonnet/Opus 4.6	$3-25	Architecture, security, design

MiniMax M2.7 is the secret weapon here. In real-world tests, it delivers 90% of Opus quality for 7% of the cost. It found all 6 bugs and all 10 security vulnerabilities that Opus found — the fixes were just slightly less thorough. For most coding tasks, that's more than enough.

The system also auto-escalates: if an agent fails 3 times on a cheaper model, it automatically upgrades to the next tier. And auto-downgrades: 10 consecutive successes on Sonnet? The system suggests trying M2.7 next time.

A full project milestone that costs $50-80 on all-Opus runs $10-12 with routing. That's 80-85% savings.

The research stack: not just chat, actual research

This is where I think most agent platforms fall short. They can write code and generate content — but they can't research. They don't know what's happening in the market right now.

My stack includes four self-hosted research tools:

Perplexica — open-source Perplexity alternative. AI-powered web search with cited sources. When Nova (researcher agent) needs to analyze a market, she searches the web through Perplexica and gets answers with real citations, not hallucinations.

SurfSense — open-source NotebookLM alternative. Upload documents, chat with them, get cited answers. Hybrid search (semantic + full text). Can even generate podcasts from documents.

AnythingLLM — RAG workspace for document analysis. Upload PDFs, DOCX, code files — agents query them with grounded answers.

Firecrawl — web scraping via MCP. Agents can scrape any URL into clean markdown, crawl entire websites, extract structured data.

The combination means agents can research a market, analyze competitors, scrape their pricing pages, summarize uploaded pitch decks, and cite every claim with a real source.

The gate system: think before you build

Here's what nobody else does. Before my system commits resources to building something, it analyzes whether it's worth building.

You write: "Build an online chess school for kids 6-14. Analyze viability first. Only proceed if rating is above 7/10."

The system runs a full analysis:

Criterion	Weight	Score
Market size	20%	8/10
Competition level	20%	7/10
Niche uniqueness	15%	9/10
Revenue potential	15%	8/10
Acquisition cost	15%	5/10
Channel accessibility	15%	8/10
Overall		7.4/10

If it passes the threshold — development begins. If not — the system explains why and suggests modifications. "Focus on children 6-10 instead of 6-14 — less competition, higher willingness to pay. Adjusted score: 8.1/10."

This saves thousands of dollars and weeks of development on ideas that won't work.

Persistent memory: the Obsidian Vault

Every research finding, every architectural decision, every bug fix, every content plan — saved as markdown notes in an Obsidian-compatible vault with git version control.

The vault isn't just storage. It's a living knowledge base:

Auto-indexing: Vault Librarian agent (Libra) maintains indexes, tags notes, creates links between related decisions
Git history: every change tracked, every note timestamped, full rollback capability
Memory consolidation: Libra periodically merges scattered notes into coherent knowledge structures
Cross-project learning: insights from one project automatically available in related projects

After 3 months of operation, the vault contains hundreds of notes — and the system is measurably smarter. Nova doesn't re-research topics she already investigated. Atlas references past ADRs when making new architecture decisions. The knowledge compounds.

Event-driven architecture: everything is observable

Every agent action emits an event to Redis pub/sub:

{
  "agent": "Atlas",
  "action": "created_adr",
  "model": "claude-opus-4.6",
  "tokens_in": 2400,
  "tokens_out": 5100,
  "cost_usd": 0.142,
  "duration_ms": 8300,
  "vault_note": "projects/chess/decisions/ADR-001.md"
}

Multiple services subscribe: the audit logger saves to immutable JSONL, the cost tracker aggregates spending, the vault manager auto-saves results, and the live activity stream pushes to the Web UI via WebSocket.

This gives you:

Full audit trail for compliance (EU AI Act, GDPR)
Real-time cost tracking with ROI calculation ("Your agents saved $28,000 in equivalent human labor this month")
Live activity feed — watch your agents work in real-time
Kill switch — instantly halt all agent activity if something goes wrong

A2A protocol readiness

Google's Agent-to-Agent protocol (A2A) is becoming the standard for inter-platform agent communication. 50+ partners including Salesforce, SAP, and PayPal are building on it.

I'm building A2A compatibility from day one. Every agent has an Agent Card — a JSON file describing its capabilities. External agents can discover our agents, send tasks, and receive results through standardized endpoints.

Why this matters: in 2027-2028, your business agents will negotiate with supplier agents, customer agents will talk to support agents across platforms, and marketing agents will coordinate campaigns with influencer agents — all machine-to-machine. Building the protocol layer now means we're ready when this arrives.

What's coming next

The full platform has 14 milestones. I'm currently on the build phase, deploying infrastructure on a Hetzner VPS with Claude Code + GSD-2 running the development process.

What I'm building toward:

YouTube content pipeline: from idea to published video, fully automated
Business Exchange: marketplace for buying and selling AI-powered businesses
Cross-business learning: anonymous patterns shared across all businesses on the platform
400+ integrations via Composio: Gmail, Slack, HubSpot, Notion, Jira — one MCP server
7 business templates: Online Education, SaaS, Agency, E-commerce, Content, Marketplace, Coaching

Tech stack summary

Layer	Technology
Server	Ubuntu 24.04, Hetzner CPX31
AI Engine	Claude Code CLI via OpenRouter
Orchestration	GSD-2 + Ruflo
API Gateway	FastAPI + Redis pub/sub
Interfaces	Telegram (aiogram), React Web UI, REST API
Research	Perplexica, SurfSense, AnythingLLM, Firecrawl
Integrations	Composio (400+ apps), MCP servers
Memory	Obsidian Vault + MCPVault + git
Payments	Stripe, Paddle (MoR)

Why I'm sharing this

Two reasons. First, I genuinely believe this is where software is heading — from tools to autonomous business operators. The predictions from Dario Amodei, Sam Altman, and every major AI lab point to agents handling multi-week projects autonomously by 2028. Building the platform for this now is a bet on the near future.

Second, building in public keeps me honest. If you see flaws in the architecture, I want to know. If you're building something similar, let's compare notes. If you want to be an early user — I'll be opening access soon.

Follow the build: I'll be posting weekly updates here on dev.to with technical deep dives into each component.

What would you build if you had 28 AI agents at your command?

Top comments (1)

kanta13jp1 • Apr 14

Really interesting direction. What stood out to me most was the two-layer architecture — separating business-specific manager agents from shared tool agents feels much more scalable than the usual flat “list of agents” approach.

I also liked that you put a viability gate before build execution. That’s a strong design choice, because a lot of agent systems focus on generation speed but not on whether the thing should be built in the first place.

The combination of persistent memory, event-driven observability, and model routing makes this feel closer to an operating system for AI-native businesses than just another agent demo. Curious to see how the business-layer context evolves over time as the vault compounds.