김이더

Posted on Mar 31

Paperclip: The Open-Source OS for Running 20 AI Agents Like a Company

#ai #opensource #agentorchestration #sideprojects

Project on GitHub. Official site at paperclip.ing.
More posts at radarlog.kr.

Have you ever had 20 Claude Code tabs open at the same time?

I have. One coding agent per side project, one for my RAG server, one for blog automation. Code review, testing, documentation — each in a separate tab. Somewhere past 12 terminals, it hit me.

I wasn't writing code anymore. I was managing agents.

The problem is there was nothing to manage them with. Each agent had no idea the others existed. No shared context. No unified cost tracking. Reboot your machine and all state is gone.

Then I found Paperclip.

Not an Agent Framework

When you first see Paperclip, it's easy to confuse it with something like LangChain or CrewAI. I did. Then I stopped at this line in the README:

"Not an agent framework. We don't tell you how to build agents. We tell you how to run a company made of them."

That's the key distinction. Paperclip doesn't build agents. It hires them and runs them as an organization.

LangChain designs the internal pipeline of a single agent. CrewAI chains multiple agents into a task pipeline. Paperclip sits above both. Org charts, goal hierarchies, budgets, governance, audit trails — the things you need to run a company.

Think of it this way:

LangChain  = one employee's workflow manual
CrewAI     = a team's task board
Paperclip  = the entire company's org chart + HR + finance

It uses Claude Code, OpenClaw, Codex, Cursor — whatever agents you already have. Paperclip's job is to give those agents titles, assign goals, set budgets, and put them to work.

"If it can receive a heartbeat, it's hired."

That one line from the README captures the entire philosophy. Can it receive a scheduled ping? Then it's employable.

Why This Caught a Game Developer's Eye

When you build multiplayer games in UE5, you end up managing Dedicated Servers. Once you have 10 or 20 server instances running, you need an orchestration layer — something to monitor each instance's state, distribute load, and roll back when things break.

The moment I saw Paperclip, that architecture came flooding back.

In game server orchestration, each instance independently runs game logic. The orchestrator never touches the game logic itself. It collects state, balances load, and restarts dead instances. Paperclip works the same way. It doesn't interfere with agent internals. It tracks who's doing what, stops agents that exceed their budget, and rolls back failures.

Game server orchestration:
  Instance → runs GameMode → Orchestrator collects state + balances load

Paperclip:
  Agent → runs Task → Paperclip collects state + manages budget + enforces governance

The core principle is the same: separate what executes from what manages.

Skip this in game servers, and when one of your 20 instances crashes, you won't even know which one died. Skip this with AI agents, and one of them burns $140 in tokens while you're getting coffee.

Paperclip is the control plane that prevents this.

How It Actually Works

Setup is surprisingly simple.

npx paperclipai onboard --yes

One command. A Node.js server starts, an embedded PostgreSQL database sets itself up automatically, and a React dashboard opens at localhost:3100. No separate DB installation. No config files.

Inside the dashboard, you create a "company." This is where the company metaphor kicks in. You create a CEO agent, a CTO, engineers. Assign roles and goals. An org chart forms, with reporting lines.

Agents run on heartbeats. They wake on a schedule, check their task queue, execute work, and go back to sleep. Event triggers work too — task assignments and @-mentions wake them up.

Heartbeat cycle:
  Agent wakes → checks task queue → atomic task checkout → executes → reports back → sleeps

What impressed me most is atomic execution. Task checkout and budget deduction happen atomically. Two agents can't grab the same task. Budget overruns are blocked atomically. As someone who's spent years chasing concurrency bugs in game servers, seeing this built in from day one tells me the architecture was designed right.

Budget management is per-agent with monthly caps. Soft warning at 80%, auto-pause at 100%. Runaway token spending is structurally impossible.

Per-agent monthly budget:
  CEO Agent    → $50/month
  Coder Agent  → $200/month
  QA Agent     → $30/month

  80% reached → soft warning
  100% reached → auto-pause

All visible from the dashboard. Cost tracking by task, by project, by goal. A far cry from having 20 Claude Code tabs open and wondering why the API bill was so high this month.

What the Company Metaphor Actually Gives You

I initially thought the "company metaphor" was a marketing gimmick. Org charts? A CEO agent? Isn't that a bit much?

But when you actually use it, the metaphor changes how you think. You shift from "throwing prompts at an AI" to "managing a team."

From a prompt engineering perspective, you think:

"How do I craft the right prompt for this task?"

From a company management perspective, you think:

"Who should I assign this to, who reviews the output, and how much budget do I allocate?"

The second framing scales. You can manage 20 agents with the same mental model you used for 5. Tweaking individual prompts hits a wall around 3 agents.

Paperclip's goal-aware execution comes from this metaphor too. Every task carries the full ancestry of its parent goals. When an agent picks up a task, it knows not just what to do, but why — all the way up to the company mission. It's not working from a title alone. It understands the chain from mission to task.

In game dev terms, this is like UE5's Gameplay Ability System where each ability knows which GameplayEffect chain triggered it. Context doesn't break, so the agent is less likely to drift off in weird directions.

Then there's governance. Approval gates at every sensitive step. Config changes are versioned. Bad changes can be rolled back. This isn't marketing fluff — it's about whether you can recover when an agent makes a mistake. Flowtivity documented a real incident where an agent sent batch outreach to 23 leads instead of 3. Running agents autonomously without approval gates is like trusting the client in a multiplayer game without any anti-cheat.

It's Early — And That's Fine

Let me be honest. Paperclip is still at v0.3. It launched in March 2026. Over 38K GitHub stars and a rapidly growing community, but it's not production-ready for every use case.

Onboarding has friction. The project's own docs state a goal of "first task in under 5 minutes" — and admit it's not consistently achieved yet. API key configuration is confusing for non-developers.

And it's overkill for solo use. The README says it plainly: if one agent is enough, you don't need Paperclip. If you have 20, you definitely do. If you're somewhere in between, watch how fast your agent count is growing.

The "zero-human company" tagline is provocative. In practice, humans don't disappear. The work changes. Instead of crafting prompts, you're designing organizations, setting goals, building governance. Work doesn't vanish — the abstraction level of work goes up.

What makes this worth watching is clear. Individual agent capabilities improve every month. The bottleneck isn't how smart a single agent is — it's the infrastructure for running many agents as a coordinated system. Paperclip is aiming squarely at that gap.

There's also Clipmart on the roadmap — a marketplace for company templates. Download a pre-built org with agent configs, skills, and workflows in one click. If that lands well, the barrier to "starting an AI company" drops dramatically.