DEV Community

Cover image for ๐Ÿ“Ž Paperclip Deep Dive ๐Ÿค– โ€” A Build Guide for an "AI Company" ๐Ÿข Control Plane
Truong Phung
Truong Phung

Posted on

๐Ÿ“Ž Paperclip Deep Dive ๐Ÿค– โ€” A Build Guide for an "AI Company" ๐Ÿข Control Plane

Source: github.com/paperclipai/paperclip โ€” "Open-source orchestration for zero-human companies."

This guide distills the architecture, principles, and engineering choices behind Paperclip into an actionable blueprint you can use to build a similar system. It is written so you can read it top-to-bottom and walk away with a concrete plan.


Table of Contents

  1. ๐Ÿค– What Paperclip Actually Is
  2. ๐Ÿง  Core Mental Model: Control Plane, Not Framework
  3. ๐Ÿ“ The 10 Design Principles
  4. ๐Ÿ—๏ธ High-Level Architecture
  5. ๐Ÿ—ƒ๏ธ The Domain Model โ€” How "A Company" Maps to Tables
  6. ๐Ÿ’š The Heartbeat โ€” The Heart of the Runtime
  7. ๐Ÿ”Œ Adapters โ€” "Bring Your Own Agent"
  8. โœ… The Task System & Atomic Checkout
  9. โš–๏ธ Governance, Approvals & The Board
  10. ๐Ÿ’ฐ Budgets & Cost Control
  11. ๐Ÿงฉ Plugin System โ€” Capability-Gated Extensions
  12. ๐Ÿ“ก MCP Server โ€” Agents Talk to the API
  13. ๐ŸŽ“ Skills โ€” Teaching Agents the API
  14. โš™๏ธ Tech Stack & Repository Layout
  15. ๐ŸŒ REST API Surface
  16. ๐Ÿ”’ Multi-Company Isolation & Portability
  17. ๐Ÿ“‹ Audit Trail & Activity Log
  18. ๐Ÿ“ Engineering Conventions
  19. ๐Ÿ—บ๏ธ Step-by-Step Build Plan
  20. โš ๏ธ Pitfalls, Tradeoffs & What To Skip First

๐Ÿค– 1. What Paperclip Actually Is

Paperclip is a Node.js + React self-hosted application that lets you run a "company" of AI agents:

  • You define a company with goals/initiatives.
  • You hire agents (Claude Code, Codex, Cursor, custom CLI, HTTP bot โ€” you pick the runtime).
  • You assign tasks (issues) and budgets.
  • A board operator (human) approves hires, strategic plans, and budget overrides.
  • A scheduler runs each agent on a heartbeat (a short execution window) and tracks cost, status, tool calls, and outputs.

The Paperclip slogan: "If OpenClaw is an employee, Paperclip is the company."

It looks like a task manager (Linear/Jira) but underneath it is an org chart, a budget engine, an approval queue, a multi-runtime executor, and an audit log โ€” all designed for non-human workers.


๐Ÿง  2. Core Mental Model: Control Plane, Not Framework

This is the most important idea to internalize before building anything.

Agent Framework (LangGraph, CrewAIโ€ฆ) Control Plane (Paperclip)
Decides how an agent thinks Decides what an agent works on
Owns the prompt + tool loop Treats the agent loop as a black box
One process, in-memory Many processes, durable state
You ship code You ship a deployment

Concrete consequences for design:

  • The system never runs a "react+plan+act" loop itself. That is the adapter's job.
  • The system does own: identity, scheduling, task ownership, cost ledger, approvals, audit, persistence.
  • The contract with an agent is shockingly small: "I can invoke you, get status, and cancel you."

If you start building a Paperclip-like system and find yourself writing prompt templates or tool-call parsers in the core, you have drifted into framework territory โ€” pull back.


๐Ÿ“ 3. The 10 Design Principles

Lifted (and de-jargoned) from the spec:

  1. Unopinionated execution. The core does not care which model, prompt, or planner an agent uses. It launches a process and waits.
  2. Task-centric communication. Agents do not talk to each other directly. Delegation = task creation. Coordination = task comments. Status = field updates. This makes everything observable and replayable.
  3. Goal-traced work. Every task descends from a company initiative: Initiative โ†’ Project โ†’ Milestone โ†’ Issue โ†’ Sub-issue. No orphan work.
  4. Atomic task ownership. A task can be owned by exactly one agent at a time, enforced at the database layer (not in app code).
  5. Visible problem surfacing. Agents that get stuck must mark issues blocked and escalate. Silent retries are an anti-pattern.
  6. Human board authority. Every irreversible or high-risk action (hiring, big-spend, strategy approval, termination) requires a human approval record.
  7. Cost follows work. Costs are billed against the requesting task chain, not just the executing agent. This makes "who is expensive and why" answerable.
  8. Hard budget ceilings. Soft alert at 80%. At 100%, the agent is auto-paused and further invocations are blocked. No "best-effort."
  9. Progressive deployment. It must run on a laptop with embedded Postgres, then scale to self-hosted / cloud โ€” same code, same schema.
  10. Plugin-extensible, not fork-extensible. Capabilities the core doesn't ship come from out-of-process plugins with declared, gated capabilities.

When you design your system, keep this list visible and bounce every PR against it.


๐Ÿ—๏ธ 4. High-Level Architecture

                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                            โ”‚       React UI (Vite)      โ”‚
                            โ”‚  Org chart ยท Tasks ยท Costs โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                           โ”‚ REST + SSE
                                           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Node.js Server (TypeScript / Express)         โ”‚
โ”‚                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  REST API   โ”‚  โ”‚  Scheduler  โ”‚  โ”‚  Approvals  โ”‚  โ”‚ Plugins โ”‚  โ”‚
โ”‚  โ”‚ (handlers)  โ”‚  โ”‚ (heartbeat) โ”‚  โ”‚   engine    โ”‚  โ”‚  host   โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚         โ”‚                โ”‚                 โ”‚              โ”‚       โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                          โ–ผ                                        โ”‚
โ”‚                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
โ”‚                 โ”‚   Adapter Mgr    โ”‚โ”€โ”€โ”€โ–ถโ”‚   Agent runtime  โ”‚      โ”‚
โ”‚                 โ”‚ (claude_local,   โ”‚    โ”‚ (child process / โ”‚      โ”‚
โ”‚                 โ”‚  codex_local,    โ”‚    โ”‚  HTTP webhook)   โ”‚      โ”‚
โ”‚                 โ”‚  http, process)  โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ”‚                 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
                           โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  PostgreSQL (or PGlite)  โ”‚
              โ”‚  companies ยท agents ยท    โ”‚
              โ”‚  issues ยท heartbeats ยท   โ”‚
              โ”‚  costs ยท approvals ยท     โ”‚
              โ”‚  activity_log            โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

      Sidecar (optional):
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ”‚   MCP server (thin REST   โ”‚  โ—€โ”€โ”€โ”€ agents call here to read/write Paperclip
      โ”‚       wrapper)            โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

The 12 subsystems the spec calls out โ€” this is the checklist for "feature complete v1":

  1. Identity & Access
  2. Org Chart & Agents
  3. Work & Task System
  4. Heartbeat Execution
  5. Workspaces & Runtime
  6. Governance & Approvals
  7. Budget & Cost Control
  8. Routines & Schedules
  9. Plugins
  10. Secrets & Storage
  11. Activity & Events
  12. Company Portability (export/import)

๐Ÿ—ƒ๏ธ 5. The Domain Model

This is where most of the cleverness lives. The schema is small but every column matters.

๐Ÿข Companies

companies(
  id uuid pk,
  name, description, status (active|paused|archived),
  pause_reason, paused_at,
  issue_prefix text not null,        -- e.g. "ACME"
  issue_counter int not null,        -- monotonic, used for ACME-123
  budget_monthly_cents int default 0,
  spent_monthly_cents int default 0,
  attachment_max_bytes,
  require_board_approval_for_new_agents bool
)
Enter fullscreen mode Exit fullscreen mode

Why an issue_prefix + issue_counter? So tasks have human-friendly IDs (ACME-42) that are stable, sortable, and unique per company without leaking other tenants' counts.

๐Ÿค– Agents

agents(
  id, company_id, name, role, title, icon,
  status (active|paused|idle|running|error|pending_approval|terminated),
  reports_to uuid โ†’ agents.id null,            -- the org chart edge
  capabilities text,
  adapter_type text,                           -- claude_local | codex_local | http | ...
  adapter_config jsonb,                        -- adapter-specific
  runtime_config jsonb default {},             -- timeouts, cwd, env
  default_environment_id,
  context_mode (thin|fat) default thin,
  budget_monthly_cents int default 0,
  spent_monthly_cents int default 0
)
Enter fullscreen mode Exit fullscreen mode

Why adapter_type + adapter_config (jsonb)? Lets you support N agent runtimes without N tables. The polymorphism lives in code (the adapter manager) and JSON, not in DDL.

๐Ÿ“ Issues (tasks)

issues(
  id, company_id, project_id, goal_id, parent_id,
  title, description,
  status (backlog|todo|in_progress|in_review|done|blocked|cancelled),
  priority (critical|high|medium|low),
  assignee_agent_id, assignee_user_id,

  -- Atomic checkout fields:
  checkout_run_id, execution_run_id,
  execution_agent_name_key, execution_locked_at,

  -- Provenance:
  created_by_agent_id, created_by_user_id,
  issue_number, identifier,                    -- e.g. ACME-42
  origin_kind, origin_id, origin_run_id, origin_fingerprint,
  request_depth int default 0,                 -- how deep the delegation chain is
  billing_code text                            -- "cost follows work"
)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ’š Heartbeat runs (one row per execution window)

heartbeat_runs(
  id, company_id, agent_id,
  invocation_source (scheduler|manual|callback),
  status (queued|running|succeeded|failed|cancelled|timed_out),
  started_at, finished_at, error,
  external_run_id text,                        -- adapter's run id, for resume
  context_snapshot jsonb                       -- what was passed in
)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ’ฐ Cost events (the ledger)

cost_events(
  id, company_id, agent_id, issue_id, project_id, goal_id,
  billing_code,
  provider text, model text,
  input_tokens, output_tokens, cost_cents,
  occurred_at
)
Enter fullscreen mode Exit fullscreen mode

โš–๏ธ Approvals (governance queue)

approvals(
  id, company_id,
  type (hire_agent|approve_ceo_strategy|budget_override_required|request_board_approval),
  requested_by_agent_id, requested_by_user_id,
  status (pending|revision_requested|approved|rejected|cancelled),
  payload jsonb,                               -- the proposed change
  decision_note, decided_by_user_id, decided_at
)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“‹ Activity log (the audit tape)

activity_log(
  id, company_id,
  actor_type (agent|user|system), actor_id,
  action text,                                 -- "issue.checked_out"
  entity_type, entity_id,
  details jsonb,
  created_at
)
Enter fullscreen mode Exit fullscreen mode

๐Ÿ” Indexes that matter (don't skip)

agents(company_id, status)
agents(company_id, reports_to)                   -- org-chart traversal
issues(company_id, status)
issues(company_id, assignee_agent_id, status)    -- "what's on my plate"
issues(company_id, parent_id)                    -- subtasks
issues(company_id, project_id)
cost_events(company_id, occurred_at)
cost_events(company_id, agent_id, occurred_at)   -- per-agent rollups
heartbeat_runs(company_id, agent_id, started_at desc)
approvals(company_id, status, type)
activity_log(company_id, created_at desc)
Enter fullscreen mode Exit fullscreen mode

Lesson: every index starts with company_id. Tenant isolation is a query-plan concern, not just an auth concern.


๐Ÿ’š 6. The Heartbeat

The heartbeat is the runtime kernel. Everything else is plumbing around it.

๐Ÿ”„ Lifecycle of a single tick

1. Scheduler decides "agent A should run now"
       โ†“
2. Insert heartbeat_runs row (status=queued)
       โ†“
3. Adapter manager looks up agents.adapter_type
       โ†“
4. Adapter.invoke(agentConfig, context):
        - Build prompt/context
        - Spawn child process OR fire HTTP webhook
        - Pass session_id from previous run if resumable
       โ†“
5. Stream logs, status, tool calls back into the run row
       โ†“
6. Wait until: exit | timeout | cancel
        - On timeout: send stop signal, wait graceSec, force-kill
       โ†“
7. Persist: token usage, cost_events rows, output snippet, error
       โ†“
8. Update heartbeat_runs (status=succeeded|failed|timed_out)
       โ†“
9. Emit activity_log entry; broadcast SSE to UI
Enter fullscreen mode Exit fullscreen mode

โšก Wakeup triggers (only four)

Trigger Meaning
timer Cron-like โ€” "every 5 minutes"
assignment A new task was checked out to this agent
on_demand Human or API pressed the "Run now" button
automation System-internal trigger (future)

๐Ÿ” Coalescing

"If an agent is already running, new wakeups are merged (coalesced) instead of launching duplicate runs."

This rule alone prevents 90% of the duplicate-spend bugs you'd otherwise hit.

โ–ถ๏ธ Session resumption

For adapters that support it (Claude CLI, Codex CLI), Paperclip stores the external_run_id / session ID in the heartbeat row. The next tick passes it back so the agent reloads its context. Operators can reset the session when context goes stale.

โš™๏ธ Runtime config

runtime_config:
  cwd: /workspaces/acme-engineering
  timeoutSec: 1800        # max wall time per heartbeat
  graceSec: 30            # SIGTERM โ†’ SIGKILL window
  env:
    ANTHROPIC_API_KEY: ${secret:anthropic_key}
  promptTemplate: ...     # adapter-specific
  args: [...]
Enter fullscreen mode Exit fullscreen mode

๐Ÿ›ก๏ธ Safety

"Local CLI adapters run unsandboxed on the host machine."

The spec is honest about this. Mitigations: per-agent OS user, restricted cwd, secrets managed by the host (not in prompts), and capability-gated plugins for anything the agent can't do directly.


๐Ÿ”Œ 7. Adapters โ€” "Bring Your Own Agent"

The adapter is the only abstraction over agent runtimes. It is intentionally tiny.

interface Adapter {
  invoke(agentConfig: AgentConfig, context?: HeartbeatContext): Promise<RunHandle>;
  status(agentConfig: AgentConfig): Promise<AgentStatus>;
  cancel(agentConfig: AgentConfig): Promise<void>;
}
Enter fullscreen mode Exit fullscreen mode

That's the whole contract. Three methods.

๐Ÿ”Œ Built-in adapters

Adapter Mechanism
process Spawns an arbitrary CLI as a child process
http POSTs to a webhook; agent lives wherever it lives
claude_local Claude Code CLI, supports session resume
codex_local OpenAI Codex CLI
cursor Cursor headless mode
gemini-local, pi_local, opencode-local, hermes_local Other local CLIs
openclaw_gateway Calls a managed cloud service

๐Ÿ† Why this design wins

  • Adding an agent runtime is a self-contained PR. Drop a folder under packages/adapters/<name>/. No core changes.
  • Most adapters are 100โ€“300 lines. They're mostly: spawn process, wire stdin/stdout, parse final JSON, report cost.
  • Polymorphism in JSON, not types. adapter_config jsonb lets each adapter define its own shape; the manager just passes it through.

๐Ÿ“Š Integration levels (acceptable degrees of "support")

Level What the adapter does
Minimum Callable; reports exit code
Status Reports success/failure/progress
Full Reports cost, updates tasks, calls back into Paperclip API

You don't need full instrumentation on day one. A new adapter can land at "Minimum" and be useful.


โœ… 8. Task System & Atomic Checkout

The task system is what stops two agents from doing the same work at the same time. It is the second-most-important runtime concept after the heartbeat.

๐ŸŒฒ Hierarchy

Initiative   (board-level direction, e.g. "Reach $1M ARR")
  โ””โ”€โ”€ Project          (e.g. "Self-serve checkout")
       โ””โ”€โ”€ Milestone   (e.g. "Public beta")
            โ””โ”€โ”€ Issue   (e.g. "Add Stripe webhook handler")
                 โ””โ”€โ”€ Sub-issue
Enter fullscreen mode Exit fullscreen mode

Every task traces up to an initiative; no work is "for nothing."

๐Ÿ” Atomic checkout (the magic SQL)

// Request
POST /issues/:issueId/checkout
{ "agentId": "uuid", "expectedStatuses": ["todo","backlog","blocked","in_review"] }
Enter fullscreen mode Exit fullscreen mode

Server-side:

UPDATE issues
SET assignee_agent_id = :agentId,
    status            = 'in_progress',
    started_at        = COALESCE(started_at, now())
WHERE id = :issueId
  AND status = ANY (:expectedStatuses)
  AND (assignee_agent_id IS NULL OR assignee_agent_id = :agentId);
Enter fullscreen mode Exit fullscreen mode

If the row count is 0, return 409 Conflict with the current owner/status. Otherwise the row is locked to that agent.

This single update is the entire concurrency story. No queues, no Redis locks, no leases. The DB row is the lock.

๐Ÿค Cross-team work & escalation rules

  • Any agent can create a task for any other agent (no permission walls โ€” visibility is total).
  • The receiving agent must complete, block, or escalate. They cannot silently cancel a cross-team request.
  • Escalation goes up their own reports_to chain.

๐Ÿท๏ธ Billing codes

When agent X delegates to agent Y, Y's cost_events are tagged with the billing code from X's task. Roll-ups answer "how much did Initiative #3 actually cost across the whole graph?"

๐Ÿ”„ State machine

backlog โ”€โ†’ todo โ”€โ†’ in_progress โ”€โ†’ in_review โ”€โ†’ done   (terminal)
   โ”‚         โ”‚           โ”‚
   โ”‚         โ””โ”€โ†’ blocked โ†โ”˜
   โ”‚         โ”‚
   โ””โ”€โ†’ cancelled (terminal)

Side effects:
  โ†’ in_progress  : sets started_at if null
  โ†’ done         : sets completed_at
  โ†’ cancelled    : sets cancelled_at
Enter fullscreen mode Exit fullscreen mode

โš–๏ธ 9. Governance, Approvals & The Board

The "board" is a single human operator (in v1). They have unrestricted authority โ€” pause, resume, override, terminate.

๐Ÿ“ฅ Approval queue

The approvals table is a generic mechanism. Four request types ship by default:

Type Who proposes What it gates
hire_agent CEO agent (or any agent if company requires) Creating a new agent
approve_ceo_strategy CEO agent Initial org/task plan
budget_override_required Any agent Spending past hard limit
request_board_approval Any agent Anything escalated to a human

Each approval carries a payload jsonb describing the proposed change. Approving an approval is what causes the change โ€” the request isn't applied until decided.

๐Ÿš€ The bootstrap sequence

This is what happens when a user starts a new company:

1. Human creates Company + Initiatives
2. Human writes initial top-level tasks
3. Human creates a "CEO" agent from a default template
4. CEO agent runs, proposes:
     - org structure (sub-agents to hire)
     - task breakdown
     - hiring approvals
5. Board reviews + approves
6. CEO begins delegating; the company is alive
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”‘ Decision authority

Agents can propose anything. Agents can execute only on tasks they own. Anything else routes through approvals. This is the rule that prevents an agent from, say, "deciding" to spawn 50 sub-agents and bankrupting the company.


๐Ÿ’ฐ 10. Budgets & Cost Control

Cost is treated like rate-limiting: a soft warning, then a hard wall.

๐Ÿ“Š Reporting levels

Level Question it answers
Per-agent "Is this agent expensive?"
Per-task "Did this PR cost too much?"
Per-project "What's our $ on Project X?"
Per-billing-code "What did Initiative #3 cost end-to-end?"
Company-wide "What did the company spend this month?"

๐Ÿšง Enforcement

Soft alert default threshold: 80%
At 100%:
  - Set agent status to paused
  - Block new checkout/invocation for that agent
  - Emit high-priority activity event
Enter fullscreen mode Exit fullscreen mode

The "auto-pause" is the entire mechanism. There is no graceful degradation, no "let it finish the current task." It stops.

โš™๏ธ Budget configuration

  • Periods: daily | weekly | monthly | rolling
  • Per-agent and per-company budgets are independent. Both must allow the run.
  • "Unlimited" is a setting; if you want it, you set it explicitly.

๐Ÿ’ณ Cost ingestion

Agents (or their adapter) POST to:

POST /companies/:companyId/cost-events
{ agentId, issueId, provider, model, input_tokens, output_tokens, cost_cents, billing_code, occurred_at }
Enter fullscreen mode Exit fullscreen mode

The server enforces the company scope, denormalizes into rollups, and runs the budget check. Cost events are append-only โ€” no edits, no deletes.


๐Ÿงฉ 11. Plugin System

Plugins extend Paperclip without forking it. The architecture is two pieces:

  • Worker: Node.js process running the plugin's logic. Out-of-process by design.
  • UI: React components mounted at named "slots" in the host UI.

๐Ÿ› ๏ธ Worker contract

import { definePlugin } from "@paperclipai/plugin-sdk";

export default definePlugin({
  async setup(ctx) {
    ctx.data.register("widget.summary", async (params) => { ... });
    ctx.actions.register("widget.run",  async (input) => { ... });
    ctx.tools.register("widget.search", schema, async (input) => { ... });
    ctx.events.on("issue.checked_out", async (e) => { ... });
    ctx.jobs.register("daily.rollup",  async () => { ... });
  },
  onConfigChanged(newConfig) {},
  onShutdown() {},
  onValidateConfig(config) {},
  onWebhook(input) {},
  onHealth() {},
});
Enter fullscreen mode Exit fullscreen mode

๐Ÿ” Capability gating

Every API on ctx requires a declared capability in the plugin manifest:

companies.read, issues.read, issues.create,
events.subscribe, jobs.schedule,
agent.sessions.create, agents.invoke,
ui.sidebar.register, ui.detailTab.register, ...
Enter fullscreen mode Exit fullscreen mode

The host enforces them at call time. A plugin without issues.create cannot create an issue, even if it tries.

๐Ÿ–ผ๏ธ UI slots

Plugins mount React into named slots:

page, sidebar, sidebarPanel, settingsPage, dashboardWidget,
globalToolbarButton, detailTab, taskDetailView,
projectSidebarItem, toolbarButton, contextMenuItem,
commentAnnotation, commentContextMenuItem
Enter fullscreen mode Exit fullscreen mode

The UI side gets typed React hooks:

usePluginData<T>(key, params?)        // fetch worker data
usePluginAction(key)                   // invoke worker action
usePluginStream<T>(channel)            // SSE
useHostContext()                       // { companyId, entityId, entityType }
Enter fullscreen mode Exit fullscreen mode

๐Ÿงฑ Why out-of-process?

  • A crashing plugin doesn't take down the server.
  • Plugins can be in any language that can speak the IPC protocol.
  • Capability gating is enforceable at the IPC boundary, not just by trust.

๐Ÿ“ก 12. MCP Server

packages/mcp-server is a thin Model Context Protocol wrapper around the REST API. It exists so that any MCP-aware agent runtime (Claude Code, Cursor, etc.) can read and write Paperclip without bespoke integration code.

Configured with:

PAPERCLIP_API_URL
PAPERCLIP_API_KEY
PAPERCLIP_COMPANY_ID    (optional)
PAPERCLIP_AGENT_ID      (optional)
PAPERCLIP_RUN_ID        (optional)
Enter fullscreen mode Exit fullscreen mode

Tool surface (representative)

Read: getMe, listAgents, listIssues, getIssue, listComments, listProjects, listGoals, listApprovals, ...

Write: createIssue, updateIssue, checkoutIssue, addComment, suggestTask, requestConfirmation, decideApproval, ...

Escape hatch: paperclipApiRequest({ path, method, body }) โ€” restricted to /api paths and JSON bodies, lets agents reach endpoints with no dedicated tool yet.

Lesson: the MCP server has no business logic. It is a translation layer. Single source of truth = the REST API. This is why it can stay tiny.


๐ŸŽ“ 13. Skills

A skill is a markdown file (plus optional examples) that teaches an agent how to use the Paperclip API. It is adapter-agnostic โ€” Claude, Codex, custom, all read the same SKILL.md.

The bundled skills (under /skills) include:

  • paperclip โ€” the master skill: task CRUD, status reporting, cost logging, comms rules.
  • paperclip-create-agent โ€” how to propose hiring a new agent (writes to approvals).
  • paperclip-create-plugin โ€” scaffolding a plugin.
  • paperclip-converting-plans-to-tasks โ€” taking a CEO's plan into atomic issues.
  • paperclip-dev โ€” meta-skill for editing Paperclip itself.
  • para-memory-files โ€” managing persistent agent memory.

A skill is not code; it's prose + examples. The agent's runtime loads it as part of its system context. This means upgrading a skill upgrades every agent that uses it, no redeploy.


โš™๏ธ 14. Tech Stack & Repo Layout

Concern Choice
Backend Node.js 20+, TypeScript, Express (REST only โ€” no tRPC)
Frontend React + Vite
DB PostgreSQL; PGlite for local/dev, Supabase or Docker Postgres for prod
ORM Drizzle (drizzle.config.ts in packages/db)
Auth Better Auth
Tests Vitest + Playwright
Package mgr pnpm 9.15+ workspaces
License MIT

Top-level layout

.agents/skills/      # Agent skill definitions
.claude/skills/      # Claude-specific skills
.github/             # CI, templates
cli/                 # `npx paperclipai onboard` etc.
docker/              # Compose + Dockerfiles
docs/                # Public docs site
doc/                 # Internal SPEC.md, SPEC-implementation.md
evals/               # Agent eval framework
packages/
  adapters/          # claude-local, codex-local, cursor-local, ...
  adapter-utils/     # shared adapter helpers
  db/                # Drizzle schema + migrations
  mcp-server/        # MCP wrapper
  plugins/
    sdk/             # @paperclipai/plugin-sdk
    create-paperclip-plugin/
    sandbox-providers/e2b/
  shared/            # types, utils
patches/             # pnpm patch files
releases/            # release artifacts
report/              # reporting tools
scripts/             # one-off ops scripts
server/              # the Node server
  src/
  scripts/
skills/              # the bundled skills
tests/               # cross-package tests
ui/                  # the React app
Enter fullscreen mode Exit fullscreen mode

One-command onboarding

npx paperclipai onboard --yes
# or:
git clone https://github.com/paperclipai/paperclip.git && cd paperclip
pnpm install
pnpm dev
Enter fullscreen mode Exit fullscreen mode

pnpm dev boots: server (with PGlite embedded), UI (Vite), and a watcher.


๐ŸŒ 15. REST API Surface

The full v1 surface, grouped. Use this as the spec for your server.

๐Ÿข Companies

GET    /companies
POST   /companies
GET    /companies/:companyId
PATCH  /companies/:companyId
PATCH  /companies/:companyId/branding
POST   /companies/:companyId/archive
Enter fullscreen mode Exit fullscreen mode

๐ŸŽฏ Goals

GET    /companies/:companyId/goals
POST   /companies/:companyId/goals
GET    /goals/:goalId
PATCH  /goals/:goalId
DELETE /goals/:goalId
Enter fullscreen mode Exit fullscreen mode

๐Ÿค– Agents

GET    /companies/:companyId/agents
POST   /companies/:companyId/agents
GET    /agents/:agentId
PATCH  /agents/:agentId
POST   /agents/:agentId/pause
POST   /agents/:agentId/resume
POST   /agents/:agentId/terminate
POST   /agents/:agentId/keys                  # mint API key for the agent
POST   /agents/:agentId/heartbeat/invoke      # manual on-demand wakeup
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Issues

GET    /companies/:companyId/issues
POST   /companies/:companyId/issues
GET    /issues/:issueId
PATCH  /issues/:issueId
POST   /issues/:issueId/checkout              # atomic
POST   /issues/:issueId/release
POST   /issues/:issueId/admin/force-release   # board-only
POST   /issues/:issueId/comments
GET    /issues/:issueId/comments
POST   /companies/:companyId/issues/:issueId/attachments
GET    /issues/:issueId/attachments
Enter fullscreen mode Exit fullscreen mode

๐Ÿ’ฐ Costs & budgets

POST   /companies/:companyId/cost-events
GET    /companies/:companyId/costs/summary
GET    /companies/:companyId/costs/by-agent
GET    /companies/:companyId/costs/by-project
PATCH  /companies/:companyId/budgets
PATCH  /agents/:agentId/budgets
Enter fullscreen mode Exit fullscreen mode

โš–๏ธ Approvals

GET    /companies/:companyId/approvals?status=pending
POST   /companies/:companyId/approvals
POST   /approvals/:approvalId/approve
POST   /approvals/:approvalId/reject
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Š Activity & dashboard

GET    /companies/:companyId/activity
GET    /companies/:companyId/dashboard
Enter fullscreen mode Exit fullscreen mode

Design notes

  • Every write that mutates state writes one row to activity_log in the same transaction.
  • Authorization is one model: the API key resolves to an actor (user, agent, or system) and a company scope. The same handler serves UI requests and agent requests; only the actor type differs.
  • No RPC, no GraphQL. Plain REST keeps the MCP wrapper trivially thin.

๐Ÿ”’ 16. Multi-Company Isolation & Portability

The deployment is single-tenant for the operator (you run your own server), but multi-company within the deployment (one Paperclip can host several orgs).

Isolation is enforced three ways:

  1. Schema: every domain table has company_id and every index leads with it.
  2. Authorization: the actor's API key carries a company scope; handlers reject mismatches.
  3. Storage: secrets, attachments, plugin state are namespaced by company.

๐Ÿ“ฆ Portability

  • Template export โ€” schema only (org chart, roles, default tasks). Useful for "starter companies."
  • Snapshot export โ€” full state including tasks, comments, costs. With secret scrubbing before serialization.
  • Imports are atomic; either the whole company appears or nothing does.

๐Ÿ“‹ 17. Audit Trail & Activity Log

Every state mutation produces:

activity_log(
  actor_type โˆˆ {agent, user, system},
  actor_id,
  action       e.g. "issue.checked_out",
  entity_type, entity_id,
  details jsonb,
  created_at
)
Enter fullscreen mode Exit fullscreen mode

Two consequences:

  • Replay โ€” you can reconstruct any past state by walking the log.
  • Tool-call tracing โ€” when an agent calls the MCP server, those calls become activity entries. "What did agent X actually do at 3:14am?" is a query, not an investigation.

๐Ÿ“ 18. Engineering Conventions

These are guardrails worth copying verbatim:

  1. Keep changes company-scoped. Every query, every cache key, every authorization check. No cross-tenant code paths exist.
  2. Contracts must be in sync. The DB schema, the OpenAPI spec, the TypeScript types, and the MCP tool definitions are generated from one source. Drift is a bug.
  3. Migrations are append-only. Never edit a migration after it has shipped. Use pnpm db:migrate to generate; never hand-write SQL into old files.
  4. One PR = one logical change.
  5. Each PR declares the model that wrote it. (Cute but useful telemetry.)
  6. All tests pass before merge. CI green. Code-review tool score = 5/5.
  7. Fail visibly. Agents that hit unexpected state mark tasks blocked; servers return errors; UIs show them. No silent fallbacks.
  8. Read SPEC-implementation.md when in doubt. When SPEC.md and the implementation spec disagree, implementation wins for v1.

๐Ÿ—บ๏ธ 19. Step-by-Step Build Plan

If you are building a Paperclip-like system from scratch, do it in this order. Each step is shippable on its own.

๐ŸŒฑ Phase 0 โ€” Skeleton (1-2 days)

  • pnpm monorepo with server/, ui/, packages/db, packages/shared.
  • Express server, Vite React app, Drizzle + PGlite for dev.
  • Health check endpoint, hello world UI.

๐Ÿ” Phase 1 โ€” Companies & Auth

  • companies table.
  • Better Auth for human users.
  • API-key model: every key is (actor_type, actor_id, company_id).
  • Middleware that resolves the key into an Actor and rejects on company mismatch.

๐Ÿข Phase 2 โ€” Org Chart

  • agents table with reports_to.
  • CRUD endpoints + UI org-chart view.
  • Status field with transitions, but no runtime yet โ€” agents are just data.

๐Ÿ“ Phase 3 โ€” Tasks

  • issues + goals + projects tables with the full hierarchy.
  • Implement atomic checkout with the exact SQL above. Write a regression test that races 50 concurrent checkouts and asserts exactly one wins.
  • Kanban / list UI.

๐Ÿ’š Phase 4 โ€” The Heartbeat (the moment your project becomes real)

  • heartbeat_runs table.
  • Adapter manager interface (3 methods: invoke, status, cancel).
  • Build one adapter first: process (just spawn a CLI you control). Don't start with Claude.
  • Scheduler:
    • Cron loop for timer triggers.
    • Hook on issue checkout โ†’ emit assignment wakeup.
    • "Run now" button โ†’ on_demand.
  • Coalescing: if a run is already running for an agent, drop new wakeups, mark them as merged.
  • Timeouts + grace + force-kill.

๐Ÿ’ฐ Phase 5 โ€” Cost & Budgets

  • cost_events table.
  • Budget fields on companies and agents.
  • Ingestion endpoint with company-scope check.
  • On every cost insert: recompute spent / budget; if past 100%, pause agent + emit activity.
  • Dashboards: per-agent, per-task, per-project rollups (use the indexes you already built).

โš–๏ธ Phase 6 โ€” Approvals & Governance

  • approvals table; generic payload + type.
  • request_board_approval flow end-to-end.
  • "Hire agent" requires approval; approving the approval creates the agent row.
  • Board UI with a single "approvals" inbox.

๐Ÿ“‹ Phase 7 โ€” Activity Log + SSE

  • Append activity_log in the same transaction as every mutation.
  • Server-sent events broadcast new activity to subscribed UIs.
  • "Recent activity" feed and per-entity history.

๐Ÿ”Œ Phase 8 โ€” More adapters

  • Wrap a real CLI (Claude Code or Codex). Reuse adapter-utils for stdio framing and JSON parsing.
  • Add http adapter for remote agents.
  • Now you can ship to early users.

๐Ÿ“ก Phase 9 โ€” MCP Server

  • Standalone package that calls your REST API.
  • One MCP tool per important endpoint, plus the escape-hatch apiRequest.
  • Test it with Claude Code locally.

๐ŸŽ“ Phase 10 โ€” Skills

  • Pick the top 3 things agents do badly without guidance and write SKILL.mds for them.
  • Distribute via .agents/skills/ and tell adapters to load them into the system context.

๐Ÿงฉ Phase 11 โ€” Plugins

  • Out-of-process worker SDK with definePlugin.
  • IPC: simplest is JSON over stdio with a request-id correlation.
  • Manifest with declared capabilities; host enforces at every IPC call.
  • UI slot system: a registry keyed by slot name, plugins mount React via iframe or shadow DOM.

๐Ÿ“ฆ Phase 12 โ€” Portability

  • POST /companies/:id/export โ†’ JSON snapshot, with a secret_scrub pass.
  • POST /companies/import โ†’ atomic, transactional.

โœจ Phase 13 โ€” Polish

  • One-command onboarding (npx <yourtool> onboard) that generates .env, runs migrations, opens browser.
  • Docker compose for "self-host on a box."
  • Telemetry (anonymous, opt-out).

โš ๏ธ 20. Pitfalls and Tradeoffs

๐Ÿšซ Things to not do, especially early

  • Don't build your own agent loop. The whole point is to be unopinionated. Wrap a CLI; ship.
  • Don't add tRPC / GraphQL. It makes the MCP wrapper non-trivial. Plain REST is the contract that survives.
  • Don't centralize prompts in the server. Prompts belong in adapters or skills. The core has zero opinion about model behavior.
  • Don't treat budgets as soft. "Best effort" budget enforcement is no enforcement. Build the auto-pause from day one.
  • Don't allow direct agent-to-agent calls. Force everything through tasks/comments. You'll thank yourself when debugging.
  • Don't put company_id on "most" tables. Put it on every table.
  • Don't sandbox plugins via trust. Out-of-process + capability manifest, or nothing.

โš–๏ธ Honest tradeoffs Paperclip makes

Tradeoff What you get What you lose
Single human board operator (v1) Simple authority model No multi-stakeholder governance
REST + jsonb polymorphism Easy to extend, MCP is trivial Less compile-time safety than tRPC
Local CLI adapters unsandboxed Maximum runtime freedom You own the host security story
Atomic checkout via SQL Dead simple, no extra services Doesn't scale past a single Postgres
Skills as markdown Hot-swappable; runtime-agnostic Behavior depends on adapter discipline
Plugins out-of-process Crash isolation; multi-language Higher latency than in-proc

๐Ÿ”€ Where to deviate if your domain differs

  • If your "agents" are humans-in-the-loop, keep the same model โ€” add assignee_user_id, you already have it.
  • If you need multi-board governance, generalize decided_by_user_id to a poll-style record on approvals.
  • If costs aren't $/tokens, generalize cost_events to usage_events with provider-defined units. Keep the rollup shape.
  • If you need horizontal scale, the bottleneck is the heartbeat scheduler. Move it to a leader-elected job runner; everything else (REST, DB) already scales.

๐Ÿ’ก TL;DR for Building Your Own

  1. It's a control plane, not a framework. Three-method adapter contract. Don't pretend otherwise.
  2. Postgres schema is the architecture. Get companies / agents / issues / heartbeat_runs / cost_events / approvals / activity_log right and 80% of behavior falls out.
  3. The heartbeat is the kernel. Coalesce, timeout, persist runs, log activity.
  4. Atomic SQL UPDATE = your concurrency story.
  5. Hard budget ceilings, not soft ones.
  6. Tasks are the only communication channel between agents.
  7. REST + MCP + skills, in that order. Each is a thin layer over the previous.
  8. Plugins out-of-process, capability-gated.
  9. Every table, query, and index starts with company_id.
  10. Append-only audit log in the same transaction as every mutation.

Build those ten things and you have Paperclip. Everything else is polish.


๐Ÿ“š Sources


If you found this helpful, let me know by leaving a ๐Ÿ‘ or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! ๐Ÿ˜ƒ

Top comments (0)