Truong Phung

Posted on Apr 30

📎 Paperclip Deep Dive 🤖 — A Build Guide for an "AI Company" 🏢 Control Plane

#webdev #programming #ai #tutorial

Source: github.com/paperclipai/paperclip — "Open-source orchestration for zero-human companies."

This guide distills the architecture, principles, and engineering choices behind Paperclip into an actionable blueprint you can use to build a similar system. It is written so you can read it top-to-bottom and walk away with a concrete plan.

🤖 What Paperclip Actually Is
🧠 Core Mental Model: Control Plane, Not Framework
📐 The 10 Design Principles
🏗️ High-Level Architecture
🗃️ The Domain Model — How "A Company" Maps to Tables
💚 The Heartbeat — The Heart of the Runtime
🔌 Adapters — "Bring Your Own Agent"
✅ The Task System & Atomic Checkout
⚖️ Governance, Approvals & The Board
💰 Budgets & Cost Control
🧩 Plugin System — Capability-Gated Extensions
📡 MCP Server — Agents Talk to the API
🎓 Skills — Teaching Agents the API
⚙️ Tech Stack & Repository Layout
🌐 REST API Surface
🔒 Multi-Company Isolation & Portability
📋 Audit Trail & Activity Log
📏 Engineering Conventions
🗺️ Step-by-Step Build Plan
⚠️ Pitfalls, Tradeoffs & What To Skip First

🤖 1. What Paperclip Actually Is

Paperclip is a Node.js + React self-hosted application that lets you run a "company" of AI agents:

You define a company with goals/initiatives.
You hire agents (Claude Code, Codex, Cursor, custom CLI, HTTP bot — you pick the runtime).
You assign tasks (issues) and budgets.
A board operator (human) approves hires, strategic plans, and budget overrides.
A scheduler runs each agent on a heartbeat (a short execution window) and tracks cost, status, tool calls, and outputs.

The Paperclip slogan: "If OpenClaw is an employee, Paperclip is the company."

It looks like a task manager (Linear/Jira) but underneath it is an org chart, a budget engine, an approval queue, a multi-runtime executor, and an audit log — all designed for non-human workers.

🧠 2. Core Mental Model: Control Plane, Not Framework

This is the most important idea to internalize before building anything.

Agent Framework (LangGraph, CrewAI…)	Control Plane (Paperclip)
Decides how an agent thinks	Decides what an agent works on
Owns the prompt + tool loop	Treats the agent loop as a black box
One process, in-memory	Many processes, durable state
You ship code	You ship a deployment

Concrete consequences for design:

The system never runs a "react+plan+act" loop itself. That is the adapter's job.
The system does own: identity, scheduling, task ownership, cost ledger, approvals, audit, persistence.
The contract with an agent is shockingly small: "I can invoke you, get status, and cancel you."

If you start building a Paperclip-like system and find yourself writing prompt templates or tool-call parsers in the core, you have drifted into framework territory — pull back.

📐 3. The 10 Design Principles

Lifted (and de-jargoned) from the spec:

Unopinionated execution. The core does not care which model, prompt, or planner an agent uses. It launches a process and waits.
Task-centric communication. Agents do not talk to each other directly. Delegation = task creation. Coordination = task comments. Status = field updates. This makes everything observable and replayable.
Goal-traced work. Every task descends from a company initiative: Initiative → Project → Milestone → Issue → Sub-issue. No orphan work.
Atomic task ownership. A task can be owned by exactly one agent at a time, enforced at the database layer (not in app code).
Visible problem surfacing. Agents that get stuck must mark issues blocked and escalate. Silent retries are an anti-pattern.
Human board authority. Every irreversible or high-risk action (hiring, big-spend, strategy approval, termination) requires a human approval record.
Cost follows work. Costs are billed against the requesting task chain, not just the executing agent. This makes "who is expensive and why" answerable.
Hard budget ceilings. Soft alert at 80%. At 100%, the agent is auto-paused and further invocations are blocked. No "best-effort."
Progressive deployment. It must run on a laptop with embedded Postgres, then scale to self-hosted / cloud — same code, same schema.
Plugin-extensible, not fork-extensible. Capabilities the core doesn't ship come from out-of-process plugins with declared, gated capabilities.

When you design your system, keep this list visible and bounce every PR against it.

🏗️ 4. High-Level Architecture

                            ┌────────────────────────────┐
                            │       React UI (Vite)      │
                            │  Org chart · Tasks · Costs │
                            └──────────────┬─────────────┘
                                           │ REST + SSE
                                           ▼
┌──────────────────────────────────────────────────────────────────┐
│                    Node.js Server (TypeScript / Express)         │
│                                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐  │
│  │  REST API   │  │  Scheduler  │  │  Approvals  │  │ Plugins │  │
│  │ (handlers)  │  │ (heartbeat) │  │   engine    │  │  host   │  │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └────┬────┘  │
│         │                │                 │              │       │
│         └────────────────┼─────────────────┴──────────────┘       │
│                          ▼                                        │
│                 ┌──────────────────┐    ┌──────────────────┐      │
│                 │   Adapter Mgr    │───▶│   Agent runtime  │      │
│                 │ (claude_local,   │    │ (child process / │      │
│                 │  codex_local,    │    │  HTTP webhook)   │      │
│                 │  http, process)  │    └──────────────────┘      │
│                 └──────────────────┘                              │
└──────────────────────────┬───────────────────────────────────────┘
                           │
                           ▼
              ┌──────────────────────────┐
              │  PostgreSQL (or PGlite)  │
              │  companies · agents ·    │
              │  issues · heartbeats ·   │
              │  costs · approvals ·     │
              │  activity_log            │
              └──────────────────────────┘

      Sidecar (optional):
      ┌───────────────────────────┐
      │   MCP server (thin REST   │  ◀─── agents call here to read/write Paperclip
      │       wrapper)            │
      └───────────────────────────┘

The 12 subsystems the spec calls out — this is the checklist for "feature complete v1":

Identity & Access
Org Chart & Agents
Work & Task System
Heartbeat Execution
Workspaces & Runtime
Governance & Approvals
Budget & Cost Control
Routines & Schedules
Plugins
Secrets & Storage
Activity & Events
Company Portability (export/import)

🗃️ 5. The Domain Model

This is where most of the cleverness lives. The schema is small but every column matters.

🏢 Companies

companies(
  id uuid pk,
  name, description, status (active|paused|archived),
  pause_reason, paused_at,
  issue_prefix text not null,        -- e.g. "ACME"
  issue_counter int not null,        -- monotonic, used for ACME-123
  budget_monthly_cents int default 0,
  spent_monthly_cents int default 0,
  attachment_max_bytes,
  require_board_approval_for_new_agents bool
)

Why an issue_prefix + issue_counter? So tasks have human-friendly IDs (ACME-42) that are stable, sortable, and unique per company without leaking other tenants' counts.

🤖 Agents

agents(
  id, company_id, name, role, title, icon,
  status (active|paused|idle|running|error|pending_approval|terminated),
  reports_to uuid → agents.id null,            -- the org chart edge
  capabilities text,
  adapter_type text,                           -- claude_local | codex_local | http | ...
  adapter_config jsonb,                        -- adapter-specific
  runtime_config jsonb default {},             -- timeouts, cwd, env
  default_environment_id,
  context_mode (thin|fat) default thin,
  budget_monthly_cents int default 0,
  spent_monthly_cents int default 0
)

Why adapter_type + adapter_config (jsonb)? Lets you support N agent runtimes without N tables. The polymorphism lives in code (the adapter manager) and JSON, not in DDL.

📝 Issues (tasks)

issues(
  id, company_id, project_id, goal_id, parent_id,
  title, description,
  status (backlog|todo|in_progress|in_review|done|blocked|cancelled),
  priority (critical|high|medium|low),
  assignee_agent_id, assignee_user_id,

  -- Atomic checkout fields:
  checkout_run_id, execution_run_id,
  execution_agent_name_key, execution_locked_at,

  -- Provenance:
  created_by_agent_id, created_by_user_id,
  issue_number, identifier,                    -- e.g. ACME-42
  origin_kind, origin_id, origin_run_id, origin_fingerprint,
  request_depth int default 0,                 -- how deep the delegation chain is
  billing_code text                            -- "cost follows work"
)

💚 Heartbeat runs (one row per execution window)

heartbeat_runs(
  id, company_id, agent_id,
  invocation_source (scheduler|manual|callback),
  status (queued|running|succeeded|failed|cancelled|timed_out),
  started_at, finished_at, error,
  external_run_id text,                        -- adapter's run id, for resume
  context_snapshot jsonb                       -- what was passed in
)

💰 Cost events (the ledger)

cost_events(
  id, company_id, agent_id, issue_id, project_id, goal_id,
  billing_code,
  provider text, model text,
  input_tokens, output_tokens, cost_cents,
  occurred_at
)

⚖️ Approvals (governance queue)

approvals(
  id, company_id,
  type (hire_agent|approve_ceo_strategy|budget_override_required|request_board_approval),
  requested_by_agent_id, requested_by_user_id,
  status (pending|revision_requested|approved|rejected|cancelled),
  payload jsonb,                               -- the proposed change
  decision_note, decided_by_user_id, decided_at
)

📋 Activity log (the audit tape)

activity_log(
  id, company_id,
  actor_type (agent|user|system), actor_id,
  action text,                                 -- "issue.checked_out"
  entity_type, entity_id,
  details jsonb,
  created_at
)

🔍 Indexes that matter (don't skip)

agents(company_id, status)
agents(company_id, reports_to)                   -- org-chart traversal
issues(company_id, status)
issues(company_id, assignee_agent_id, status)    -- "what's on my plate"
issues(company_id, parent_id)                    -- subtasks
issues(company_id, project_id)
cost_events(company_id, occurred_at)
cost_events(company_id, agent_id, occurred_at)   -- per-agent rollups
heartbeat_runs(company_id, agent_id, started_at desc)
approvals(company_id, status, type)
activity_log(company_id, created_at desc)

Lesson: every index starts with company_id. Tenant isolation is a query-plan concern, not just an auth concern.

💚 6. The Heartbeat

The heartbeat is the runtime kernel. Everything else is plumbing around it.

🔄 Lifecycle of a single tick

1. Scheduler decides "agent A should run now"
       ↓
2. Insert heartbeat_runs row (status=queued)
       ↓
3. Adapter manager looks up agents.adapter_type
       ↓
4. Adapter.invoke(agentConfig, context):
        - Build prompt/context
        - Spawn child process OR fire HTTP webhook
        - Pass session_id from previous run if resumable
       ↓
5. Stream logs, status, tool calls back into the run row
       ↓
6. Wait until: exit | timeout | cancel
        - On timeout: send stop signal, wait graceSec, force-kill
       ↓
7. Persist: token usage, cost_events rows, output snippet, error
       ↓
8. Update heartbeat_runs (status=succeeded|failed|timed_out)
       ↓
9. Emit activity_log entry; broadcast SSE to UI

⚡ Wakeup triggers (only four)

Trigger	Meaning
`timer`	Cron-like — "every 5 minutes"
`assignment`	A new task was checked out to this agent
`on_demand`	Human or API pressed the "Run now" button
`automation`	System-internal trigger (future)

🔁 Coalescing

"If an agent is already running, new wakeups are merged (coalesced) instead of launching duplicate runs."

This rule alone prevents 90% of the duplicate-spend bugs you'd otherwise hit.

▶️ Session resumption

For adapters that support it (Claude CLI, Codex CLI), Paperclip stores the external_run_id / session ID in the heartbeat row. The next tick passes it back so the agent reloads its context. Operators can reset the session when context goes stale.

⚙️ Runtime config

runtime_config:
  cwd: /workspaces/acme-engineering
  timeoutSec: 1800        # max wall time per heartbeat
  graceSec: 30            # SIGTERM → SIGKILL window
  env:
    ANTHROPIC_API_KEY: ${secret:anthropic_key}
  promptTemplate: ...     # adapter-specific
  args: [...]

🛡️ Safety

"Local CLI adapters run unsandboxed on the host machine."

The spec is honest about this. Mitigations: per-agent OS user, restricted cwd, secrets managed by the host (not in prompts), and capability-gated plugins for anything the agent can't do directly.

🔌 7. Adapters — "Bring Your Own Agent"

The adapter is the only abstraction over agent runtimes. It is intentionally tiny.

interface Adapter {
  invoke(agentConfig: AgentConfig, context?: HeartbeatContext): Promise<RunHandle>;
  status(agentConfig: AgentConfig): Promise<AgentStatus>;
  cancel(agentConfig: AgentConfig): Promise<void>;
}

That's the whole contract. Three methods.

🔌 Built-in adapters

Adapter	Mechanism
`process`	Spawns an arbitrary CLI as a child process
`http`	POSTs to a webhook; agent lives wherever it lives
`claude_local`	Claude Code CLI, supports session resume
`codex_local`	OpenAI Codex CLI
`cursor`	Cursor headless mode
`gemini-local`, `pi_local`, `opencode-local`, `hermes_local`	Other local CLIs
`openclaw_gateway`	Calls a managed cloud service

🏆 Why this design wins

Adding an agent runtime is a self-contained PR. Drop a folder under packages/adapters/<name>/. No core changes.
Most adapters are 100–300 lines. They're mostly: spawn process, wire stdin/stdout, parse final JSON, report cost.
Polymorphism in JSON, not types. adapter_config jsonb lets each adapter define its own shape; the manager just passes it through.

📊 Integration levels (acceptable degrees of "support")

Level	What the adapter does
Minimum	Callable; reports exit code
Status	Reports success/failure/progress
Full	Reports cost, updates tasks, calls back into Paperclip API

You don't need full instrumentation on day one. A new adapter can land at "Minimum" and be useful.

✅ 8. Task System & Atomic Checkout

The task system is what stops two agents from doing the same work at the same time. It is the second-most-important runtime concept after the heartbeat.

🌲 Hierarchy

Initiative   (board-level direction, e.g. "Reach $1M ARR")
  └── Project          (e.g. "Self-serve checkout")
       └── Milestone   (e.g. "Public beta")
            └── Issue   (e.g. "Add Stripe webhook handler")
                 └── Sub-issue

Every task traces up to an initiative; no work is "for nothing."

🔐 Atomic checkout (the magic SQL)

// Request
POST /issues/:issueId/checkout
{ "agentId": "uuid", "expectedStatuses": ["todo","backlog","blocked","in_review"] }

Server-side:

UPDATE issues
SET assignee_agent_id = :agentId,
    status            = 'in_progress',
    started_at        = COALESCE(started_at, now())
WHERE id = :issueId
  AND status = ANY (:expectedStatuses)
  AND (assignee_agent_id IS NULL OR assignee_agent_id = :agentId);

If the row count is 0, return 409 Conflict with the current owner/status. Otherwise the row is locked to that agent.

This single update is the entire concurrency story. No queues, no Redis locks, no leases. The DB row is the lock.

🤝 Cross-team work & escalation rules

Any agent can create a task for any other agent (no permission walls — visibility is total).
The receiving agent must complete, block, or escalate. They cannot silently cancel a cross-team request.
Escalation goes up their own reports_to chain.

🏷️ Billing codes

When agent X delegates to agent Y, Y's cost_events are tagged with the billing code from X's task. Roll-ups answer "how much did Initiative #3 actually cost across the whole graph?"

🔄 State machine

backlog ─→ todo ─→ in_progress ─→ in_review ─→ done   (terminal)
   │         │           │
   │         └─→ blocked ←┘
   │         │
   └─→ cancelled (terminal)

Side effects:
  → in_progress  : sets started_at if null
  → done         : sets completed_at
  → cancelled    : sets cancelled_at

⚖️ 9. Governance, Approvals & The Board

The "board" is a single human operator (in v1). They have unrestricted authority — pause, resume, override, terminate.

📥 Approval queue

The approvals table is a generic mechanism. Four request types ship by default:

Type	Who proposes	What it gates
`hire_agent`	CEO agent (or any agent if company requires)	Creating a new agent
`approve_ceo_strategy`	CEO agent	Initial org/task plan
`budget_override_required`	Any agent	Spending past hard limit
`request_board_approval`	Any agent	Anything escalated to a human

Each approval carries a payload jsonb describing the proposed change. Approving an approval is what causes the change — the request isn't applied until decided.

🚀 The bootstrap sequence

This is what happens when a user starts a new company:

1. Human creates Company + Initiatives
2. Human writes initial top-level tasks
3. Human creates a "CEO" agent from a default template
4. CEO agent runs, proposes:
     - org structure (sub-agents to hire)
     - task breakdown
     - hiring approvals
5. Board reviews + approves
6. CEO begins delegating; the company is alive

🔑 Decision authority

Agents can propose anything. Agents can execute only on tasks they own. Anything else routes through approvals. This is the rule that prevents an agent from, say, "deciding" to spawn 50 sub-agents and bankrupting the company.

💰 10. Budgets & Cost Control

Cost is treated like rate-limiting: a soft warning, then a hard wall.

📊 Reporting levels

Level	Question it answers
Per-agent	"Is this agent expensive?"
Per-task	"Did this PR cost too much?"
Per-project	"What's our $ on Project X?"
Per-billing-code	"What did Initiative #3 cost end-to-end?"
Company-wide	"What did the company spend this month?"

🚧 Enforcement

Soft alert default threshold: 80%
At 100%:
  - Set agent status to paused
  - Block new checkout/invocation for that agent
  - Emit high-priority activity event

The "auto-pause" is the entire mechanism. There is no graceful degradation, no "let it finish the current task." It stops.

⚙️ Budget configuration

Periods: daily | weekly | monthly | rolling
Per-agent and per-company budgets are independent. Both must allow the run.
"Unlimited" is a setting; if you want it, you set it explicitly.

💳 Cost ingestion

Agents (or their adapter) POST to:

POST /companies/:companyId/cost-events
{ agentId, issueId, provider, model, input_tokens, output_tokens, cost_cents, billing_code, occurred_at }

The server enforces the company scope, denormalizes into rollups, and runs the budget check. Cost events are append-only — no edits, no deletes.

🧩 11. Plugin System

Plugins extend Paperclip without forking it. The architecture is two pieces:

Worker: Node.js process running the plugin's logic. Out-of-process by design.
UI: React components mounted at named "slots" in the host UI.

🛠️ Worker contract

import { definePlugin } from "@paperclipai/plugin-sdk";

export default definePlugin({
  async setup(ctx) {
    ctx.data.register("widget.summary", async (params) => { ... });
    ctx.actions.register("widget.run",  async (input) => { ... });
    ctx.tools.register("widget.search", schema, async (input) => { ... });
    ctx.events.on("issue.checked_out", async (e) => { ... });
    ctx.jobs.register("daily.rollup",  async () => { ... });
  },
  onConfigChanged(newConfig) {},
  onShutdown() {},
  onValidateConfig(config) {},
  onWebhook(input) {},
  onHealth() {},
});

🔐 Capability gating

Every API on ctx requires a declared capability in the plugin manifest:

companies.read, issues.read, issues.create,
events.subscribe, jobs.schedule,
agent.sessions.create, agents.invoke,
ui.sidebar.register, ui.detailTab.register, ...

The host enforces them at call time. A plugin without issues.create cannot create an issue, even if it tries.

🖼️ UI slots

Plugins mount React into named slots:

page, sidebar, sidebarPanel, settingsPage, dashboardWidget,
globalToolbarButton, detailTab, taskDetailView,
projectSidebarItem, toolbarButton, contextMenuItem,
commentAnnotation, commentContextMenuItem

The UI side gets typed React hooks:

usePluginData<T>(key, params?)        // fetch worker data
usePluginAction(key)                   // invoke worker action
usePluginStream<T>(channel)            // SSE
useHostContext()                       // { companyId, entityId, entityType }

🧱 Why out-of-process?

A crashing plugin doesn't take down the server.
Plugins can be in any language that can speak the IPC protocol.
Capability gating is enforceable at the IPC boundary, not just by trust.

📡 12. MCP Server

packages/mcp-server is a thin Model Context Protocol wrapper around the REST API. It exists so that any MCP-aware agent runtime (Claude Code, Cursor, etc.) can read and write Paperclip without bespoke integration code.

Configured with:

PAPERCLIP_API_URL
PAPERCLIP_API_KEY
PAPERCLIP_COMPANY_ID    (optional)
PAPERCLIP_AGENT_ID      (optional)
PAPERCLIP_RUN_ID        (optional)

Tool surface (representative)

Read: getMe, listAgents, listIssues, getIssue, listComments, listProjects, listGoals, listApprovals, ...

Write: createIssue, updateIssue, checkoutIssue, addComment, suggestTask, requestConfirmation, decideApproval, ...

Escape hatch: paperclipApiRequest({ path, method, body }) — restricted to /api paths and JSON bodies, lets agents reach endpoints with no dedicated tool yet.

Lesson: the MCP server has no business logic. It is a translation layer. Single source of truth = the REST API. This is why it can stay tiny.

🎓 13. Skills

A skill is a markdown file (plus optional examples) that teaches an agent how to use the Paperclip API. It is adapter-agnostic — Claude, Codex, custom, all read the same SKILL.md.

The bundled skills (under /skills) include:

paperclip — the master skill: task CRUD, status reporting, cost logging, comms rules.
paperclip-create-agent — how to propose hiring a new agent (writes to approvals).
paperclip-create-plugin — scaffolding a plugin.
paperclip-converting-plans-to-tasks — taking a CEO's plan into atomic issues.
paperclip-dev — meta-skill for editing Paperclip itself.
para-memory-files — managing persistent agent memory.

A skill is not code; it's prose + examples. The agent's runtime loads it as part of its system context. This means upgrading a skill upgrades every agent that uses it, no redeploy.

⚙️ 14. Tech Stack & Repo Layout

Concern	Choice
Backend	Node.js 20+, TypeScript, Express (REST only — no tRPC)
Frontend	React + Vite
DB	PostgreSQL; PGlite for local/dev, Supabase or Docker Postgres for prod
ORM	Drizzle (`drizzle.config.ts` in `packages/db`)
Auth	Better Auth
Tests	Vitest + Playwright
Package mgr	pnpm 9.15+ workspaces
License	MIT

Top-level layout

.agents/skills/      # Agent skill definitions
.claude/skills/      # Claude-specific skills
.github/             # CI, templates
cli/                 # `npx paperclipai onboard` etc.
docker/              # Compose + Dockerfiles
docs/                # Public docs site
doc/                 # Internal SPEC.md, SPEC-implementation.md
evals/               # Agent eval framework
packages/
  adapters/          # claude-local, codex-local, cursor-local, ...
  adapter-utils/     # shared adapter helpers
  db/                # Drizzle schema + migrations
  mcp-server/        # MCP wrapper
  plugins/
    sdk/             # @paperclipai/plugin-sdk
    create-paperclip-plugin/
    sandbox-providers/e2b/
  shared/            # types, utils
patches/             # pnpm patch files
releases/            # release artifacts
report/              # reporting tools
scripts/             # one-off ops scripts
server/              # the Node server
  src/
  scripts/
skills/              # the bundled skills
tests/               # cross-package tests
ui/                  # the React app

One-command onboarding

npx paperclipai onboard --yes
# or:
git clone https://github.com/paperclipai/paperclip.git && cd paperclip
pnpm install
pnpm dev

pnpm dev boots: server (with PGlite embedded), UI (Vite), and a watcher.

🌐 15. REST API Surface

The full v1 surface, grouped. Use this as the spec for your server.

🏢 Companies

GET    /companies
POST   /companies
GET    /companies/:companyId
PATCH  /companies/:companyId
PATCH  /companies/:companyId/branding
POST   /companies/:companyId/archive

🎯 Goals

GET    /companies/:companyId/goals
POST   /companies/:companyId/goals
GET    /goals/:goalId
PATCH  /goals/:goalId
DELETE /goals/:goalId

🤖 Agents

GET    /companies/:companyId/agents
POST   /companies/:companyId/agents
GET    /agents/:agentId
PATCH  /agents/:agentId
POST   /agents/:agentId/pause
POST   /agents/:agentId/resume
POST   /agents/:agentId/terminate
POST   /agents/:agentId/keys                  # mint API key for the agent
POST   /agents/:agentId/heartbeat/invoke      # manual on-demand wakeup

📝 Issues

GET    /companies/:companyId/issues
POST   /companies/:companyId/issues
GET    /issues/:issueId
PATCH  /issues/:issueId
POST   /issues/:issueId/checkout              # atomic
POST   /issues/:issueId/release
POST   /issues/:issueId/admin/force-release   # board-only
POST   /issues/:issueId/comments
GET    /issues/:issueId/comments
POST   /companies/:companyId/issues/:issueId/attachments
GET    /issues/:issueId/attachments

💰 Costs & budgets

POST   /companies/:companyId/cost-events
GET    /companies/:companyId/costs/summary
GET    /companies/:companyId/costs/by-agent
GET    /companies/:companyId/costs/by-project
PATCH  /companies/:companyId/budgets
PATCH  /agents/:agentId/budgets

⚖️ Approvals

GET    /companies/:companyId/approvals?status=pending
POST   /companies/:companyId/approvals
POST   /approvals/:approvalId/approve
POST   /approvals/:approvalId/reject

📊 Activity & dashboard

GET    /companies/:companyId/activity
GET    /companies/:companyId/dashboard

Design notes

Every write that mutates state writes one row to activity_log in the same transaction.

Authorization is one model: the API key resolves to an actor (user, agent, or system) and a company scope. The same handler serves UI requests and agent requests; only the actor type differs.

No RPC, no GraphQL. Plain REST keeps the MCP wrapper trivially thin.

🔒 16. Multi-Company Isolation & Portability

The deployment is single-tenant for the operator (you run your own server), but multi-company within the deployment (one Paperclip can host several orgs).

Isolation is enforced three ways:

Schema: every domain table has company_id and every index leads with it.
Authorization: the actor's API key carries a company scope; handlers reject mismatches.
Storage: secrets, attachments, plugin state are namespaced by company.

📦 Portability

Template export — schema only (org chart, roles, default tasks). Useful for "starter companies."
Snapshot export — full state including tasks, comments, costs. With secret scrubbing before serialization.
Imports are atomic; either the whole company appears or nothing does.

📋 17. Audit Trail & Activity Log

Every state mutation produces:

activity_log(
  actor_type ∈ {agent, user, system},
  actor_id,
  action       e.g. "issue.checked_out",
  entity_type, entity_id,
  details jsonb,
  created_at
)

Two consequences:

Replay — you can reconstruct any past state by walking the log.
Tool-call tracing — when an agent calls the MCP server, those calls become activity entries. "What did agent X actually do at 3:14am?" is a query, not an investigation.

📏 18. Engineering Conventions

These are guardrails worth copying verbatim:

Keep changes company-scoped. Every query, every cache key, every authorization check. No cross-tenant code paths exist.
Contracts must be in sync. The DB schema, the OpenAPI spec, the TypeScript types, and the MCP tool definitions are generated from one source. Drift is a bug.
Migrations are append-only. Never edit a migration after it has shipped. Use pnpm db:migrate to generate; never hand-write SQL into old files.
One PR = one logical change.
Each PR declares the model that wrote it. (Cute but useful telemetry.)
All tests pass before merge. CI green. Code-review tool score = 5/5.
Fail visibly. Agents that hit unexpected state mark tasks blocked; servers return errors; UIs show them. No silent fallbacks.
Read SPEC-implementation.md when in doubt. When SPEC.md and the implementation spec disagree, implementation wins for v1.

🗺️ 19. Step-by-Step Build Plan

If you are building a Paperclip-like system from scratch, do it in this order. Each step is shippable on its own.

🌱 Phase 0 — Skeleton (1-2 days)

pnpm monorepo with server/, ui/, packages/db, packages/shared.
Express server, Vite React app, Drizzle + PGlite for dev.
Health check endpoint, hello world UI.

🔐 Phase 1 — Companies & Auth

companies table.
Better Auth for human users.
API-key model: every key is (actor_type, actor_id, company_id).
Middleware that resolves the key into an Actor and rejects on company mismatch.

🏢 Phase 2 — Org Chart

agents table with reports_to.
CRUD endpoints + UI org-chart view.
Status field with transitions, but no runtime yet — agents are just data.

📝 Phase 3 — Tasks

issues + goals + projects tables with the full hierarchy.
Implement atomic checkout with the exact SQL above. Write a regression test that races 50 concurrent checkouts and asserts exactly one wins.
Kanban / list UI.

💚 Phase 4 — The Heartbeat (the moment your project becomes real)

heartbeat_runs table.
Adapter manager interface (3 methods: invoke, status, cancel).
Build one adapter first: process (just spawn a CLI you control). Don't start with Claude.
Scheduler:
- Cron loop for timer triggers.
- Hook on issue checkout → emit assignment wakeup.
- "Run now" button → on_demand.
Coalescing: if a run is already running for an agent, drop new wakeups, mark them as merged.
Timeouts + grace + force-kill.

💰 Phase 5 — Cost & Budgets

cost_events table.
Budget fields on companies and agents.
Ingestion endpoint with company-scope check.
On every cost insert: recompute spent / budget; if past 100%, pause agent + emit activity.
Dashboards: per-agent, per-task, per-project rollups (use the indexes you already built).

⚖️ Phase 6 — Approvals & Governance

approvals table; generic payload + type.
request_board_approval flow end-to-end.
"Hire agent" requires approval; approving the approval creates the agent row.
Board UI with a single "approvals" inbox.

📋 Phase 7 — Activity Log + SSE

Append activity_log in the same transaction as every mutation.
Server-sent events broadcast new activity to subscribed UIs.
"Recent activity" feed and per-entity history.

🔌 Phase 8 — More adapters

Wrap a real CLI (Claude Code or Codex). Reuse adapter-utils for stdio framing and JSON parsing.
Add http adapter for remote agents.
Now you can ship to early users.

📡 Phase 9 — MCP Server

Standalone package that calls your REST API.
One MCP tool per important endpoint, plus the escape-hatch apiRequest.
Test it with Claude Code locally.

🎓 Phase 10 — Skills

Pick the top 3 things agents do badly without guidance and write SKILL.mds for them.
Distribute via .agents/skills/ and tell adapters to load them into the system context.

🧩 Phase 11 — Plugins

Out-of-process worker SDK with definePlugin.
IPC: simplest is JSON over stdio with a request-id correlation.
Manifest with declared capabilities; host enforces at every IPC call.
UI slot system: a registry keyed by slot name, plugins mount React via iframe or shadow DOM.

📦 Phase 12 — Portability

POST /companies/:id/export → JSON snapshot, with a secret_scrub pass.
POST /companies/import → atomic, transactional.

✨ Phase 13 — Polish

One-command onboarding (npx <yourtool> onboard) that generates .env, runs migrations, opens browser.
Docker compose for "self-host on a box."
Telemetry (anonymous, opt-out).

⚠️ 20. Pitfalls and Tradeoffs

🚫 Things to not do, especially early

Don't build your own agent loop. The whole point is to be unopinionated. Wrap a CLI; ship.
Don't add tRPC / GraphQL. It makes the MCP wrapper non-trivial. Plain REST is the contract that survives.
Don't centralize prompts in the server. Prompts belong in adapters or skills. The core has zero opinion about model behavior.
Don't treat budgets as soft. "Best effort" budget enforcement is no enforcement. Build the auto-pause from day one.
Don't allow direct agent-to-agent calls. Force everything through tasks/comments. You'll thank yourself when debugging.
Don't put company_id on "most" tables. Put it on every table.
Don't sandbox plugins via trust. Out-of-process + capability manifest, or nothing.

⚖️ Honest tradeoffs Paperclip makes

Tradeoff	What you get	What you lose
Single human board operator (v1)	Simple authority model	No multi-stakeholder governance
REST + jsonb polymorphism	Easy to extend, MCP is trivial	Less compile-time safety than tRPC
Local CLI adapters unsandboxed	Maximum runtime freedom	You own the host security story
Atomic checkout via SQL	Dead simple, no extra services	Doesn't scale past a single Postgres
Skills as markdown	Hot-swappable; runtime-agnostic	Behavior depends on adapter discipline
Plugins out-of-process	Crash isolation; multi-language	Higher latency than in-proc

🔀 Where to deviate if your domain differs

If your "agents" are humans-in-the-loop, keep the same model — add assignee_user_id, you already have it.
If you need multi-board governance, generalize decided_by_user_id to a poll-style record on approvals.
If costs aren't $/tokens, generalize cost_events to usage_events with provider-defined units. Keep the rollup shape.
If you need horizontal scale, the bottleneck is the heartbeat scheduler. Move it to a leader-elected job runner; everything else (REST, DB) already scales.

💡 TL;DR for Building Your Own

It's a control plane, not a framework. Three-method adapter contract. Don't pretend otherwise.
Postgres schema is the architecture. Get companies / agents / issues / heartbeat_runs / cost_events / approvals / activity_log right and 80% of behavior falls out.
The heartbeat is the kernel. Coalesce, timeout, persist runs, log activity.
Atomic SQL UPDATE = your concurrency story.
Hard budget ceilings, not soft ones.
Tasks are the only communication channel between agents.
REST + MCP + skills, in that order. Each is a thin layer over the previous.
Plugins out-of-process, capability-gated.
Every table, query, and index starts with company_id.
Append-only audit log in the same transaction as every mutation.

Build those ten things and you have Paperclip. Everything else is polish.

📚 Sources

If you found this helpful, let me know by leaving a 👍 or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! 😃

Table of Contents

🤖 1. What Paperclip Actually Is

🧠 2. Core Mental Model: Control Plane, Not Framework

📐 3. The 10 Design Principles

🏗️ 4. High-Level Architecture

🗃️ 5. The Domain Model

🏢 Companies

🤖 Agents

📝 Issues (tasks)

💚 Heartbeat runs (one row per execution window)

💰 Cost events (the ledger)

⚖️ Approvals (governance queue)

📋 Activity log (the audit tape)

🔍 Indexes that matter (don't skip)

💚 6. The Heartbeat

🔄 Lifecycle of a single tick

⚡ Wakeup triggers (only four)

🔁 Coalescing

▶️ Session resumption

⚙️ Runtime config

🛡️ Safety

🔌 7. Adapters — "Bring Your Own Agent"

🔌 Built-in adapters

🏆 Why this design wins

📊 Integration levels (acceptable degrees of "support")

✅ 8. Task System & Atomic Checkout

🌲 Hierarchy

🔐 Atomic checkout (the magic SQL)

🤝 Cross-team work & escalation rules

🏷️ Billing codes

🔄 State machine

⚖️ 9. Governance, Approvals & The Board

📥 Approval queue

🚀 The bootstrap sequence

🔑 Decision authority

💰 10. Budgets & Cost Control

📊 Reporting levels

🚧 Enforcement

⚙️ Budget configuration

💳 Cost ingestion

🧩 11. Plugin System

🛠️ Worker contract

🔐 Capability gating

🖼️ UI slots

🧱 Why out-of-process?

📡 12. MCP Server

Tool surface (representative)

🎓 13. Skills

⚙️ 14. Tech Stack & Repo Layout

Top-level layout

One-command onboarding

🌐 15. REST API Surface

🏢 Companies

🎯 Goals

🤖 Agents

📝 Issues

💰 Costs & budgets

⚖️ Approvals

📊 Activity & dashboard

🔒 16. Multi-Company Isolation & Portability

📦 Portability

📋 17. Audit Trail & Activity Log

📏 18. Engineering Conventions

🗺️ 19. Step-by-Step Build Plan

🌱 Phase 0 — Skeleton (1-2 days)

🔐 Phase 1 — Companies & Auth

🏢 Phase 2 — Org Chart

📝 Phase 3 — Tasks

💚 Phase 4 — The Heartbeat (the moment your project becomes real)

💰 Phase 5 — Cost & Budgets

⚖️ Phase 6 — Approvals & Governance

📋 Phase 7 — Activity Log + SSE

🔌 Phase 8 — More adapters

📡 Phase 9 — MCP Server

🎓 Phase 10 — Skills

🧩 Phase 11 — Plugins

📦 Phase 12 — Portability

✨ Phase 13 — Polish

⚠️ 20. Pitfalls and Tradeoffs

🚫 Things to not do, especially early