Source: github.com/paperclipai/paperclip โ "Open-source orchestration for zero-human companies."
This guide distills the architecture, principles, and engineering choices behind Paperclip into an actionable blueprint you can use to build a similar system. It is written so you can read it top-to-bottom and walk away with a concrete plan.
Table of Contents
- ๐ค What Paperclip Actually Is
- ๐ง Core Mental Model: Control Plane, Not Framework
- ๐ The 10 Design Principles
- ๐๏ธ High-Level Architecture
- ๐๏ธ The Domain Model โ How "A Company" Maps to Tables
- ๐ The Heartbeat โ The Heart of the Runtime
- ๐ Adapters โ "Bring Your Own Agent"
- โ The Task System & Atomic Checkout
- โ๏ธ Governance, Approvals & The Board
- ๐ฐ Budgets & Cost Control
- ๐งฉ Plugin System โ Capability-Gated Extensions
- ๐ก MCP Server โ Agents Talk to the API
- ๐ Skills โ Teaching Agents the API
- โ๏ธ Tech Stack & Repository Layout
- ๐ REST API Surface
- ๐ Multi-Company Isolation & Portability
- ๐ Audit Trail & Activity Log
- ๐ Engineering Conventions
- ๐บ๏ธ Step-by-Step Build Plan
- โ ๏ธ Pitfalls, Tradeoffs & What To Skip First
๐ค 1. What Paperclip Actually Is
Paperclip is a Node.js + React self-hosted application that lets you run a "company" of AI agents:
- You define a company with goals/initiatives.
- You hire agents (Claude Code, Codex, Cursor, custom CLI, HTTP bot โ you pick the runtime).
- You assign tasks (issues) and budgets.
- A board operator (human) approves hires, strategic plans, and budget overrides.
- A scheduler runs each agent on a heartbeat (a short execution window) and tracks cost, status, tool calls, and outputs.
The Paperclip slogan: "If OpenClaw is an employee, Paperclip is the company."
It looks like a task manager (Linear/Jira) but underneath it is an org chart, a budget engine, an approval queue, a multi-runtime executor, and an audit log โ all designed for non-human workers.
๐ง 2. Core Mental Model: Control Plane, Not Framework
This is the most important idea to internalize before building anything.
| Agent Framework (LangGraph, CrewAIโฆ) | Control Plane (Paperclip) |
|---|---|
| Decides how an agent thinks | Decides what an agent works on |
| Owns the prompt + tool loop | Treats the agent loop as a black box |
| One process, in-memory | Many processes, durable state |
| You ship code | You ship a deployment |
Concrete consequences for design:
- The system never runs a "react+plan+act" loop itself. That is the adapter's job.
- The system does own: identity, scheduling, task ownership, cost ledger, approvals, audit, persistence.
- The contract with an agent is shockingly small: "I can invoke you, get status, and cancel you."
If you start building a Paperclip-like system and find yourself writing prompt templates or tool-call parsers in the core, you have drifted into framework territory โ pull back.
๐ 3. The 10 Design Principles
Lifted (and de-jargoned) from the spec:
- Unopinionated execution. The core does not care which model, prompt, or planner an agent uses. It launches a process and waits.
- Task-centric communication. Agents do not talk to each other directly. Delegation = task creation. Coordination = task comments. Status = field updates. This makes everything observable and replayable.
-
Goal-traced work. Every task descends from a company initiative:
Initiative โ Project โ Milestone โ Issue โ Sub-issue. No orphan work. - Atomic task ownership. A task can be owned by exactly one agent at a time, enforced at the database layer (not in app code).
-
Visible problem surfacing. Agents that get stuck must mark issues
blockedand escalate. Silent retries are an anti-pattern. - Human board authority. Every irreversible or high-risk action (hiring, big-spend, strategy approval, termination) requires a human approval record.
- Cost follows work. Costs are billed against the requesting task chain, not just the executing agent. This makes "who is expensive and why" answerable.
- Hard budget ceilings. Soft alert at 80%. At 100%, the agent is auto-paused and further invocations are blocked. No "best-effort."
- Progressive deployment. It must run on a laptop with embedded Postgres, then scale to self-hosted / cloud โ same code, same schema.
- Plugin-extensible, not fork-extensible. Capabilities the core doesn't ship come from out-of-process plugins with declared, gated capabilities.
When you design your system, keep this list visible and bounce every PR against it.
๐๏ธ 4. High-Level Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ React UI (Vite) โ
โ Org chart ยท Tasks ยท Costs โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ REST + SSE
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Node.js Server (TypeScript / Express) โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ REST API โ โ Scheduler โ โ Approvals โ โ Plugins โ โ
โ โ (handlers) โ โ (heartbeat) โ โ engine โ โ host โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโฌโโโโโ โ
โ โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ Adapter Mgr โโโโโถโ Agent runtime โ โ
โ โ (claude_local, โ โ (child process / โ โ
โ โ codex_local, โ โ HTTP webhook) โ โ
โ โ http, process) โ โโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PostgreSQL (or PGlite) โ
โ companies ยท agents ยท โ
โ issues ยท heartbeats ยท โ
โ costs ยท approvals ยท โ
โ activity_log โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Sidecar (optional):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP server (thin REST โ โโโโ agents call here to read/write Paperclip
โ wrapper) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The 12 subsystems the spec calls out โ this is the checklist for "feature complete v1":
- Identity & Access
- Org Chart & Agents
- Work & Task System
- Heartbeat Execution
- Workspaces & Runtime
- Governance & Approvals
- Budget & Cost Control
- Routines & Schedules
- Plugins
- Secrets & Storage
- Activity & Events
- Company Portability (export/import)
๐๏ธ 5. The Domain Model
This is where most of the cleverness lives. The schema is small but every column matters.
๐ข Companies
companies(
id uuid pk,
name, description, status (active|paused|archived),
pause_reason, paused_at,
issue_prefix text not null, -- e.g. "ACME"
issue_counter int not null, -- monotonic, used for ACME-123
budget_monthly_cents int default 0,
spent_monthly_cents int default 0,
attachment_max_bytes,
require_board_approval_for_new_agents bool
)
Why an
issue_prefix+issue_counter? So tasks have human-friendly IDs (ACME-42) that are stable, sortable, and unique per company without leaking other tenants' counts.
๐ค Agents
agents(
id, company_id, name, role, title, icon,
status (active|paused|idle|running|error|pending_approval|terminated),
reports_to uuid โ agents.id null, -- the org chart edge
capabilities text,
adapter_type text, -- claude_local | codex_local | http | ...
adapter_config jsonb, -- adapter-specific
runtime_config jsonb default {}, -- timeouts, cwd, env
default_environment_id,
context_mode (thin|fat) default thin,
budget_monthly_cents int default 0,
spent_monthly_cents int default 0
)
Why
adapter_type+adapter_config(jsonb)? Lets you support N agent runtimes without N tables. The polymorphism lives in code (the adapter manager) and JSON, not in DDL.
๐ Issues (tasks)
issues(
id, company_id, project_id, goal_id, parent_id,
title, description,
status (backlog|todo|in_progress|in_review|done|blocked|cancelled),
priority (critical|high|medium|low),
assignee_agent_id, assignee_user_id,
-- Atomic checkout fields:
checkout_run_id, execution_run_id,
execution_agent_name_key, execution_locked_at,
-- Provenance:
created_by_agent_id, created_by_user_id,
issue_number, identifier, -- e.g. ACME-42
origin_kind, origin_id, origin_run_id, origin_fingerprint,
request_depth int default 0, -- how deep the delegation chain is
billing_code text -- "cost follows work"
)
๐ Heartbeat runs (one row per execution window)
heartbeat_runs(
id, company_id, agent_id,
invocation_source (scheduler|manual|callback),
status (queued|running|succeeded|failed|cancelled|timed_out),
started_at, finished_at, error,
external_run_id text, -- adapter's run id, for resume
context_snapshot jsonb -- what was passed in
)
๐ฐ Cost events (the ledger)
cost_events(
id, company_id, agent_id, issue_id, project_id, goal_id,
billing_code,
provider text, model text,
input_tokens, output_tokens, cost_cents,
occurred_at
)
โ๏ธ Approvals (governance queue)
approvals(
id, company_id,
type (hire_agent|approve_ceo_strategy|budget_override_required|request_board_approval),
requested_by_agent_id, requested_by_user_id,
status (pending|revision_requested|approved|rejected|cancelled),
payload jsonb, -- the proposed change
decision_note, decided_by_user_id, decided_at
)
๐ Activity log (the audit tape)
activity_log(
id, company_id,
actor_type (agent|user|system), actor_id,
action text, -- "issue.checked_out"
entity_type, entity_id,
details jsonb,
created_at
)
๐ Indexes that matter (don't skip)
agents(company_id, status)
agents(company_id, reports_to) -- org-chart traversal
issues(company_id, status)
issues(company_id, assignee_agent_id, status) -- "what's on my plate"
issues(company_id, parent_id) -- subtasks
issues(company_id, project_id)
cost_events(company_id, occurred_at)
cost_events(company_id, agent_id, occurred_at) -- per-agent rollups
heartbeat_runs(company_id, agent_id, started_at desc)
approvals(company_id, status, type)
activity_log(company_id, created_at desc)
Lesson: every index starts with
company_id. Tenant isolation is a query-plan concern, not just an auth concern.
๐ 6. The Heartbeat
The heartbeat is the runtime kernel. Everything else is plumbing around it.
๐ Lifecycle of a single tick
1. Scheduler decides "agent A should run now"
โ
2. Insert heartbeat_runs row (status=queued)
โ
3. Adapter manager looks up agents.adapter_type
โ
4. Adapter.invoke(agentConfig, context):
- Build prompt/context
- Spawn child process OR fire HTTP webhook
- Pass session_id from previous run if resumable
โ
5. Stream logs, status, tool calls back into the run row
โ
6. Wait until: exit | timeout | cancel
- On timeout: send stop signal, wait graceSec, force-kill
โ
7. Persist: token usage, cost_events rows, output snippet, error
โ
8. Update heartbeat_runs (status=succeeded|failed|timed_out)
โ
9. Emit activity_log entry; broadcast SSE to UI
โก Wakeup triggers (only four)
| Trigger | Meaning |
|---|---|
timer |
Cron-like โ "every 5 minutes" |
assignment |
A new task was checked out to this agent |
on_demand |
Human or API pressed the "Run now" button |
automation |
System-internal trigger (future) |
๐ Coalescing
"If an agent is already running, new wakeups are merged (coalesced) instead of launching duplicate runs."
This rule alone prevents 90% of the duplicate-spend bugs you'd otherwise hit.
โถ๏ธ Session resumption
For adapters that support it (Claude CLI, Codex CLI), Paperclip stores the external_run_id / session ID in the heartbeat row. The next tick passes it back so the agent reloads its context. Operators can reset the session when context goes stale.
โ๏ธ Runtime config
runtime_config:
cwd: /workspaces/acme-engineering
timeoutSec: 1800 # max wall time per heartbeat
graceSec: 30 # SIGTERM โ SIGKILL window
env:
ANTHROPIC_API_KEY: ${secret:anthropic_key}
promptTemplate: ... # adapter-specific
args: [...]
๐ก๏ธ Safety
"Local CLI adapters run unsandboxed on the host machine."
The spec is honest about this. Mitigations: per-agent OS user, restricted cwd, secrets managed by the host (not in prompts), and capability-gated plugins for anything the agent can't do directly.
๐ 7. Adapters โ "Bring Your Own Agent"
The adapter is the only abstraction over agent runtimes. It is intentionally tiny.
interface Adapter {
invoke(agentConfig: AgentConfig, context?: HeartbeatContext): Promise<RunHandle>;
status(agentConfig: AgentConfig): Promise<AgentStatus>;
cancel(agentConfig: AgentConfig): Promise<void>;
}
That's the whole contract. Three methods.
๐ Built-in adapters
| Adapter | Mechanism |
|---|---|
process |
Spawns an arbitrary CLI as a child process |
http |
POSTs to a webhook; agent lives wherever it lives |
claude_local |
Claude Code CLI, supports session resume |
codex_local |
OpenAI Codex CLI |
cursor |
Cursor headless mode |
gemini-local, pi_local, opencode-local, hermes_local
|
Other local CLIs |
openclaw_gateway |
Calls a managed cloud service |
๐ Why this design wins
-
Adding an agent runtime is a self-contained PR. Drop a folder under
packages/adapters/<name>/. No core changes. - Most adapters are 100โ300 lines. They're mostly: spawn process, wire stdin/stdout, parse final JSON, report cost.
-
Polymorphism in JSON, not types.
adapter_config jsonblets each adapter define its own shape; the manager just passes it through.
๐ Integration levels (acceptable degrees of "support")
| Level | What the adapter does |
|---|---|
| Minimum | Callable; reports exit code |
| Status | Reports success/failure/progress |
| Full | Reports cost, updates tasks, calls back into Paperclip API |
You don't need full instrumentation on day one. A new adapter can land at "Minimum" and be useful.
โ 8. Task System & Atomic Checkout
The task system is what stops two agents from doing the same work at the same time. It is the second-most-important runtime concept after the heartbeat.
๐ฒ Hierarchy
Initiative (board-level direction, e.g. "Reach $1M ARR")
โโโ Project (e.g. "Self-serve checkout")
โโโ Milestone (e.g. "Public beta")
โโโ Issue (e.g. "Add Stripe webhook handler")
โโโ Sub-issue
Every task traces up to an initiative; no work is "for nothing."
๐ Atomic checkout (the magic SQL)
// Request
POST /issues/:issueId/checkout
{ "agentId": "uuid", "expectedStatuses": ["todo","backlog","blocked","in_review"] }
Server-side:
UPDATE issues
SET assignee_agent_id = :agentId,
status = 'in_progress',
started_at = COALESCE(started_at, now())
WHERE id = :issueId
AND status = ANY (:expectedStatuses)
AND (assignee_agent_id IS NULL OR assignee_agent_id = :agentId);
If the row count is 0, return 409 Conflict with the current owner/status. Otherwise the row is locked to that agent.
This single update is the entire concurrency story. No queues, no Redis locks, no leases. The DB row is the lock.
๐ค Cross-team work & escalation rules
- Any agent can create a task for any other agent (no permission walls โ visibility is total).
- The receiving agent must complete, block, or escalate. They cannot silently cancel a cross-team request.
- Escalation goes up their own
reports_tochain.
๐ท๏ธ Billing codes
When agent X delegates to agent Y, Y's cost_events are tagged with the billing code from X's task. Roll-ups answer "how much did Initiative #3 actually cost across the whole graph?"
๐ State machine
backlog โโ todo โโ in_progress โโ in_review โโ done (terminal)
โ โ โ
โ โโโ blocked โโ
โ โ
โโโ cancelled (terminal)
Side effects:
โ in_progress : sets started_at if null
โ done : sets completed_at
โ cancelled : sets cancelled_at
โ๏ธ 9. Governance, Approvals & The Board
The "board" is a single human operator (in v1). They have unrestricted authority โ pause, resume, override, terminate.
๐ฅ Approval queue
The approvals table is a generic mechanism. Four request types ship by default:
| Type | Who proposes | What it gates |
|---|---|---|
hire_agent |
CEO agent (or any agent if company requires) | Creating a new agent |
approve_ceo_strategy |
CEO agent | Initial org/task plan |
budget_override_required |
Any agent | Spending past hard limit |
request_board_approval |
Any agent | Anything escalated to a human |
Each approval carries a payload jsonb describing the proposed change. Approving an approval is what causes the change โ the request isn't applied until decided.
๐ The bootstrap sequence
This is what happens when a user starts a new company:
1. Human creates Company + Initiatives
2. Human writes initial top-level tasks
3. Human creates a "CEO" agent from a default template
4. CEO agent runs, proposes:
- org structure (sub-agents to hire)
- task breakdown
- hiring approvals
5. Board reviews + approves
6. CEO begins delegating; the company is alive
๐ Decision authority
Agents can propose anything. Agents can execute only on tasks they own. Anything else routes through approvals. This is the rule that prevents an agent from, say, "deciding" to spawn 50 sub-agents and bankrupting the company.
๐ฐ 10. Budgets & Cost Control
Cost is treated like rate-limiting: a soft warning, then a hard wall.
๐ Reporting levels
| Level | Question it answers |
|---|---|
| Per-agent | "Is this agent expensive?" |
| Per-task | "Did this PR cost too much?" |
| Per-project | "What's our $ on Project X?" |
| Per-billing-code | "What did Initiative #3 cost end-to-end?" |
| Company-wide | "What did the company spend this month?" |
๐ง Enforcement
Soft alert default threshold: 80%
At 100%:
- Set agent status to paused
- Block new checkout/invocation for that agent
- Emit high-priority activity event
The "auto-pause" is the entire mechanism. There is no graceful degradation, no "let it finish the current task." It stops.
โ๏ธ Budget configuration
- Periods:
daily | weekly | monthly | rolling - Per-agent and per-company budgets are independent. Both must allow the run.
- "Unlimited" is a setting; if you want it, you set it explicitly.
๐ณ Cost ingestion
Agents (or their adapter) POST to:
POST /companies/:companyId/cost-events
{ agentId, issueId, provider, model, input_tokens, output_tokens, cost_cents, billing_code, occurred_at }
The server enforces the company scope, denormalizes into rollups, and runs the budget check. Cost events are append-only โ no edits, no deletes.
๐งฉ 11. Plugin System
Plugins extend Paperclip without forking it. The architecture is two pieces:
- Worker: Node.js process running the plugin's logic. Out-of-process by design.
- UI: React components mounted at named "slots" in the host UI.
๐ ๏ธ Worker contract
import { definePlugin } from "@paperclipai/plugin-sdk";
export default definePlugin({
async setup(ctx) {
ctx.data.register("widget.summary", async (params) => { ... });
ctx.actions.register("widget.run", async (input) => { ... });
ctx.tools.register("widget.search", schema, async (input) => { ... });
ctx.events.on("issue.checked_out", async (e) => { ... });
ctx.jobs.register("daily.rollup", async () => { ... });
},
onConfigChanged(newConfig) {},
onShutdown() {},
onValidateConfig(config) {},
onWebhook(input) {},
onHealth() {},
});
๐ Capability gating
Every API on ctx requires a declared capability in the plugin manifest:
companies.read, issues.read, issues.create,
events.subscribe, jobs.schedule,
agent.sessions.create, agents.invoke,
ui.sidebar.register, ui.detailTab.register, ...
The host enforces them at call time. A plugin without issues.create cannot create an issue, even if it tries.
๐ผ๏ธ UI slots
Plugins mount React into named slots:
page, sidebar, sidebarPanel, settingsPage, dashboardWidget,
globalToolbarButton, detailTab, taskDetailView,
projectSidebarItem, toolbarButton, contextMenuItem,
commentAnnotation, commentContextMenuItem
The UI side gets typed React hooks:
usePluginData<T>(key, params?) // fetch worker data
usePluginAction(key) // invoke worker action
usePluginStream<T>(channel) // SSE
useHostContext() // { companyId, entityId, entityType }
๐งฑ Why out-of-process?
- A crashing plugin doesn't take down the server.
- Plugins can be in any language that can speak the IPC protocol.
- Capability gating is enforceable at the IPC boundary, not just by trust.
๐ก 12. MCP Server
packages/mcp-server is a thin Model Context Protocol wrapper around the REST API. It exists so that any MCP-aware agent runtime (Claude Code, Cursor, etc.) can read and write Paperclip without bespoke integration code.
Configured with:
PAPERCLIP_API_URL
PAPERCLIP_API_KEY
PAPERCLIP_COMPANY_ID (optional)
PAPERCLIP_AGENT_ID (optional)
PAPERCLIP_RUN_ID (optional)
Tool surface (representative)
Read: getMe, listAgents, listIssues, getIssue, listComments, listProjects, listGoals, listApprovals, ...
Write: createIssue, updateIssue, checkoutIssue, addComment, suggestTask, requestConfirmation, decideApproval, ...
Escape hatch: paperclipApiRequest({ path, method, body }) โ restricted to /api paths and JSON bodies, lets agents reach endpoints with no dedicated tool yet.
Lesson: the MCP server has no business logic. It is a translation layer. Single source of truth = the REST API. This is why it can stay tiny.
๐ 13. Skills
A skill is a markdown file (plus optional examples) that teaches an agent how to use the Paperclip API. It is adapter-agnostic โ Claude, Codex, custom, all read the same SKILL.md.
The bundled skills (under /skills) include:
-
paperclipโ the master skill: task CRUD, status reporting, cost logging, comms rules. -
paperclip-create-agentโ how to propose hiring a new agent (writes toapprovals). -
paperclip-create-pluginโ scaffolding a plugin. -
paperclip-converting-plans-to-tasksโ taking a CEO's plan into atomic issues. -
paperclip-devโ meta-skill for editing Paperclip itself. -
para-memory-filesโ managing persistent agent memory.
A skill is not code; it's prose + examples. The agent's runtime loads it as part of its system context. This means upgrading a skill upgrades every agent that uses it, no redeploy.
โ๏ธ 14. Tech Stack & Repo Layout
| Concern | Choice |
|---|---|
| Backend | Node.js 20+, TypeScript, Express (REST only โ no tRPC) |
| Frontend | React + Vite |
| DB | PostgreSQL; PGlite for local/dev, Supabase or Docker Postgres for prod |
| ORM |
Drizzle (drizzle.config.ts in packages/db) |
| Auth | Better Auth |
| Tests | Vitest + Playwright |
| Package mgr | pnpm 9.15+ workspaces |
| License | MIT |
Top-level layout
.agents/skills/ # Agent skill definitions
.claude/skills/ # Claude-specific skills
.github/ # CI, templates
cli/ # `npx paperclipai onboard` etc.
docker/ # Compose + Dockerfiles
docs/ # Public docs site
doc/ # Internal SPEC.md, SPEC-implementation.md
evals/ # Agent eval framework
packages/
adapters/ # claude-local, codex-local, cursor-local, ...
adapter-utils/ # shared adapter helpers
db/ # Drizzle schema + migrations
mcp-server/ # MCP wrapper
plugins/
sdk/ # @paperclipai/plugin-sdk
create-paperclip-plugin/
sandbox-providers/e2b/
shared/ # types, utils
patches/ # pnpm patch files
releases/ # release artifacts
report/ # reporting tools
scripts/ # one-off ops scripts
server/ # the Node server
src/
scripts/
skills/ # the bundled skills
tests/ # cross-package tests
ui/ # the React app
One-command onboarding
npx paperclipai onboard --yes
# or:
git clone https://github.com/paperclipai/paperclip.git && cd paperclip
pnpm install
pnpm dev
pnpm dev boots: server (with PGlite embedded), UI (Vite), and a watcher.
๐ 15. REST API Surface
The full v1 surface, grouped. Use this as the spec for your server.
๐ข Companies
GET /companies
POST /companies
GET /companies/:companyId
PATCH /companies/:companyId
PATCH /companies/:companyId/branding
POST /companies/:companyId/archive
๐ฏ Goals
GET /companies/:companyId/goals
POST /companies/:companyId/goals
GET /goals/:goalId
PATCH /goals/:goalId
DELETE /goals/:goalId
๐ค Agents
GET /companies/:companyId/agents
POST /companies/:companyId/agents
GET /agents/:agentId
PATCH /agents/:agentId
POST /agents/:agentId/pause
POST /agents/:agentId/resume
POST /agents/:agentId/terminate
POST /agents/:agentId/keys # mint API key for the agent
POST /agents/:agentId/heartbeat/invoke # manual on-demand wakeup
๐ Issues
GET /companies/:companyId/issues
POST /companies/:companyId/issues
GET /issues/:issueId
PATCH /issues/:issueId
POST /issues/:issueId/checkout # atomic
POST /issues/:issueId/release
POST /issues/:issueId/admin/force-release # board-only
POST /issues/:issueId/comments
GET /issues/:issueId/comments
POST /companies/:companyId/issues/:issueId/attachments
GET /issues/:issueId/attachments
๐ฐ Costs & budgets
POST /companies/:companyId/cost-events
GET /companies/:companyId/costs/summary
GET /companies/:companyId/costs/by-agent
GET /companies/:companyId/costs/by-project
PATCH /companies/:companyId/budgets
PATCH /agents/:agentId/budgets
โ๏ธ Approvals
GET /companies/:companyId/approvals?status=pending
POST /companies/:companyId/approvals
POST /approvals/:approvalId/approve
POST /approvals/:approvalId/reject
๐ Activity & dashboard
GET /companies/:companyId/activity
GET /companies/:companyId/dashboard
Design notes
- Every write that mutates state writes one row to
activity_login the same transaction.- Authorization is one model: the API key resolves to an actor (user, agent, or system) and a company scope. The same handler serves UI requests and agent requests; only the actor type differs.
- No RPC, no GraphQL. Plain REST keeps the MCP wrapper trivially thin.
๐ 16. Multi-Company Isolation & Portability
The deployment is single-tenant for the operator (you run your own server), but multi-company within the deployment (one Paperclip can host several orgs).
Isolation is enforced three ways:
-
Schema: every domain table has
company_idand every index leads with it. - Authorization: the actor's API key carries a company scope; handlers reject mismatches.
- Storage: secrets, attachments, plugin state are namespaced by company.
๐ฆ Portability
- Template export โ schema only (org chart, roles, default tasks). Useful for "starter companies."
- Snapshot export โ full state including tasks, comments, costs. With secret scrubbing before serialization.
- Imports are atomic; either the whole company appears or nothing does.
๐ 17. Audit Trail & Activity Log
Every state mutation produces:
activity_log(
actor_type โ {agent, user, system},
actor_id,
action e.g. "issue.checked_out",
entity_type, entity_id,
details jsonb,
created_at
)
Two consequences:
- Replay โ you can reconstruct any past state by walking the log.
- Tool-call tracing โ when an agent calls the MCP server, those calls become activity entries. "What did agent X actually do at 3:14am?" is a query, not an investigation.
๐ 18. Engineering Conventions
These are guardrails worth copying verbatim:
- Keep changes company-scoped. Every query, every cache key, every authorization check. No cross-tenant code paths exist.
- Contracts must be in sync. The DB schema, the OpenAPI spec, the TypeScript types, and the MCP tool definitions are generated from one source. Drift is a bug.
-
Migrations are append-only. Never edit a migration after it has shipped. Use
pnpm db:migrateto generate; never hand-write SQL into old files. - One PR = one logical change.
- Each PR declares the model that wrote it. (Cute but useful telemetry.)
- All tests pass before merge. CI green. Code-review tool score = 5/5.
-
Fail visibly. Agents that hit unexpected state mark tasks
blocked; servers return errors; UIs show them. No silent fallbacks. -
Read SPEC-implementation.md when in doubt. When
SPEC.mdand the implementation spec disagree, implementation wins for v1.
๐บ๏ธ 19. Step-by-Step Build Plan
If you are building a Paperclip-like system from scratch, do it in this order. Each step is shippable on its own.
๐ฑ Phase 0 โ Skeleton (1-2 days)
- pnpm monorepo with
server/,ui/,packages/db,packages/shared. - Express server, Vite React app, Drizzle + PGlite for dev.
- Health check endpoint, hello world UI.
๐ Phase 1 โ Companies & Auth
-
companiestable. - Better Auth for human users.
- API-key model: every key is
(actor_type, actor_id, company_id). - Middleware that resolves the key into an
Actorand rejects on company mismatch.
๐ข Phase 2 โ Org Chart
-
agentstable withreports_to. - CRUD endpoints + UI org-chart view.
- Status field with transitions, but no runtime yet โ agents are just data.
๐ Phase 3 โ Tasks
-
issues+goals+projectstables with the full hierarchy. - Implement atomic checkout with the exact SQL above. Write a regression test that races 50 concurrent checkouts and asserts exactly one wins.
- Kanban / list UI.
๐ Phase 4 โ The Heartbeat (the moment your project becomes real)
-
heartbeat_runstable. - Adapter manager interface (3 methods:
invoke,status,cancel). - Build one adapter first:
process(just spawn a CLI you control). Don't start with Claude. - Scheduler:
- Cron loop for
timertriggers. - Hook on issue checkout โ emit
assignmentwakeup. - "Run now" button โ
on_demand.
- Cron loop for
- Coalescing: if a run is already
runningfor an agent, drop new wakeups, mark them as merged. - Timeouts + grace + force-kill.
๐ฐ Phase 5 โ Cost & Budgets
-
cost_eventstable. - Budget fields on
companiesandagents. - Ingestion endpoint with company-scope check.
- On every cost insert: recompute spent / budget; if past 100%, pause agent + emit activity.
- Dashboards: per-agent, per-task, per-project rollups (use the indexes you already built).
โ๏ธ Phase 6 โ Approvals & Governance
-
approvalstable; generic payload + type. -
request_board_approvalflow end-to-end. - "Hire agent" requires approval; approving the approval creates the agent row.
- Board UI with a single "approvals" inbox.
๐ Phase 7 โ Activity Log + SSE
- Append
activity_login the same transaction as every mutation. - Server-sent events broadcast new activity to subscribed UIs.
- "Recent activity" feed and per-entity history.
๐ Phase 8 โ More adapters
- Wrap a real CLI (Claude Code or Codex). Reuse
adapter-utilsfor stdio framing and JSON parsing. - Add
httpadapter for remote agents. - Now you can ship to early users.
๐ก Phase 9 โ MCP Server
- Standalone package that calls your REST API.
- One MCP tool per important endpoint, plus the escape-hatch
apiRequest. - Test it with Claude Code locally.
๐ Phase 10 โ Skills
- Pick the top 3 things agents do badly without guidance and write
SKILL.mds for them. - Distribute via
.agents/skills/and tell adapters to load them into the system context.
๐งฉ Phase 11 โ Plugins
- Out-of-process worker SDK with
definePlugin. - IPC: simplest is JSON over stdio with a request-id correlation.
- Manifest with declared capabilities; host enforces at every IPC call.
- UI slot system: a registry keyed by slot name, plugins mount React via iframe or shadow DOM.
๐ฆ Phase 12 โ Portability
-
POST /companies/:id/exportโ JSON snapshot, with asecret_scrubpass. -
POST /companies/importโ atomic, transactional.
โจ Phase 13 โ Polish
- One-command onboarding (
npx <yourtool> onboard) that generates.env, runs migrations, opens browser. - Docker compose for "self-host on a box."
- Telemetry (anonymous, opt-out).
โ ๏ธ 20. Pitfalls and Tradeoffs
๐ซ Things to not do, especially early
- Don't build your own agent loop. The whole point is to be unopinionated. Wrap a CLI; ship.
- Don't add tRPC / GraphQL. It makes the MCP wrapper non-trivial. Plain REST is the contract that survives.
- Don't centralize prompts in the server. Prompts belong in adapters or skills. The core has zero opinion about model behavior.
- Don't treat budgets as soft. "Best effort" budget enforcement is no enforcement. Build the auto-pause from day one.
- Don't allow direct agent-to-agent calls. Force everything through tasks/comments. You'll thank yourself when debugging.
- Don't put
company_idon "most" tables. Put it on every table. - Don't sandbox plugins via trust. Out-of-process + capability manifest, or nothing.
โ๏ธ Honest tradeoffs Paperclip makes
| Tradeoff | What you get | What you lose |
|---|---|---|
| Single human board operator (v1) | Simple authority model | No multi-stakeholder governance |
| REST + jsonb polymorphism | Easy to extend, MCP is trivial | Less compile-time safety than tRPC |
| Local CLI adapters unsandboxed | Maximum runtime freedom | You own the host security story |
| Atomic checkout via SQL | Dead simple, no extra services | Doesn't scale past a single Postgres |
| Skills as markdown | Hot-swappable; runtime-agnostic | Behavior depends on adapter discipline |
| Plugins out-of-process | Crash isolation; multi-language | Higher latency than in-proc |
๐ Where to deviate if your domain differs
-
If your "agents" are humans-in-the-loop, keep the same model โ add
assignee_user_id, you already have it. -
If you need multi-board governance, generalize
decided_by_user_idto a poll-style record onapprovals. -
If costs aren't $/tokens, generalize
cost_eventstousage_eventswith provider-defined units. Keep the rollup shape. - If you need horizontal scale, the bottleneck is the heartbeat scheduler. Move it to a leader-elected job runner; everything else (REST, DB) already scales.
๐ก TL;DR for Building Your Own
- It's a control plane, not a framework. Three-method adapter contract. Don't pretend otherwise.
-
Postgres schema is the architecture. Get
companies / agents / issues / heartbeat_runs / cost_events / approvals / activity_logright and 80% of behavior falls out. - The heartbeat is the kernel. Coalesce, timeout, persist runs, log activity.
- Atomic SQL UPDATE = your concurrency story.
- Hard budget ceilings, not soft ones.
- Tasks are the only communication channel between agents.
- REST + MCP + skills, in that order. Each is a thin layer over the previous.
- Plugins out-of-process, capability-gated.
- Every table, query, and index starts with
company_id. - Append-only audit log in the same transaction as every mutation.
Build those ten things and you have Paperclip. Everything else is polish.
๐ Sources
- GitHub: paperclipai/paperclip
- paperclip.ing โ project site
- SPEC.md (master)
- SPEC-implementation.md (master)
- docs/agents-runtime.md
- packages/adapters
- packages/mcp-server
- packages/plugins/sdk
- awesome-paperclip โ community plugins
If you found this helpful, let me know by leaving a ๐ or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! ๐
Top comments (0)