Eugen

Posted on Apr 23

Anatomy of a 118-Tool MCP Server: How We Organized the Chaos

#ai #productivity #saas #mcp

In the last post I showed what an AI does with 118 MCP tools. The first question people asked was: "How do you organize all of that without going insane?"

The honest answer: we didn't start with 118 tools. We started with 25 and hit every organizational problem you'd expect. This is what we ended up with after three rewrites.

The file structure

Every tool lives in one of 19 files, grouped by domain:

tools/
  accountingReadTools.ts         8 tools
  accountingWriteTools.ts        5 tools
  entityReadTools.ts             9 tools
  entityWriteTools.ts            9 tools
  entityDeleteTools.ts           3 tools
  documentReadTools.ts           3 tools
  documentCreateTools.ts        11 tools
  documentLifecycleTools.ts     10 tools
  categoryTools.ts               5 tools
  financialAccountTools.ts       5 tools
  currencyTools.ts               6 tools
  teamTools.ts                   6 tools
  billingTools.ts                2 tools
  aiInsightTools.ts              3 tools
  folderTools.ts                 6 tools
  fileTools.ts                   8 tools
  linkTools.ts                   4 tools
  sharingAnalyticsTools.ts       8 tools
  recurringTransactionTools.ts   7 tools
  types.ts                       shared type

Each file exports a single function:

export type ToolRegistrar = (server: McpServer, authInfo: McpAuthInfoDto) => void;

The entry point calls all 19 registrars:

function createMcpServer(authInfo: McpAuthInfoDto) {
  const server = new McpServer(
    { name: 'PaperLink', version: '1.0.0' },
    { instructions: MCP_INSTRUCTIONS }
  );

  registerAccountingReadTools(server, authInfo);
  registerAccountingWriteTools(server, authInfo);
  registerEntityReadTools(server, authInfo);
  // ... 16 more

  return server;
}

Adding a new domain is one file and one line in the entry point. No config, no registry, no plugin system.

Why read/write/delete are separate files

Version one had one file per domain: accountingTools.ts with all 13 tools. That worked until we added OAuth scopes.

Our OAuth consent screen lets users grant granular permissions: accounting:read, accounting:write, accounting:delete. A user might connect their AI assistant with read-only access to invoices but full write access to accounting.

When tools were in a single file, scope checks were scattered across the file with no clear boundary. Splitting by permission level made each file internally consistent: every tool in accountingReadTools.ts checks accounting:read, every tool in accountingWriteTools.ts checks accounting:write.

invoices:read    invoices:write    invoices:delete
accounting:read  accounting:write  accounting:delete
companies:read   companies:write   companies:delete
clients:read     clients:write     clients:delete
products:read    products:write    products:delete
estimates:read   estimates:write   estimates:delete
sharing:read     sharing:write
teams:read       teams:write
billing:read
ai:read          ai:write

25 scopes total. Each maps to a clear set of tools. You never wonder "does this tool need read or write?" because the filename tells you.

For smaller domains (categories, currencies, team), read and write stay in one file. We split when it starts to matter for permissions or when the file gets too long.

Tool naming: {verb}-{resource}

Most tools follow one pattern:

list-invoices
get-invoice
create-invoice
update-invoice
archive-invoice
restore-invoice
delete-invoice

Seven core verbs cover the CRUD lifecycle:

Verb	What it does	Reversible
`list`	Paginated collection	-
`get`	Single item by ID	-
`create`	New record	Yes (archive)
`update`	Modify existing	Yes (update again)
`archive`	Soft delete	Yes (restore)
`restore`	Undo archive	Yes (archive)
`delete`	Hard delete	No

Then there are domain-specific verbs for operations that don't fit the CRUD model: change-invoice-status, convert-estimate-to-invoice, record-invoice-payment, pause-recurring-transaction, generate-document-insight. About a dozen specialized verbs for a dozen specialized operations.

Batch operations get a plural noun: create-transactions (not create-transaction-batch). The AI figures it out from the plural.

This matters more than you'd think. AI models pattern-match on tool names. If you name one tool fetchInvoices, another getClients, and a third list-products, the model has to learn three conventions. Consistent naming means the model can predict tool names it hasn't seen yet.

Zod schemas as SSOT

Every tool input is a Zod schema:

server.registerTool(
  'list-invoices',
  {
    title: 'List Invoices',
    description: 'List invoices for the authenticated team.',
    inputSchema: z.object({
      status: z.string().optional().describe('Filter by status name'),
      clientName: z.string().optional().describe('Filter by client name'),
      dateFrom: coerceDateString().optional().describe('From date (YYYY-MM-DD)'),
      dateTo: coerceDateString().optional().describe('To date (YYYY-MM-DD)'),
      limit: z.coerce.number().int().min(1).max(100).optional(),
      offset: z.coerce.number().int().min(0).optional(),
    }),
    annotations: { readOnlyHint: true },
  },
  async (params) => { /* ... */ }
);

The Zod schema does three things at once:

Validation - bad input gets rejected before your handler runs
TypeScript types - params are fully typed inside the handler
JSON Schema - the MCP SDK generates the schema that AI clients see

No separate type definitions, no OpenAPI spec, no manual JSON Schema. One Zod object is the single source of truth.

The coercion problem

MCP sends parameters as JSON-RPC, and some clients serialize everything as strings. A number field might arrive as "10" instead of 10. A date might be "2024-03-15" or just 2024-03-15.

We wrote five coerce helpers to handle this:

coerceDateString()      // validates YYYY-MM-DD, accepts string
coerceAmount()          // positive number, min 0.01, max 999,999,999
coerceBoolean()         // "true"/"false" string -> boolean
coerceNullableString()  // "null" string -> null
coerceJsonArray(inner)  // JSON string -> parsed array

coerceJsonArray is the interesting one. Our create-transactions tool accepts a batch of up to 50 transactions. The AI sends them as a JSON string inside a single parameter:

{
  "transactions": "[{\"amount\": 12.50, ...}, {\"amount\": 8.00, ...}]"
}

The coercer parses the string, validates each element against the inner schema, and gives you a typed array. One helper, used in one tool, but it saved us from building a separate batch API.

Tool annotations

Every tool carries metadata hints:

// Read tools
{ readOnlyHint: true }

// Create tools
{ destructiveHint: false, idempotentHint: false }

// Update tools (modifying data = destructive + idempotent)
{ destructiveHint: true, idempotentHint: true }

// Restore tools (reversing a soft delete = safe + idempotent)
{ destructiveHint: false, idempotentHint: true }

// Archive/delete tools
{ destructiveHint: true, idempotentHint: false }

MCP clients can use these to decide whether to auto-approve a tool call or ask the user first. A readOnlyHint: true tool is safe to run without confirmation. A destructiveHint: true tool should probably ask first.

We adopted this convention early and it paid off. Some MCP clients show a different UI for destructive operations, and ours just worked.

The handler pattern

All 118 tool handlers follow the same five-step structure:

async (params) => {
  // 1. Check scope
  if (!authInfo.scopes.includes(SCOPE_ACCOUNTING_WRITE)) {
    return {
      content: [{ type: 'text', text: 'Insufficient scope - accounting:write required.' }],
      isError: true,
    };
  }

  // 2. Get use case from DI container
  const useCase = mcpUseCases.getCreateTransactionViaMcpUseCase();

  // 3. Execute (teamId/userId always come from auth, never from params)
  const result = await useCase.execute({
    ...params,
    teamId: authInfo.teamId,
    userId: authInfo.userId,
    memberRole: authInfo.teamRole,
  });

  // 4. Handle failure
  if (!result.isSuccess) {
    return {
      content: [{ type: 'text', text: result.errors.join(', ') }],
      isError: true,
    };
  }

  // 5. Return summary + structured data
  return {
    content: [
      { type: 'text', text: `Created transaction: ${data.description} - $${data.amount}` },
      { type: 'text', text: JSON.stringify(result.value, null, 2) },
    ],
  };
}

A few things to notice:

teamId and userId never come from parameters. They come from authInfo, which is populated from the OAuth token. The AI can't impersonate another user or access another team's data, even if it tries.

Two content blocks in every response. The first is a human-readable summary the AI can relay directly to the user. The second is the full JSON data the AI can parse and use for follow-up operations. This convention makes the AI's responses more natural without losing precision.

Result pattern, not exceptions. Every use case returns Result<T> with isSuccess, value, and errors. No try/catch in tool handlers, no exception-based flow control.

Use Cases: not raw queries

This is where we broke from what most MCP servers do.

The typical approach: tool handler takes params, runs a database query, returns the result. Fast to build, works fine for 10 tools.

At 118 tools, that approach means 118 handlers with database logic, validation, authorization checks, and business rules all mixed together. We'd already built these for our web app, so we reused them.

Every MCP tool calls a Use Case:

Tool handler (presentation layer)
  -> MCP Use Case (application layer)
    -> Domain Use Case (business logic)
      -> Repository (data access)

The MCP Use Cases are thin wrappers that add AI-friendly enrichment. For example, CreateTransactionViaMcpUseCase calls the same CreateTransactionUseCase that the web UI uses, then resolves account names and category paths so the AI can confirm with "Added $12.50 to Food & Dining > Groceries in your Wise USD account" instead of raw UUIDs.

This matters because business rules stay in one place. When we added a rule that transactions over $10,000 require approval, it worked in the web UI, the API, and the MCP server simultaneously. Zero duplication.

DRY patterns that actually helped

Polymorphic use cases

Companies, clients, and products share the same lifecycle: create, update, archive, restore, delete. Instead of nine separate use cases:

// One use case handles all three entity types
useCase.execute('company', { id, teamId, userId, memberRole });
useCase.execute('client',  { id, teamId, userId, memberRole });
useCase.execute('product', { id, teamId, userId, memberRole });

Three entity types, three operations each, nine tool handlers - but only three use case classes.

Category path resolver

Several tools need to show a category as a human-readable path: "Food & Dining > Groceries" instead of a UUID. We had the same ancestor-walking loop copy-pasted across four use cases before extracting it:

const resolved = await resolveCategoryPath(categoryId, categoryRepository);
// { name: "Groceries", path: "Food & Dining > Groceries" }

Scope constants

All 25 scope strings defined once:

export const MCP_SCOPE_ACCOUNTING_READ = 'accounting:read';
export const MCP_SCOPE_ACCOUNTING_WRITE = 'accounting:write';
// ... 23 more

Not exciting, but it means "rename a scope" is a single-line change with TypeScript catching every reference.

What we'd do differently

Start with the verb table. We defined verbs (list, get, create...) after building 60 tools. The first 60 had inconsistencies we had to fix. If we'd started with the verb convention, we'd have saved the refactor.

Split by permission from day one. The read/write file split happened at tool 40 when OAuth scopes forced it. It would have been easier at tool 1.

Invest in coerce helpers early. The MCP string coercion problem bit us on every tool. We wrote the helpers at tool 30. Should have been tool 1.

The one-liner

If you want to see all 118 tools in action:

claude mcp add --transport http paperlink https://mcp.paperlink.online/api/mcp/mcp

Top comments (1)

Randy Rockwell • May 5

118 tools is wild — most people would have stopped at 12. Curious how you're handling the discoverability layer at that scale, because even with a handful of tools on my own MCP server (forgepointsignal.com — regulatory data + x402 micropayments on Base mainnet) the agent-side discovery is feeling like the real blocker, not the build.

Are agents actually finding their way to PaperLink yet, or is it mostly devs evaluating?