Rich Jeffries

Posted on Nov 29, 2025

Context-Optimized APIs: Designing MCP Servers for LLMs

#ai #llm #api #mcp

We reduced 60 tools to 9.
Same functionality.
85% less context overhead.

REST conventions work brilliantly for human developers who read documentation once and remember endpoints forever.

But your API consumer isn't human anymore.

It's an LLM with a 200k context window that re-reads every tool description on every turn. And it's paying per token.

Read that again. Every tool description on every turn.

You need a different pattern.

The Problem: Tool Sprawl

MCP lets you extend AI assistants with custom tools. The natural instinct is to create granular endpoints:

memory_add
memory_get
memory_list
memory_update
memory_delete
memory_pin
memory_archive
memory_link
memory_unlink
memory_search
memory_embed
...

Multiply this across domains (projects, tasks, docs, files, database) and you hit 60+ tools fast. Each needs a description, parameter schema, and examples.

That's 12,000 tokens the LLM must process every single turn.

The result? Slower responses, higher costs, and an AI that picks memory_update when it meant memory_upsert because they look similar in a list of 60.

Real Example: Before and After

V1: The Granular Approach (Truncated)

{
  "tools": [
    { "name": "MemoriesAdd", "description": "Add a new memory to the system", "inputSchema": { "type": "object", "properties": { "projectKey": {}, "title": {}, "body": {}, "scope": {}, "memoryType": {}, "tags": {}, "importance": {}, "pinned": {}, "ttlIso": {}, "userId": {}, "chatId": {}, "sourceKind": {}, "sourceRef": {} }, "required": ["projectKey", "title", "body"] } },
    { "name": "MemoriesSearch", "description": "Search memories using hybrid FTS + semantic search", "inputSchema": { ... } },
    { "name": "MemoriesList", "description": "List memories with filtering and pagination", "inputSchema": { ... } },
    { "name": "MemoriesGet", "description": "Get a specific memory by ID", "inputSchema": { ... } },
    { "name": "MemoriesUpdate", "description": "Update an existing memory", "inputSchema": { ... } },
    { "name": "MemoriesPin", "description": "Pin or unpin a memory", "inputSchema": { ... } },
    { "name": "MemoriesArchive", "description": "Archive a memory (soft delete)", "inputSchema": { ... } },
    { "name": "MemoriesDelete", "description": "Permanently delete a memory", "inputSchema": { ... } },
    { "name": "MemoriesLink", "description": "Link two memories", "inputSchema": { ... } },
    { "name": "MemoriesUnlink", "description": "Remove a link between memories", "inputSchema": { ... } },
    { "name": "MemoriesRelated", "description": "Get related memories", "inputSchema": { ... } },
    { "name": "MemoriesPrune", "description": "Archive expired memories", "inputSchema": { ... } },
    { "name": "MemoriesEmbed", "description": "Generate embeddings", "inputSchema": { ... } },
    { "name": "MemoriesStats", "description": "Get memory statistics", "inputSchema": { ... } },
    { "name": "ProjectsList", "description": "List all projects", "inputSchema": { ... } },
    { "name": "ProjectsGet", "description": "Get a project by key", "inputSchema": { ... } },
    { "name": "DocsList", "description": "List docs for a project", "inputSchema": { ... } },
    { "name": "DocsSearch", "description": "Search docs via FTS", "inputSchema": { ... } },
    { "name": "FilesList", "description": "List files", "inputSchema": { ... } },
    { "name": "FilesRead", "description": "Read a file", "inputSchema": { ... } },
    { "name": "FilesWrite", "description": "Write a file", "inputSchema": { ... } },
    { "name": "DbTables", "description": "List SQLite tables", "inputSchema": { ... } },
    { "name": "DbQuery", "description": "Run a SELECT", "inputSchema": { ... } },
    { "name": "DbExec", "description": "Execute SQL", "inputSchema": { ... } }
    // ... and 35+ more
  ]
}

~12,000 tokens. Every. Single. Turn.

V2: The Domain Facade Approach (Complete)

{
  "tools": [
    {
      "name": "MemoryExecute",
      "description": "Neural memory system. Commands: add, get, list, search, update, pin, delete, archive, link, unlink, related, embed, stats, prune",
      "inputSchema": {
        "type": "object",
        "properties": {
          "cmd": { "type": "string" },
          "detail": { "enum": ["minimal", "standard", "full"] },
          "params": { "type": "object" }
        },
        "required": ["cmd"]
      }
    },
    { "name": "ProjectsExecute", "description": "Project management. Commands: list, get, upsert, archive, stats", "inputSchema": { ... } },
    { "name": "TasksExecute", "description": "Task tracking. Commands: list, get, upsert, delete, set_status", "inputSchema": { ... } },
    { "name": "DocsExecute", "description": "Documentation. Commands: list, get, upsert, delete, search, pin", "inputSchema": { ... } },
    { "name": "FilesExecute", "description": "File operations. Commands: list, get, put, delete, roundtrip_*", "inputSchema": { ... } },
    { "name": "DatabaseExecute", "description": "SQL access. Commands: query, exec, schema, tables, stats", "inputSchema": { ... } },
    { "name": "ArtifactsExecute", "description": "Content storage. Commands: get, search, upsert", "inputSchema": { ... } },
    { "name": "HydrationExecute", "description": "AI context. Commands: hydrate, persona_*, identity_*", "inputSchema": { ... } },
    { "name": "DeepSearch", "description": "External search: Google, GitHub, Wikipedia, HackerNews", "inputSchema": { ... } }
  ]
}

~2,000 tokens. Same functionality. That's the whole list.

The Pattern: One Tool Per Domain

Instead of 14 memory tools, expose 1 memory tool with 14 commands:

// Before: 14 tools, 14 descriptions, 14 schemas
MemoriesAdd({ title, body, ... })
MemoriesSearch({ query, topK, ... })
MemoriesPin({ id, pinned })
...

// After: 1 tool, 1 description, commands as a parameter
MemoryExecute({ cmd: "add", params: { title, body, ... }})
MemoryExecute({ cmd: "search", params: { query, topK, ... }})
MemoryExecute({ cmd: "pin", params: { id, pinned }})

The AI reasons about 9 domains instead of 60 verbs.

"I need to search memories" → MemoryExecute with cmd: "search". Done.

The Implementation

Each domain facade follows the same structure:

public async Task<DomainResponse> ExecuteAsync(DomainCommand command)
{
    return command.Cmd.ToLowerInvariant() switch
    {
        "add" => await AddAsync(command),
        "get" => await GetAsync(command),
        "list" => await ListAsync(command),
        "search" => await SearchAsync(command),
        "update" => await UpdateAsync(command),
        "delete" => await DeleteAsync(command),
        _ => DomainResponse.Failure(command.Cmd, "Unknown command")
    };
}

Consistent Envelopes

Request:

{
  "cmd": "search",
  "detail": "standard",
  "params": { "projectId": 1, "query": "authentication", "topK": 10 }
}

Response:

{
  "ok": true,
  "cmd": "search",
  "data": [...],
  "count": 10,
  "error": null
}

Echo back the command. The AI needs to correlate request/response when it's juggling multiple operations.

Detail Levels

Control response verbosity with a single parameter:

Level	Returns	Use Case
`minimal`	ID, title only	Lists, counts, quick checks
`standard`	Key fields, excerpts	General use
`full`	Everything	Deep inspection, debugging

The AI requests what it needs. No more parsing 50KB responses when you just wanted a count.

The 9 Tools

Tool	Commands	Purpose
`MemoryExecute`	add, get, list, search, update, pin, delete, link, unlink, embed, stats, prune	Neural memory with hybrid search
`ProjectsExecute`	list, get, upsert, archive, stats, get_tree	Workspace management
`TasksExecute`	list, get, upsert, delete, set_status, add_note	Task tracking
`DocsExecute`	list, get, upsert, delete, search, pin, embed	Documentation
`FilesExecute`	list, get, put, delete, mkdir, roundtrip_*	File operations
`DatabaseExecute`	query, exec, schema, tables, stats	Direct SQL access
`ArtifactsExecute`	get, search, upsert	Content-addressed storage
`HydrationExecute`	hydrate, persona_, identity_, preferences_*	AI context loading
`DeepSearch`	(aggregated)	Google, GitHub, Wikipedia, HackerNews

60+ operations. 9 tools. Same capability.

Why It Works

1. Reduced cognitive load. The AI thinks in domains, not verbs. "I need to work with memories" → one obvious choice.

2. Consistent interface. Learn the pattern once, apply everywhere. Every domain has list, get, search. Same envelope, same error codes.

3. Token efficiency. You describe "Memory" once, not memory_add, memory_get, memory_list, memory_update... 14 times.

4. Extensibility. New command? Add a case to the switch. No new tool registration, no schema changes, no documentation updates.

5. Fewer wrong choices. 9 options beats 60. The AI stops confusing MemoriesUpdate with MemoriesUpsert.

The Metrics

Metric	Before (60 tools)	After (9 tools)
Tool list tokens	~12,000	~2,000
Wrong tool selection	Frequent	Rare
Response latency	Higher	Lower
Monthly API costs	$$$	$

Bonus: Manifest-Based Roundtripping

One more pattern worth mentioning: atomic multi-file editing.

The Problem

LLMs editing files one at a time:

PUT /file/a.cs → content
PUT /file/b.cs → content
PUT /file/c.cs → content

Three API calls. No atomicity. No conflict detection. If the user edits a file while the AI is working, you get silent overwrites.

The Solution

roundtrip_start({ paths: ["a.cs", "b.cs", "c.cs"] })
  → Returns: manifest (SHA256 hashes) + ZIP of originals

[AI edits files in ZIP]

roundtrip_preview({ manifestId, modifiedZip })
  → Returns: diff, conflict warnings

roundtrip_commit({ manifestId, zip, mode: "replace" })
  → Applies atomically

The manifest tracks original state:

{
  "manifestId": "rtp_2024-01-15T10-30-00Z_a1b2c3d4",
  "entries": [
    { "path": "src/auth/login.cs", "sha256": "abc123...", "size": 2048 },
    { "path": "src/auth/logout.cs", "sha256": "def456...", "size": 1024 }
  ]
}

Conflict detection on commit:

var currentSha256 = ComputeHash(physicalPath);
if (currentSha256 != manifestEntry.Sha256)
    conflicts.Add($"File modified externally: {virtualPath}");

Commit modes:

Mode	Existing	New	Use Case
`replace`	Overwrite	Create	Full sync
`add_only`	Skip	Create	Safe scaffolding
`update_only`	Overwrite	Skip	Targeted fixes

Single atomic operation. Bandwidth efficient. Conflict-safe. The manifest is your checkpoint - you know exactly what state you started from.

When NOT to Use This

Simple servers with 3-5 tools. The overhead isn't worth it.
Stateless utilities where operations are truly independent.
Human-facing APIs. Developers prefer granular REST.

This pattern is specifically for LLM consumers with context constraints and per-token costs.

Conclusion

MCP is young. Best practices are still forming.

But one thing is clear: APIs designed for human developers don't automatically work for LLM consumers.

Humans read docs once. LLMs re-read every turn. Humans remember endpoints. LLMs pay per token. Humans like granular options. LLMs get confused by 60 similar verbs.

Context-Optimized APIs flip the design question. Instead of "what's most RESTful?", ask "what minimizes context overhead while maximizing capability?"

For us, the answer was domain facades: one tool per domain, commands as parameters, consistent envelopes, configurable detail levels.

60 tools → 9 tools. 12,000 tokens → 2,000 tokens. Same functionality.

The AI is faster, cheaper, and picks the right tool more often.

Sometimes the best API design is the one that respects your consumer's constraints.

I'd love to hear your thoughts, and any tips you might have for improving the utility of MCP.

Rich

DEV Community