DEV Community

Cover image for Context-Optimized APIs: Designing MCP Servers for LLMs
Rich Jeffries
Rich Jeffries

Posted on

Context-Optimized APIs: Designing MCP Servers for LLMs

We reduced 60 tools to 9.
Same functionality.
85% less context overhead.

REST conventions work brilliantly for human developers who read documentation once and remember endpoints forever.

But your API consumer isn't human anymore.

It's an LLM with a 200k context window that re-reads every tool description on every turn. And it's paying per token.

Read that again. Every tool description on every turn.

You need a different pattern.


The Problem: Tool Sprawl

MCP lets you extend AI assistants with custom tools. The natural instinct is to create granular endpoints:

memory_add
memory_get
memory_list
memory_update
memory_delete
memory_pin
memory_archive
memory_link
memory_unlink
memory_search
memory_embed
...
Enter fullscreen mode Exit fullscreen mode

Multiply this across domains (projects, tasks, docs, files, database) and you hit 60+ tools fast. Each needs a description, parameter schema, and examples.

That's 12,000 tokens the LLM must process every single turn.

The result? Slower responses, higher costs, and an AI that picks memory_update when it meant memory_upsert because they look similar in a list of 60.


Real Example: Before and After

V1: The Granular Approach (Truncated)

{
  "tools": [
    { "name": "MemoriesAdd", "description": "Add a new memory to the system", "inputSchema": { "type": "object", "properties": { "projectKey": {}, "title": {}, "body": {}, "scope": {}, "memoryType": {}, "tags": {}, "importance": {}, "pinned": {}, "ttlIso": {}, "userId": {}, "chatId": {}, "sourceKind": {}, "sourceRef": {} }, "required": ["projectKey", "title", "body"] } },
    { "name": "MemoriesSearch", "description": "Search memories using hybrid FTS + semantic search", "inputSchema": { ... } },
    { "name": "MemoriesList", "description": "List memories with filtering and pagination", "inputSchema": { ... } },
    { "name": "MemoriesGet", "description": "Get a specific memory by ID", "inputSchema": { ... } },
    { "name": "MemoriesUpdate", "description": "Update an existing memory", "inputSchema": { ... } },
    { "name": "MemoriesPin", "description": "Pin or unpin a memory", "inputSchema": { ... } },
    { "name": "MemoriesArchive", "description": "Archive a memory (soft delete)", "inputSchema": { ... } },
    { "name": "MemoriesDelete", "description": "Permanently delete a memory", "inputSchema": { ... } },
    { "name": "MemoriesLink", "description": "Link two memories", "inputSchema": { ... } },
    { "name": "MemoriesUnlink", "description": "Remove a link between memories", "inputSchema": { ... } },
    { "name": "MemoriesRelated", "description": "Get related memories", "inputSchema": { ... } },
    { "name": "MemoriesPrune", "description": "Archive expired memories", "inputSchema": { ... } },
    { "name": "MemoriesEmbed", "description": "Generate embeddings", "inputSchema": { ... } },
    { "name": "MemoriesStats", "description": "Get memory statistics", "inputSchema": { ... } },
    { "name": "ProjectsList", "description": "List all projects", "inputSchema": { ... } },
    { "name": "ProjectsGet", "description": "Get a project by key", "inputSchema": { ... } },
    { "name": "DocsList", "description": "List docs for a project", "inputSchema": { ... } },
    { "name": "DocsSearch", "description": "Search docs via FTS", "inputSchema": { ... } },
    { "name": "FilesList", "description": "List files", "inputSchema": { ... } },
    { "name": "FilesRead", "description": "Read a file", "inputSchema": { ... } },
    { "name": "FilesWrite", "description": "Write a file", "inputSchema": { ... } },
    { "name": "DbTables", "description": "List SQLite tables", "inputSchema": { ... } },
    { "name": "DbQuery", "description": "Run a SELECT", "inputSchema": { ... } },
    { "name": "DbExec", "description": "Execute SQL", "inputSchema": { ... } }
    // ... and 35+ more
  ]
}
Enter fullscreen mode Exit fullscreen mode

~12,000 tokens. Every. Single. Turn.

V2: The Domain Facade Approach (Complete)

{
  "tools": [
    {
      "name": "MemoryExecute",
      "description": "Neural memory system. Commands: add, get, list, search, update, pin, delete, archive, link, unlink, related, embed, stats, prune",
      "inputSchema": {
        "type": "object",
        "properties": {
          "cmd": { "type": "string" },
          "detail": { "enum": ["minimal", "standard", "full"] },
          "params": { "type": "object" }
        },
        "required": ["cmd"]
      }
    },
    { "name": "ProjectsExecute", "description": "Project management. Commands: list, get, upsert, archive, stats", "inputSchema": { ... } },
    { "name": "TasksExecute", "description": "Task tracking. Commands: list, get, upsert, delete, set_status", "inputSchema": { ... } },
    { "name": "DocsExecute", "description": "Documentation. Commands: list, get, upsert, delete, search, pin", "inputSchema": { ... } },
    { "name": "FilesExecute", "description": "File operations. Commands: list, get, put, delete, roundtrip_*", "inputSchema": { ... } },
    { "name": "DatabaseExecute", "description": "SQL access. Commands: query, exec, schema, tables, stats", "inputSchema": { ... } },
    { "name": "ArtifactsExecute", "description": "Content storage. Commands: get, search, upsert", "inputSchema": { ... } },
    { "name": "HydrationExecute", "description": "AI context. Commands: hydrate, persona_*, identity_*", "inputSchema": { ... } },
    { "name": "DeepSearch", "description": "External search: Google, GitHub, Wikipedia, HackerNews", "inputSchema": { ... } }
  ]
}
Enter fullscreen mode Exit fullscreen mode

~2,000 tokens. Same functionality. That's the whole list.


The Pattern: One Tool Per Domain

Instead of 14 memory tools, expose 1 memory tool with 14 commands:

// Before: 14 tools, 14 descriptions, 14 schemas
MemoriesAdd({ title, body, ... })
MemoriesSearch({ query, topK, ... })
MemoriesPin({ id, pinned })
...

// After: 1 tool, 1 description, commands as a parameter
MemoryExecute({ cmd: "add", params: { title, body, ... }})
MemoryExecute({ cmd: "search", params: { query, topK, ... }})
MemoryExecute({ cmd: "pin", params: { id, pinned }})
Enter fullscreen mode Exit fullscreen mode

The AI reasons about 9 domains instead of 60 verbs.

"I need to search memories" → MemoryExecute with cmd: "search". Done.


The Implementation

Each domain facade follows the same structure:

public async Task<DomainResponse> ExecuteAsync(DomainCommand command)
{
    return command.Cmd.ToLowerInvariant() switch
    {
        "add" => await AddAsync(command),
        "get" => await GetAsync(command),
        "list" => await ListAsync(command),
        "search" => await SearchAsync(command),
        "update" => await UpdateAsync(command),
        "delete" => await DeleteAsync(command),
        _ => DomainResponse.Failure(command.Cmd, "Unknown command")
    };
}
Enter fullscreen mode Exit fullscreen mode

Consistent Envelopes

Request:

{
  "cmd": "search",
  "detail": "standard",
  "params": { "projectId": 1, "query": "authentication", "topK": 10 }
}
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "ok": true,
  "cmd": "search",
  "data": [...],
  "count": 10,
  "error": null
}
Enter fullscreen mode Exit fullscreen mode

Echo back the command. The AI needs to correlate request/response when it's juggling multiple operations.

Detail Levels

Control response verbosity with a single parameter:

Level Returns Use Case
minimal ID, title only Lists, counts, quick checks
standard Key fields, excerpts General use
full Everything Deep inspection, debugging

The AI requests what it needs. No more parsing 50KB responses when you just wanted a count.


The 9 Tools

Tool Commands Purpose
MemoryExecute add, get, list, search, update, pin, delete, link, unlink, embed, stats, prune Neural memory with hybrid search
ProjectsExecute list, get, upsert, archive, stats, get_tree Workspace management
TasksExecute list, get, upsert, delete, set_status, add_note Task tracking
DocsExecute list, get, upsert, delete, search, pin, embed Documentation
FilesExecute list, get, put, delete, mkdir, roundtrip_* File operations
DatabaseExecute query, exec, schema, tables, stats Direct SQL access
ArtifactsExecute get, search, upsert Content-addressed storage
HydrationExecute hydrate, persona_, identity_, preferences_* AI context loading
DeepSearch (aggregated) Google, GitHub, Wikipedia, HackerNews

60+ operations. 9 tools. Same capability.


Why It Works

1. Reduced cognitive load. The AI thinks in domains, not verbs. "I need to work with memories" → one obvious choice.

2. Consistent interface. Learn the pattern once, apply everywhere. Every domain has list, get, search. Same envelope, same error codes.

3. Token efficiency. You describe "Memory" once, not memory_add, memory_get, memory_list, memory_update... 14 times.

4. Extensibility. New command? Add a case to the switch. No new tool registration, no schema changes, no documentation updates.

5. Fewer wrong choices. 9 options beats 60. The AI stops confusing MemoriesUpdate with MemoriesUpsert.


The Metrics

Metric Before (60 tools) After (9 tools)
Tool list tokens ~12,000 ~2,000
Wrong tool selection Frequent Rare
Response latency Higher Lower
Monthly API costs $$$ $

Bonus: Manifest-Based Roundtripping

One more pattern worth mentioning: atomic multi-file editing.

The Problem

LLMs editing files one at a time:

PUT /file/a.cs → content
PUT /file/b.cs → content
PUT /file/c.cs → content
Enter fullscreen mode Exit fullscreen mode

Three API calls. No atomicity. No conflict detection. If the user edits a file while the AI is working, you get silent overwrites.

The Solution

roundtrip_start({ paths: ["a.cs", "b.cs", "c.cs"] })
  → Returns: manifest (SHA256 hashes) + ZIP of originals

[AI edits files in ZIP]

roundtrip_preview({ manifestId, modifiedZip })
  → Returns: diff, conflict warnings

roundtrip_commit({ manifestId, zip, mode: "replace" })
  → Applies atomically
Enter fullscreen mode Exit fullscreen mode

The manifest tracks original state:

{
  "manifestId": "rtp_2024-01-15T10-30-00Z_a1b2c3d4",
  "entries": [
    { "path": "src/auth/login.cs", "sha256": "abc123...", "size": 2048 },
    { "path": "src/auth/logout.cs", "sha256": "def456...", "size": 1024 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Conflict detection on commit:

var currentSha256 = ComputeHash(physicalPath);
if (currentSha256 != manifestEntry.Sha256)
    conflicts.Add($"File modified externally: {virtualPath}");
Enter fullscreen mode Exit fullscreen mode

Commit modes:

Mode Existing New Use Case
replace Overwrite Create Full sync
add_only Skip Create Safe scaffolding
update_only Overwrite Skip Targeted fixes

Single atomic operation. Bandwidth efficient. Conflict-safe. The manifest is your checkpoint - you know exactly what state you started from.


When NOT to Use This

  • Simple servers with 3-5 tools. The overhead isn't worth it.
  • Stateless utilities where operations are truly independent.
  • Human-facing APIs. Developers prefer granular REST.

This pattern is specifically for LLM consumers with context constraints and per-token costs.


Conclusion

MCP is young. Best practices are still forming.

But one thing is clear: APIs designed for human developers don't automatically work for LLM consumers.

Humans read docs once. LLMs re-read every turn. Humans remember endpoints. LLMs pay per token. Humans like granular options. LLMs get confused by 60 similar verbs.

Context-Optimized APIs flip the design question. Instead of "what's most RESTful?", ask "what minimizes context overhead while maximizing capability?"

For us, the answer was domain facades: one tool per domain, commands as parameters, consistent envelopes, configurable detail levels.

60 tools → 9 tools. 12,000 tokens → 2,000 tokens. Same functionality.

The AI is faster, cheaper, and picks the right tool more often.

Sometimes the best API design is the one that respects your consumer's constraints.


I'd love to hear your thoughts, and any tips you might have for improving the utility of MCP.

Rich

Top comments (0)