How do you give an AI agent 30+ tools without drowning the context window? Include everything and you waste tokens. Be selective and the agent can't do its job. Here's how I solved it with a 3-layer architecture.
The Problem: Too Many Tools
As your AI agent grows, so does its toolbox. My personal assistant now has 35+ tools — web search, email, calendar, weather, Git, host PC control, file management, code execution, and more.
Sending all 35 tool schemas to the LLM on every request causes two problems:
- Token cost explosion: 35 JSON function schemas easily consume 3,000+ tokens per turn
- Selection accuracy drops: The more tools available, the more likely the LLM picks the wrong one
But if you trim tools aggressively, the agent can't handle requests it should be able to.
The Solution: 3-Layer Architecture
┌───────────────┐
│ User Input │
└───────┬───────┘
↓
┌────────────────────────────────────┐
│ Tool Registry │
│ │
│ ┌────────────┐ ┌──────────────┐ │
│ │ Base Tools │ │ Toolkits │ │
│ │ (always ON)│ │(dynamic load)│ │
│ │ 13 general │ │ 8 packs │ │
│ └──────┬─────┘ └──────┬───────┘ │
│ │ │ │
│ │ ┌─────┴──────┐ │
│ │ │ Tasks │ │
│ │ │(individual)│ │
│ │ └─────┬──────┘ │
│ └───────┬───────┘ │
│ ↓ │
│ ┌───────────────────────┐ │
│ │ Selected Tools → LLM │ │
│ └───────────────────────┘ │
└────────────────────────────────────┘
Layer 1: Base Tools — Always Included
13 general-purpose tools that could be needed for any request:
web_search, read_file, write_file, list_files,
run_command, get_datetime, calculate, run_python_code,
pip_install, recall, forget,
host_list_files, vm_to_host, host_to_vm
Web search, file I/O, code execution, memory (recall/forget) — these are universal. Included in every LLM call regardless of the user's request.
Layer 2: Toolkits — Domain-Specific Tool Packs
Related tools grouped into packs, defined as JSON files:
toolkits/
├── calendar.json # create_event, list_events, update_event, delete_event
├── contacts.json # find_contact
├── email.json # send_email, read_email, search_email
├── git.json # git_clone, git_status, git_commit, git_push
├── host_pc.json # host_open_url, host_open_app, host_find_file, ...
├── meta.json # help, show_config, health
├── scheduler.json # create_task, list_tasks, cancel_task
└── weather.json # weather
Each toolkit JSON contains:
{
"name": "weather",
"tier": "free",
"description": "Weather and forecast information...",
"keywords": ["날씨", "weather", "forecast", "temperature", "rain", "umbrella"],
"tasks": [
{
"type": "function",
"function": {
"name": "weather",
"description": "Get weather info...",
"parameters": { ... }
}
}
]
}
- description: Used for embedding similarity matching
- keywords: Fast keyword-based activation
- tasks: Actual OpenAI function calling schemas sent to the LLM
Layer 3: Tasks — Individual Tool Functions
A Task is a single tool function inside a Toolkit. The weather toolkit has 1 task (weather()), while calendar has 4 (create_event, list_events, update_event, delete_event).
Dynamic Routing: Which Toolkits to Activate?
The key question: given a user input, which toolkits are relevant?
Two-Stage Matching
def select_tools(user_input):
selected = base_tools # always included
# Stage 1: Keyword matching (fast, certain)
for toolkit in all_toolkits:
if any(keyword in user_input for keyword in toolkit.keywords):
selected += toolkit.tasks
# Stage 2: Embedding similarity (catches what keywords miss)
input_embedding = embed(user_input)
for toolkit in remaining_toolkits:
if cosine(input_embedding, toolkit.embedding) >= 0.40:
selected += toolkit.tasks
return selected
Stage 1: Keyword Matching
- "What's the weather today?" →
"weather"keyword → weather toolkit activated - "Open Chrome" →
"열어줘"(Korean "open") keyword → host_pc toolkit activated - Fast and precise, but limited coverage
Stage 2: Embedding Similarity (BGE-M3)
- "Should I bring an umbrella?" → No keyword match, but semantically similar to weather toolkit → activated
- Threshold: 0.40 (prioritize recall — better to include extra tools than miss needed ones)
- Model: BGE-M3 (multilingual, runs locally via Ollama)
Pre-computed Embeddings
At server startup, all toolkit descriptions are embedded once:
def init():
for toolkit in all_toolkits:
toolkit.embedding = get_embedding(toolkit.description)
# At request time, only the user input needs embedding (1 API call)
Real Example
User: "If it rains tomorrow, plan an indoor workout and add it to my calendar"
[ToolRouter] 18/35 tools | activated: [weather(keyword:1.00), calendar(embed:0.52)]
-
"rain"→ weather toolkit via keyword - "add to calendar" → calendar toolkit via embedding (similarity 0.52)
Base 13 + weather 1 + calendar 4 = 18 tools sent to LLM. The other 17 tools (git, email, contacts, etc.) are excluded → token savings + better accuracy.
Design Decisions
Decision 1: Why not send all tools every time?
With 35+ tool schemas:
- Token cost increases (inference cost + response latency)
- LLM confuses similar tools (
send_emailvshost_run_commandfor sending mail) - Especially severe with smaller models (8B parameters)
Decision 2: Why not use embeddings only?
Embedding-only approach:
- Even obvious keywords like "weather" require an embedding API call (unnecessary latency)
- If the embedding server goes down, everything breaks
→ Keywords first + embeddings as fallback is the optimal two-stage design
Decision 3: What threshold for similarity?
- 0.60: Precise but misses relevant toolkits
- 0.40: May over-activate but never misses
- Recall 100% is the priority — extra tools in the context are harmless (LLM ignores them), but missing a needed tool means the agent simply can't do its job
Conclusion
Tool management for AI agents comes down to one question: "Which tools should the LLM see for this specific request?"
The 3-layer answer:
- Base Tools: Universal → always ON
- Toolkits: Domain packs → dynamically activated via keyword + embedding
- Tasks: Individual functions inside toolkits
This architecture lets "What's the weather?" include only the weather task, while "Commit my code" includes only git tasks — saving tokens and improving accuracy across the board.
Top comments (0)