DEV Community: Stacklok

Cut token waste across your entire team with the MCP Optimizer

Alejandro Ponce de León — Wed, 11 Mar 2026 17:14:51 +0000

You already cut your own token bill. Now imagine doing that for every member on your team, without them lifting a finger.

Here's what you'll learn in this post:

Why per-person Optimizer setups don't scale, and what to do instead
How Stacklok's Virtual MCP Server (vMCP) delivers team-wide token savings from a single deployment
How AI agents benefit automatically, with no per-agent configuration required
How to deploy the Optimizer in Kubernetes in two steps

The MCP Optimizer dynamically finds and exposes the right tools to clients only when needed, via a unified vMCP Gateway endpoint.

The problem at scale

If you read Cut Token Waste from Your AI Workflow with the ToolHive MCP Optimizer, you know the local Optimizer works great — download it, run it, and watch your token bill drop by 60-85% per request in our benchmarks. But individual setups aren't enterprise setups. You can't ask every team member to install an embedding model, tune search parameters, and keep the whole thing running alongside their other tools. And you can't ask your platform team to verify that each of those setups is configured correctly and stays that way. You need a solution that everyone benefits from the moment they connect.

Configuration drift is the first headache. One person runs a different embedding model than another. Someone tweaked the hybrid search ratio three weeks ago and forgot to tell anyone. Someone else doesn't even know the Optimizer needs configuring and wonders why their token bill is 3x everyone else's. Meanwhile, each machine burns CPU and memory running its own embedding inference — resources that could be doing literally anything else.

AI agents amplify both the problem and the payoff. Agents that fan out across multiple MCP servers stuff the full tool catalog into the context window on every invocation. When an agent connects to five or six MCP servers, that catalog grows quickly. The token bill climbs, inference slows, and the LLM starts picking the wrong tools because it's drowning in descriptions.

Multiply that by hundreds of agent runs a day. Without a centralized Optimizer, you'd have to manually wire it up for each agent and each server combination.

What you actually want — for users and AI agents alike — is to configure it once, in one place, and have everyone benefit automatically. That's exactly what Stacklok now delivers through the vMCP and Operator.

How the Optimizer works

The core idea is simple. Instead of sending your AI agent the full list of every tool from every MCP server (which can easily run to hundreds of descriptions), the Optimizer collapses them into two meta-tools:

Your agent receives a prompt that requires tool use.
It calls find_tool with a natural language description of what it needs.
The Optimizer runs hybrid search (semantic and keyword) against all registered tools.
Only the relevant tools come back — typically 8 instead of 200+.
The agent calls call_tool to invoke the one it needs.

Your agent never sees the full tool catalog. It discovers tools on demand, pays only for the descriptions it actually needs, and the LLM stays focused on fewer, more relevant options.

For a deeper dive into the mechanics and benchmarks, see the original Optimizer blog post.

All the power of vMCP, now with cost savings

If you're already running Stacklok in Kubernetes, you're likely using the vMCP— a unified gateway that aggregates multiple MCP servers behind a single endpoint. vMCP gives you:

Unified gateway. One endpoint for all your MCP servers. Onboarding a new team member means sharing one URL, not configuring five connections.
Authentication and authorization. Centralized auth for incoming clients (OIDC, anonymous, etc.) and outgoing connections, so you can enforce access policies without modifying each MCP server.
Aggregation and conflict resolution. Automatic prefixing, priority ordering, or manual overrides when tool names collide across MCP servers.

The Optimizer adds one more layer on top:

Token optimization. Every tool behind the gateway gets indexed. Clients see only find_tool and call_tool instead of the full catalog.

The savings are real. The original Optimizer blog post walks through the benchmarks in detail, showing 60-85% token reductions per request. In a head-to-head comparison with Anthropic's tool search tool, the Optimizer matched or exceeded a first-party solution.

Token savings aren't the only benefit. Fewer tool descriptions means less noise for the LLM to wade through, which means better tool selection and fewer hallucinated tool calls. You're saving tokens and getting better results.

How to deploy the Optimizer in Kubernetes

The Kubernetes setup is deliberately minimal. You need two things: an EmbeddingServer and a reference to it from your VirtualMCPServer.

Step 1: Deploy an EmbeddingServer

The EmbeddingServer Custom Resource Definition (CRD) manages a shared embedding model for the whole team. With sensible defaults baked in, the minimal configuration is just this:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: EmbeddingServer
metadata:
  name: optimizer-embedding
spec: {}

The operator defaults to BAAI/bge-small-en-v1.5 as the model and runs the HuggingFace Text Embeddings Inference server. You can increase the replica count via spec.replicas to match your team's throughput needs. One shared instance serves every vMCP in the namespace. For all available configuration options, see the Optimizer docs.

Step 2: Reference it from your VirtualMCPServer

Add a single field to your existing VirtualMCPServer:

embeddingServerRef:
  name: optimizer-embedding

That's the only change. When the operator sees embeddingServerRef without an explicit optimizer config block, it auto-populates the optimizer with sensible defaults and resolves the embedding server URL automatically. You don't need any manual wiring.

For finer control — tuning search parameters, timeouts, and more — see the Optimizer docs for the full reference.

The cost savings add up

The per-request savings are compelling on their own, but they compound quickly when you multiply across a team: every team member, every request, every day. At typical API pricing, those savings add up fast. Fewer tokens also means faster responses and lower latency for your organization.

Beyond the raw savings, the Kubernetes approach gives you operational advantages:

GitOps-friendly. EmbeddingServer and VirtualMCPServer configurations live in Git, get reviewed in PRs, and deploy through your existing CI/CD pipeline. That gives you full change history and rollback for compliance requirements.
One shared embedding server. Instead of every machine running a local embedding model, one instance serves the whole team. Less resource waste, consistent behavior.
Zero end-user setup. Users point their MCP client at the vMCP endpoint. The Optimizer is transparent; they don't need to know it's there.
Centralized security boundary. All tool discovery flows through one place, giving you a single point to audit and control which tools your team can access.

Resources

Here's everything referenced above and some extra resources:

Optimizer docs: Configuration guide
vMCP blog post: Introducing Virtual MCP Server: a unified gateway for multi-MCP workflows
vMCP docs: Virtual MCP Server configuration guide
Quickstart example: vmcp_optimizer_quickstart.yaml: deploys several MCP backends with a fully auto-configured optimizer
All options example: vmcp_optimizer_all_options.yaml: every tuning knob exposed
Original Optimizer blog: Cut Token Waste from Your AI Workflow
ToolHive GitHub: github.com/stacklok/toolhive

Want to see what Stacklok can do for your organization? Book a demo or get started right away with ToolHive, our open source project. Join the conversation and engage directly with our team on Discord.

Build your first enterprise MCP server with GitHub Copilot

Alejandro Ponce de León — Mon, 02 Feb 2026 08:53:54 +0000

Ever wondered how to bridge the gap between your company's private knowledge and AI assistants? You're about to vibecode your way there.

What all the fuss with MCP is about

Back in November 2022, the world changed when OpenAI launched ChatGPT. It wasn't the first Large Language Model (LLM), but it was the most capable at the time, and most importantly, it was available for everyone to explore. To make a small analogy: it got to the moon first. LLMs sparked everyone's imagination and forever changed the way we work. Maybe that's a little far-fetched, but they definitely boosted productivity across many areas.

Yet LLMs weren't (and still aren't) all-mighty. They've been trained on vast amounts of internet content, but they have two critical limitations:

They're not trained on private content. No company wikis, internal docs, or how-tos.
They have a knowledge cutoff. Their training stops at a fixed date, usually months in the past.

So if you ask ChatGPT something like "How was feature X designed in product Y, and how can I integrate it with my new feature Z?", it will have no idea what you're talking about. First of all, it would most probably not have access to the implementation details, since it would fall into the private content of an organization. Even if it did, there’s no guarantee because it’s frozen in time; it doesn’t know what’s changed in the world since that cutoff.

MCP to the rescue

Fortunately, both problems can be solved with tools. Tools empower LLMs with capabilities beyond their training. To solve the two issues above, we can create tools that tell the LLM: "When you're asked about product X at company Y, use tool Z to get the most up-to-date information." That tool might, for example, search an internal knowledge base.

MCP (Model Context Protocol) has quickly become the standard for tool calling. Modern AI systems have two essential parts: the client (VS Code, Cursor, ChatGPT, Claude Code, etc.) and the model itself. Tools live on the client side. When the model doesn't know something, it calls a tool that the client executes. Originally introduced by Anthropic, MCP’s open design and community adoption have made it the clear industry standard, now supported by OpenAI, Google, Microsoft, and others. That means you can write an MCP server once and use it with your favorite clients.

Building your first MCP, the AI scrappy way

Let's say your boss just tasked you with connecting your AI assistants to the corporate Confluence wiki. This is a perfect use case for MCP; you need to expose enterprise knowledge to AI tools in a standardized way.

For this tutorial, we'll assume you already have a querying system in place, whether that's a Retrieval Augmented Generation (RAG) pipeline, a search API, or another knowledge retrieval mechanism. Our job is to wrap that existing system with an MCP server so AI assistants can access it.

Our approach: vibecoding

We're going to build this MCP server using what Andrej Karpathy sarcastically called "vibecoding": letting LLMs do most, if not all, of the code. The term spread like wildfire because, well, it works surprisingly well for certain tasks. It's not a silver bullet, but it's perfect for handling boilerplate code and getting something functional quickly.

Ingredients

Python 3.13+
VS Code with Copilot
uv for package management

Why Copilot?

While editors like Cursor, Codex, Windsurf, and Claude Code have gained wide popularity for their deep AI integration, GitHub Copilot remains the most widely available option for enterprise developers. It’s often already included in Microsoft or GitHub contracts, making it simple to deploy without extra approvals. We’ll still use Copilot here because it’s what most teams already have available and will get the job done.

Implementation

The initial prompt

Getting started with AI-assisted development is all about setting clear expectations. Here's the first prompt I used to kick off the project. Being specific about tooling and goals helps guide the AI toward the implementation you actually want. After this initial prompt, we should have the scaffolding of the project and most of the implementation ready.

This is a new project called enterprise-mcp. It is a Python project using 3.13 or greater. The project is meant to be an MCP server that will access enterprise knowledge and make it available to LLMs. The project should:
- Use uv as package manager
- For adding packages use `uv add <package_name>`
- All configuration should be centralized in pyproject.toml file
- Use uv dependency groups when adding development dependencies like pytest, e.g. `uv add pytest --dev` or `uv add --group dev pytest`
- I would also like a Taskfile to centralize running commands, like `task format`, `task test`, or `task typecheck`
- Use `ruff` for linting and formatting
- Use `ty` for typechecking https://docs.astral.sh/ty/
- Use `async` functions wherever possible and `asyncio.gather` when parallelizing multiple tasks
- Use the official Python MCP SDK: https://github.com/modelcontextprotocol/python-sdk
- For now, make a single tool called search_enterprise_knowledge. Make sure the tool has appropriate descriptions that are descriptive enough for LLM usage
- Make the implementation with tests. I don't care so much about unit tests but about testing the overall functionality of the application

Where the AI got confused

Even with a detailed prompt, the first pass required some corrections. Still, we had a working implementation even at the first prompt, which is also quite impressive. Two main issues emerged, both likely related to the AI's knowledge cutoff:

1. Misunderstanding the MCP SDK

Instead of using the official Python SDK, Copilot attempted to semi-reimplement the MCP protocol from scratch, creating custom list_tools and call_tool endpoints. Since the MCP SDK is fairly recent, it wasn't in the training data, and crucially, the AI didn't check the documentation before implementing.

2. Using Mypy instead of Ty

Similar story here. The AI defaulted to the more established Mypy rather than looking up the newer Ty package I'd specified.

Manual refinements

Beyond fixing the AI's mistakes, I made some personal preference edits:

Structure of pyproject.toml. No coding assistant until this day nails my pyproject.toml preferences on the first try (it may well be a me problem and not an AI problem). I referenced configurations from past projects I liked and adapted them here.
Taskfile.yml adjustments. Same deal with the Taskfile.yml. That said, the AI got me 80-90% of the way there, which is pretty remarkable.

Iterating with prompt #2

After the initial implementation and manual edits, a few minor improvements remained. Rather than handle them manually, I asked Copilot to finish the job because it would certainly take less time than I would:

I have made some changes in my server.py to correctly use the Python SDK. I want you to:
1. Transform my server to a streamable HTTP server.
2. Add a comprehensive docstring for my handle_search method so that it's usable by LLMs whenever enterprise knowledge is needed.
Check the documentation of the Python SDK to know how to correctly transform the server to streamable HTTP: https://github.com/modelcontextprotocol/python-sdk

Closing the loop

The final step is updating project memory: the context file that helps future AI sessions (and human developers) understand your project quickly. For most coding assistants, this lives in AGENTS.md or CLAUDE.md at the project root. Most coding assistants recognize either. It's a good place to:

Document the project structure, so the agent knows where to implement a new feature or fix a bug
Outline the project's best practices
Give instructions that can be repeated across runs, e.g., always run unit tests along with code linting

Perfect, 3 final tasks after some manual modifications:
1. Make sure my commands `task format`, `task typecheck` and `task test` work and return without errors
2. Update the file AGENTS.md with relevant context information for coding agents. Take into account the best practices signaled at the beginning, like centralizing everything in pyproject.toml and using `task ..` commands to run relevant project commands. The code formatting and tests commands should be used every time a coding task is finished. Read the repo again for any other relevant information
3. Finally, update a README.md with a summary of the project and the development process

Key lessons

Tools are not API endpoints

This is crucial to understand when building MCP servers: an MCP tool is fundamentally different from an API endpoint, even though it's tempting to map them one-to-one.

API endpoints are designed as small, atomic, reusable operations. They're the building blocks you compose together: one endpoint to fetch user data, another to update preferences, another to send notifications. Each is focused and modular, meant to serve multiple use cases across your application.

MCP tools, by contrast, are meant to accomplish complete deterministic workflows or actions. Think of an API as giving you a toolbox of small buttons, each doing one thing, that you wire together. An MCP tool is a single big button that says "do the thing." It handles an entire task from start to finish.

For example, instead of separate tools for "search documents," "filter by date," and "format results," you'd create one search_enterprise_knowledge tool that handles the full workflow of finding, filtering, and returning relevant information in one shot.

You're still accountable

Whatever code the AI produces, you own it. If it breaks in production, you can't blame Copilot or Claude. Humans remain accountable for the code we ship.

This means you should always review what gets generated. Not necessarily line-by-line, but at minimum: understand what it does, verify it follows your standards, and run it through your normal quality checks. A quick sanity check is never wasted time, especially when you're the one who'll be called at 2am to fix it.

Testing the MCP server

For this first iteration, it's better to remove all variables like coding assistants and configuration files. The easiest way to do that is with the MCP Inspector, a tool from Anthropic for inspecting an MCP server and querying it directly. To run the inspector:

npx -y @modelcontextprotocol/inspector

Example response

## Result 1
**Title:** API Documentation - Authentication
**Content:**
# API Authentication Guide
## Overview
Our REST API uses OAuth 2.0 for authentication.
## Getting Started
1. Register your application
2. Obtain client credentials
3. Request access token
4. Include token in requests

## Example

curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.company.com/v1/users
Access tokens expire after 1 hour.


**Metadata:**
- author: API Team
- created: 2024-02-01
- last_updated: 2024-10-20
- tags: ['api', 'authentication', 'oauth', 'documentation']
- source: confluence
**URL:** https://company.atlassian.net/wiki/spaces/API/pages/987654321

What's next

We've successfully built a working MCP server using Copilot and vibecoding, ready to access enterprise knowledge through a standardized protocol!

By letting GitHub Copilot handle most of the boilerplate code, we created a functional Python MCP server with proper tooling, testing, and documentation, all while maintaining code quality and best practices.

Full code repository: https://github.com/aponcedeleonch/enterprise-mcp

In the next blog post, we're taking this further by introducing ToolHive, a powerful platform that makes deploying and managing MCP servers effortless. ToolHive offers:

Instant deployment using Docker containers or source packages (Python, TypeScript, or Go)
Secure by default with isolated containers, customizable permissions, and encrypted secrets management
Seamless integration with GitHub Copilot, Cursor, and other popular AI clients
Enterprise-ready features, including OAuth-based authorization and Kubernetes deployment via the ToolHive Kubernetes Operator
A curated registry of verified MCP servers you can discover and run immediately, or create your own custom registry

Stay tuned to learn how to evolve our enterprise MCP server from a prototype into a production-ready service!

Introducing mcp-tef - Testing Your MCP Tool Descriptions Before They Cause Problems

Nigel Brown — Tue, 16 Dec 2025 20:08:11 +0000

Introducing mcp-tef - Testing Your MCP Tool Descriptions Before They Cause Problems

TL;DR

When you build MCP tools, vague or overlapping descriptions cause LLMs to select the wrong tools—or no tools at all. Testing in production frustrates users and damages trust. mcp-tef is an open-source tool evaluation system that lets you test tool descriptions systematically before deployment, catching problems early with real LLM testing, similarity detection, and quality analysis.

The Problem: Tool Description Failures in Production

When you write an MCP tool, you provide a name and description. The LLM reads this description and decides whether to use your tool based on user prompts. But here's what goes wrong:

Vague descriptions confuse LLMs. A tool called search with description "Search for things" gives the LLM no information about what can be searched, how to search it, or what it returns.

Overlapping descriptions cause conflicts. You might have your own create_issue tool, but then add a third-party GitHub MCP server that also has create_issue. The LLM sees two tools with identical names doing similar things and can't determine which to select.

The result: The LLM either picks the wrong tool entirely or becomes so confused that it picks no tool at all. Users get frustrated, trust erodes, and you're debugging in production.

It gets worse with mixed environments. The MCP ecosystem is growing fast. You're mixing custom tools with third-party MCP servers, and maybe multiple third-party servers together. Each has its own set of tools, and they all need to play nicely together. Without systematic testing, conflicts and confusion multiply.

Why This Matters: The Cost of Getting It Wrong

Testing in production is expensive. By the time you realize your tool descriptions are broken, you've already frustrated users. You're fixing problems reactively instead of preventing them proactively.

Manual testing doesn't scale. How do you know if your fix actually works? How do you know if two descriptions are too similar? How do you test that the LLM will actually pick the right tool when a user asks a real question? You can't manually test every possible prompt against every combination of tools.

The solution: Test tool descriptions systematically before deployment, with real LLM testing and actionable feedback.

How mcp-tef Solves This

mcp-tef is an open source (Apache 2.0 licensed) tool evaluation system that helps you create correct, non-clashing tool descriptions from the start. It provides three core capabilities:

1. Tool evaluation

Create test cases with real user prompts (queries), and mcp-tef tests whether the LLM picks the right tool. It provides metrics (precision, recall, F1 scores), validates parameter extraction, and analyzes confidence. If the LLM is highly confident but wrong, that's a "misleading" description that needs immediate attention.

Example:

# Create a test case
mtef test-case create \
  --url https://localhost:8000 \
  --name "GitHub repository search" \
  --query "Find repositories related to MCP tools" \
  --expected-server "http://localhost:8080/github/mcp" \
  --expected-tool "search_repositories" \
  --servers "http://localhost:8080/github/mcp:streamable-http" \
  --insecure

✓ Test case created successfully
ID: d2fcb4bf-8334-4339-a0a8-c1ead2deeea6

# Run the test
mtef test-run execute d2fcb4bf-8334-4339-a0a8-c1ead2deeea6 \
  --url https://localhost:8000 \
  --model-provider openrouter \
  --model-name anthropic/claude-3.5-sonnet \
  --api-key sk-or-v1-... \
  --insecure

Result:

✓ Test run completed successfully
Status: completed
Classification: TP (True Positive)
Tool Match: Correct
Confidence: high (robust description)
Param Score: 10.0/10
Execution: 9,295 ms

2. Similarity detection

Uses embeddings to find tools with similar descriptions. Generates similarity matrices showing which tools overlap, and flags high-similarity pairs (e.g., 0.87 similarity) that might confuse the LLM. Provides specific recommendations for differentiation, including revised descriptions you can use.

Example:

mtef similarity analyze \
  --url https://localhost:8000 \
  --server-urls "http://localhost:8080/fetch/mcp:streamable-http,http://localhost:8080/toolhive-doc-mcp/mcp:streamable-http,http://localhost:8080/mcp-optimizer/mcp:streamable-http,http://localhost:8080/github/mcp:streamable-http" \
  --threshold 0.85 \
  --insecure

Result:

✓ Analysis complete: 18 pairs flagged above 0.85 threshold
Analyzed 55 tools across 4 servers

3. Tool quality analysis

Scores tool descriptions on clarity, completeness, and conciseness (1-10 scale). Tells you what's missing, what's vague, and what could be improved. Provides suggested improved descriptions.

Example:

$ mtef tool-quality \
  --url https://localhost:8000 \
  --server-urls "http://localhost:8080/toolhive-doc-mcp/mcp" \
  --model-provider openrouter \
  --model-name anthropic/claude-3.5-sonnet \
  --insecure \
  --timeout 120

Result:

ℹ Using mcp-tef at https://localhost:8000

Tool Quality Evaluation Results
============================================================

┏━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Tool Name  ┃ Clarity ┃ Completeness ┃ Conciseness ┃
┡━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ query_docs │  7/10   │     6/10     │    9/10     │
│ get_chunk  │  6/10   │     4/10     │    8/10     │
└────────────┴─────────┴──────────────┴─────────────┘

✓ Evaluated 2 tool(s)

Note on transport support:

Supported: mcp-tef connects to MCP servers using the Streamable HTTP or SSE (deprecated) transports.
Not supported: mcp-tef does not support stdio servers directly, but you can use stdio-based MCP servers with ToolHive, which runs stdio servers and exposes them via Streamable HTTP endpoint.

Using mcp-tef: CLI and HTTP API

All the examples in this post use the mtef CLI tool, but every operation can also be performed directly via HTTP API calls. The mcp-tef server exposes a REST API with OpenAPI documentation, so you can integrate it into your own workflows, CI/CD pipelines, or applications. The server provides interactive API documentation at /docs and an OpenAPI specification at /openapi.json.

Both approaches provide the same functionality—choose the one that fits your workflow.

Where You Use It

Your own MCP servers: Test descriptions before deployment. Create test cases for common user prompts, run them through mcp-tef, iterate on descriptions until tests pass.

Third-party MCP servers: Evaluate tools before integrating. Test server tools in isolation, see how well they perform, make informed decisions about which servers to use.

Mixed environments: Before mixing multiple servers together, run similarity detection. See which tools conflict, use mcp-tef's recommendations to understand how to differentiate them—maybe you'll need vMCP's prefixing, or maybe you can improve descriptions.

Continuous testing: As you add new tools or update descriptions, keep testing. Make mcp-tef part of your CI/CD pipeline. Catch problems before they reach users.

LLM comparison and migration: Validate that different models (e.g., Anthropic Claude vs. Ollama Llama) correctly select tools using the same test cases. Compare performance across providers to ensure tool descriptions work consistently.

Real-World Example

You're building a document management MCP server with a tool called search and description: "Search for documents."

mcp-tef flags it:

Clarity: 3/10
Missing: what can you search? Content? Filenames? Metadata? What does it return?

You improve it to: "Search document CONTENT using keywords and boolean operators. Supports PDF, TXT, DOCX, and MD files. Returns ranked results with highlighted excerpts and relevance scores."

You test it: Create a test case, run it, LLM correctly selects your tool. Great!

But then you add a third-party file system MCP server with find_files: "Find files by searching with patterns."

Similarity detection catches it: 0.87 similarity. Recommendation: "Emphasize that search searches CONTENT, while find_files searches FILENAMES."

You differentiate them clearly: Now the LLM can distinguish between searching document content and finding files by name. If the third-party server doesn't update their description, you can still use vMCP to prefix them, but now the descriptions are also clear, so the LLM makes better choices.

Getting Started

mcp-tef is open source and works with several providers: Anthropic, OpenAI, Openrouter and Ollama.

Prerequisites

Required:

Python 3.13+
uv package manager (https://docs.astral.sh/uv/)

Optional:

Ollama — for local LLM testing (no API keys needed)
Docker — if deploying via the CLI (mtef deploy)
API keys — for cloud LLM providers (e.g., OpenRouter) if not using Ollama

Install:

uv tool install \
"mcp-tef-cli@git+https://github.com/StacklokLabs/mcp-tef.git#subdirectory=cli"

Deploy:

mtef deploy --health-check

Test your tools:

Using the examples above

# Check quality
mtef tool-quality ...

# Create test case
mtef test-case create --name "My first test" --query ...

# Run test
mtef test-run execute <test-case-id> ...

The whole process takes just a few minutes. You'll immediately see if your descriptions work or if they need improvement.

How mcp-tef Works with vMCP and MCP Optimizer

These tools are designed to work together, each solving different parts of the MCP ecosystem challenge:

mcp-tef helps you write better tool descriptions from the start. It tests whether descriptions are clear, complete, and differentiated. When descriptions are good, LLMs make better tool selection decisions.

vMCP (Virtual MCP Server) provides a unified gateway for multiple MCP servers, handling tool name conflicts through intelligent prefixing and routing. When you've tested your descriptions with mcp-tef, vMCP's prefixing works even better—the LLM can distinguish tools not just by name, but by their clear, well-differentiated descriptions.

MCP Optimizer intelligently routes requests to the right tools across your MCP ecosystem. With well-tested descriptions from mcp-tef, Optimizer has better information to work with, requiring fewer manual overrides and making smarter routing decisions.

The workflow: Use mcp-tef to test and improve your tool descriptions. Deploy with vMCP to handle multi-server coordination. Let MCP Optimizer route requests intelligently. Good descriptions make all these solutions work better together, creating a more reliable and maintainable system.

The Verdict

mcp-tef helps you write better tool descriptions systematically, with real LLM testing and actionable feedback. But great descriptions work even better when combined with the right infrastructure tools.

Key takeaway: Test your tool descriptions before deploying. Good descriptions lead to better tool selection, which leads to happier users. And when you combine well-tested descriptions with tools like vMCP and MCP Optimizer, you get a robust, maintainable MCP ecosystem that works reliably at scale.

Key Points Summary

The problem: Vague or overlapping tool descriptions confuse LLMs, leading to incorrect tool selection.
Why it matters: Testing in production frustrates users; prevention is better than reactive fixes.
The solution: mcp-tef provides systematic testing with tool evaluation, similarity detection, and quality analysis.
Where to use it: Your own servers, third-party servers, mixed environments, continuous testing, LLM comparison.
The goal: Create descriptions that are correct and don't clash, making your entire MCP ecosystem work better.
Working together: mcp-tef, vMCP, and MCP Optimizer complement each other. Good descriptions make infrastructure tools work even better.

Want to join in the MCP fun? Visit toolhive.dev and join the ToolHive community on Discord.

Introducing Virtual MCP Server: Unified Gateway for Multi-MCP Workflows

Dan Barr — Thu, 11 Dec 2025 15:59:12 +0000

If you're working with AI coding assistants like GitHub Copilot or Claude, you've probably encountered MCP (Model Context Protocol) servers. They're powerful, connecting your AI to GitHub, Jira, Slack, cloud providers, and more. But here's the problem: each connection requires separate configuration, authentication, and maintenance.

Managing MCP server connections gets messy fast. That’s why we built the Virtual MCP Server (vMCP) in ToolHive to solve this problem by aggregating multiple MCP servers into a single unified endpoint.

The problem: connection overload

Picture this: you're an engineer on a platform team. Your AI assistant needs access to GitHub for code, Jira for tickets, Slack for notifications, PagerDuty for incidents, Datadog for metrics, AWS for infrastructure, Confluence for docs, and your internal knowledge base. That's 8 separate MCP server connections, each exposing 10-20+ tools. Now your AI's context window is filling up with 80+ tool descriptions, burning tokens and degrading performance as the LLM struggles to select the right tools from an overwhelming list.

Each MCP server connection requires:

Individual configuration in your AI client
Separate authentication credentials
Manual coordination when tasks span multiple systems
Repeated parameter entry (same repo, same channel, same database)
Tool filtering to avoid context bloat and wasted tokens

Want to investigate a production incident? You're manually running commands across 4 different systems and piecing together the results yourself. Deploying an app? You're orchestrating a sequence of operations: merge PR, wait for CI, get approval, deploy, notify team. It's tedious, error-prone, and not reusable.

The solution: aggregate everything

vMCP transforms those 8 connections into one. You configure a single MCP endpoint that aggregates all your backend servers.

Before vMCP:

{
  "servers": {
    "github": { "url": "..." },
    "jira": { "url": "..." },
    "slack": { "url": "..." },
    "pagerduty": { "url": "..." },
    "datadog": { "url": "..." },
    "aws": { "url": "..." },
    "confluence": { "url": "..." },
    "docs": { "url": "..." }
  }
}

With vMCP:

{
  "servers": {
    "company-tools": {
      "url": "http://vmcp.company.com/mcp"
    }
  }
}

One connection. One authentication flow. All your tools available.

And here’s the key: you can run as many vMCP instances as you need. Your frontend team connects to one vMCP with their specific tools. Your platform team connects to another with infrastructure access. Each vMCP aggregates exactly the backends that each team needs, with appropriate security policies and permissions.

This matters for two reasons: security (no more giving everyone access to everything) and efficiency (fewer tools means smaller context windows, which means lower token costs and better AI performance).

What vMCP does

vMCP is part of the ToolHive Kubernetes Operator. It acts as an intelligent aggregation layer that sits between your AI client and your backend MCP servers.

1. Multi-server aggregation with tool filtering

All MCP tools appear through a single endpoint, but you cherry-pick exactly which tools to expose.

Example: An engineer on the ToolHive team gets a single vMCP connection with:

GitHub’s search_code tool (scoped to the stacklok/toolhive repo only)
The ToolHive docs MCP server
An internal docs server hooked up to Google Drive and filtered to ToolHive design docs
Slack (only the #toolhive-team channel)

No irrelevant tools cluttering the LLM's context. No wasted tokens on unused tool descriptions. Just the tools needed for their work, making it easier for the AI to select the right tool every time.

When multiple MCP servers have tools with the same name (both GitHub and Jira have create_issue), vMCP automatically prefixes them: github_create_issue and jira_create_issue. You can customize these names however you want.

2. Declarative multi-system workflows

Real tasks often require coordinating across multiple systems. vMCP lets you define deterministic workflows that execute in parallel with conditionals, error handling, and approval gates.

Example: Incident investigation

Instead of manually jumping between 4 different systems, copy/pasting data, and aggregating the results, a single “composite tool” could:

→ Query logs from logging system
→ Fetch metrics from monitoring platform  
→ Pull traces from tracing service
→ Check infrastructure status from cloud provider
→ Manually combine everything into a report
→ Create Jira ticket with findings

vMCP executes all queries in parallel, automatically aggregates the data, and creates the ticket. Define the workflow once, use it for every incident.

Example: App deployment

A typical deployment workflow handled end-to-end:

→ Merge pull request in GitHub
→ Wait for CI tests to pass
→ Request human approval (using MCP elicitation)
→ Deploy (only if approved)
→ Notify team in Slack

3. Pre-configured defaults and guardrails

Stop typing the same parameters repeatedly. Configure defaults once in vMCP.

Before: Every GitHub query requires specifying repo: stacklok/toolhive

After: The repo is pre-configured. Engineers never specify it, and they can't accidentally query the wrong one.

This isn’t just convenience, it’s about deterministic behavior and security. By pre-configuring parameters, you ensure tools behave consistently, and users can only access resources you’ve explicitly exposed. No more accidental queries to the wrong repo, Slack channels, databases, cloud regions, or anything else you reference repeatedly.

4. Tool customization and security policies

Third-party MCP servers often expose generic, unrestricted tools. vMCP lets you wrap and restrict them without modifying upstream servers.

Security policy enforcement: Restrict a website fetch tool to internal domains only (*.company.com), validate URLs before calling the backend, and provide clear error messages for violations.

Simplified interfaces: That AWS EC2 tool with 20+ parameters? Create a wrapper that only exposes the 3 parameters your frontend team actually needs, with safe defaults for everything else.

5. Centralized authentication

vMCP implements a two-boundary authentication model with a complete audit trail. Your AI client authenticates once to vMCP using the OAuth 2.1 methods defined in the official MCP spec. vMCP handles authorization to each backend independently based on its requirements.

When it’s time to revoke access, disable the user in your identity provider, and all backend access is revoked instantly.

Real-world benefits

Let's look at the incident investigation example with concrete numbers:

Without vMCP:

4 sequential manual commands
2-3 minutes per command
5-10 minutes aggregating and formatting
15-20 minutes total per incident
Results vary by engineer
Process isn't documented or reusable

With vMCP:

One command triggers the workflow
Parallel execution: 30 seconds
Automatic aggregation and formatting
Consistent results every time
Workflow is documented as code
Any team member can use it

For a team handling 20 incidents per week, that's 5-6 hours saved. More importantly, the response is faster, more consistent, and doesn't require senior engineers to handle routine investigations.

How it works

vMCP runs in Kubernetes alongside your backend MCP servers. You define three types of resources:

MCPGroup: Organizes backend servers logically (e.g., "platform-tools")

MCPServer: Individual backend MCP servers (GitHub, Jira, etc.)

VirtualMCPServer: The aggregation layer that combines servers from a group

The ToolHive operator discovers backends, resolves tool name conflicts, applies security policies, and exposes everything through a single endpoint. Your AI client connects to vMCP just like any other MCP server.

Since each VirtualMCPServer is a separate Kubernetes resource, you can deploy as many as needed. One per team, one per environment, or organized however makes sense for your security model.

For a working example, check out the quickstart tutorial.

When to use vMCP

vMCP makes sense when you're managing multiple MCP servers (typically 5+), curating a subset of MCP tools for specific teams and workflows, or need tasks that coordinate across systems. It's especially valuable for:

Teams requiring centralized authentication and authorization
Workflows that should be reusable across the entire team
Security policies that need centralized enforcement
Reducing onboarding complexity for new engineers

If you're using a single MCP server for simple one-step operations, you probably don't need vMCP. It's built for managing complexity at scale.

Get started

vMCP is available now as part of ToolHive. To try it out:

Install the ToolHive Kubernetes Operator
Follow the vMCP quickstart
Connect your AI client to the aggregated endpoint

We'd love to hear how you're using vMCP. What workflows are you building? Which MCP servers are you aggregating? Join the ToolHive community on Discord and let us know.

Looking to leverage vMCP within your enterprise organization? Book a demo with us.

ToolHive is an open-source MCP platform focused on security and enterprise operationalization. Learn more at toolhive.dev.

Stacklok's MCP Optimizer vs Anthropic's Tool Search Tool: A Head-to-Head Comparison

Alejandro Ponce de León — Wed, 10 Dec 2025 15:36:56 +0000

TL;DR

Both solutions tackle the critical problem of token bloat from excessive tool definitions. However, our testing with 2,792 tools reveals a stark performance gap: Stacklok MCP Optimizer achieves 94% accuracy in selecting the right tools, while Anthropic's Tool Search Tool achieves only 34% accuracy. If you're building production AI agents that need reliable tool selection without breaking the bank on tokens, these numbers matter.

The Problem Both Are Solving

When you connect AI agents to multiple Model Context Protocol (MCP) servers, tool definitions quickly consume massive portions of your context window, often before your actual conversation even begins.

The reality? Most queries only need a handful of these tools. Loading all of them wastes tokens (read: money) and degrades model performance as the tool count grows.

Both Stacklok MCP Optimizer (launched October 28, 2025) and Anthropic's Tool Search Tool (launched November 20, 2025 as part of their advanced tool use beta) address this by loading a single search tool that finds and loads only the necessary tools on demand.

Why This Matters: Real Benefits and Trade-offs

The Upside

Token savings are substantial. We've observed up to 80% reductions in input tokens. In their internal testing, Anthropic reports their approach preserves 191,300 tokens of context compared to loading all tools upfront, an 85% reduction. In rate-limited enterprise environments, this translates directly to cost savings and faster response times.

Improved model performance. Reducing token overhead doesn't just save money, it can improve model accuracy. Anthropic's internal testing showed substantial improvements with Tool Search Tool enabled: Opus 4 jumped from 49% to 74%, and Opus 4.5 improved from 79.5% to 88.1% on MCP evaluations. However, it's important to note that Anthropic's experiments and datasets are not publicly available, making direct comparisons challenging.

Our own testing with MCP Optimizer across different model tiers revealed an interesting pattern: while state-of-the-art models like Claude Sonnet 4 maintained strong performance when benchmarking tool selection accuracy (94.6% → 93.4%), mid-tier and smaller models showed significant improvements. Gemini 2.5 Flash increased from 83.2% to 92.4%, and the gpt-oss-20B model nearly doubled its accuracy from 38% to 69.4%. This suggests that efficient tool loading particularly benefits models with tighter context constraints, making MCP Optimizer valuable across different deployment scenarios, from resource-constrained edge deployments to cost-optimized production systems.

The Downside

Risk of tool retrieval failure. The benefits above assume the search tool successfully finds the right tool. But what happens when it doesn't? If the search tool can't find the right tool, your task fails or produces unexpected behavior. While the agent can retry searches, this introduces latency and still consumes tokens. The critical question becomes: How often does the search actually work in practice ? This is precisely what our head-to-head comparison measures.

How Each Approach Works

Both solutions introduce a lightweight search tool, but their algorithms differ significantly:

Stacklok MCP Optimizer: Combines semantic search with BM25 for hybrid tool discovery
Anthropic Tool Search Tool: Offers two variants, BM25-only or regex-based pattern matching

The algorithmic difference has profound implications for real-world performance, as our testing reveals.

The Head-to-Head Comparison

We conducted a comprehensive evaluation to answer the question: Which approach is more effective? (Source code and full results)

Test Methodology

Loaded 2,792 tools from various MCP servers using the MCP-tools dataset
For each tool, generated a synthetic query using an LLM that would naturally require that specific tool
- Example: For GitHub's create_pull_request tool → Generated query: "Create a pull request from feature-branch to main branch in the octocat/Hello-World repository on GitHub"
- Example: Slack's channels_list tool → Generated query: "Show me all channels in my Slack workspace"
Used Claude Sonnet 4.5 to test whether each approach could correctly search and select the original tool that generated the query
- Retrieval Accuracy: Does the correct tool appear anywhere in the search results returned by the search tool?
- Selection Accuracy: Is the correct tool actually selected by the model for use?
- This direct mapping lets us objectively measure retrieval accuracy: we know the ground truth for every query. In the examples above, the correct tools would be GitHub's create_pull_request and Slack's channels_list

Results

The stark difference in selection accuracy between approaches primarily reflects retrieval effectiveness rather than model performance. Since all approaches used the same model (Claude Sonnet 4.5) for tool selection, the 94% vs 34% accuracy gap stems from MCP Optimizer's superior retrieval accuracy (98% vs 48%). Put simply: if the correct tool doesn't appear in the search results, even the best model cannot select it. MCP Optimizer's hybrid semantic + BM25 search successfully surfaces the correct tool in 98% of cases, giving the model the opportunity to make the right selection. In contrast, Tool Search Tool's lower retrieval rates mean the model often never sees the correct tool among its options.

These results align with independent testing from other organizations. Arcade reported that Anthropic's Tool Search achieved only 56% retrieval accuracy with regex and 64% with BM25 across 4,027 tools.

Runtime Performance Characteristics

Approach	Average execution time	Average tools retrieved	Average input tokens*
MCP Optimizer	5.75 seconds	5.2	3296
Tool Search Tool (BM25)	12.05 seconds	5.0	2823
Tool Search Tool (regex)	13.55 seconds	5.2	3679

* Average Input Tokens: The total number of tokens sent to the model per request, including system prompt, tool definitions, and user query.

Beyond accuracy, the operational characteristics of each approach reveal important trade-offs. Tool Search Tool (BM25) achieves the lowest token consumption at 2,823 tokens per request, which likely stems from retrieving slightly fewer tools on average (5.0 vs 5.2). However, MCP Optimizer's token count of 3,296 still represents substantial savings compared to attempting to load all 2,792 tools upfront, which would require 206,073 tokens and cause an error due to context window limitations.

The execution time differences are noteworthy: MCP Optimizer completes searches in 5.75 seconds on average, while Tool Search Tool takes 12.05 (BM25) and 13.55 seconds (regex). However, this comparison requires context. MCP Optimizer was executed locally in our test environment, while Tool Search Tool operates as an internal Anthropic service with unknown infrastructure requirements and potential network latency.

What This Means

The numbers tell a clear story: MCP Optimizer consistently finds the correct tool 94% of the time, while Tool Search Tool's accuracy hovers around 30-34% in environments with thousands of tools. For production systems where reliability and performance matters, this gap is significant.

The Verdict

Anthropic's Tool Search Tool correctly identifies a real problem facing production AI deployments. The concept of on-demand tool loading is sound, and the token savings are genuine. However, the current implementation isn't production-ready for environments with large tool catalogs. Limited to Claude Sonnet 4.5 and Opus 4.5, it remains a proprietary solution exclusive to Anthropic's ecosystem.

MCP Optimizer, on the other hand, delivers on the promise: reliable tool selection (94% accuracy) combined with significant token savings. Built into the ToolHive runtime as a free and open-source solution, it seamlessly integrates with all major AI clients including Claude Code, GitHub Copilot, Cursor, and others, providing vendor flexibility and broader compatibility across different AI platforms. For teams building AI agents that need to work consistently across hundreds or thousands of tools, this performance difference and deployment flexibility are critical.

Looking Forward

The future of AI agents depends on solving context window constraints without sacrificing reliability. For that future to arrive, we need tool selection systems that work reliably. MCP Optimizer proves that hybrid semantic + keyword search can deliver both token efficiency and production-grade accuracy. As Anthropic's Tool Search Tool matures beyond beta, we hope to see similar reliability gains.

For now, if you're deploying AI agents in production and need dependable tool selection across extensive tool catalogs, the data points to MCP Optimizer as the more reliable choice.

Interested in learning more about MCP Optimizer? Check out the ToolHive documentation or visit stacklok.com.

Deploying an Okta-Authenticated BigQuery MCP Server on Kubernetes with ToolHive

Yolanda Robla Mota — Wed, 19 Nov 2025 09:56:11 +0000

In my previous article, I showed how to connect Okta authentication to a BigQuery MCP server running locally. The objective was to build a workflow that was secure (with user-level attribution and least privilege roles), short-lived, and that would save you the pain of managing Google service-account keys. That setup worked perfectly for local development, but it wasn’t something I’d confidently hand off to production.
This time, we’ll take that local prototype and transform it into a production-ready, cloud-native deployment running on Kubernetes, secured by Okta, and managed end-to-end by the ToolHive Operator. We’ll even make it accessible remotely through ngrok, so you can connect to it from anywhere using VS Code.

Setting the Stage

Before diving in, let’s make sure we have the right pieces in place. You’ll need a Kubernetes cluster (I’ll be using kind for simplicity), along with kubectl and helm. You’ll also need an Okta account with an authorization server configured, and a Google Cloud project with BigQuery enabled.
If you haven’t already, set up Workload Identity Federation in your Google Cloud project. That’s what allows Google Cloud to trust Okta tokens and issue temporary credentials for BigQuery access.
Finally, install the ToolHive CLI (thv) and sign up for an ngrok account — we’ll use both to expose your service later on.

Deploying the ToolHive Operator

Let’s start by getting the ToolHive Operator running in our cluster. The operator is what manages the lifecycle of MCP servers — it handles the pods, proxies, authentication, and updates automatically.
I’m using kind to create a local cluster:

kind create cluster --name toolhive

Next, install the ToolHive CRDs and the operator itself:

helm upgrade --install toolhive-operator-crds \
  oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds

helm upgrade --install toolhive-operator \
  oci://ghcr.io/stacklok/toolhive/toolhive-operator \
  --namespace toolhive-system --create-namespace

A quick check confirms the operator is running:

kubectl get pods -n toolhive-system

You should see something like:

toolhive-operator-7875c8c5cd-xxxxx   1/1     Running   0   30s

With that, our cluster is ready to start managing MCP servers.

Storing the Okta Secret

The next step is to give ToolHive access to your Okta client secret. This allows the proxy to validate incoming tokens. Instead of hardcoding secrets, Kubernetes encourages us to store them in a dedicated Secret resource.
Here’s the YAML to create one:

apiVersion: v1
kind: Secret
metadata:
  name: okta-client-secret
  namespace: default
type: Opaque
stringData:
  client-secret: <YOUR_OKTA_CLIENT_SECRET>

Save that as 00-okta-client-secret.yaml and apply it:

kubectl apply -f 00-okta-client-secret.yaml

Setting Up Token Exchange

To allow Okta to exchange its tokens for Google Cloud credentials, we’ll define an MCPExternalAuthConfig resource. This tells ToolHive how to talk to Google’s Security Token Service (STS) and request access tokens for BigQuery.
Here’s the config:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPExternalAuthConfig
metadata:
  name: bigquery-token-exchange
  namespace: default
spec:
  type: tokenExchange
  tokenExchange:
    tokenUrl: https://sts.googleapis.com/v1/token
    audience: //iam.googleapis.com/projects/<YOUR_PROJECT_NUMBER>/locations/global/workloadIdentityPools/okta-pool/providers/okta-provider
    subjectTokenType: id_token
    scopes:
      - https://www.googleapis.com/auth/bigquery
      - https://www.googleapis.com/auth/cloud-platform

Apply it with:

kubectl apply -f 01-external-auth-config.yaml

This configuration acts as a bridge between Okta and Google Cloud, handling the secure exchange behind the scenes.

Deploying the BigQuery MCP Server

Now we can create the MCP server that will connect VS Code to BigQuery. This configuration ties together the image, authentication, and proxy.
We need to expose a public endpoint that is the resourceURL. For that, we can use a service like ngrok. Configure a domain in the ngrok dashboard or note your automatically-generated “dev domain” if you’re on a free account. Configure that properly on the custom resource, along with the other settings indicated with :

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: database-toolbox-bigquery
  namespace: default
spec:
  image: us-central1-docker.pkg.dev/database-toolbox/toolbox/toolbox:0.19.1
  env:
    - name: BIGQUERY_PROJECT
      value: <YOUR_GCP_PROJECT_ID>
    - name: BIGQUERY_USE_CLIENT_OAUTH
      value: "true"

  args:
    - --prebuilt
    - bigquery
    - --address
    - 0.0.0.0

  transport: streamable-http
  proxyPort: 8000
  mcpPort: 5000

  oidcConfig:
    type: inline
    resourceUrl: https://<YOUR_NGROK_DOMAIN>.ngrok-free.app/mcp   # Replace with your ngrok URL
    inline:
      issuer: https://<YOUR_OKTA_DOMAIN>.okta.com/oauth2/<YOUR_AUTH_SERVER_ID>
      audience: //iam.googleapis.com/projects/<YOUR_PROJECT_NUMBER>/locations/global/workloadIdentityPools/okta-pool/providers/okta-provider
      clientId: <YOUR_OKTA_CLIENT_ID>
      clientSecretRef:
        name: okta-client-secret
        key: client-secret

  externalAuthConfigRef:
    name: bigquery-token-exchange

  resources:
    limits:
      cpu: "1"
      memory: "512Mi"
    requests:
      cpu: "100m"
      memory: "128Mi"

Apply it with:

kubectl apply -f 02-mcp-server-bigquery.yaml

Kubernetes will create two pods: one running the MCP server, and another running the ToolHive proxy.

Exposing the Service Publicly

Once the MCP server is running, we can expose it publicly to be reachable by authentication endpoints and clients. This means we’ll temporarily expose the service, create a tunnel through ngrok using ToolHive’s built-in support, and grab that domain before proceeding.
Start by forwarding the proxy service locally:

kubectl port-forward -n default svc/database-toolbox-bigquery-proxy-svc 8000:8000

This makes the MCP proxy accessible at http://127.0.0.1:8000.

Now, use the ToolHive CLI to open a secure tunnel with ngrok:

thv proxy tunnel http://127.0.0.1:8000 tunnel \
  --tunnel-provider ngrok \
  --provider-args '{"auth-token": "<YOUR_NGROK_AUTH_TOKEN>", “url”: “https://<YOUR_NGROK_DOMAIN>.ngrok-free.app”}'

ToolHive will create the tunnel and print a line like:

✔ Tunnel created
Public URL: https://<YOUR_NGROK_DOMAIN>.ngrok-free.app

If you want more background on this tunneling feature, the ToolHive team has a nice write-up: Exposing a Kubernetes-Hosted MCP Server with ToolHive + ngrok (with Basic Auth)

Verifying the Deployment

After a few moments, confirm everything’s running:

kubectl get pods -n default -l toolhive-name=database-toolbox-bigquery

You should see two pods in the “Running” state — one for the server, one for the proxy.
If you’d like to peek under the hood, tail the proxy logs to see the authentication and token exchange process in action:

kubectl logs -n default -l app.kubernetes.io/instance=database-toolbox-bigquery-proxy --tail=50

You should see debug lines referencing token validation and the STS endpoint.

Connect from VS Code

Once your MCP server is running, secured, and exposed via your public ngrok URL (for example: https://abc123.ngrok-free.app/mcp), you’ll use VS Code’s MCP support to connect.

Open VS Code. Make sure you have the MCP / Copilot Chat extension installed and enabled.
Open the Command Palette (Ctrl+Shift+P or ⌘+Shift+P) and run “MCP: Add Server” (or you can open the mcp.json configuration manually).
When prompted, enter a JSON configuration like this:

{
  "servers": {
    "toolbox": {
      "url": "https://<YOUR_NGROK_DOMAIN>.ngrok-free.app/mcp",
      "type": "http"
    }
  },
  "inputs": []
}

The "type": "http" indicates you’re connecting over HTTP transport.

After saving/accepting this config, VS Code will attempt to connect to the MCP server. During this process it will prompt you to enter the Client ID and the Client Secret from your Okta app
These credentials allow VS Code to authenticate and authorize with the server according to the MCP/OIDC handshake.
Once the authentication completes, the server will appear in your MCP server list. You can open the Chat view, select the MCP tools (e.g., query_bigquery, list_datasets, etc.), and issue queries or commands as needed.
Try a test query to confirm everything is working:

Wrapping Up

We’ve come a long way from a local Okta-authenticated server to a fully managed, cloud-ready Kubernetes deployment. Now you have a secure, scalable, and remote-accessible BigQuery MCP server managed entirely by ToolHive.
This setup combines Okta’s identity management, Google Cloud’s token exchange, and Kubernetes automation into a single cohesive workflow. The result is a developer-friendly environment that’s easy to scale and safe to expose beyond your local machine.
If you’re interested in exploring further, join the ToolHive Discord community to share what you’ve built. The possibilities with ToolHive, Okta, and Kubernetes together are just getting started.

How to use Okta to remotely authenticate to your BigQuery MCP Server

Yolanda Robla Mota — Thu, 06 Nov 2025 12:07:50 +0000

This article builds on our previous post, where we explored the high-level architecture of token exchange, identity federation, and how to run MCP servers in a secure and IdP-agnostic way. Now we shift into the hands-on phase: how to use ToolHive to enable an MCP server to query Google BigQuery for users authenticated via Okta. While we use Okta and Google Cloud as the example stack, this flow is adaptable to any IdP and any cloud provider with a compatible STS / federation service.

Scenario overview

You run an MCP server that receives requests from users who are authenticated via Okta.
The MCP server must execute queries in Google Cloud BigQuery.
You don’t want to manage Google service-account keys, embed JSON credentials in config, or lose per-user audit.
You want: user-level attribution, least-privilege roles, secure, short-lived access, and federation between Okta and Google Cloud.

In this example, we’re implementing the IdP federation approach described as scenario “B” in the previous blog post. The diagram below shows how ToolHive, Okta, and Google Cloud interact in this flow.

Prerequisites

Before you start, make sure you have:

Okta admin access: You’ll need permissions to create an OIDC app and an authorization server.
A Google Cloud project: With BigQuery enabled and permissions to create a Workforce Identity Pool.
ToolHive CLI: download it from toolhive.dev and confirm it’s in your system path.
Container runtime: Docker, Podman, or Rancher Desktop are supported.
An MCP client such as Claude Code (or any other client supporting the MCP protocol).

Detailed configuration steps

Step 1: Configure Okta as Identity Provider

In the Okta Admin Console, navigate to Applications → Applications and click Create App Integration. See https://help.okta.com/en-us/content/topics/apps/apps_app_integration_wizard_oidc.htm
Choose OIDC – OpenID Connect and then Web Application for the app type.
Configure the sign-in redirect URI to http://localhost:8666/callback (this is the callback needed for the MCP server that we will run later using ToolHive).

IMPORTANT: Note the client ID and client secret; you’ll need them in later steps.

Step 2: Create an Authorization Server in Okta

Your OIDC app issues tokens via an Authorization Server. For the Workforce Federation and token exchange, you need one configured correctly.

In the Okta Admin Console, Navigate to Security → API → Authorization Servers.
Click Add Authorization Server.
Name: BigQuery MCP Server (or any descriptive name)
Audience: set this to match the audience expected by your MCP server configuration (for example, mcpserver).
Click Save.
Configure an additional gcp.access scope:
And the access policies for the types of tokens to generate, including Token Exchange:

With this setup, Okta will:

Issue standards-compliant OIDC tokens to your MCP server through ToolHive.
Include the claims Google Cloud expects during the token exchange.

IMPORTANT: Note the issuer URL for the Authorization Server, you’ll need it in the next steps.

Step 3: Create Workforce Identity Pool in Google Cloud

In the Google Cloud console, create a Workforce Identity Pool and a matching provider, using the Issuer URL you noted in the previous step:
Define custom audiences. The Okta client ID needs to be passed as an audience, so start by copying the default audience. Then select Allowed audiences, add the default value, and include your Okta client ID as well.

Configure permissions for the Okta user so they can read BigQuery data. Repeat this for each user you want to map:

gcloud projects add-iam-policy-binding <PROJECT_NAME> \
--member="principalSet://iam.googleapis.com/projects/<PROJECT_ID>/locations/global/workloadIdentityPools/okta-pool/attribute.email/<MAPPED_OKTA_EMAIL>" \
--role="roles/bigquery.dataViewer"

Step 4: Deploy MCP server + proxy with remote authentication via ToolHive

In this step, we bring together the MCP server and the remote authentication/federation flow. Using ToolHive, we’ll run the server and wrap it with a proxy that handles user authentication with Okta and token exchange into Google Cloud.

Start by creating a group. ToolHive automatically manages clients registered to your default group, adding or removing MCP servers as you run them. Since this server will sit behind an authenticated proxy, we don’t want that auto-configuration behavior, so we’ll create a separate group for it instead:

thv group create toolbox-group

Then start the open source MCP Toolbox for Databases server using the ToolHive CLI. ToolHive automatically pulls the server image using metadata from the ToolHive registry. You can view details about the image with thv registry info database-toolbox.

thv run --group toolbox-group database-toolbox \
--env BIGQUERY_PROJECT=<YOUR_PROJECT_ID> \
--env BIGQUERY_USE_CLIENT_OAUTH=true \
--proxy-port 6000 \
-- --prebuilt bigquery --address 0.0.0.0

Here’s what each parameter does:
--group toolbox-group: Name of the ToolHive group that the MCP server belongs to
database-toolbox: The MCP server image from the ToolHive registry
--env BIQUERY_PROJECT: Your Google Cloud project ID containing BigQuery resources
--env BIGQUERY_USE_CLIENT_OAUTH=true: Use the OAuth flow instead of static service account credentials
--proxy-port: Port exposed on your host for the containerized MCP server
--: CLI arguments passed into the MCP server
--prebuilt bigquery: Use the prebuilt configuration for BigQuery
--address 0.0.0.0: Bind the server to all network interfaces so the proxy can reach it

ToolHive spins up the MCP server container and HTTP proxy process, ready to handle BigQuery queries using the MCP protocol. Using ToolHive ensures the server is containerized, isolated, and managed securely — avoiding the “run-it-manually” friction.

Next, the thv proxy command starts a proxy process that sits in front of the MCP server and handles all incoming requests. It prompts you to sign in with Okta, exchanges your Okta token for a Google Cloud access token, and then forwards your request to the MCP server using that token.

thv proxy \
  --target-uri http://127.0.0.1:6000 \
  --remote-auth-client-id <OKTA_CLIENT_ID> \
  --remote-auth-client-secret <OKTA_CLIENT_SECRET> \
  --remote-auth okta \
  --remote-auth-issuer <AUTHORIZATION_SERVER_URL> \
  --remote-auth-callback-port 8666 \
  --remote-auth-scopes 'openid,profile,email,gcp.access' \
  --port 62614 \
  --token-exchange-url https://sts.googleapis.com/v1/token \
  --token-exchange-scopes 'https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/cloud-platform' \
  --token-exchange-audience //iam.googleapis.com/projects/<GOOGLE_PROJECT_NUMBER>/locations/global/workloadIdentityPools/okta-pool/providers/okta-provider

Here’s what each flag does:
--target-uri: Points to the MCP server’s proxy port (from the previous step)
--remote-auth-client-id: Client ID of your Okta app (from step 1)
--remote-auth-client-secret: Client secret of your Okta app (from step 1)
--remote-auth okta: Specifies the remote auth provider
--remote-auth-issuer: URL of the Okta authorization server’s issuer (from step 2)
--remote-auth-callback-port: Local port used for the OAuth callback (must match the callback URL used in step 1)
--remote-auth-scopes: Scopes requested from Okta during authentication
--port: Port the ToolHive proxy exposes to clients
--token-exchange-url: Google STS endpoint for exchanging tokens
--token-exchange-scopes: Google Cloud scopes required to access BigQuery and related APIs
--token-exchange-audience: Google Workload Identity Pool audience for Okta federation

When your browser opens, sign in with Okta. The proxy uses your Okta credentials to generate ID tokens, exchange them for valid Google tokens with the right scopes, and then continues the request automatically.

Step 5: Run the MCP server with Claude or another client

Let’s use Claude Code as an example. Because ToolHive doesn’t automatically manage client configurations for proxied MCP servers, you’ll need to add it manually:

# Add the authenticated ToolHive proxy
claude mcp add --scope user --transport http database-toolbox http://127.0.0.1:62614/mcp

# Run Claude Code
claude

The Toolbox MCP server uses the token provided by the ToolHive proxy and passes it to Google Cloud, giving you access to the resources available to your account.

Any other MCP-compatible client can connect the same way. Just point it to the ToolHive proxy endpoint.

Why this architecture is powerful

Simple for clients: Apps connect to the ToolHive proxy just like any other MCP server endpoint.
Secure authentication flow: The proxy makes you log in through Okta, so every request carries a verified user identity.
Federated access to Google Cloud: Instead of embedding service account keys in your server, the proxy handles a token exchange so Google recognizes your identity through the workforce identity provider.
Least-privilege and auditable: BigQuery jobs run under your federated Okta identity, so logs show “user@domain.com ran a BigQuery job” rather than “service-account X”.
Separation of concerns: The MCP server (Toolbox) focuses on data tools and queries, while the proxy handles auth, token exchange, and routing. It’s a cleaner, safer architecture.

Of course, it’s easy to get started with ToolHive, since it’s free and open source. I encourage you to visit toolhive.dev, where you can download the project and explore our docs.

Using Token Exchange with ToolHive and Okta for MCP Server to GraphQL Authentication

Yolanda Robla Mota — Tue, 04 Nov 2025 16:37:21 +0000

This article builds on our previous post, where we introduced the core concepts of token exchange and its role in secure authentication. Here, we delve into a practical application, demonstrating how to leverage Okta and ToolHive to facilitate token exchange for authenticating an MCP server with a GraphQL API.

Environment

This demo mimics a (hopefully!) real world example where we run an API service and we want to expose it with an MCP server. The back end API requires a token with aud=backend and scopes=[backend-api:read].

"Aud" (audience) in a token specifies the intended recipient of the token, indicating which service or application is meant to consume it. "Scopes" define the specific permissions or access rights granted by the token, detailing what actions the token holder is authorized to perform. Only tokens having the expected audience and the expected scopes authorize the caller to use the service.

We don’t want to expose the back end service directly to the AI client, but only through the MCP server. We also want to maintain a clean audit trail showing us who accessed what.

The MCP server requires a token with aud=mcpserver and scopes=mcp:tools:call.

Both the API service and the MCP server are part of the same Okta realm, but we’ll use different Authorization Servers to ensure that both the token the MCP server receives and the token use different audiences.

We’ll simulate the whole flow as a developer connecting to this setup by adding the MCP server to VSCode and calling the tools it provides.

It should be noted that in this example, we’ll be using an Apollo-based GraphQL service as the backend API service and the existing Apollo MCP server, but the same setup applies to any kind of API services as long as they both use OAuth tokens from the same realm as the authentication mechanism.

In order to follow along, you can clone the Apollo GraphQL service from a demo repository.

Okta setup

I’ve used the Okta integrator setup to prepare this demo and therefore the instructions cover the whole setup from the ground up including creating the Authorization Servers. This is likely not needed or needs to be adjusted in a real world environment.

Authorization Servers

To logically separate the MCP server from the back end API service, we’ll configure two Okta Authorization servers - one for the MCP server and client and the other for the backend server.

Create the Authorization Servers and then the following scopes:

mcpserver AS mcp:tools:call
backend AS backend-api:read

Trust between authorization servers

In order to enable token exchange between two authorization servers - the one that issues tokens for access to the MCP server and the one that issues tokens for accessing the back end, we need to establish trust between the two.

Go to the back end AS and down at the settings tab, add the mcpserver AS as trusted:

Applications

We’ll set up two Applications:

A VSCode client to authenticate to the MCP server. We create a client directly to avoid Dynamic Client registration. This will be an OIDC application with a client ID and a secret. It is important to match the Redirect URIs that VSCode uses. Set the Redirect URIs to http://127.0.0.1:33418 and https://vscode.dev/redirect
A toolhive client that will perform the Token Exchange. This is an API Services type in Okta lingo. To create the application, go to:
Applications -> Create App Integration and select API Services
Name your application
In the application page, navigate to the General Settings page and uncheck the “Require Demonstrating Proof of Possession” header as this is not yet supported by ToolHive
Check the Token Exchange grant

Policies

In order for applications to authenticate, we need to include them in policies, otherwise Okta will not issue tokens to the clients. We’ll define two policies: One that allows the MCP Client (VSCode) to request tokens with mcp:tools:call and another one that allows the token exchange by the ToolHive process.

MCP client to MCP server

This policy is to be defined on the mcpserver AS side. Select “Add New Access Policy”, then “Assign to the following Clients” and select the VSCode client. When the policy is created, click “Add Rule” in the policy and in the “And the following scopes” section add both the “OpenID Connect” scopes and the mcp:tools:call scopes.

MCP server token exchange

This policy is to be defined on the back end AS side. Select “Add New Access Policy”, then “Assign to the following Clients” and select the ToolHive client. When adding the rule, don’t forget to unroll “Advanced” under the “If Grant Type Is” section and add Token Exchange. Add “backend-api:read” to the scopes.

Running the GraphQL server

Let’s clone our server locally:

git clone https://github.com/StacklokLabs/apollo-mcp-auth-demo

Next, let’s configure the IDP settings in the .env file:

cp .env.example .env
vim .env

Using my Okta integrator account, the .env file looks as follows:

# Okta Configuration
# Your Okta domain (e.g., dev-123456.okta.com)
OKTA_DOMAIN=integrator-3683736.okta.com

# Your Okta issuer URL (authorization server)
# For default authorization server: https://your-domain.okta.com/oauth2/default
# For custom authorization server: https://your-domain.okta.com/oauth2/{authServerId}
OKTA_ISSUER=https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697

# JWT Validation Configuration
# Expected audience in JWT tokens (space-separated if multiple)
OKTA_AUDIENCE=backend
# Required scopes in JWT tokens (space-separated)
REQUIRED_SCOPES=backend-api:read

# Authentication Configuration
# Set to 'true' to require valid tokens for all requests (recommended)
# Set to 'false' to disable authentication requirement (for testing)
REQUIRE_AUTH=true

# Server Configuration
PORT=4000

Now we’re ready to start the server:

npm install
npm start

Running ToolHive

In our testing, we’re using the already existing Apollo MCP server with no modifications - all the heavy lifting is done by ToolHive. The Apollo MCP server is merely configured to accept the downstream authentication token in the Authorization: Bearer HTTP header and forward it to the external API.
The MCP server configuration can be found in the mcp-server-data directory in the demo repository.

Because the unmodified MCP server also validates the incoming tokens, we need to set the transport.auth.servers attribute in the config file to the back end Authorization server:

vim mcp-server-data/apollo-mcp-config.yaml

...
transport:
  type: sse
  port: 8000
  auth:
    servers:
      - https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697
...

Now we can run the server with:

thv run \
--debug \
--foreground \
--transport streamable-http \
--name apollo \
--target-port 8000 \
--proxy-port 8000 \
--volume $(pwd)/mcp-server-data/apollo-mcp-config.yaml:/config.yaml \
--volume $(pwd)/mcp-server-data:/data \
--oidc-audience mcpserver \
--resource-url http://localhost:8000/mcp \
       --oidc-issuer https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697 \
--oidc-jwks-url https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697/v1/keys \
--token-exchange-audience backend \
--token-exchange-client-id 0oawdgw7krVBSwzIx697 \
--token-exchange-client-secret O2zqVb-evhKgfBOD-PRVDs5HFyCXAnRZAwxAtQOH9oGt72aBrLBiwEVlyyTengj9 \
--token-exchange-scopes backend-api:read \
--token-exchange-url https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token \
apollo-mcp-server -- /config.yaml

Let’s unpack the parameters:
--oidc-audience mcpserver - When the OIDC token from VSCode arrives to toolhive, then toolhive checks if the token’s aud field matches this value and rejects the connection otherwise

--resource-url http://localhost:9090/mcp - Setting the resource explicitly helps VSCode discover the proper Protected Resource Metadata Endpoint as per the MCP specification and in effect points VSCode to the Okta instance. Typically not needed in e.g. Kubernetes environments where the service name can be used

--oidc-issuer https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697 - This is the issuer of the mcpserver Authorization Server (see the first screenshot of the document)

--oidc-jwks-url https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697/v1/keys - The JWKS endpoint of the mcpserver Authorization Server

--token-exchange-audience 'backend' - We want ToolHive to take the incoming tokens and exchange them for tokens with audience of “backend”

--token-exchange-client-id 0oawdgw7krVBSwzIx697 - The Client ID of the “ToolHive client”, the one who has assigned the token exchange policy to itself

--token-exchange-client-secret O2zqVb-evhKgfBOD-PRVDs5HFyCXAnRZAwxAtQOH9oGt72aBrLBiwEVlyyTengj9 - the client secret of the ToolHive client. Outside demos, please use the --token-exchange-client-secret-file switch instead, or the TOOLHIVE_TOKEN_EXCHANGE_CLIENT_SECRET environment variable.

--token-exchange-scopes 'backend-api:read' - The scopes we request for the external token. Must match what’s in the policy.

--token-exchange-url [https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token](https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token) - the token endpoint of the back end Authorization Server.

Note that the example above uses thv run, but it’s equally possible to use the token exchange from thv proxy which can then also provide authentication to the MCP server:

thv proxy demo-mcp-server \
    --target-uri http://localhost:8091 \
    --port 3000 \
    --remote-auth \
    --remote-auth-client-id 0oawdhc2mlgHOwNvW697 \
    --remote-auth-client-secret Ag0Zj6ALuxxqascP6KJ-CA4uCRcOLmIKtQeR_o3ClGgxMxx0zcgZYYtg-TmHF6U- \
    --remote-auth-issuer https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697 \
    --remote-auth-scopes 'mcp:tools:call,openid,email' \
    --token-exchange-audience 'backend' \
    --token-exchange-client-id 0oawdgw7krVBSwzIx697 \
    --token-exchange-client-secret O2zqVb-evhKgfBOD-PRVDs5HFyCXAnRZAwxAtQOH9oGt72aBrLBiwEVlyyTengj9 \
    --token-exchange-scopes 'backend-api:read' \
    --token-exchange-url https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token

Authentication from VSCode and putting it all together

Once the server is running, it should automatically appear in the list of the configured MCP servers in VSCode. Clicking Start will prompt authentication against Okta. The first time, you’ll be prompted to enter the client ID and secret as well. Once Okta authenticates, VSCode receives the token, uses it to authenticate to the MCP server (toolhive) which exchanges the token which enables calling the back end API.

Past the initial setup on the IDP side, authentication and authorization to the MCP server fronted by ToolHive and by extension the back end service is seamless and allows partition access to the back end services as well as provides a cleaner audit trail.

As the last step, we can invoke one of the MCP tools to verify the setup end-to-end:

As seen on the screenshot above, the GetCountry tool of the Apollo server was called and returned a reply! If we check the logs of the API server we ran earlier we also see details of the token that was validated:

This token has different audience than the one passed to the ToolHive - if you recall the thv run parameters, they specified, through the --oidc-audience mcpserver argument that the tokens must set the aud claim to mcpserver while the token that arrived to the back end API has audience backend. Looking closely at the issuer, we also see that the token was issued by the back end Authorization Server, while the tokens issued to authenticate to ToolHive were issued by the mcpserver Authorization Server. This shows that the token exchange works correctly. In the next section, we’ll illustrate for completeness’ sake how the tokens look exactly and how the whole flow works.

The token exchange under the hood

The flow is described in the Mermaid diagram below.

The client authenticates to the toolhive which exposes the interface and endpoints as the MCP standard describes. The toolhive authentication middleware verifies the token was issued by the expected IDP and has the expected audience. After authentication, the token is then passed to the Token Exchange middleware which contacts the IDP and exchanges the token meant for the MCP server for the token meant for the external service.

The token issued to the client might look like this (simplified):

{
    "iss": https://idp.example.com/oauth2/default",
    "aud": "mcp-server",
    "scp": [
        "backend-mcp:tools:call",
        "backend-mcp:tools:list",
    ],
    "sub": "user@example.com",
}

While the exchanged token would have different scopes and a different audience, allowing the MCP server to authenticate to the back end service:

{
    "iss": https://idp.example.com/oauth2/default",
    "aud": "backend-server",
    "scp": [
        "backend-api:read",
    ],
    "sub": "user@example.com",
}

This exchanged token is then injected into the Authorization: Bearer HTTP header and passed on to the actual MCP server running under Toolhive. The MCP server can then use the token.

Summary and benefits

By leveraging token exchange, ToolHive enables MCP servers to authenticate to third-party APIs in a secure, efficient, and tenant-aware way. MCP servers receive properly scoped, short-lived access tokens instead of embedding long-lived secrets or bespoke authentication logic. Each API call made upstream can be attributed to the individual user identity rather than a generic service account, making audit trails clearer and more meaningful.

References

https://modelcontextprotocol.io/docs/tutorials/security/authorization

https://developer.okta.com/docs/guides/set-up-token-exchange/main/

Using Token Exchange with ToolHive and Okta for MCP Server to GraphQL Authentication

Yolanda Robla Mota — Tue, 04 Nov 2025 16:37:21 +0000

Environment

We don’t want to expose the back end service directly to the AI client, but only through the MCP server. We also want to maintain a clean audit trail showing us who accessed what.

The MCP server requires a token with aud=mcpserver and scopes=mcp:tools:call.

We’ll simulate the whole flow as a developer connecting to this setup by adding the MCP server to VSCode and calling the tools it provides.

In order to follow along, you can clone the Apollo GraphQL service from a demo repository.

Okta setup

Authorization Servers

To logically separate the MCP server from the back end API service, we’ll configure two Okta Authorization servers - one for the MCP server and client and the other for the backend server.

Create the Authorization Servers and then the following scopes:

mcpserver AS mcp:tools:call
backend AS backend-api:read

Trust between authorization servers

Go to the back end AS and down at the settings tab, add the mcpserver AS as trusted:

Applications

We’ll set up two Applications:

A VSCode client to authenticate to the MCP server. We create a client directly to avoid Dynamic Client registration. This will be an OIDC application with a client ID and a secret. It is important to match the Redirect URIs that VSCode uses. Set the Redirect URIs to http://127.0.0.1:33418 and https://vscode.dev/redirect
A toolhive client that will perform the Token Exchange. This is an API Services type in Okta lingo. To create the application, go to:
Applications -> Create App Integration and select API Services
Name your application
In the application page, navigate to the General Settings page and uncheck the “Require Demonstrating Proof of Possession” header as this is not yet supported by ToolHive
Check the Token Exchange grant

Policies

MCP client to MCP server

MCP server token exchange

Running the GraphQL server

Let’s clone our server locally:

git clone https://github.com/StacklokLabs/apollo-mcp-auth-demo

Next, let’s configure the IDP settings in the .env file:

cp .env.example .env
vim .env

Using my Okta integrator account, the .env file looks as follows:

# Okta Configuration
# Your Okta domain (e.g., dev-123456.okta.com)
OKTA_DOMAIN=integrator-3683736.okta.com

# Your Okta issuer URL (authorization server)
# For default authorization server: https://your-domain.okta.com/oauth2/default
# For custom authorization server: https://your-domain.okta.com/oauth2/{authServerId}
OKTA_ISSUER=https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697

# JWT Validation Configuration
# Expected audience in JWT tokens (space-separated if multiple)
OKTA_AUDIENCE=backend
# Required scopes in JWT tokens (space-separated)
REQUIRED_SCOPES=backend-api:read

# Authentication Configuration
# Set to 'true' to require valid tokens for all requests (recommended)
# Set to 'false' to disable authentication requirement (for testing)
REQUIRE_AUTH=true

# Server Configuration
PORT=4000

Now we’re ready to start the server:

npm install
npm start

Running ToolHive

Because the unmodified MCP server also validates the incoming tokens, we need to set the transport.auth.servers attribute in the config file to the back end Authorization server:

vim mcp-server-data/apollo-mcp-config.yaml

...
transport:
  type: sse
  port: 8000
  auth:
    servers:
      - https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697
...

Now we can run the server with:

thv run \
--debug \
--foreground \
--transport streamable-http \
--name apollo \
--target-port 8000 \
--proxy-port 8000 \
--volume $(pwd)/mcp-server-data/apollo-mcp-config.yaml:/config.yaml \
--volume $(pwd)/mcp-server-data:/data \
--oidc-audience mcpserver \
--resource-url http://localhost:8000/mcp \
       --oidc-issuer https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697 \
--oidc-jwks-url https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697/v1/keys \
--token-exchange-audience backend \
--token-exchange-client-id 0oawdgw7krVBSwzIx697 \
--token-exchange-client-secret O2zqVb-evhKgfBOD-PRVDs5HFyCXAnRZAwxAtQOH9oGt72aBrLBiwEVlyyTengj9 \
--token-exchange-scopes backend-api:read \
--token-exchange-url https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token \
apollo-mcp-server -- /config.yaml

--oidc-issuer https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697 - This is the issuer of the mcpserver Authorization Server (see the first screenshot of the document)

--oidc-jwks-url https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697/v1/keys - The JWKS endpoint of the mcpserver Authorization Server

--token-exchange-audience 'backend' - We want ToolHive to take the incoming tokens and exchange them for tokens with audience of “backend”

--token-exchange-client-id 0oawdgw7krVBSwzIx697 - The Client ID of the “ToolHive client”, the one who has assigned the token exchange policy to itself

--token-exchange-client-secret O2zqVb-evhKgfBOD-PRVDs5HFyCXAnRZAwxAtQOH9oGt72aBrLBiwEVlyyTengj9 - the client secret of the ToolHive client. Outside demos, please use the --token-exchange-client-secret-file switch instead, or the TOOLHIVE_TOKEN_EXCHANGE_CLIENT_SECRET environment variable.

--token-exchange-scopes 'backend-api:read' - The scopes we request for the external token. Must match what’s in the policy.

--token-exchange-url [https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token](https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token) - the token endpoint of the back end Authorization Server.

Note that the example above uses thv run, but it’s equally possible to use the token exchange from thv proxy which can then also provide authentication to the MCP server:

thv proxy demo-mcp-server \
    --target-uri http://localhost:8091 \
    --port 3000 \
    --remote-auth \
    --remote-auth-client-id 0oawdhc2mlgHOwNvW697 \
    --remote-auth-client-secret Ag0Zj6ALuxxqascP6KJ-CA4uCRcOLmIKtQeR_o3ClGgxMxx0zcgZYYtg-TmHF6U- \
    --remote-auth-issuer https://integrator-3683736.okta.com/oauth2/ausw8f1ut6X0WMjZN697 \
    --remote-auth-scopes 'mcp:tools:call,openid,email' \
    --token-exchange-audience 'backend' \
    --token-exchange-client-id 0oawdgw7krVBSwzIx697 \
    --token-exchange-client-secret O2zqVb-evhKgfBOD-PRVDs5HFyCXAnRZAwxAtQOH9oGt72aBrLBiwEVlyyTengj9 \
    --token-exchange-scopes 'backend-api:read' \
    --token-exchange-url https://integrator-3683736.okta.com/oauth2/auswdh3wurjeJ62La697/v1/token

Authentication from VSCode and putting it all together

As the last step, we can invoke one of the MCP tools to verify the setup end-to-end:

The token exchange under the hood

The flow is described in the Mermaid diagram below.

The token issued to the client might look like this (simplified):

{
    "iss": https://idp.example.com/oauth2/default",
    "aud": "mcp-server",
    "scp": [
        "backend-mcp:tools:call",
        "backend-mcp:tools:list",
    ],
    "sub": "user@example.com",
}

While the exchanged token would have different scopes and a different audience, allowing the MCP server to authenticate to the back end service:

{
    "iss": https://idp.example.com/oauth2/default",
    "aud": "backend-server",
    "scp": [
        "backend-api:read",
    ],
    "sub": "user@example.com",
}

This exchanged token is then injected into the Authorization: Bearer HTTP header and passed on to the actual MCP server running under Toolhive. The MCP server can then use the token.

Summary and benefits

References

https://modelcontextprotocol.io/docs/tutorials/security/authorization

https://developer.okta.com/docs/guides/set-up-token-exchange/main/

Beyond API Keys: Token Exchange, Identity Federation & MCP Servers

Yolanda Robla Mota — Thu, 30 Oct 2025 11:04:03 +0000

Modern backend systems—especially in the era of AI agents, MCP servers, and multi-cloud architectures—are evolving far beyond static credentials and monolithic identity models. In this post we explore the architecture of token exchange, identity federation, and how a system like ToolHive enables secure deployment of MCP servers in this world.

The legacy problem: static credentials

The MCP authorization specification focuses on how to authorize access to the MCP server itself. It doesn't specify how an MCP server should authenticate with the server it's connecting to. This leaves MCP server creators without clear guidance.

In many deployments of MCP (Model Context Protocol) servers and tooling services today, developers still default to patterns like:

A service-account JSON key or a long-lived API key embedded in configuration.
All calls executed under a single “shared identity” with elevated permissions.
If the key is compromised, the impact spans many users or tenants; rotating or tracking the key is operationally heavy.
Least-privilege is often compromised because the shared identity needs broad access to avoid blocking tool invocation.

This approach doesn’t align with how modern identity systems, federated services and cloud tools are designed. It’s less secure, harder to govern, and doesn’t scale across users or multi‐tenant environments.

Step up: Short-lived tokens via an IdP

A much better pattern emerges when you shift to short-lived tokens:

A user (or service) authenticates via an Identity Provider (IdP) — for example, Okta or Azure AD.
They receive a short-lived token (OIDC ID token or OAuth access token) that's scoped to their identity and minimal permissions.
This token is used to authenticate to the MCP server (with the help of ToolHive), which validates it and establishes the user's identity.
Toolhive then acquires a separate token for the downstream backend API—either through token exchange (if using the same IdP) or federation (if crossing identity domains).
Your MCP server receives this backend-scoped token and uses it when calling downstream services or tools.

Because tokens are scoped, time-limited, and mapped to a specific user context, you get better auditability, enforce least-privilege, and eliminate static credentials. Next, we’ll show you how to ensure that your MCP server always has the right credentials for its backend API without embedding secrets or handling complex auth flows.

Token Exchange & Federation: crossing trust-boundaries

Token exchange refers to the process where one security token (issued by one identity domain) is presented to a “Security Token Service” (STS) or similar endpoint, and in return you receive a new token valid for another domain, audience, or scope.
The standard for this is RFC 8693 (OAuth 2.0 Token Exchange) which lets you request a new token via a grant like urn:ietf:params:oauth:grant-type:token-exchange

Use-cases for token exchange include:

A token issued by your internal IdP being exchanged for a token valid for a cloud provider’s API.
A token from one IdP being reused to obtain tokens in another trust domain without forcing the user to log in again.
A service acting on behalf of a user, exchanging its own token for one with narrower scopes or different audiences.

Two common scenarios

A) The downstream service uses the same IdP as the MCP server

In this case your identity provider (IdP) issues tokens for both the MCP server and the downstream resources. No cross-domain trust is needed.

User authenticates via IdP → obtains a token for the MCP server.
ToolHive validates the token and performs access control checks.
ToolHive exchanges that token with the same IdP for a new token with the downstream service's audience and scopes.
MCP server receives this exchanged token and uses it to call the downstream service. - Simpler, fewer moving parts, since the exchange happens within the same IdP ecosystem.

The token issued to the client might look like this (simplified):

{
   "iss": https://idp.example.com/oauth2/default",
   "aud": "**mcp-server**",
   "scp": [
     "**backend-mcp:tools:call**",
     "**backend-mcp:tools:list**",
   ],
   "sub": "user@example.com",
}

While the exchanged token would have different scopes and a different audience, allowing the MCP server to authenticate to the back end service:

{
    "iss": https://idp.example.com/oauth2/default",
    "aud": "**backend-server**",
    "scp": [
        "**backend-api:read**",
    ],
    "sub": "user@example.com",
}

B) The downstream service uses a different IdP and you rely on federation

Here you have two distinct identity/trust domains: one used by the MCP server (or its IdP) and another used by the back end resource. Instead of issuing separate credentials or having users login twice, you rely on federation and token exchange.

User authenticates via IdP A → receives a token for domain A that is presented to ToolHive
ToolHive validates the token and performs access control checks.
ToolHive presents the token to an STS or federation service (e.g., Google Cloud STS) → obtains a federated token valid for domain B (cloud provider).
Downstream service validates the token from domain B and executes requests under that identity.

This approach enables your system to be IdP-agnostic and cloud-agnostic: authenticate with any IdP, then federate into any trust-configured domain.

The token issued to the client might look like this (simplified):

{
  "iss": "**https://idp.example.com/oauth2/default**",
  "aud": "**mcp-server**",
  "sub": "user@example.com",
  "email": "user@example.com",
  "scp": [
    "**mcp:tools:call**",
    "**mcp:tools:list**"
  ],
  "exp": 1729641600,
  "iat": 1729638000
}

The exchanged federated access token would have a different issuer, audience, and scopes, allowing the MCP server to authenticate to the upstream service as the federated user identity:

{
  "iss": "**https://sts.googleapis.com**",
  "aud": "**https://bigquery.googleapis.com/**",
  "sub": "**principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/subject/user@example.com**",
  "email": "user@example.com",
  "scp": [
    "**https://www.googleapis.com/auth/bigquery**",
  ],
  "exp": 1729641600,
  "iat": 1729638000
}

Why this matters for MCP servers

MCP servers are often deployed to call different services on behalf of users. If they rely on static credentials or simplistic “shared identity” models, you lose user-level attribution, least-privilege control, and auditability.
By using token exchange + federation, you allow your MCP server to operate under the right identity context, even when the target service sits in a different trust domain.
It also lets you design your architecture so the authentication piece (login, token issuance) is decoupled from the MCP server logic — the server can remain auth-agnostic and medium-agnostic.

Where ToolHive fits

ToolHive simplifies deployment of MCP servers by handling the operational and security heavy-lifting.

You run your MCP servers in containers with minimal permissions and network access — ToolHive manages that.
ToolHive acts as a gateway: it verifies the user's token (via your IdP), enforces access policies, then acquires the appropriate backend token—either through exchange or federation—before passing that to your MCP server.
This separation means your MCP server remains auth-agnostic — ToolHive handles authN/authZ and you plug in any IdP or downstream STS.

This blog post is the first in a series. Over the coming posts we’ll dive into a set of practical examples using ToolHive — showing how to wire up different IdPs, federate into different clouds, run MCP servers securely, and deal with real-world edge cases.

Note: ToolHive is an open source project, and we encourage you to download it (from toolhive.dev) and start using it. We value your feedback and would love to engage with you via our GitHub repo and/or Discord channel.

Cut token waste from your AI workflow with the ToolHive MCP Optimizer

Dan Barr — Tue, 28 Oct 2025 17:12:08 +0000

If you’ve ever hit a rate limit in your AI assistant or felt the sting of regret after checking your usage bill, you’re not alone. Whether you’re exploring an open source repo or triaging issues for a sprint, running into token walls is disruptive. It breaks your flow and burns your time and money.

Turns out, there’s a hidden cost in many of today’s AI-enhanced dev workflows: tool metadata bloat. When dozens (or hundreds) of tools get injected into each prompt, it drives up token usage and slows down responses. Input tokens aren’t free, and cluttering the context window with irrelevant content degrades model performance.

At Stacklok, we’ve been working with the Model Context Protocol (MCP) and discovered something surprising. A significant chunk of the tokens burned during AI coding sessions doesn’t come from your prompt, or even the code. It comes from tool descriptions.

MCP Optimizer, now available in ToolHive, tackles this problem at the root. It reduces token waste by acting as a smart broker between your AI assistant and MCP servers.

Where the waste comes from

Let’s say you’ve installed MCP servers for GitHub, Grafana, and Notion. You ask your assistant:

“List the 10 most recent issues from my GitHub repo.”

That simple prompt uses 102,000 tokens (total input & output), not because the task is complex, but because the model receives metadata for 114 tools, most of which have nothing to do with the request.

Other common prompts create similar waste:

“Summarize my meeting notes from October 19, 2025”
uses 240,600 tokens, again with 114 tools injected, even though only the Notion server is relevant
“Search dashboards related to RDS”
consumes 93,600 tokens

In each case, only a small fraction of those tokens are relevant to the task. Even saying “hello” burns more than 46,000 tokens.

Multiply that across even a few dozen prompts per day, and you’re burning millions of tokens on context the model doesn’t need. That’s not just expensive, it’s disruptive. In rate-limited enterprise environments or time-sensitive projects, this inefficiency slows down responses, breaks flow, and cuts directly into productivity.

Introducing MCP Optimizer: Smarter tool selection for leaner prompts

Instead of flooding the model with all available tools, MCP Optimizer introduces two lightweight primitives:

find_tool: Searches for the most relevant tools using hybrid semantic + keyword search
call_tool: Routes the selected tool request to the appropriate MCP server

Here’s how it works:

You send a prompt that requires tool assistance (for example, interacting with a GitHub repo)
The assistant calls find_tool
MCP Optimizer returns the most relevant tools (up to 8 by default, but this is configurable)
Only those tools are included in the context
The assistant uses call_tool to execute the task

The results are dramatic. Using the GitHub, Grafana, and Notion MCP servers from the example above:

Prompt	MCP server used	Without MCP Optimizer	With MCP Optimizer	Token reduction
Hello	None	Tokens*: 46.8k Tools sent: 114	Tokens: 11.2k Tools sent: 3	76%
List the latest 10 issues from the stacklok/toolhive repository.	GitHub	Tokens: 102k Tools sent: 114	Tokens: 32.4k Tools sent: 11	68%
Summarize my meeting notes from Oct 19th 2025	Notion	Tokens: 240.6k Tools sent: 114	Tokens: 86.8k Tools sent: 11	64%
Search the dashboards related to "RDS" in my Grafana workspace	Grafana	Tokens: 93.6k Tools sent: 114	Tokens: 13.7k Tools sent: 11	85%

* Total input & output tokens for the request

By sending only what’s needed, MCP Optimizer reduces total token usage, shortens response times, and prevents the assistant from thrashing through irrelevant tools.

No tokens wasted on excessive metadata. No LLMs spiraling as they try to reason through 100+ tools. Just fast, efficient execution.

Try it now

MCP Optimizer is available today as an experimental feature in the ToolHive desktop app. Here’s how to get started:

Download ToolHive for your platform.
Follow the Quickstart guide and MCP usage guides to install a few MCP servers into the default group (or another group of your choice).
In the Settings (⚙️) screen, enable MCP Optimizer under Experimental Features.
On the MCP Servers screen, click MCP Optimizer, and enable optimization for the default group.
Open the default group and click Manage Clients to connect your favorite AI client.
The optimizer discovers the MCP servers and tools in the default group, and ToolHive automatically connects your clients to the optimizer MCP server.
In your AI client, send prompts that require tool usage, like: “Find a good first issue in the stacklok/toolhive repo to start working on.”

For more, see the full tutorial in the ToolHive documentation.

What’s next

We’re building ToolHive and MCP Optimizer in the open, and your feedback helps shape what comes next.

Explore the project at toolhive.dev and join our community on Discord to share your experiences, suggest features, and help make tool-driven AI workflows faster, safer, and more developer-friendly.

Simplify Your AI Agent Development: Test and Tune MCP Servers Instantly with the ToolHive Playground

Samuele Verzi — Tue, 14 Oct 2025 12:20:05 +0000

Developing capable AI agents means more than just connecting to a model. It requires testing, tuning, and managing the external tools and servers your agents rely on. That’s where the Model Context Protocol (MCP) comes in, enabling agents to interact with real-world systems through well-defined interfaces.

But validating and iterating on those MCP servers can be tedious. The ToolHive playground streamlines that process by giving you a sandboxed, conversational environment to test and tune your MCP servers instantly, no complex configuration required. With the playground, you can move from debugging tools to building smarter, production-ready agents in record time.

ToolHive UI is an open-source project that makes it easy to test and manage MCP servers and their connection to AI clients. You can see the full source code on GitHub repository

Here is how you can leverage the ToolHive playground to simplify your AI agent workflow.

What the playground offers

The playground delivers powerful capabilities, all wrapped in a single, unified interface:

Instant testing: You can immediately validate MCP server functionality. Just enter your AI model API key (such as for Anthropic or OpenAI), select the MCP servers, and begin testing. This eliminates the need for external tooling just to confirm your MCP server works correctly.
Detailed information: Every interaction with your AI agent is meticulously logged. You see the tool's name, the exact input parameters passed to it, the execution status (success or failure), the raw response data, and the timing information. This visibility ensures you understand exactly how your MCP servers respond.
Conversational server management: The playground's built-in MCP server (toolhive mcp) lets you manage your infrastructure using simple natural language commands, no command lines, no manual setup. It's integrated, clear management that feels like a conversation.
Local and remote server support: ToolHive lets you run both local MCP servers (on your machine using Docker) and remote MCP servers (accessed via URL), giving you flexibility in how you deploy and test your tools.

Getting started in the playground

Starting with the playground is straightforward. You only need to complete a few simple setup steps:

Access the playground

Click the playground tab in the ToolHive UI navigation bar.

Configure a provider

Click Configure your API Keys to set up access to your chosen AI model providers.

You can configure multiple accounts to test different models and providers:

OpenAI (for GPT models)
Anthropic (for Claude models)
Google (for Gemini models)
xAI (for Grok models)
OpenRouter (for access to multiple model providers)

Select MCP tools

Click the tools icon to manage which MCP servers and tools are available to the AI model in the playground. Here, you can toggle the availability of tools from each server, and search or filter them. The toolhive mcp management server is enabled by default, providing infrastructure management capabilities for both your local and remote MCP servers.

Start testing

Once configured, you can start a conversation. The model will utilize all enabled MCP tools to respond to your queries.

Testing complex workflows

The playground isn't just for simple server validation, it offers an end-to-end testing environment with the features you'd expect from a modern AI client like rich media attachments and multi-server orchestration.

Multi-server orchestration

You can combine multiple MCP servers to create powerful workflows. For example, enable both a filesystem MCP server and a data processing server simultaneously. The AI can intelligently coordinate between them to read files, process data, and write results—all through natural conversation.

Before testing: Make sure your MCP servers are enabled in ToolHive, running, and also enabled in the playground's tool selection panel.

Example workflow:

Read the JSON file from /projects/data/products.json, analyze the inventory levels, and create a summary report

The AI will use the filesystem server to read the file, process the data using available tools, and provide structured insights.

Rich media attachments

The playground supports attaching images and PDF documents directly in the conversation, just like any modern AI client. This capability is essential for testing document analysis, image processing, or multimodal workflows.

Use cases:

Document processing: Upload a PDF invoice and ask the AI to extract key information using your custom MCP tools
Image analysis: Attach screenshots or diagrams and test how your MCP servers interact with visual data
Data validation: Share files that your MCP servers need to process and verify the output in real-time

End-to-end testing

Because the playground behaves like a production MCP client, you can validate complete user journeys before deployment:

Test tool discovery and selection by the AI
Verify parameter passing and error handling
Validate multi-step workflows that require tool chaining
Confirm proper handling of different file formats and media types

This comprehensive testing environment means you can catch integration issues early, reducing the risk of problems when you connect your MCP servers to external AI clients like GitHub Copilot, Cursor, or other applications.

Conversational power: managing servers with natural language

The true elegance of the playground lies in managing your MCP infrastructure using the same chat interface you use to test its functionality.

The built-in toolhive mcp server enables powerful, conversational commands, offering a streamlined approach with significant benefits:

Feature	Benefit to you
Unified interface	Manage infrastructure using the exact same conversational interface as testing.
Contextual operations	The AI understands your current server state and can make intelligent decisions about which servers to manage.
Reduced complexity	You don't need to switch between traditional command-line interfaces and the chat interface. Everything can be done through conversation.
Observability	All management actions are logged alongside tool executions, providing clear visibility.

For example, to check the running state of all hosted servers, you can simply ask:

Can you list all my MCP servers and show their current status?

The AI executes the list_servers tool, providing immediate, structured feedback directly in the conversation panel:

You can also carry out complex, maintenance-focused requests easily, such as:

Start the fetch MCP server for me
Stop all unhealthy MCP servers
Show me the logs for the meta-mcp server

Recommended practices for effective testing

To get the most out of the playground, keep these best practices in mind:

Isolated testing: Test individual MCP servers one at a time to validate their core functionality.
Integration testing: Enable multiple servers to test how they work together and prevent tool conflicts. Use the same models as in production to ensure consistent behavior and expected tool calls.
Performance validation: Monitor tool execution times under different loads.
Error handling: Intentionally create error conditions to ensure your tools, and the AI's response, handle failures gracefully.

The ToolHive playground transforms the intricate process of setting up, managing, and validating Model Context Protocol servers into an intuitive, seamless experience. It provides you with the visibility and control you need to confidently deploy secure and effective AI agents.

Try ToolHive UI Now