DEV Community

Cover image for Introducing Virtual MCP Server: Unified Gateway for Multi-MCP Workflows
Dan Barr for Stacklok

Posted on

Introducing Virtual MCP Server: Unified Gateway for Multi-MCP Workflows

If you're working with AI coding assistants like GitHub Copilot or Claude, you've probably encountered MCP (Model Context Protocol) servers. They're powerful, connecting your AI to GitHub, Jira, Slack, cloud providers, and more. But here's the problem: each connection requires separate configuration, authentication, and maintenance.

Managing MCP server connections gets messy fast. That’s why we built the Virtual MCP Server (vMCP) in ToolHive to solve this problem by aggregating multiple MCP servers into a single unified endpoint.

The problem: connection overload

Picture this: you're an engineer on a platform team. Your AI assistant needs access to GitHub for code, Jira for tickets, Slack for notifications, PagerDuty for incidents, Datadog for metrics, AWS for infrastructure, Confluence for docs, and your internal knowledge base. That's 8 separate MCP server connections, each exposing 10-20+ tools. Now your AI's context window is filling up with 80+ tool descriptions, burning tokens and degrading performance as the LLM struggles to select the right tools from an overwhelming list.

Each MCP server connection requires:

  • Individual configuration in your AI client
  • Separate authentication credentials
  • Manual coordination when tasks span multiple systems
  • Repeated parameter entry (same repo, same channel, same database)
  • Tool filtering to avoid context bloat and wasted tokens

Want to investigate a production incident? You're manually running commands across 4 different systems and piecing together the results yourself. Deploying an app? You're orchestrating a sequence of operations: merge PR, wait for CI, get approval, deploy, notify team. It's tedious, error-prone, and not reusable.

The solution: aggregate everything

vMCP transforms those 8 connections into one. You configure a single MCP endpoint that aggregates all your backend servers.

Before vMCP:

{
  "servers": {
    "github": { "url": "..." },
    "jira": { "url": "..." },
    "slack": { "url": "..." },
    "pagerduty": { "url": "..." },
    "datadog": { "url": "..." },
    "aws": { "url": "..." },
    "confluence": { "url": "..." },
    "docs": { "url": "..." }
  }
}
Enter fullscreen mode Exit fullscreen mode

With vMCP:

{
  "servers": {
    "company-tools": {
      "url": "http://vmcp.company.com/mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

One connection. One authentication flow. All your tools available.

And here’s the key: you can run as many vMCP instances as you need. Your frontend team connects to one vMCP with their specific tools. Your platform team connects to another with infrastructure access. Each vMCP aggregates exactly the backends that each team needs, with appropriate security policies and permissions.

This matters for two reasons: security (no more giving everyone access to everything) and efficiency (fewer tools means smaller context windows, which means lower token costs and better AI performance).

What vMCP does

vMCP is part of the ToolHive Kubernetes Operator. It acts as an intelligent aggregation layer that sits between your AI client and your backend MCP servers.

Diagram of the basic vMCP architecture

1. Multi-server aggregation with tool filtering

All MCP tools appear through a single endpoint, but you cherry-pick exactly which tools to expose.

Example: An engineer on the ToolHive team gets a single vMCP connection with:

  • GitHub’s search_code tool (scoped to the stacklok/toolhive repo only)
  • The ToolHive docs MCP server
  • An internal docs server hooked up to Google Drive and filtered to ToolHive design docs
  • Slack (only the #toolhive-team channel)

No irrelevant tools cluttering the LLM's context. No wasted tokens on unused tool descriptions. Just the tools needed for their work, making it easier for the AI to select the right tool every time.

When multiple MCP servers have tools with the same name (both GitHub and Jira have create_issue), vMCP automatically prefixes them: github_create_issue and jira_create_issue. You can customize these names however you want.

2. Declarative multi-system workflows

Real tasks often require coordinating across multiple systems. vMCP lets you define deterministic workflows that execute in parallel with conditionals, error handling, and approval gates.

Example: Incident investigation

Instead of manually jumping between 4 different systems, copy/pasting data, and aggregating the results, a single “composite tool” could:

→ Query logs from logging system
→ Fetch metrics from monitoring platform  
→ Pull traces from tracing service
→ Check infrastructure status from cloud provider
→ Manually combine everything into a report
→ Create Jira ticket with findings
Enter fullscreen mode Exit fullscreen mode

vMCP executes all queries in parallel, automatically aggregates the data, and creates the ticket. Define the workflow once, use it for every incident.

Example: App deployment

A typical deployment workflow handled end-to-end:

→ Merge pull request in GitHub
→ Wait for CI tests to pass
→ Request human approval (using MCP elicitation)
→ Deploy (only if approved)
→ Notify team in Slack
Enter fullscreen mode Exit fullscreen mode

3. Pre-configured defaults and guardrails

Stop typing the same parameters repeatedly. Configure defaults once in vMCP.

Before: Every GitHub query requires specifying repo: stacklok/toolhive

After: The repo is pre-configured. Engineers never specify it, and they can't accidentally query the wrong one.

This isn’t just convenience, it’s about deterministic behavior and security. By pre-configuring parameters, you ensure tools behave consistently, and users can only access resources you’ve explicitly exposed. No more accidental queries to the wrong repo, Slack channels, databases, cloud regions, or anything else you reference repeatedly.

4. Tool customization and security policies

Third-party MCP servers often expose generic, unrestricted tools. vMCP lets you wrap and restrict them without modifying upstream servers.

Security policy enforcement: Restrict a website fetch tool to internal domains only (*.company.com), validate URLs before calling the backend, and provide clear error messages for violations.

Simplified interfaces: That AWS EC2 tool with 20+ parameters? Create a wrapper that only exposes the 3 parameters your frontend team actually needs, with safe defaults for everything else.

5. Centralized authentication

vMCP implements a two-boundary authentication model with a complete audit trail. Your AI client authenticates once to vMCP using the OAuth 2.1 methods defined in the official MCP spec. vMCP handles authorization to each backend independently based on its requirements.

When it’s time to revoke access, disable the user in your identity provider, and all backend access is revoked instantly.

Real-world benefits

Let's look at the incident investigation example with concrete numbers:

Without vMCP:

  • 4 sequential manual commands
  • 2-3 minutes per command
  • 5-10 minutes aggregating and formatting
  • 15-20 minutes total per incident
  • Results vary by engineer
  • Process isn't documented or reusable

With vMCP:

  • One command triggers the workflow
  • Parallel execution: 30 seconds
  • Automatic aggregation and formatting
  • Consistent results every time
  • Workflow is documented as code
  • Any team member can use it

For a team handling 20 incidents per week, that's 5-6 hours saved. More importantly, the response is faster, more consistent, and doesn't require senior engineers to handle routine investigations.

How it works

vMCP runs in Kubernetes alongside your backend MCP servers. You define three types of resources:

MCPGroup: Organizes backend servers logically (e.g., "platform-tools")

MCPServer: Individual backend MCP servers (GitHub, Jira, etc.)

VirtualMCPServer: The aggregation layer that combines servers from a group

The ToolHive operator discovers backends, resolves tool name conflicts, applies security policies, and exposes everything through a single endpoint. Your AI client connects to vMCP just like any other MCP server.

Since each VirtualMCPServer is a separate Kubernetes resource, you can deploy as many as needed. One per team, one per environment, or organized however makes sense for your security model.

For a working example, check out the quickstart tutorial.

When to use vMCP

vMCP makes sense when you're managing multiple MCP servers (typically 5+), curating a subset of MCP tools for specific teams and workflows, or need tasks that coordinate across systems. It's especially valuable for:

  • Teams requiring centralized authentication and authorization
  • Workflows that should be reusable across the entire team
  • Security policies that need centralized enforcement
  • Reducing onboarding complexity for new engineers

If you're using a single MCP server for simple one-step operations, you probably don't need vMCP. It's built for managing complexity at scale.

Get started

vMCP is available now as part of ToolHive. To try it out:

  1. Install the ToolHive Kubernetes Operator
  2. Follow the vMCP quickstart
  3. Connect your AI client to the aggregated endpoint

We'd love to hear how you're using vMCP. What workflows are you building? Which MCP servers are you aggregating? Join the ToolHive community on Discord and let us know.

Looking to leverage vMCP within your enterprise organization? Book a demo with us.

ToolHive is an open-source MCP platform focused on security and enterprise operationalization. Learn more at toolhive.dev.

Top comments (0)