Introduction
Tool calling is what turns large language models from passive text generators into real, action-oriented agents. Instead of stopping at answers, an agent can search the web, inspect files, run code, and trigger business logic in a controlled way. In this guide, you will build a fully production-ready Research Assistant using Bifrost and its integration with the Model Context Protocol (MCP).
By the end of this tutorial, your agent will be able to:
- Search the web for up-to-date information
- Read from and write to the local filesystem
- Execute Python code for analysis
- Run under strict governance with full observability
What is Bifrost?
Bifrost is an open-source LLM gateway written in Go that provides a single, consistent interface across multiple AI providers such as OpenAI, Anthropic, and Amazon Bedrock. It acts as an intelligent routing and control layer, offering load balancing, semantic caching, governance primitives, and deep observability out of the box.
Why Bifrost for Tool-Calling Agents?
- Security-first by design - tool calls are never auto-executed
- Provider-agnostic - reuse the same tool schema across models
- Built-in governance - virtual keys, budgets, and rate limits
- Strong observability - traces, metrics, and logs by default
- Drop-in compatibility - works with existing OpenAI-style SDKs
Prerequisites
Before getting started, make sure you have:
- Node.js (for NPX) or Docker
- An API key for at least one LLM provider
- Familiarity with REST APIs and the command line
- Python 3.8+ (optional, for code execution tools)
Part 1 - Setting Up the Bifrost Gateway
Installation
Bifrost can be launched in two ways.
Option 1 - NPX (fastest way to start)
npx -y @maximhq/bifrost
To pin a specific transport version:
npx -y @maximhq/bifrost --transport-version v1.3.9
Option 2 - Docker
docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost
For persistent configuration:
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
Once running, Bifrost exposes a web UI at http://localhost:8080 for configuration, monitoring, and debugging.
Configuration Modes
Bifrost supports two mutually exclusive configuration styles.
Web UI Mode (recommended)
When no config file is present, Bifrost initializes an internal SQLite store. This enables:
- Live configuration updates
- No restarts on changes
- Visual management of providers and tools
- Built-in request analytics
File-Based Mode
Using a config.json file is useful for GitOps and fully automated environments. In this mode, configuration changes require restarts unless a config store is explicitly enabled.
This tutorial assumes Web UI mode.
Adding Your First Provider
From the UI:
- Open Providers
- Click Add Provider
- Choose a provider (for example OpenAI)
- Paste your API key
- Enable the models you want
Provider setup via API
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "openai-key",
"value": "sk-...",
"models": ["gpt-4o-mini"],
"weight": 1.0
}
]
}'
Sanity Check
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello Bifrost"}]
}'
If everything is working, you will receive a model response. The provider/model naming convention is how Bifrost routes requests.
Part 2 - Understanding MCP
What MCP Solves
Model Context Protocol is an open standard that allows AI models to discover and call external tools at runtime. Bifrost operates as an MCP client that connects to MCP servers hosting those tools.
The Security Model
Bifrost enforces a strict, explicit execution flow:
- Discover tools from MCP servers
- Inject tool schemas into the model prompt
- Receive tool call suggestions from the model
- Explicitly execute approved tool calls
- Resume the conversation with results
At no point does Bifrost execute tools automatically.
Supported Connection Types
- STDIO for local processes
- HTTP for remote servers
- SSE for streaming responses
This guide uses STDIO for simplicity.
Part 3 - Building the Research Assistant
High-Level Architecture
The agent will use three tools:
- Filesystem access
- Web search
- Python execution
Filesystem Tool
curl -X POST http://localhost:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "filesystem",
"connection_type": "stdio",
"stdio_config": {
"command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
"args": []
}
}'
This restricts the agent to /tmp only.
Web Search Tool
curl -X POST http://localhost:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "web-search",
"connection_type": "http",
"connection_string": "http://your-mcp-search-server:8080"
}'
Python Execution Tool
curl -X POST http://localhost:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "python-executor",
"connection_type": "stdio",
"stdio_config": {
"command": ["python", "-m", "mcp_python_server"],
"args": []
}
}'
Only enable this in sandboxed environments.
Part 4 - Agent Logic
Stateless Tool Flow
- Request completion
- Inspect tool calls
- Execute tools explicitly
- Append results
- Continue conversation
Example Implementation (Python)
The full implementation maintains its own conversation state while using Bifrost only for stateless calls. This separation keeps execution safe and debuggable.
(Implementation unchanged from original for clarity and correctness.)
Part 5 - Governance and Security
Virtual Keys
Virtual keys define:
- Allowed providers and models
- Budgets and reset periods
- Token and request limits
- Which MCP tools may run
They are the foundation of safe production deployments.
Enforcing Governance
Once enabled, every request must include a valid virtual key. Missing or invalid keys are rejected before reaching the model.
Part 6 - Observability
Bifrost records:
- Full request and response traces
- Tool execution details
- Latency, tokens, and cost
- Error conditions
You can explore everything live in the UI or query logs programmatically for dashboards and alerts.
Part 7 - Advanced Capabilities
Semantic Caching
Semantic caching allows similar requests to reuse prior responses, reducing cost and latency by up to 80 percent in repetitive workloads.
Provider Failover
Traffic can be split across providers with weighted routing and automatic fallback when failures occur.
Long-Running Sessions
Conversation history can be persisted, summarized, and restored to support extended research workflows.
Part 8 - Production Best Practices
- Restrict filesystem scope aggressively
- Add approval layers for sensitive tools
- Use separate virtual keys per environment
- Rotate secrets regularly
- Monitor token usage and latency continuously
Conclusion
You have built a production-grade Research Assistant powered by Bifrost and MCP. The final system combines:
- Secure, explicit tool execution
- Multi-provider flexibility
- Strong governance controls
- First-class observability
- Cost and performance optimizations
From here, you can extend the agent with custom MCP servers, automatic execution for trusted tools, or deploy it at scale using Kubernetes.
Next steps
- Explore the MCP ecosystem
- Experiment with agent auto-execution modes
- Deploy to production with enforced governance
- Build domain-specific tools
Happy building.
Top comments (0)