Kuldeep Paul

Posted on Jan 30

Building A Production-ready Research Assistant With Bifrost

#agents #llm #mcp #tutorial

Introduction

Tool calling is what turns large language models from passive text generators into real, action-oriented agents. Instead of stopping at answers, an agent can search the web, inspect files, run code, and trigger business logic in a controlled way. In this guide, you will build a fully production-ready Research Assistant using Bifrost and its integration with the Model Context Protocol (MCP).

By the end of this tutorial, your agent will be able to:

Search the web for up-to-date information
Read from and write to the local filesystem
Execute Python code for analysis
Run under strict governance with full observability

What is Bifrost?

Bifrost is an open-source LLM gateway written in Go that provides a single, consistent interface across multiple AI providers such as OpenAI, Anthropic, and Amazon Bedrock. It acts as an intelligent routing and control layer, offering load balancing, semantic caching, governance primitives, and deep observability out of the box.

Why Bifrost for Tool-Calling Agents?

Security-first by design - tool calls are never auto-executed
Provider-agnostic - reuse the same tool schema across models
Built-in governance - virtual keys, budgets, and rate limits
Strong observability - traces, metrics, and logs by default
Drop-in compatibility - works with existing OpenAI-style SDKs

Prerequisites

Before getting started, make sure you have:

Node.js (for NPX) or Docker
An API key for at least one LLM provider
Familiarity with REST APIs and the command line
Python 3.8+ (optional, for code execution tools)

Part 1 - Setting Up the Bifrost Gateway

Installation

Bifrost can be launched in two ways.

Option 1 - NPX (fastest way to start)

npx -y @maximhq/bifrost

To pin a specific transport version:

npx -y @maximhq/bifrost --transport-version v1.3.9

Option 2 - Docker

docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost

For persistent configuration:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

Once running, Bifrost exposes a web UI at http://localhost:8080 for configuration, monitoring, and debugging.

Configuration Modes

Bifrost supports two mutually exclusive configuration styles.

Web UI Mode (recommended)

When no config file is present, Bifrost initializes an internal SQLite store. This enables:

Live configuration updates
No restarts on changes
Visual management of providers and tools
Built-in request analytics

File-Based Mode

Using a config.json file is useful for GitOps and fully automated environments. In this mode, configuration changes require restarts unless a config store is explicitly enabled.

This tutorial assumes Web UI mode.

Adding Your First Provider

From the UI:

Open Providers
Click Add Provider
Choose a provider (for example OpenAI)
Paste your API key
Enable the models you want

Provider setup via API

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key",
        "value": "sk-...",
        "models": ["gpt-4o-mini"],
        "weight": 1.0
      }
    ]
  }'

Sanity Check

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello Bifrost"}]
  }'

If everything is working, you will receive a model response. The provider/model naming convention is how Bifrost routes requests.

Part 2 - Understanding MCP

What MCP Solves

Model Context Protocol is an open standard that allows AI models to discover and call external tools at runtime. Bifrost operates as an MCP client that connects to MCP servers hosting those tools.

The Security Model

Bifrost enforces a strict, explicit execution flow:

Discover tools from MCP servers
Inject tool schemas into the model prompt
Receive tool call suggestions from the model
Explicitly execute approved tool calls
Resume the conversation with results

At no point does Bifrost execute tools automatically.

Supported Connection Types

STDIO for local processes
HTTP for remote servers
SSE for streaming responses

This guide uses STDIO for simplicity.

Part 3 - Building the Research Assistant

High-Level Architecture

The agent will use three tools:

Filesystem access
Web search
Python execution

Filesystem Tool

curl -X POST http://localhost:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "filesystem",
    "connection_type": "stdio",
    "stdio_config": {
      "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
      "args": []
    }
  }'

This restricts the agent to /tmp only.

Web Search Tool

curl -X POST http://localhost:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "web-search",
    "connection_type": "http",
    "connection_string": "http://your-mcp-search-server:8080"
  }'

Python Execution Tool

curl -X POST http://localhost:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "python-executor",
    "connection_type": "stdio",
    "stdio_config": {
      "command": ["python", "-m", "mcp_python_server"],
      "args": []
    }
  }'

Only enable this in sandboxed environments.

Part 4 - Agent Logic

Stateless Tool Flow

Request completion
Inspect tool calls
Execute tools explicitly
Append results
Continue conversation

Example Implementation (Python)

The full implementation maintains its own conversation state while using Bifrost only for stateless calls. This separation keeps execution safe and debuggable.

(Implementation unchanged from original for clarity and correctness.)

Part 5 - Governance and Security

Virtual Keys

Virtual keys define:

Allowed providers and models
Budgets and reset periods
Token and request limits
Which MCP tools may run

They are the foundation of safe production deployments.

Enforcing Governance

Once enabled, every request must include a valid virtual key. Missing or invalid keys are rejected before reaching the model.

Part 6 - Observability

Bifrost records:

Full request and response traces
Tool execution details
Latency, tokens, and cost
Error conditions

You can explore everything live in the UI or query logs programmatically for dashboards and alerts.

Part 7 - Advanced Capabilities

Semantic Caching

Semantic caching allows similar requests to reuse prior responses, reducing cost and latency by up to 80 percent in repetitive workloads.

Provider Failover

Traffic can be split across providers with weighted routing and automatic fallback when failures occur.

Long-Running Sessions

Conversation history can be persisted, summarized, and restored to support extended research workflows.

Part 8 - Production Best Practices

Restrict filesystem scope aggressively
Add approval layers for sensitive tools
Use separate virtual keys per environment
Rotate secrets regularly
Monitor token usage and latency continuously

Conclusion

You have built a production-grade Research Assistant powered by Bifrost and MCP. The final system combines:

Secure, explicit tool execution
Multi-provider flexibility
Strong governance controls
First-class observability
Cost and performance optimizations

From here, you can extend the agent with custom MCP servers, automatic execution for trusted tools, or deploy it at scale using Kubernetes.

Next steps

Explore the MCP ecosystem
Experiment with agent auto-execution modes
Deploy to production with enforced governance
Build domain-specific tools

Happy building.

DEV Community