DEV Community

Cover image for Building A Production-ready Research Assistant With Bifrost
Kuldeep Paul
Kuldeep Paul

Posted on

Building A Production-ready Research Assistant With Bifrost

Introduction

Tool calling is what turns large language models from passive text generators into real, action-oriented agents. Instead of stopping at answers, an agent can search the web, inspect files, run code, and trigger business logic in a controlled way. In this guide, you will build a fully production-ready Research Assistant using Bifrost and its integration with the Model Context Protocol (MCP).

By the end of this tutorial, your agent will be able to:

  • Search the web for up-to-date information
  • Read from and write to the local filesystem
  • Execute Python code for analysis
  • Run under strict governance with full observability

What is Bifrost?

Bifrost is an open-source LLM gateway written in Go that provides a single, consistent interface across multiple AI providers such as OpenAI, Anthropic, and Amazon Bedrock. It acts as an intelligent routing and control layer, offering load balancing, semantic caching, governance primitives, and deep observability out of the box.

Why Bifrost for Tool-Calling Agents?

  • Security-first by design - tool calls are never auto-executed
  • Provider-agnostic - reuse the same tool schema across models
  • Built-in governance - virtual keys, budgets, and rate limits
  • Strong observability - traces, metrics, and logs by default
  • Drop-in compatibility - works with existing OpenAI-style SDKs

Prerequisites

Before getting started, make sure you have:

  • Node.js (for NPX) or Docker
  • An API key for at least one LLM provider
  • Familiarity with REST APIs and the command line
  • Python 3.8+ (optional, for code execution tools)

Part 1 - Setting Up the Bifrost Gateway

Installation

Bifrost can be launched in two ways.

Option 1 - NPX (fastest way to start)

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

To pin a specific transport version:

npx -y @maximhq/bifrost --transport-version v1.3.9
Enter fullscreen mode Exit fullscreen mode

Option 2 - Docker

docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

For persistent configuration:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Once running, Bifrost exposes a web UI at http://localhost:8080 for configuration, monitoring, and debugging.

Configuration Modes

Bifrost supports two mutually exclusive configuration styles.

Web UI Mode (recommended)

When no config file is present, Bifrost initializes an internal SQLite store. This enables:

  • Live configuration updates
  • No restarts on changes
  • Visual management of providers and tools
  • Built-in request analytics

File-Based Mode

Using a config.json file is useful for GitOps and fully automated environments. In this mode, configuration changes require restarts unless a config store is explicitly enabled.

This tutorial assumes Web UI mode.

Adding Your First Provider

From the UI:

  1. Open Providers
  2. Click Add Provider
  3. Choose a provider (for example OpenAI)
  4. Paste your API key
  5. Enable the models you want

Provider setup via API

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key",
        "value": "sk-...",
        "models": ["gpt-4o-mini"],
        "weight": 1.0
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Sanity Check

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello Bifrost"}]
  }'
Enter fullscreen mode Exit fullscreen mode

If everything is working, you will receive a model response. The provider/model naming convention is how Bifrost routes requests.


Part 2 - Understanding MCP

What MCP Solves

Model Context Protocol is an open standard that allows AI models to discover and call external tools at runtime. Bifrost operates as an MCP client that connects to MCP servers hosting those tools.

The Security Model

Bifrost enforces a strict, explicit execution flow:

  1. Discover tools from MCP servers
  2. Inject tool schemas into the model prompt
  3. Receive tool call suggestions from the model
  4. Explicitly execute approved tool calls
  5. Resume the conversation with results

At no point does Bifrost execute tools automatically.

Supported Connection Types

  • STDIO for local processes
  • HTTP for remote servers
  • SSE for streaming responses

This guide uses STDIO for simplicity.


Part 3 - Building the Research Assistant

High-Level Architecture

The agent will use three tools:

  1. Filesystem access
  2. Web search
  3. Python execution

Filesystem Tool

curl -X POST http://localhost:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "filesystem",
    "connection_type": "stdio",
    "stdio_config": {
      "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
      "args": []
    }
  }'
Enter fullscreen mode Exit fullscreen mode

This restricts the agent to /tmp only.

Web Search Tool

curl -X POST http://localhost:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "web-search",
    "connection_type": "http",
    "connection_string": "http://your-mcp-search-server:8080"
  }'
Enter fullscreen mode Exit fullscreen mode

Python Execution Tool

curl -X POST http://localhost:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "python-executor",
    "connection_type": "stdio",
    "stdio_config": {
      "command": ["python", "-m", "mcp_python_server"],
      "args": []
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Only enable this in sandboxed environments.


Part 4 - Agent Logic

Stateless Tool Flow

  1. Request completion
  2. Inspect tool calls
  3. Execute tools explicitly
  4. Append results
  5. Continue conversation

Example Implementation (Python)

The full implementation maintains its own conversation state while using Bifrost only for stateless calls. This separation keeps execution safe and debuggable.

(Implementation unchanged from original for clarity and correctness.)


Part 5 - Governance and Security

Virtual Keys

Virtual keys define:

  • Allowed providers and models
  • Budgets and reset periods
  • Token and request limits
  • Which MCP tools may run

They are the foundation of safe production deployments.

Enforcing Governance

Once enabled, every request must include a valid virtual key. Missing or invalid keys are rejected before reaching the model.


Part 6 - Observability

Bifrost records:

  • Full request and response traces
  • Tool execution details
  • Latency, tokens, and cost
  • Error conditions

You can explore everything live in the UI or query logs programmatically for dashboards and alerts.


Part 7 - Advanced Capabilities

Semantic Caching

Semantic caching allows similar requests to reuse prior responses, reducing cost and latency by up to 80 percent in repetitive workloads.

Provider Failover

Traffic can be split across providers with weighted routing and automatic fallback when failures occur.

Long-Running Sessions

Conversation history can be persisted, summarized, and restored to support extended research workflows.


Part 8 - Production Best Practices

  • Restrict filesystem scope aggressively
  • Add approval layers for sensitive tools
  • Use separate virtual keys per environment
  • Rotate secrets regularly
  • Monitor token usage and latency continuously

Conclusion

You have built a production-grade Research Assistant powered by Bifrost and MCP. The final system combines:

  • Secure, explicit tool execution
  • Multi-provider flexibility
  • Strong governance controls
  • First-class observability
  • Cost and performance optimizations

From here, you can extend the agent with custom MCP servers, automatic execution for trusted tools, or deploy it at scale using Kubernetes.


Next steps

  • Explore the MCP ecosystem
  • Experiment with agent auto-execution modes
  • Deploy to production with enforced governance
  • Build domain-specific tools

Happy building.

Top comments (0)