Kuldeep Paul

Posted on Jan 5

Tool Calling with Bifrost: The Complete Guide to Building Function-Calling AI Agents

TL;DR

Tool calling enables AI models to interact with external systems, APIs, and databases by defining function schemas or connecting to Model Context Protocol (MCP) servers. Bifrost, Maxim AI's high-performance LLM gateway, simplifies tool calling implementation with its OpenAI-compatible API, support for 12+ LLM providers, and native MCP integration. This guide covers custom function definitions, MCP server connections, tool choice controls, and best practices for building production-ready AI agents that can execute real-world tasks.

Introduction: Why Tool Calling Matters for AI Applications

Modern AI applications need to do more than generate text. They need to query databases, call APIs, read files, update CRM records, and execute complex workflows across multiple systems. This capability, known as tool calling or function calling, transforms language models from conversational interfaces into action-oriented agents that can accomplish real tasks.

According to Anthropic's Model Context Protocol documentation, connecting AI systems to external tools has traditionally required custom integrations for each pairing, creating an "N×M integration problem" that becomes unmanageable at scale. Tool calling provides a standardized way to bridge this gap, enabling AI models to understand when and how to invoke external functions based on user requests.

The benefits of tool calling extend beyond simple API interactions. When properly implemented, tool calling enables:

Real-time data access: AI agents can fetch current information from databases, APIs, and file systems instead of relying solely on training data
Autonomous task execution: Models can break down complex requests into sequential API calls and execute multi-step workflows
Domain-specific capabilities: Custom tools allow AI to perform specialized tasks like financial calculations, data analysis, or system administration
Reduced hallucinations: By grounding responses in actual data from trusted sources, tool calling helps minimize factual errors

What is Bifrost and Why Use It for Tool Calling?

Bifrost is Maxim AI's high-performance LLM gateway that provides a unified interface for accessing multiple AI providers through a single OpenAI-compatible API. Built with production reliability in mind, Bifrost offers enterprise-grade features including automatic failovers, load balancing, semantic caching, and comprehensive observability.

For tool calling specifically, Bifrost provides several critical advantages:

Unified Tool Calling Interface: Bifrost normalizes tool calling syntax across different providers. Whether you're using OpenAI's GPT-4, Anthropic's Claude, or models from AWS Bedrock, you can define tools once using the OpenAI format and have them work consistently across all providers.

Native MCP Support: Bifrost includes first-class support for the Model Context Protocol (MCP), an open standard introduced by Anthropic that enables seamless integration between AI applications and external data sources. Instead of manually defining dozens of function schemas, you can connect to MCP servers that automatically expose tools, resources, and prompts to your AI models.

Zero-Configuration Deployment: Unlike complex orchestration frameworks, Bifrost starts with zero configuration. You can deploy it locally with Docker in seconds and begin making tool-enabled API calls immediately.

Production-Ready Observability: When running tool-calling agents in production, visibility into execution flows is critical. Bifrost's observability features provide distributed tracing, comprehensive logging, and Prometheus metrics that integrate seamlessly with Maxim AI's observability platform for end-to-end monitoring.

Understanding Tool Calling Fundamentals

Before diving into implementation details, it's important to understand how tool calling works at a conceptual level.

When you send a request to an AI model with tools defined, the model doesn't automatically execute those tools. Instead, the model:

Analyzes the user's request and determines if any defined tools are relevant
Returns tool call instructions specifying which function to invoke and what parameters to pass
Waits for your application to execute the tool and return results
Incorporates the results into its reasoning to formulate a final response

This design gives you full control over tool execution. The model acts as an intelligent router and parameter generator, but your application code handles the actual function calls. This separation is crucial for security, as it prevents models from directly accessing sensitive systems.

Function Calling with Custom Tools

The most straightforward approach to tool calling involves defining custom functions using the OpenAI tool schema format. This method gives you complete control over tool definitions and is ideal when you have a small, well-defined set of functions.

Defining Tool Schemas

Tools are defined as JSON objects that specify the function name, description, and parameter schema. The description is particularly important because the model uses it to determine when to call the function. Here's a practical example using a calculator tool:

curl --location 'http://localhost:8080/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai/gpt-4o-mini",
    "messages": [
        {"role": "user", "content": "What is 15 + 27? Use the calculator tool."}
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "calculator",
                "description": "A calculator tool for basic arithmetic operations",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "operation": {
                            "type": "string",
                            "description": "The operation to perform",
                            "enum": ["add", "subtract", "multiply", "divide"]
                        },
                        "a": {
                            "type": "number",
                            "description": "The first number"
                        },
                        "b": {
                            "type": "number",
                            "description": "The second number"
                        }
                    },
                    "required": ["operation", "a", "b"]
                }
            }
        }
    ],
    "tool_choice": "auto"
}'

This request defines a calculator tool with three parameters: the operation type and two numbers. The enum constraint on the operation parameter ensures the model only selects valid operations.

Understanding the Response Format

When the model decides to use a tool, it returns a response containing tool call objects:

{
    "choices": [{
        "message": {
            "role": "assistant", 
            "tool_calls": [{
                "id": "call_abc123",
                "type": "function",
                "function": {
                    "name": "calculator",
                    "arguments": "{\"operation\":\"add\",\"a\":15,\"b\":27}"
                }
            }]
        }
    }]
}

The tool_calls array contains each function the model wants to invoke. Notice that the arguments are returned as a JSON string, which you'll need to parse before executing the function. The id field is crucial for maintaining conversation state when you send the function results back to the model.

Implementing the Tool Execution Loop

After receiving tool calls, your application needs to:

Parse the function arguments
Execute the actual function (in this case, perform the calculation)
Send the results back to the model
Receive the final response

This creates a multi-turn conversation where the model requests tools, your code executes them, and the model synthesizes the results into a natural language response. For more complex workflows involving multiple tool calls, this loop may repeat several times.

Connecting to Model Context Protocol (MCP) Servers

While custom tool definitions work well for small numbers of functions, they become unwieldy when you need to connect AI models to dozens or hundreds of tools across multiple systems. This is where the Model Context Protocol provides a more scalable solution.

MCP is an open standard introduced by Anthropic in November 2024 that standardizes how AI systems integrate with external data sources and tools. As explained by Anthropic, MCP addresses the N×M integration problem by providing a universal protocol. Instead of building custom connectors for each AI application and data source combination, developers implement MCP once and unlock an entire ecosystem of integrations.

MCP Architecture Overview

MCP uses a client-server architecture where:

MCP Servers expose tools, resources, and prompts from external systems (like Google Drive, Slack, databases, or custom APIs)
MCP Clients (embedded in AI applications like Bifrost) connect to these servers and make tools available to language models

This modular design means you can add new capabilities to your AI agents by simply connecting to existing MCP servers, without writing any tool definition code.

Configuring MCP Clients in Bifrost

Bifrost provides three methods for configuring MCP clients: the web UI, the REST API, or configuration files. Each method has its use cases.

Using the Web UI

For development and testing, the web UI provides the easiest way to configure MCP clients:

Navigate to http://localhost:8080
Click on "MCP Clients" in the sidebar
Click "Add MCP Client"
Enter your server details (name, connection type, and configuration)
Save the configuration

This method is ideal when you're exploring new MCP servers or need to quickly prototype integrations.

Using the REST API

For programmatic configuration and CI/CD pipelines, use Bifrost's REST API:

curl --location 'http://localhost:8080/api/mcp/client' \
--header 'Content-Type: application/json' \
--data '{
    "name": "filesystem",
    "connection_type": "stdio",
    "stdio_config": {
        "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
        "args": []
    }
}'

This example configures an MCP server that provides filesystem access tools. The stdio connection type means the server runs as a local subprocess that communicates via standard input/output.

Using Configuration Files

For production deployments, define MCP clients in your config.json:

{
    "mcp": {
        "client_configs": [
            {
                "name": "filesystem",
                "connection_type": "stdio",
                "stdio_config": {
                    "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
                    "args": []
                }
            },
            {
                "name": "youtube-search",
                "connection_type": "http",
                "connection_string": "http://your-youtube-mcp-url"
            }
        ]
    }
}

This configuration persists across restarts and can be version-controlled alongside your application code.

Available MCP Servers

The MCP ecosystem has grown rapidly since its introduction. The community has built thousands of MCP servers for popular services including:

Google Drive - Access and manipulate documents, spreadsheets, and files
Slack - Read messages, post updates, and manage channels
GitHub - Search repositories, create issues, and review pull requests
PostgreSQL - Query databases and execute SQL operations
Puppeteer - Automate web browsing and scraping tasks

You can find a comprehensive registry of available MCP servers at the Model Context Protocol GitHub organization.

For detailed information on advanced MCP features including tool execution, agent mode, and filtering, consult the Bifrost MCP documentation.

Tool Choice Options: Controlling Function Execution

Bifrost provides fine-grained control over how and when models use tools through the tool_choice parameter. Understanding these options helps you balance automation with predictability.

Auto Mode (Default)

"tool_choice": "auto"

In auto mode, the model decides whether to call functions based on the user's request. This provides maximum flexibility and allows the model to use its judgment about when tools are necessary. Use auto mode when you trust the model to make appropriate decisions about tool usage.

Forcing Specific Tools

"tool_choice": {
    "type": "function",
    "function": {"name": "calculator"}
}

This forces the model to use a specific tool regardless of the user's request. This is useful when you've already determined that a particular function should be called and just need the model to generate appropriate parameters.

For example, if you're building a data analysis application where every query should use your custom analyze_data function, forcing that tool ensures consistent behavior.

Disabling Tool Usage

"tool_choice": "none"

Setting tool choice to "none" prevents the model from calling any tools, even if they're defined in the request. This is useful when you want to temporarily disable tool calling without removing tool definitions, or when you want the model to provide a purely conversational response.

Real-World Use Cases for Tool Calling

Understanding practical applications helps illustrate the power of tool calling. Here are several production use cases:

Customer Support Automation

AI agents can handle complex support queries by accessing customer data, order histories, and knowledge bases. For instance, an agent might:

Call a get_customer_profile function to retrieve account details
Use a search_orders function to find recent purchases
Invoke a check_shipping_status function to get delivery updates
Call a create_ticket function if the issue requires escalation

By chaining these tool calls, the agent provides personalized, accurate responses without requiring human intervention for routine inquiries. Companies like Comm100 have successfully deployed such systems, using Maxim AI's platform to ensure quality and reliability.

Sales Intelligence and CRM Updates

Sales teams benefit from AI agents that can update CRM systems based on conversation context. Consider a scenario where a sales representative discusses a deal with a prospect. An AI agent listening to the conversation might:

Extract key information (company name, deal size, decision timeline)
Call a salesforce.searchAccount function to find the relevant account
Invoke a salesforce.updateOpportunity function to record the new information
Use a salesforce.createTask function to schedule follow-up activities

This automation reduces manual data entry and ensures CRM systems stay current with minimal overhead.

Data Analysis Workflows

AI agents excel at conducting ad-hoc data analysis by querying databases and processing results. An analyst might ask "What were our top-selling products in Q4?" and the agent would:

Call a query_database function to execute a SQL query
Use a calculate_statistics function to compute summary metrics
Invoke a generate_visualization function to create charts
Call a write_report function to compile results

This workflow transforms natural language questions into actionable insights without requiring analysts to write code.

Document Processing Pipelines

Organizations processing large volumes of documents benefit from AI agents that can read, analyze, and route files. For example:

Call a read_pdf function to extract text from uploaded documents
Use a classify_document function to categorize content
Invoke an extract_entities function to identify key information
Call a route_to_department function to send documents to appropriate teams

MCP servers for filesystem access make these workflows particularly straightforward to implement.

Best Practices for Production Tool Calling

Deploying tool-calling agents in production requires careful attention to reliability, security, and observability.

Write Clear Tool Descriptions

The model relies heavily on function descriptions to determine when to call tools. Vague descriptions lead to inappropriate tool usage or missed opportunities. Compare these examples:

Weak: "Gets data"

Strong: "Retrieves customer account information including name, email, account status, and subscription tier. Use this when users ask about their account details or subscription."

The stronger description provides context about what data the function returns and when to use it, helping the model make better decisions.

Implement Robust Error Handling

Tool calls can fail for many reasons: network issues, invalid parameters, authorization errors, or resource unavailability. Your application needs to handle these failures gracefully:

Validate parameters before executing functions
Catch exceptions and return meaningful error messages
Implement retries for transient failures
Log errors for debugging and monitoring

When a tool call fails, return an error message to the model explaining what went wrong. The model can often recover by adjusting parameters or trying alternative approaches.

Monitor Tool Usage and Performance

Understanding how your AI agents use tools in production is critical for optimization. Track metrics like:

Tool call frequency: Which tools are used most often?
Success rates: How often do tool calls succeed vs. fail?
Execution latency: How long do tools take to execute?
Cost per tool call: What's the resource consumption?

Maxim AI's observability platform provides comprehensive monitoring for AI agents, including detailed tracing of tool calls within larger conversation flows. This visibility helps identify performance bottlenecks, debug failures, and optimize tool selection logic.

Secure Tool Access

Tool calling introduces security considerations since AI models can potentially invoke sensitive operations. Follow these security best practices:

Implement authentication and authorization checks before executing tools
Validate and sanitize inputs to prevent injection attacks
Use least-privilege principles when granting tool access
Audit tool usage to detect anomalous behavior
Rate limit tool calls to prevent abuse

For enterprise deployments, Bifrost Enterprise includes advanced governance features including fine-grained access control, budget management, and comprehensive audit logs.

Test Tool Interactions Thoroughly

Before deploying tool-calling agents, conduct extensive testing across various scenarios:

Happy path testing: Verify tools work correctly with valid inputs
Error condition testing: Ensure graceful handling of failures
Edge case testing: Test boundary conditions and unusual inputs
Integration testing: Verify tools interact correctly with external systems
Load testing: Confirm performance under production traffic levels

Maxim AI's simulation and evaluation platform enables comprehensive testing of AI agents across hundreds of scenarios and user personas. By simulating diverse customer interactions, you can identify issues before they impact production users.

Integrating Tool Calling with Maxim AI's Observability

While Bifrost provides the infrastructure for tool calling, combining it with Maxim AI's observability suite creates a complete solution for monitoring and improving AI agents in production.

Distributed Tracing for Multi-Step Workflows

When AI agents execute complex workflows involving multiple tool calls, understanding the complete execution flow is essential for debugging. Maxim's distributed tracing captures:

Conversation context: The original user request and conversation history
Model reasoning: Why the model decided to call specific tools
Tool execution: Parameters passed and results returned
Final responses: How the model synthesized tool outputs

This end-to-end visibility helps identify where workflows fail or produce incorrect results.

Automated Quality Evaluation

Even with careful prompt engineering, AI agents sometimes make poor tool selection decisions. Maxim's automated evaluation framework allows you to:

Define custom evaluators that check if agents selected appropriate tools
Run periodic quality checks on production logs
Track quality metrics over time to identify regressions
Trigger alerts when quality drops below thresholds

For more details on evaluation strategies, see the comprehensive guide to AI agent evaluation.

Production Data Curation

As your AI agents interact with users, production logs become a valuable source of training data for improving tool selection. Maxim's data engine enables:

Dataset curation from production interactions
Human annotation of tool usage correctness
Fine-tuning datasets for model customization
Continuous improvement workflows

This closed-loop approach ensures your agents get better over time based on real-world usage patterns.

Advanced Tool Calling Patterns

Once you've mastered basic tool calling, several advanced patterns enable more sophisticated agent behaviors.

Sequential Tool Chaining

Many tasks require calling multiple tools in sequence, where each tool's output informs the next call. The model handles this orchestration automatically when you provide appropriate tool descriptions.

For example, a research agent might:

Search for relevant documents
Read the top documents
Extract key facts
Synthesize findings into a summary

By defining tools for each step and allowing the model to operate in auto mode, it can execute this multi-step workflow without explicit orchestration code.

Parallel Tool Execution

Some scenarios benefit from calling multiple tools concurrently. While the model returns multiple tool calls in a single response, your application code needs to handle parallel execution:

import asyncio

async def execute_tools_parallel(tool_calls):
    tasks = [execute_tool(call) for call in tool_calls]
    return await asyncio.gather(*tasks)

This pattern significantly reduces latency when tools don't depend on each other's outputs.

Recursive Tool Calling

For particularly complex tasks, you might need recursive tool calling where tools can trigger additional tool calls. This requires careful implementation to prevent infinite loops:

Set maximum recursion depth limits
Track tool call history
Implement circuit breakers for repeated failures
Monitor execution budgets

Code Execution with MCP

Anthropic's recent work on code execution with MCP introduces an efficient pattern where agents write code to call tools instead of using direct tool calling. This approach reduces context window consumption by:

Loading tools on demand rather than including all definitions upfront
Filtering data before returning results to the model
Executing complex logic in a single step

This pattern is particularly effective when working with dozens or hundreds of tools across multiple MCP servers.

Conclusion: Building Reliable AI Agents with Tool Calling

Tool calling transforms language models from conversational interfaces into capable agents that can accomplish real-world tasks. By enabling AI to interact with databases, APIs, and external systems, tool calling unlocks applications ranging from customer support automation to complex data analysis workflows.

Bifrost simplifies tool calling implementation by providing a unified, OpenAI-compatible API that works across multiple providers, native MCP support for connecting to pre-built integrations, and production-ready features including observability, failover, and caching.

To start building tool-enabled AI agents:

Deploy Bifrost locally and experiment with custom tool definitions
Explore the MCP server ecosystem for pre-built integrations
Implement robust error handling and monitoring for production deployments
Integrate with Maxim AI's observability platform for comprehensive agent monitoring

For teams building production AI applications, combining Bifrost's infrastructure capabilities with Maxim AI's end-to-end platform for simulation, evaluation, and observability provides the foundation for shipping reliable AI agents faster. Whether you're building customer support bots, sales intelligence tools, or data analysis assistants, tool calling with Bifrost enables your AI to move from conversation to action.

Ready to implement tool calling in your AI applications? Try Bifrost Enterprise free for 14 days or schedule a demo to see how Maxim AI can help you build, test, and deploy reliable AI agents.