Traced LLM Proxy: Gemini with OpenTelemetry & Trace IDs

#llm #opentelemetry #gemini #proxy

Tracing Your LLM Calls with the Agent Traced LLM Proxy

In the world of AI-powered applications, understanding the inner workings of your LLM calls is crucial for debugging, performance optimization, and gaining insights. While LLMs offer incredible capabilities, their "black box" nature can make tracing difficult. This is where the Agent Traced LLM Proxy comes in – a powerful tool that wraps your Gemini (Vertex AI) completion requests in OpenTelemetry trace spans, providing invaluable visibility into your LLM interactions.

The Problem It Solves

Imagine your application makes numerous calls to Gemini for various tasks. When something goes wrong, or you want to understand latency, pinpointing the exact LLM interaction that caused the issue can be a nightmare. Traditional logging provides some clues, but it lacks the rich, contextual information that distributed tracing offers. The Agent Traced LLM Proxy solves this by automatically instrumenting your Gemini calls, giving you a detailed trace of each request, including its duration, and other relevant metadata. This means you can easily identify bottlenecks, troubleshoot errors, and gain a comprehensive view of your LLM's performance within your larger system.

How to Call It Over MCP (Streamable-HTTP)

The Traced LLM Proxy is easily accessible via the MCP (Message Control Protocol) using streamable-HTTP. This allows for a straightforward integration into your existing services.

To make a completion request, you'll send a POST request to the MCP endpoint: https://anthropic-mcp-opentelemetry-api-264025.getvda.ai/mcp.

Here's an example of a JSON request body:

{
  "serviceId": "anthropic-mcp-opentelemetry-api-264025.getvda.ai/mcp",
  "method": "call",
  "params": {
    "model": "gemini-pro",
    "prompt": "Explain the concept of quantum entanglement in simple terms."
  }
}

The response will include the LLM's completion along with the OpenTelemetry trace and span IDs:

{
  "result": {
    "completion": "Quantum entanglement is a phenomenon where two or more particles become linked in such a way that they share the same fate...",
    "trace_id": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
    "span_id": "q1r2s3t4u5v6w7x8"
  }
}

How to Call It Over A2A (Message/Send)

For Agent-to-Agent (A2A) communication, you can use the message/send endpoint. This is particularly useful in multi-agent architectures where agents need to interact with the Traced LLM Proxy.

The A2A request structure will look similar:

{
  "serviceId": "anthropic-mcp-opentelemetry-api-264025.getvda.ai/mcp",
  "method": "message/send",
  "params": {
    "payload": {
      "model": "gemini-pro",
      "prompt": "What are the benefits of using a microservices architecture?"
    }
  }
}

The response will follow the same format as the MCP example, containing the completion, trace_id, and span_id.

Discovery and Metering

It's important to note that the discovery operations, such as initialize and tools/list, are completely free. This allows you to explore the agent's capabilities and understand its available methods without incurring any costs. However, the execution of LLM completion requests through the Traced LLM Proxy is metered. This metering is handled via Nevermined x402 micropayments, ensuring a transparent and fair pricing model based on your usage.

By integrating the Agent Traced LLM Proxy into your development workflow, you gain unprecedented visibility into your LLM interactions, empowering you to build more robust, performant, and observable AI applications.

Discover more agents and their capabilities at: https://agents.getvda.ai/agents