Guyoung Studio

Posted on May 31

BoxAgnts Introduction (7) — OpenAI API and Anthropic API

#ai #agents #openai #claude

The 2025 AI model market is in full bloom. But each provider has its own API format, authentication method, and streaming protocol. BoxAgnts' design goal: users switch models by changing just one parameter, with all internal logic remaining unchanged.

This article dissects this abstraction across four levels:

Unified Interface: How the LlmProvider trait defines a "model provider"
Three Major API Format Comparisons: Format differences between Anthropic, OpenAI, and Google Gemini
Format Conversion: How to translate between three completely different message formats
Engineering Practices: Think configuration, error handling, ProviderQuirks, API Key management

Unified Interface: The LlmProvider Trait

Everything starts with the interface definition:

// boxagnts-api/src/provider.rs
#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn id(&self) -> &ProviderId;                              // Unique identifier
    fn name(&self) -> &str;                                   // Human-readable name

    async fn create_message(                                  // Non-streaming request
        &self,
        request: ProviderRequest,
    ) -> Result<ProviderResponse, ProviderError>;

    async fn create_message_stream(                           // Streaming request
        &self,
        request: ProviderRequest,
    ) -> Result<
        Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>,
        ProviderError,
    >;

    async fn list_models(&self) -> Result<Vec<ModelInfo>, ProviderError>;  // Model list
    async fn check_connectivity(&self) -> Result<ProviderStatus, ProviderError>; // Health check
    fn capabilities(&self) -> ProviderCapabilities;           // Capability declaration
}

Both input and output use provider-agnostic unified types:

pub struct ProviderRequest {
    pub model: String,
    pub messages: Vec<Message>,          // Unified conversation format
    pub system_prompt: Option<SystemPrompt>,
    pub tools: Vec<ToolDefinition>,      // Unified tool definitions
    pub max_tokens: u32,
    pub temperature: Option<f64>,
    pub thinking: Option<ThinkingConfig>, // Deep thinking configuration
    pub provider_options: Value,          // Provider-specific parameters
}

pub struct ProviderResponse {
    pub id: String,
    pub content: Vec<ContentBlock>,      // Unified content blocks
    pub stop_reason: StopReason,         // Unified stop reason
    pub usage: UsageInfo,                // Token usage
    pub model: String,
}

The core value of the normalization layer: whether the underlying is Claude, GPT, or Gemini, upper-layer code only sees ProviderRequest and ProviderResponse.

ProviderRegistry: Unified Entry for 40+ Models

// boxagnts-api/src/registry.rs
pub struct ProviderRegistry {
    providers: HashMap<ProviderId, Arc<dyn LlmProvider>>,
    default_provider_id: ProviderId,
}

fn provider_from_key(provider_id: &str, key: String) -> Option<Arc<dyn LlmProvider>> {
    match provider_id {
        // Native implementations — each with its own API format
        "anthropic" => Some(Arc::new(AnthropicProvider::from_config(...))),
        "openai"    => Some(Arc::new(OpenAiProvider::new(key))),
        "google"    => Some(Arc::new(GoogleProvider::new(key))),
        "github-copilot" => Some(Arc::new(CopilotProvider::new(key))),
        "cohere"    => Some(Arc::new(CohereProvider::new(key))),

        // OpenAI-compatible providers — share the same conversion logic, only change base_url
        "deepseek", "groq", "ollama", "mistral", "xai",
        "perplexity", "openrouter", "siliconflow", "moonshot",
        "zhipu", "stepfun", "fireworks", "llamacpp",
        "sambanova", "huggingface", "nvidia", "cerebras",
        // ... 30+ OpenAI-compatible providers in total
        _ => None,
    }
}

Three implementation strategies:

Type	Representative	Conversion Strategy	Count
Native Anthropic	claude-sonnet-4-5	Near-zero conversion (internal format = Anthropic format)	1
Native OpenAI	gpt-4o, o3	ProviderRequest → Chat Completions	1
Native Google	gemini-2.5-flash	ProviderRequest → generateContent	1
OpenAI Compatible	deepseek, groq, ollama, etc.	Same logic as OpenAI, only URL changes	30+
Other Native	github-copilot, cohere	Independent format conversion	3+

Differences Between Three Major API Formats

Anthropic, OpenAI, Google Gemini — three APIs with vast differences in message format. Understanding these differences is essential to understanding the value of the conversion layer.

3.1 System Prompt

Feature	Anthropic	OpenAI	Google Gemini
Location	Top-level `"system"` field	messages[0], `role:"system"`	Top-level `"systemInstruction"` field
Type	string or ContentBlock array	string only	content parts array only

// Anthropic — top-level standalone field
{"model": "claude-sonnet-4-5", "system": "You are helpful.", "messages": [...]}

// OpenAI — embedded in messages array
{"model": "gpt-4o", "messages": [{"role":"system","content":"You are helpful."}, ...]}

// Google — uses systemInstruction field, structure differs from messages
{
  "systemInstruction": {"parts": [{"text": "You are helpful."}]},
  "contents": [{"role": "user", "parts": [{"text": "Hello"}]}]
}

3.2 Tool Definitions

Feature	Anthropic	OpenAI	Google
Field	`"tools": [{name, description, input_schema}]`	`"tools": [{type:"function", function:{...}}]`	`"tools": [{functionDeclarations: [{name, description, parameters}]}]`
Wrapping Layers	0	1	1, with different nesting names

3.3 Tool Call Responses

// Anthropic — native block in content array
{"content": [{"type":"tool_use", "id":"toolu_01A", "name":"read", "input": {...}}]}

// OpenAI — standalone tool_calls array, arguments is JSON string
{"tool_calls": [{"id":"call_abc", "function": {"name":"read", "arguments": "{\"path\":\"...\"}"}}]}

// Google — functionCall embedded in parts, args is JSON object
{"candidates": [{"content": {"parts": [{"functionCall": {"name":"read", "args": {...}}}]}}]}

3.4 Tool Result Format

// Anthropic — tool_result is a block in the user message content array
{"role":"user", "content": [{"type":"tool_result", "tool_use_id":"toolu_01A", "content":"..."}]}

// OpenAI — requires a separate role: "tool" message
{"role":"tool", "tool_call_id":"call_abc", "content":"..."}

// Google — functionResponse embedded in user content parts
{"role":"user", "parts": [{"functionResponse": {"name":"read", "response": {...}}}]}

3.5 Role Naming

Anthropic	OpenAI	Google
`user`	`user`	`user`
`assistant`	`assistant`	`model`

Google uses model instead of assistant — this is the most easily overlooked but most error-prone difference.

Conversion Layer Implementation: OpenAI Provider as Example

OpenAiProvider is the most complete example of the conversion layer:

// boxagnts-api/src/providers/openai.rs
impl OpenAiProvider {
    fn to_openai_messages(
        messages: &[Message],
        system_prompt: Option<&SystemPrompt>,
    ) -> Vec<Value> {
        let mut result: Vec<Value> = Vec::new();

        // Step 1: system prompt → role: "system" message
        if let Some(sys) = system_prompt {
            result.push(json!({"role": "system", "content": sys_text}));
        }

        for msg in messages {
            match msg.role {
                Role::User => {
                    // User messages may mix text and tool_result blocks
                    // tool_result needs to be split into separate role: "tool" messages
                    Self::append_user_messages(&mut result, &msg.content);
                }
                Role::Assistant => {
                    let (text, tool_calls) = Self::assistant_content_to_openai(&msg.content);
                    result.push(json!({
                        "role": "assistant",
                        "content": text,
                        "tool_calls": tool_calls
                    }));
                }
            }
        }
        result
    }

    fn to_openai_tools(tools: &[ToolDefinition]) -> Vec<Value> {
        tools.iter().map(|td| {
            json!({
                "type": "function",
                "function": {
                    "name": td.name,
                    "description": td.description,
                    "parameters": td.input_schema
                }
            })
        }).collect()
    }
}

The most complex part is tool_use_id sanitization — Anthropic's tool IDs (e.g., toolu_01Bx...) may contain characters that OpenAI does not accept.

Google Gemini Provider: Full Adaptation of a Third Format

GoogleProvider shows how to handle an API format that is different from both Anthropic and OpenAI:

// boxagnts-api/src/providers/google.rs
// URL pattern completely different from OpenAI's /v1/chat/completions
fn generate_url(&self, model: &str) -> String {
    format!(
        "{}/v1beta/models/{}:generateContent?key={}",
        self.base_url, model, self.api_key  // API Key in URL query parameters!
    )
}

Key differences from OpenAI:

Difference	Google Gemini	OpenAI
API Key Location	URL query parameter `?key=`	HTTP Header `Authorization: Bearer`
Endpoint Format	`/v1beta/models/{model}:generateContent`	`/v1/chat/completions`
Streaming Endpoint	`/v1beta/models/{model}:streamGenerateContent?alt=sse`	`/v1/chat/completions` + `stream:true`
Message Roles	`user` / `model` (not assistant)	`user` / `assistant`
Tool Results	`functionResponse` in parts	Separate `role: tool` message
Image Input	`inlineData` base64	`image_url` or content parts

Thinking Configuration: Model Differences in Deep Reasoning

ThinkingConfig is the normalized deep thinking configuration — but different providers handle it completely differently:

// Normalized configuration
pub struct ThinkingConfig {
    pub budget_tokens: u32,   // Thinking token budget
}

// When building ProviderRequest, decides whether to pass based on provider capabilities
let provider_request = ProviderRequest {
    // ...
    thinking: if caps.thinking {
        effective_thinking_budget
            .map(|b| ThinkingConfig::enabled(b))
    } else {
        None  // This provider doesn't support thinking, don't pass
    },
};

Provider	Thinking Support	How It's Passed
Anthropic (Claude 3.5+)	✓	`"thinking": {"type": "enabled", "budget_tokens": N}`
Google (Gemini 2.5+)	✓	`"thinkingConfig": {"thinkingBudget": N}`
OpenAI (o1/o3 series)	Partial	Via `reasoning_effort` parameter
Other OpenAI Compatible	Mostly unsupported	Not passed

At request construction time, ProviderCapabilities declares each provider's capabilities:

pub struct ProviderCapabilities {
    pub thinking: bool,              // Whether deep thinking is supported
    pub prompt_caching: bool,        // Whether prompt caching is supported
    pub image_input: bool,           // Whether image input is supported
    pub native_tool_use: bool,       // Whether native tool calling exists
    pub supports_streaming: bool,    // Whether streaming responses are supported
    // ...
}

ProviderQuirks: Each Provider's "Little Quirks"

OpenAI-compatible providers' APIs are roughly compatible, but all have subtle differences. ProviderQuirks handles these:

pub struct ProviderQuirks {
    /// Specific error message patterns for context overflow
    pub overflow_patterns: Vec<String>,
    /// Local services that don't require API Keys (e.g., Ollama, LM Studio)
    pub no_api_key_required: bool,
    /// Whether streaming responses include usage info
    pub include_usage_in_stream: bool,
    /// Providers like DeepSeek need the reasoning_content field
    pub reasoning_field: Option<String>,
}

For example, DeepSeek's streaming response returns reasoning content with a field name different from OpenAI's — adapted via reasoning_field. Ollama's context overflow error message is "exceeds the available context size", while LM Studio's is "greater than the context length" — adapted via overflow_patterns.

Unified Streaming Processing

Streaming responses are also completely different across the three APIs:

Feature	Anthropic (SSE)	OpenAI (SSE)	Google (SSE)
Event Granularity	High: 6 event types (start/delta/stop × 2)	Low: each chunk is a complete delta	Medium: pushed by chunk, but structure is flat
Tool call Increment	Fragmented send of `input_json_delta`	Single send of complete `arguments` string	Single send of complete `functionCall`
Termination Signal	`message_stop` event	`data: [DONE]` marker	Stream ends naturally
Need to Reassemble by index	Yes (reassemble by index for multiple tool_use)	Yes	Yes

All three formats are normalized to the same StreamEvent enum:

pub enum StreamEvent {
    MessageStart { id, model, usage },
    ContentBlockStart { index, content_block },
    TextDelta { text },
    ThinkingDelta { thinking },
    InputJsonDelta { index, partial_json },
    ContentBlockStop { index },
    MessageDelta { stop_reason, usage },
    MessageStop,
}

Error Handling: From Provider Differences to Unified Semantics

Each provider's error format is also different:

// Unified error types
pub enum ProviderError {
    Auth { ... },             // Authentication failure
    RateLimited { ... },      // Rate limiting
    ContextOverflow { ... },  // Context exceeds window (matched via ProviderQuirks)
    InvalidRequest { ... },   // Invalid request parameters
    ServerError { ... },      // Server error
    StreamError { ... },      // Stream interruption
    Other { ... },            // Unknown error
}

In the query loop, specific errors trigger specific recovery strategies:

RateLimited / Overloaded → Switch to fallback_model
ContextOverflow → Trigger auto_compact
StreamError (stall) → Retry (max 2 times, 45s timeout)
Auth → Unrecoverable, return error

Tiered API Key Management

BoxAgnts defines environment variable name mappings for each provider:

// boxagnts-workspace/src/config.rs
pub fn api_key_env_vars_for_provider(provider_id: &str) -> &'static [&'static str] {
    match provider_id {
        "anthropic" => &["ANTHROPIC_API_KEY"],
        "openai" => &["OPENAI_API_KEY"],
        "google" => &["GOOGLE_API_KEY", "GOOGLE_GENERATIVE_AI_API_KEY"],
        "deepseek" => &["DEEPSEEK_API_KEY"],
        "mistral" => &["MISTRAL_API_KEY"],
        "xai" => &["XAI_API_KEY"],
        "zhipu" => &["ZHIPU_API_KEY"],
        // ... 40+ provider environment variables
    }
}

Three-tier priority: Environment Variables > User Config JSON > No Default. This design supports different scenarios such as multi-tenancy, CI/CD, and local development.

Summary

BoxAgnts' model abstraction layer solves the essential problem of "one set of code adapting to all APIs":

┌──────────────────────────────────────────────┐
│  boxagnts-query (Agent reasoning loop)        │
│  Only uses ProviderRequest / ProviderResponse │
└────────────────────┬─────────────────────────┘
                     │
┌────────────────────▼─────────────────────────┐
│  LlmProvider trait                            │
│  + ProviderRegistry (40+ providers)           │
├──────────┬──────────┬──────────┬─────────────┤
│Anthropic │ OpenAI   │ Google   │ OpenAiCompat │
│Provider  │ Provider │ Provider │ (30+ vendors)│
│(Near-zero│ (Full    │ (Independent│ (Shares    │
│ conversion)│ format  │ format    │ OpenAI      │
│          │ conversion)│ conversion)│ conversion  │
│          │          │          │ +Quirks)     │
└──────────┴──────────┴──────────┴─────────────┘

Three key capabilities:

User freedom: Switch models by just changing the --model parameter
Code unaffected: run_query_loop() has no idea what's underneath
Extremely low extension cost: Adding a new OpenAI-compatible provider takes about 3 lines of code

This is not a simple "adapter pattern" — it's a production-grade abstraction validated against 40+ real-world APIs.

Related Resources

Boxagnts: https://github.com/guyoung/boxagnts
Anthropic API: https://docs.anthropic.com/en/api/messages
OpenAI API: https://platform.openai.com/docs/api-reference/chat
Google Gemini API: https://ai.google.dev/gemini-api/docs

DEV Community