DEV Community

Guyoung Studio
Guyoung Studio

Posted on

BoxAgnts Introduction (7) — OpenAI API and Anthropic API

The 2025 AI model market is in full bloom. But each provider has its own API format, authentication method, and streaming protocol. BoxAgnts' design goal: users switch models by changing just one parameter, with all internal logic remaining unchanged.

This article dissects this abstraction across four levels:

  1. Unified Interface: How the LlmProvider trait defines a "model provider"
  2. Three Major API Format Comparisons: Format differences between Anthropic, OpenAI, and Google Gemini
  3. Format Conversion: How to translate between three completely different message formats
  4. Engineering Practices: Think configuration, error handling, ProviderQuirks, API Key management

Unified Interface: The LlmProvider Trait

Everything starts with the interface definition:

// boxagnts-api/src/provider.rs
#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn id(&self) -> &ProviderId;                              // Unique identifier
    fn name(&self) -> &str;                                   // Human-readable name

    async fn create_message(                                  // Non-streaming request
        &self,
        request: ProviderRequest,
    ) -> Result<ProviderResponse, ProviderError>;

    async fn create_message_stream(                           // Streaming request
        &self,
        request: ProviderRequest,
    ) -> Result<
        Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>,
        ProviderError,
    >;

    async fn list_models(&self) -> Result<Vec<ModelInfo>, ProviderError>;  // Model list
    async fn check_connectivity(&self) -> Result<ProviderStatus, ProviderError>; // Health check
    fn capabilities(&self) -> ProviderCapabilities;           // Capability declaration
}
Enter fullscreen mode Exit fullscreen mode

Both input and output use provider-agnostic unified types:

pub struct ProviderRequest {
    pub model: String,
    pub messages: Vec<Message>,          // Unified conversation format
    pub system_prompt: Option<SystemPrompt>,
    pub tools: Vec<ToolDefinition>,      // Unified tool definitions
    pub max_tokens: u32,
    pub temperature: Option<f64>,
    pub thinking: Option<ThinkingConfig>, // Deep thinking configuration
    pub provider_options: Value,          // Provider-specific parameters
}

pub struct ProviderResponse {
    pub id: String,
    pub content: Vec<ContentBlock>,      // Unified content blocks
    pub stop_reason: StopReason,         // Unified stop reason
    pub usage: UsageInfo,                // Token usage
    pub model: String,
}
Enter fullscreen mode Exit fullscreen mode

The core value of the normalization layer: whether the underlying is Claude, GPT, or Gemini, upper-layer code only sees ProviderRequest and ProviderResponse.


ProviderRegistry: Unified Entry for 40+ Models

// boxagnts-api/src/registry.rs
pub struct ProviderRegistry {
    providers: HashMap<ProviderId, Arc<dyn LlmProvider>>,
    default_provider_id: ProviderId,
}

fn provider_from_key(provider_id: &str, key: String) -> Option<Arc<dyn LlmProvider>> {
    match provider_id {
        // Native implementations — each with its own API format
        "anthropic" => Some(Arc::new(AnthropicProvider::from_config(...))),
        "openai"    => Some(Arc::new(OpenAiProvider::new(key))),
        "google"    => Some(Arc::new(GoogleProvider::new(key))),
        "github-copilot" => Some(Arc::new(CopilotProvider::new(key))),
        "cohere"    => Some(Arc::new(CohereProvider::new(key))),

        // OpenAI-compatible providers — share the same conversion logic, only change base_url
        "deepseek", "groq", "ollama", "mistral", "xai",
        "perplexity", "openrouter", "siliconflow", "moonshot",
        "zhipu", "stepfun", "fireworks", "llamacpp",
        "sambanova", "huggingface", "nvidia", "cerebras",
        // ... 30+ OpenAI-compatible providers in total
        _ => None,
    }
}
Enter fullscreen mode Exit fullscreen mode

Three implementation strategies:

Type Representative Conversion Strategy Count
Native Anthropic claude-sonnet-4-5 Near-zero conversion (internal format = Anthropic format) 1
Native OpenAI gpt-4o, o3 ProviderRequest → Chat Completions 1
Native Google gemini-2.5-flash ProviderRequest → generateContent 1
OpenAI Compatible deepseek, groq, ollama, etc. Same logic as OpenAI, only URL changes 30+
Other Native github-copilot, cohere Independent format conversion 3+

Differences Between Three Major API Formats

Anthropic, OpenAI, Google Gemini — three APIs with vast differences in message format. Understanding these differences is essential to understanding the value of the conversion layer.

3.1 System Prompt

Feature Anthropic OpenAI Google Gemini
Location Top-level "system" field messages[0], role:"system" Top-level "systemInstruction" field
Type string or ContentBlock array string only content parts array only
// Anthropic — top-level standalone field
{"model": "claude-sonnet-4-5", "system": "You are helpful.", "messages": [...]}

// OpenAI — embedded in messages array
{"model": "gpt-4o", "messages": [{"role":"system","content":"You are helpful."}, ...]}

// Google — uses systemInstruction field, structure differs from messages
{
  "systemInstruction": {"parts": [{"text": "You are helpful."}]},
  "contents": [{"role": "user", "parts": [{"text": "Hello"}]}]
}
Enter fullscreen mode Exit fullscreen mode

3.2 Tool Definitions

Feature Anthropic OpenAI Google
Field "tools": [{name, description, input_schema}] "tools": [{type:"function", function:{...}}] "tools": [{functionDeclarations: [{name, description, parameters}]}]
Wrapping Layers 0 1 1, with different nesting names

3.3 Tool Call Responses

// Anthropic — native block in content array
{"content": [{"type":"tool_use", "id":"toolu_01A", "name":"read", "input": {...}}]}

// OpenAI — standalone tool_calls array, arguments is JSON string
{"tool_calls": [{"id":"call_abc", "function": {"name":"read", "arguments": "{\"path\":\"...\"}"}}]}

// Google — functionCall embedded in parts, args is JSON object
{"candidates": [{"content": {"parts": [{"functionCall": {"name":"read", "args": {...}}}]}}]}
Enter fullscreen mode Exit fullscreen mode

3.4 Tool Result Format

// Anthropic — tool_result is a block in the user message content array
{"role":"user", "content": [{"type":"tool_result", "tool_use_id":"toolu_01A", "content":"..."}]}

// OpenAI — requires a separate role: "tool" message
{"role":"tool", "tool_call_id":"call_abc", "content":"..."}

// Google — functionResponse embedded in user content parts
{"role":"user", "parts": [{"functionResponse": {"name":"read", "response": {...}}}]}
Enter fullscreen mode Exit fullscreen mode

3.5 Role Naming

Anthropic OpenAI Google
user user user
assistant assistant model

Google uses model instead of assistant — this is the most easily overlooked but most error-prone difference.


Conversion Layer Implementation: OpenAI Provider as Example

OpenAiProvider is the most complete example of the conversion layer:

// boxagnts-api/src/providers/openai.rs
impl OpenAiProvider {
    fn to_openai_messages(
        messages: &[Message],
        system_prompt: Option<&SystemPrompt>,
    ) -> Vec<Value> {
        let mut result: Vec<Value> = Vec::new();

        // Step 1: system prompt → role: "system" message
        if let Some(sys) = system_prompt {
            result.push(json!({"role": "system", "content": sys_text}));
        }

        for msg in messages {
            match msg.role {
                Role::User => {
                    // User messages may mix text and tool_result blocks
                    // tool_result needs to be split into separate role: "tool" messages
                    Self::append_user_messages(&mut result, &msg.content);
                }
                Role::Assistant => {
                    let (text, tool_calls) = Self::assistant_content_to_openai(&msg.content);
                    result.push(json!({
                        "role": "assistant",
                        "content": text,
                        "tool_calls": tool_calls
                    }));
                }
            }
        }
        result
    }

    fn to_openai_tools(tools: &[ToolDefinition]) -> Vec<Value> {
        tools.iter().map(|td| {
            json!({
                "type": "function",
                "function": {
                    "name": td.name,
                    "description": td.description,
                    "parameters": td.input_schema
                }
            })
        }).collect()
    }
}
Enter fullscreen mode Exit fullscreen mode

The most complex part is tool_use_id sanitization — Anthropic's tool IDs (e.g., toolu_01Bx...) may contain characters that OpenAI does not accept.


Google Gemini Provider: Full Adaptation of a Third Format

GoogleProvider shows how to handle an API format that is different from both Anthropic and OpenAI:

// boxagnts-api/src/providers/google.rs
// URL pattern completely different from OpenAI's /v1/chat/completions
fn generate_url(&self, model: &str) -> String {
    format!(
        "{}/v1beta/models/{}:generateContent?key={}",
        self.base_url, model, self.api_key  // API Key in URL query parameters!
    )
}
Enter fullscreen mode Exit fullscreen mode

Key differences from OpenAI:

Difference Google Gemini OpenAI
API Key Location URL query parameter ?key= HTTP Header Authorization: Bearer
Endpoint Format /v1beta/models/{model}:generateContent /v1/chat/completions
Streaming Endpoint /v1beta/models/{model}:streamGenerateContent?alt=sse /v1/chat/completions + stream:true
Message Roles user / model (not assistant) user / assistant
Tool Results functionResponse in parts Separate role: tool message
Image Input inlineData base64 image_url or content parts

Thinking Configuration: Model Differences in Deep Reasoning

ThinkingConfig is the normalized deep thinking configuration — but different providers handle it completely differently:

// Normalized configuration
pub struct ThinkingConfig {
    pub budget_tokens: u32,   // Thinking token budget
}

// When building ProviderRequest, decides whether to pass based on provider capabilities
let provider_request = ProviderRequest {
    // ...
    thinking: if caps.thinking {
        effective_thinking_budget
            .map(|b| ThinkingConfig::enabled(b))
    } else {
        None  // This provider doesn't support thinking, don't pass
    },
};
Enter fullscreen mode Exit fullscreen mode
Provider Thinking Support How It's Passed
Anthropic (Claude 3.5+) "thinking": {"type": "enabled", "budget_tokens": N}
Google (Gemini 2.5+) "thinkingConfig": {"thinkingBudget": N}
OpenAI (o1/o3 series) Partial Via reasoning_effort parameter
Other OpenAI Compatible Mostly unsupported Not passed

At request construction time, ProviderCapabilities declares each provider's capabilities:

pub struct ProviderCapabilities {
    pub thinking: bool,              // Whether deep thinking is supported
    pub prompt_caching: bool,        // Whether prompt caching is supported
    pub image_input: bool,           // Whether image input is supported
    pub native_tool_use: bool,       // Whether native tool calling exists
    pub supports_streaming: bool,    // Whether streaming responses are supported
    // ...
}
Enter fullscreen mode Exit fullscreen mode

ProviderQuirks: Each Provider's "Little Quirks"

OpenAI-compatible providers' APIs are roughly compatible, but all have subtle differences. ProviderQuirks handles these:

pub struct ProviderQuirks {
    /// Specific error message patterns for context overflow
    pub overflow_patterns: Vec<String>,
    /// Local services that don't require API Keys (e.g., Ollama, LM Studio)
    pub no_api_key_required: bool,
    /// Whether streaming responses include usage info
    pub include_usage_in_stream: bool,
    /// Providers like DeepSeek need the reasoning_content field
    pub reasoning_field: Option<String>,
}
Enter fullscreen mode Exit fullscreen mode

For example, DeepSeek's streaming response returns reasoning content with a field name different from OpenAI's — adapted via reasoning_field. Ollama's context overflow error message is "exceeds the available context size", while LM Studio's is "greater than the context length" — adapted via overflow_patterns.


Unified Streaming Processing

Streaming responses are also completely different across the three APIs:

Feature Anthropic (SSE) OpenAI (SSE) Google (SSE)
Event Granularity High: 6 event types (start/delta/stop × 2) Low: each chunk is a complete delta Medium: pushed by chunk, but structure is flat
Tool call Increment Fragmented send of input_json_delta Single send of complete arguments string Single send of complete functionCall
Termination Signal message_stop event data: [DONE] marker Stream ends naturally
Need to Reassemble by index Yes (reassemble by index for multiple tool_use) Yes Yes

All three formats are normalized to the same StreamEvent enum:

pub enum StreamEvent {
    MessageStart { id, model, usage },
    ContentBlockStart { index, content_block },
    TextDelta { text },
    ThinkingDelta { thinking },
    InputJsonDelta { index, partial_json },
    ContentBlockStop { index },
    MessageDelta { stop_reason, usage },
    MessageStop,
}
Enter fullscreen mode Exit fullscreen mode

Error Handling: From Provider Differences to Unified Semantics

Each provider's error format is also different:

// Unified error types
pub enum ProviderError {
    Auth { ... },             // Authentication failure
    RateLimited { ... },      // Rate limiting
    ContextOverflow { ... },  // Context exceeds window (matched via ProviderQuirks)
    InvalidRequest { ... },   // Invalid request parameters
    ServerError { ... },      // Server error
    StreamError { ... },      // Stream interruption
    Other { ... },            // Unknown error
}
Enter fullscreen mode Exit fullscreen mode

In the query loop, specific errors trigger specific recovery strategies:

RateLimited / Overloaded → Switch to fallback_model
ContextOverflow → Trigger auto_compact
StreamError (stall) → Retry (max 2 times, 45s timeout)
Auth → Unrecoverable, return error
Enter fullscreen mode Exit fullscreen mode

Tiered API Key Management

BoxAgnts defines environment variable name mappings for each provider:

// boxagnts-workspace/src/config.rs
pub fn api_key_env_vars_for_provider(provider_id: &str) -> &'static [&'static str] {
    match provider_id {
        "anthropic" => &["ANTHROPIC_API_KEY"],
        "openai" => &["OPENAI_API_KEY"],
        "google" => &["GOOGLE_API_KEY", "GOOGLE_GENERATIVE_AI_API_KEY"],
        "deepseek" => &["DEEPSEEK_API_KEY"],
        "mistral" => &["MISTRAL_API_KEY"],
        "xai" => &["XAI_API_KEY"],
        "zhipu" => &["ZHIPU_API_KEY"],
        // ... 40+ provider environment variables
    }
}
Enter fullscreen mode Exit fullscreen mode

Three-tier priority: Environment Variables > User Config JSON > No Default. This design supports different scenarios such as multi-tenancy, CI/CD, and local development.


Summary

BoxAgnts' model abstraction layer solves the essential problem of "one set of code adapting to all APIs":

┌──────────────────────────────────────────────┐
│  boxagnts-query (Agent reasoning loop)        │
│  Only uses ProviderRequest / ProviderResponse │
└────────────────────┬─────────────────────────┘
                     │
┌────────────────────▼─────────────────────────┐
│  LlmProvider trait                            │
│  + ProviderRegistry (40+ providers)           │
├──────────┬──────────┬──────────┬─────────────┤
│Anthropic │ OpenAI   │ Google   │ OpenAiCompat │
│Provider  │ Provider │ Provider │ (30+ vendors)│
│(Near-zero│ (Full    │ (Independent│ (Shares    │
│ conversion)│ format  │ format    │ OpenAI      │
│          │ conversion)│ conversion)│ conversion  │
│          │          │          │ +Quirks)     │
└──────────┴──────────┴──────────┴─────────────┘
Enter fullscreen mode Exit fullscreen mode

Three key capabilities:

  1. User freedom: Switch models by just changing the --model parameter
  2. Code unaffected: run_query_loop() has no idea what's underneath
  3. Extremely low extension cost: Adding a new OpenAI-compatible provider takes about 3 lines of code

This is not a simple "adapter pattern" — it's a production-grade abstraction validated against 40+ real-world APIs.

Related Resources

Top comments (0)