Lavelle Hatcher Jr

Posted on Apr 9

What I Learned Calling 4 Different LLM APIs From the Same Codebase

#ai #llm #webdev #javascript

Most comparison articles give you benchmark scores. This one gives you the practical details benchmarks don't cover: response format differences, streaming implementations, and cost considerations I encountered while building a tool that lets users pick their own LLM provider.

Why one codebase, four APIs?

I build browser-based dev tools. One of my projects needed to support multiple LLM providers so users could choose whichever API they prefer. The user picks their provider, enters their own API key, and the tool handles the rest.

Sounds simple. It turned out to be more nuanced than expected.

Supporting OpenAI, Google Gemini, Anthropic Claude, and any OpenAI-compatible endpoint (like local Ollama or LM Studio) from a single codebase taught me a lot about the practical differences between these APIs.

Response format differences

Every provider returns responses in a slightly different structure.

OpenAI (Chat Completions API):

{
  "choices": [{"message": {"content": "..."}}]
}

OpenAI also offers a newer Responses API (/v1/responses) which returns a different format:

{
  "output": [{"type": "message", "content": [{"type": "output_text", "text": "..."}]}]
}

Claude:

{
  "content": [{"type": "text", "text": "..."}]
}

Gemini:

{
  "candidates": [{"content": {"parts": [{"text": "..."}]}}]
}

This seems trivial until you realize your entire downstream pipeline depends on extracting that text reliably. I ended up writing a normalizer function early on:

function extractText(provider, response) {
  switch (provider) {
    case 'openai':
      // Chat Completions API format
      return response.choices?.[0]?.message?.content ?? '';
    case 'openai-responses':
      // Responses API format
      return response.output
        ?.filter(b => b.type === 'message')
        .flatMap(b => b.content)
        .filter(c => c.type === 'output_text')
        .map(c => c.text)
        .join('\n') ?? '';
    case 'claude':
      return response.content
        ?.filter(b => b.type === 'text')
        .map(b => b.text)
        .join('\n') ?? '';
    case 'gemini':
      return response.candidates?.[0]?.content?.parts
        ?.map(p => p.text)
        .join('\n') ?? '';
    default:
      // OpenAI-compatible fallback
      return response.choices?.[0]?.message?.content ?? '';
  }
}

Write this normalizer first. Before you build anything else. Trust me.

Streaming implementations

All four providers support streaming, but each has its own implementation worth understanding.

OpenAI and compatible endpoints use data: [DONE] to signal the end of a stream. Claude uses event: message_stop. Gemini has its own SSE format.

The chunk structure is different too. OpenAI sends delta.content. Claude sends delta.text inside a content_block_delta event. Gemini sends partial text inside candidates[0].content.parts.

If you're building a UI that shows streaming text, you'll need a parser for each provider's format.

System prompt handling

OpenAI Chat Completions API accepts a system role in the messages array:

{"role": "system", "content": "You are a helpful assistant."}

OpenAI's newer Responses API uses a top-level instructions parameter instead:

{
  "instructions": "You are a helpful assistant.",
  "input": [...]
}

Claude takes the system prompt as a separate top-level parameter:

{
  "system": "You are a helpful assistant.",
  "messages": [...]
}

Gemini uses system_instruction as a separate field:

{
  "system_instruction": {"parts": [{"text": "You are a helpful assistant."}]},
  "contents": [...]
}

If you're abstracting this behind a single interface, you need to intercept the system message and route it to the correct location before sending the request. This ensures the model properly receives your system prompt regardless of provider.

Token counting and cost

Each provider has its own pricing structure, so it's worth understanding the differences.

OpenAI charges separately for input and output tokens. Claude does the same but with different pricing tiers per model. Gemini has a free tier with rate limits, and a paid tier.

An interesting observation: the same prompt can produce different output lengths depending on the provider. Each model has its own default verbosity level. This means your cost per request varies even when the input is identical.

If you're letting users bring their own API key, make this transparent. Show estimated token counts before sending the request if possible.

Error handling differences

Each API returns errors differently.

OpenAI returns error.message with HTTP status codes you'd expect (429 for rate limit, 401 for bad key).

Claude returns errors in a error.type and error.message structure. Rate limits come back as rate_limit_error.

Gemini sometimes returns 200 OK with an error inside the response body, so it's important to check the response content as well as the HTTP status code.

// Check both status codes and response body
if (response.ok) {
  const data = await response.json();
  // Gemini can return 200 with an error
  if (data.error) {
    throw new Error(data.error.message);
  }
}

What I'd do differently

If I started over today, here's what I'd do from day one:

Normalize everything immediately. Keep provider-specific response formats contained in an adapter layer so your application logic stays clean.
Test with the cheapest model from each provider. Save your token budget by using GPT-4o-mini (or the newer GPT-5 series mini models), Claude Haiku, and Gemini Flash during development.
OpenAI-compatible is your best friend. If a provider supports the OpenAI format (and many do, including local tools like Ollama and LM Studio), treat them all as one integration. That covers 80% of providers with one code path.
Stream from the start. Adding streaming to a synchronous architecture later requires significant refactoring. Build for streaming on day one even if you don't need it yet.
Log raw responses during development. Having the raw API response saved makes debugging much faster when investigating unexpected behavior.

The bigger picture

The LLM API landscape in 2026 has evolved significantly since 2024.

Most providers now support function calling, structured outputs, and vision. The baseline quality is high enough that the "best" model depends more on your specific use case than on benchmark rankings. OpenAI now offers both the Chat Completions API and the newer Responses API, adding another dimension to consider when integrating.

At the same time, each provider continues to add features with their own implementations. MCP, tool use, multimodal inputs, and structured outputs all have differences across providers, which makes a good abstraction layer increasingly valuable.

The key advantage in this environment isn't picking the "best" LLM. It's building clean abstractions that let you switch between providers without rewriting your application.

DEV Community