"I Wired DeepSeek V4 Into Claude Code and Codex CLI Without Touching the Tools"

#javascript #webdev #opensource #ai

DeepSeek V4 dropped, the benchmarks looked aggressive, and the price-per-million-tokens looked even more aggressive. The first thing I wanted to know wasn't "is it as good as Claude Opus 4.6 or GPT-5.4?" — it was "can my actual coding agents use it without me rewriting half my workflow?"

Because that's the part nobody benchmarks. A model can be the cheapest reasoner on the leaderboard and still be useless to me if Claude Code, Codex CLI, and Gemini CLI can't talk to it the way they expect to.

Here's what I learned getting DeepSeek V4 into a real AI coding workflow without forking any of the CLIs.

The protocol problem nobody warns you about

Every AI coding tool has hard-coded assumptions about which API protocol it speaks:

Claude Code speaks Anthropic's Messages API. It expects x-api-key, anthropic-version, content blocks, cache_control, the whole shape.
Codex CLI speaks OpenAI's Responses API. It expects Authorization: Bearer, tools, tool_choice, streaming SSE in OpenAI's specific format.
Gemini CLI speaks Google's GenerativeLanguage API.

DeepSeek's solution is genuinely thoughtful — they expose both an OpenAI-compatible endpoint at https://api.deepseek.com and an Anthropic-compatible endpoint at https://api.deepseek.com/anthropic. Same model, two protocols.

Great. So in theory you can point Claude Code at the Anthropic endpoint and Codex at the OpenAI endpoint and you're done.

In practice you can't, because Claude Code reads exactly one ANTHROPIC_BASE_URL and Codex reads exactly one OPENAI_BASE_URL. The moment you want to use both Claude (via Anthropic-direct) and DeepSeek (via DeepSeek-Anthropic-compat) for the same agent, you have to pick. Switching means restarting the tool with new env vars. Every. Single. Time.

That's the moment a local gateway stops being optional.

The setup: one localhost, three tools, four providers

The thing I built — CliGate — runs as a local proxy on localhost:7860. Each AI coding tool points at it once and never thinks about provider URLs again.

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:7860
export ANTHROPIC_AUTH_TOKEN=any-string

# Codex CLI
export OPENAI_BASE_URL=http://localhost:7860/v1
export OPENAI_API_KEY=any-string

# Gemini CLI
export GOOGLE_GEMINI_BASE_URL=http://localhost:7860

The gateway then routes each request to whichever provider you configured for that tier — Anthropic, OpenAI, Gemini, Azure OpenAI, GLM, or DeepSeek — and translates the protocol on the way out and back.

Adding DeepSeek as a first-class provider

Here's the actual provider implementation that landed last week. The interesting part is what I didn't have to write:

import { OpenAIProvider } from './openai.js';

const DEFAULT_BASE_URL = 'https://api.deepseek.com';
const ANTHROPIC_API_VERSION = '2023-06-01';

export class DeepSeekProvider extends OpenAIProvider {
    constructor(config) {
        super({ ...config, baseUrl: config.baseUrl || DEFAULT_BASE_URL });
        this.type = 'deepseek';
        // Ride chat-completions for Codex/Responses; not a native Responses provider.
        this.sendResponsesRequest = undefined;
    }

    _buildAnthropicBaseUrl() {
        return `${this.baseUrl}/anthropic`;
    }

    async sendAnthropicRequest(body) {
        return fetch(`${this._buildAnthropicBaseUrl()}/v1/messages`, {
            method: 'POST',
            headers: {
                'x-api-key': this.apiKey,
                'anthropic-version': ANTHROPIC_API_VERSION,
                'Content-Type': 'application/json'
            },
            body: JSON.stringify(body)
        });
    }
}

That's basically it. Forty lines.

The reason it's forty lines and not four hundred: DeepSeek's Anthropic-compatible endpoint is genuinely Anthropic-compatible. I don't have to translate Claude Code's messages into something else and back — I can just forward them. For OpenAI-compatible traffic from Codex, the existing OpenAIProvider already handles chat-completions; I extend it and override the base URL.

The one subtle thing — this.sendResponsesRequest = undefined — matters because DeepSeek does not implement OpenAI's newer Responses API. If I left that inherited, the gateway would try to call /v1/responses and get 404s. By unsetting it, the gateway falls back to chat-completions, which DeepSeek does support. That single line is the kind of detail that separates "it works in my demo" from "it works for a coding agent that does 50 tool calls in a session."

The model tier mapping

CliGate maps incoming model names to tiers — flagship, standard, fast, reasoning — and each provider declares which of its models fills each tier:

deepseek: {
    flagship:  'deepseek-v4-pro',
    standard:  'deepseek-v4-flash',
    fast:      'deepseek-v4-flash',
    reasoning: 'deepseek-v4-pro',
}

So when Claude Code asks for claude-sonnet-4-6 and you've routed standard traffic to DeepSeek, the gateway translates that into a deepseek-v4-flash call — without Claude Code knowing anything changed. The agent thinks it's still talking to Anthropic.

This is the part that actually matters for cost. DeepSeek V4 Flash is priced at:

Input: $0.27 per million tokens
Cache hit input: $0.07 per million tokens
Output: $1.10 per million tokens

Compared to a flagship model in the $3–$15 range, you can move the bulk of your boilerplate-tier coding traffic — file edits, simple refactors, test scaffolding — to DeepSeek Flash and only escalate to Claude or GPT-5 for the gnarly reasoning tasks. Same agent, three different brains, picked per request.

What actually surprised me

Two things.

First, tool calling worked on the Anthropic-compat endpoint without modification. I expected to have to sanitize tool schemas the way I do for Azure OpenAI (which rejects $schema, const, etc.). DeepSeek's Anthropic-compat layer accepts what Claude Code emits. That's a meaningful effort on their end.

Second, the cache pricing changes my mental model. DeepSeek bills cache reads at roughly a quarter of the normal input rate. For a coding agent doing repeated tool calls within the same session — where most of the prompt is the same system + history + repo context — caching turns into the dominant economic factor. It's no longer "which model is cheapest per token", it's "which model's cache hit rate × cache price is lowest."

That's a different optimization problem than the one the leaderboards are solving.

Why I'm not abandoning Claude or GPT-5

To be clear: this isn't a "DeepSeek replaces everything" post. After a week of routing traffic, my mental split is:

Hard reasoning, ambiguous specs, large refactors → Claude Opus 4.6 or GPT-5.4. They still win when the problem isn't well-formed.
Boilerplate code generation, formatting, test scaffolding, doc writing → DeepSeek V4 Flash. The quality is fine and the cost is roughly an order of magnitude lower.
Dirt-cheap classification, intent routing, log triage → DeepSeek Flash with aggressive caching.

The gateway is what makes this split practical instead of theoretical. Without it, every "use DeepSeek for this task" decision means restarting Claude Code with new env vars. With it, the routing happens server-side and the tools never know.

Try it

Source: github.com/codeking-ai/cligate

Add a DeepSeek API key in the dashboard, route the standard and fast tiers to DeepSeek, leave flagship and reasoning on Claude or GPT-5. Run Claude Code or Codex CLI as normal.

I'd genuinely like to hear how others are splitting workloads across model providers right now. Are you doing it at the agent level, the request level, or just paying for one flagship and calling it done? The answer changed for me twice this quarter and I don't think I've found the final shape yet.