Agdex AI

Posted on Apr 26

DeepSeek V4: The Complete Developer Guide (API, Pricing, Local Setup & Migration 2026)

#ai #agents #deepseek #llm

DeepSeek V4 is the hottest LLM story of 2026. GPT-4-class performance at 1/10th the price. OpenAI-compatible API. Open weights for self-hosting. And a July 24 endpoint deprecation deadline that many developers are quietly missing.

This is the guide I wish existed when I started migrating our agent stack to DeepSeek. Let's cover everything.

What is DeepSeek V4?

DeepSeek V4 (API model: deepseek-chat) is the flagship model from DeepSeek AI. Key specs:

Architecture: Mixture-of-Experts (MoE) — only activates a fraction of parameters per call
Context window: 128K tokens
API: OpenAI-compatible (drop-in replacement)
Modalities: Text only (no vision in deepseek-chat)
Availability: Cloud API + open weights

Benchmarks vs GPT-4o

Benchmark	DeepSeek V4	GPT-4o	Claude Sonnet 4
MMLU	88.5%	88.7%	88.3%
HumanEval (coding)	90.2%	90.2%	92.0%
MATH	84.1%	76.6%	78.3%
SWE-bench Verified	42.0%	38.0%	49.0%

Takeaway: Matches GPT-4o on coding, beats it on math. Claude leads on agentic tasks. For price-performance? DeepSeek wins by a mile.

Pricing — The Real Story

Model	Input $/1M	Output $/1M
DeepSeek V4	$0.27	$1.10
GPT-4o	$2.50	$10.00
Claude Sonnet 4	$3.00	$15.00
Gemini 2.5 Pro	$1.25	$10.00

Real example: processing 500K tokens/day costs $4/month with DeepSeek vs $37.50/month with GPT-4o.

💡 Cache tip: DeepSeek's context caching drops cached input to $0.07/1M tokens. For agents with large system prompts, this cuts 60-80% of input costs.

API Quickstart (5 Minutes)

Get your key at platform.deepseek.com → API Keys.

pip install openai  # DeepSeek uses the OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain MoE architecture in 3 sentences."}
    ]
)
print(response.choices[0].message.content)

That's it. If you're already using the OpenAI SDK, change api_key + base_url. Everything else is identical.

Function Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)
tc = response.choices[0].message.tool_calls[0]
print(tc.function.name, tc.function.arguments)

⚠️ Migration: Deprecated Endpoints (July 24 Deadline)

This one is catching people off guard. DeepSeek is sunsetting old model names on July 24, 2026.

Deprecated	Replace With
`deepseek-chat-v3`	`deepseek-chat`
`deepseek-chat-v3-0324`	`deepseek-chat`
`deepseek-reasoner-r1`	`deepseek-reasoner`
`deepseek-reasoner-r1-0528`	`deepseek-reasoner`

Scan your codebase now:

import os

DEPRECATED = [
    "deepseek-chat-v3",
    "deepseek-chat-v3-0324",
    "deepseek-reasoner-r1",
    "deepseek-reasoner-r1-0528",
]

for root, _, files in os.walk("."):
    for f in files:
        if f.endswith(".py"):
            content = open(os.path.join(root, f)).read()
            for dep in DEPRECATED:
                if dep in content:
                    print(f"FOUND: {dep} in {os.path.join(root, f)}")

Run Locally with Ollama

For privacy-sensitive workloads or zero-cost development:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull DeepSeek R1 7B (~4.7 GB)
ollama pull deepseek-r1:7b
ollama run deepseek-r1:7b

Then use via OpenAI-compatible API:

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
response = client.chat.completions.create(
    model="deepseek-r1:7b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Hardware guide:

7B → 8 GB VRAM (good dev machine)
14B → 16 GB VRAM (near cloud quality)
32B → 24 GB VRAM (strongest local option)

Building AI Agents with DeepSeek V4

Minimal agentic loop:

import json
from openai import OpenAI

client = OpenAI(api_key="your-key", base_url="https://api.deepseek.com")

def run_agent(user_input, max_turns=5):
    messages = [{"role": "user", "content": user_input}]

    for _ in range(max_turns):
        resp = client.chat.completions.create(
            model="deepseek-chat",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        msg = resp.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content  # Final answer

        for tc in msg.tool_calls:
            result = execute_tool(tc)  # Your tool logic
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result
            })

    return "Max turns reached"

LangChain integration (drop-in)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="deepseek-chat",
    api_key="your-key",
    base_url="https://api.deepseek.com",
    temperature=0
)
# Works with all LangChain agents, chains, and tools unchanged

Limitations (Be Honest With Yourself)

❌ No vision — use GPT-4o or Gemini Flash for images
❌ No audio/video — text only
⚠️ No official SLA — configure LiteLLM fallback for production
⚠️ GDPR concerns — requests processed in China; use Mistral or on-premise for EU data

Production fallback pattern (LiteLLM)

import litellm

response = litellm.completion(
    model="deepseek/deepseek-chat",
    messages=messages,
    fallbacks=["gpt-4o-mini", "claude-haiku-3-5"]
)

When to Use DeepSeek V4

Use Case	Verdict
Text agents / chatbots	✅ Best price-performance
Coding assistant	✅ HumanEval 90%+
High-volume production	✅ Set up fallback
Math / reasoning	✅ Beats GPT-4o
Vision / multimodal	❌ Use GPT-4o
1M+ token context	❌ Use Gemini 2.5 Pro
EU GDPR-sensitive	⚠️ Use Mistral or self-host

Bottom line: If your workload is text-only, DeepSeek V4 is the most cost-efficient production-grade LLM in 2026. The 10x price advantage over GPT-4o on equivalent quality is simply hard to ignore.

Find DeepSeek, LiteLLM, LangChain, and 420+ other AI agent tools at AgDex.ai — the AI tools directory for developers.

DEV Community