DEV Community

Agdex AI
Agdex AI

Posted on

DeepSeek V4: The Complete Developer Guide (API, Pricing, Local Setup & Migration 2026)

DeepSeek V4 is the hottest LLM story of 2026. GPT-4-class performance at 1/10th the price. OpenAI-compatible API. Open weights for self-hosting. And a July 24 endpoint deprecation deadline that many developers are quietly missing.

This is the guide I wish existed when I started migrating our agent stack to DeepSeek. Let's cover everything.


What is DeepSeek V4?

DeepSeek V4 (API model: deepseek-chat) is the flagship model from DeepSeek AI. Key specs:

  • Architecture: Mixture-of-Experts (MoE) — only activates a fraction of parameters per call
  • Context window: 128K tokens
  • API: OpenAI-compatible (drop-in replacement)
  • Modalities: Text only (no vision in deepseek-chat)
  • Availability: Cloud API + open weights

Benchmarks vs GPT-4o

Benchmark DeepSeek V4 GPT-4o Claude Sonnet 4
MMLU 88.5% 88.7% 88.3%
HumanEval (coding) 90.2% 90.2% 92.0%
MATH 84.1% 76.6% 78.3%
SWE-bench Verified 42.0% 38.0% 49.0%

Takeaway: Matches GPT-4o on coding, beats it on math. Claude leads on agentic tasks. For price-performance? DeepSeek wins by a mile.


Pricing — The Real Story

Model Input $/1M Output $/1M
DeepSeek V4 $0.27 $1.10
GPT-4o $2.50 $10.00
Claude Sonnet 4 $3.00 $15.00
Gemini 2.5 Pro $1.25 $10.00

Real example: processing 500K tokens/day costs $4/month with DeepSeek vs $37.50/month with GPT-4o.

💡 Cache tip: DeepSeek's context caching drops cached input to $0.07/1M tokens. For agents with large system prompts, this cuts 60-80% of input costs.


API Quickstart (5 Minutes)

Get your key at platform.deepseek.com → API Keys.

pip install openai  # DeepSeek uses the OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain MoE architecture in 3 sentences."}
    ]
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

That's it. If you're already using the OpenAI SDK, change api_key + base_url. Everything else is identical.

Function Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)
tc = response.choices[0].message.tool_calls[0]
print(tc.function.name, tc.function.arguments)
Enter fullscreen mode Exit fullscreen mode

⚠️ Migration: Deprecated Endpoints (July 24 Deadline)

This one is catching people off guard. DeepSeek is sunsetting old model names on July 24, 2026.

Deprecated Replace With
deepseek-chat-v3 deepseek-chat
deepseek-chat-v3-0324 deepseek-chat
deepseek-reasoner-r1 deepseek-reasoner
deepseek-reasoner-r1-0528 deepseek-reasoner

Scan your codebase now:

import os

DEPRECATED = [
    "deepseek-chat-v3",
    "deepseek-chat-v3-0324",
    "deepseek-reasoner-r1",
    "deepseek-reasoner-r1-0528",
]

for root, _, files in os.walk("."):
    for f in files:
        if f.endswith(".py"):
            content = open(os.path.join(root, f)).read()
            for dep in DEPRECATED:
                if dep in content:
                    print(f"FOUND: {dep} in {os.path.join(root, f)}")
Enter fullscreen mode Exit fullscreen mode

Run Locally with Ollama

For privacy-sensitive workloads or zero-cost development:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull DeepSeek R1 7B (~4.7 GB)
ollama pull deepseek-r1:7b
ollama run deepseek-r1:7b
Enter fullscreen mode Exit fullscreen mode

Then use via OpenAI-compatible API:

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
response = client.chat.completions.create(
    model="deepseek-r1:7b",
    messages=[{"role": "user", "content": "Hello!"}]
)
Enter fullscreen mode Exit fullscreen mode

Hardware guide:

  • 7B → 8 GB VRAM (good dev machine)
  • 14B → 16 GB VRAM (near cloud quality)
  • 32B → 24 GB VRAM (strongest local option)

Building AI Agents with DeepSeek V4

Minimal agentic loop:

import json
from openai import OpenAI

client = OpenAI(api_key="your-key", base_url="https://api.deepseek.com")

def run_agent(user_input, max_turns=5):
    messages = [{"role": "user", "content": user_input}]

    for _ in range(max_turns):
        resp = client.chat.completions.create(
            model="deepseek-chat",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        msg = resp.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content  # Final answer

        for tc in msg.tool_calls:
            result = execute_tool(tc)  # Your tool logic
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result
            })

    return "Max turns reached"
Enter fullscreen mode Exit fullscreen mode

LangChain integration (drop-in)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="deepseek-chat",
    api_key="your-key",
    base_url="https://api.deepseek.com",
    temperature=0
)
# Works with all LangChain agents, chains, and tools unchanged
Enter fullscreen mode Exit fullscreen mode

Limitations (Be Honest With Yourself)

  • No vision — use GPT-4o or Gemini Flash for images
  • No audio/video — text only
  • ⚠️ No official SLA — configure LiteLLM fallback for production
  • ⚠️ GDPR concerns — requests processed in China; use Mistral or on-premise for EU data

Production fallback pattern (LiteLLM)

import litellm

response = litellm.completion(
    model="deepseek/deepseek-chat",
    messages=messages,
    fallbacks=["gpt-4o-mini", "claude-haiku-3-5"]
)
Enter fullscreen mode Exit fullscreen mode

When to Use DeepSeek V4

Use Case Verdict
Text agents / chatbots ✅ Best price-performance
Coding assistant ✅ HumanEval 90%+
High-volume production ✅ Set up fallback
Math / reasoning ✅ Beats GPT-4o
Vision / multimodal ❌ Use GPT-4o
1M+ token context ❌ Use Gemini 2.5 Pro
EU GDPR-sensitive ⚠️ Use Mistral or self-host

Bottom line: If your workload is text-only, DeepSeek V4 is the most cost-efficient production-grade LLM in 2026. The 10x price advantage over GPT-4o on equivalent quality is simply hard to ignore.


Find DeepSeek, LiteLLM, LangChain, and 420+ other AI agent tools at AgDex.ai — the AI tools directory for developers.

Top comments (0)