Walse

Posted on Apr 24 • Originally published at apidog.com

Cara Menggunakan DeepSeek V4: Web Chat, API, dan Self-Hosted

DeepSeek V4 dirilis pada 23 April 2026 dengan empat checkpoint, API langsung, dan bobot berlisensi MIT di Hugging Face. Tidak ada satu cara baku untuk mengimplementasikannya—pilihan terbaik tergantung pada kebutuhan: akses instan, integrasi API produksi, atau deployment on-prem. Artikel ini membahas ketiga jalur tersebut beserta langkah konkret, pro-kontra, dan workflow prompt siap produksi yang bisa langsung Anda gunakan ulang.

Coba Apidog hari ini

Ingin gambaran produk? Baca dulu apa itu DeepSeek V4. Panduan API lengkap tersedia di panduan API DeepSeek V4. Untuk opsi gratis, cek cara menggunakan DeepSeek V4 secara gratis. Siap tes request nyata? Unduh Apidog dan mulai bangun koleksi API.

TL;DR

Jalur tercepat: chat.deepseek.com (obrolan web gratis; V4-Pro default; tiga mode reasoning).
Jalur produksi: https://api.deepseek.com/v1/chat/completions dengan model deepseek-v4-pro atau deepseek-v4-flash.
Self-host: Ambil bobot dari Hugging Face, jalankan skrip /inference di repo.
Pilih Non-Think untuk routing & klasifikasi, Think High untuk kode/analisis, Think Max hanya jika akurasi sangat penting.
Sampling: temperature=1.0, top_p=1.0 (saran DeepSeek).
Gunakan Apidog sebagai API client. Format kompatibel OpenAI: replay request di DeepSeek, OpenAI, Anthropic.

Pilih Jalur Sesuai Beban Kerja

Empat jalur utama, pilih sesuai kebutuhan:

Jalur	Biaya	Waktu persiapan	Terbaik untuk
chat.deepseek.com	Gratis	30 detik	Tes cepat, ad-hoc
DeepSeek API	Per token	5 menit	Produksi, agent, batch
V4-Flash self-hosted	Biaya hardware	Beberapa jam	On-prem, inferensi offline
V4-Pro self-hosted	Biaya cluster	Satu hari	Riset, fine-tune
OpenRouter / agregator	Per token	2 menit	Backup multi-provider

Jalur 1: Coba V4 via Web Chat

Buka chat.deepseek.com
Login (email, Google, atau WeChat)
Model default: V4-Pro. Switch di composer untuk Non-Think, Think High, Think Max.
Langsung kirim prompt.

Fitur: upload file, web search, dukungan 1M-token context. Rate limit berlaku per akun; heavy use bisa melambat tapi jarang diblok total.

Cocok untuk: debugging stacktrace, ringkas PDF ~200 halaman, bandingkan output prompt dengan GPT-5.5/Claude.
Tidak cocok: otomatisasi atau batch replay.

Jalur 2: Integrasi API DeepSeek

Jalur produksi utama. Bentuk request kompatibel OpenAI, mudah migrasi dan scaling.

Langkah 1: Dapatkan API Key

Daftar di platform.deepseek.com
Tambah metode pembayaran (minimal topup $2)
Buat API Key di menu API Keys. Salin dan simpan, hanya tampil sekali.

Set variabel lingkungan untuk digunakan di CLI/SDK:

export DEEPSEEK_API_KEY="sk-..."

Langkah 2: Kirim Permintaan Minimum

Gunakan endpoint OpenAI-compatible sebagai default.

curl https://api.deepseek.com/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "user", "content": "Refactor this Python function to async. Reply with code only."}
    ],
    "thinking_mode": "thinking"
  }'

Pilih deepseek-v4-flash untuk varian lebih ekonomis. thinking_mode: thinking (default), non-thinking (cepat).

Contoh Klien Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a concise senior engineer."},
        {"role": "user", "content": "Explain the CSA+HCA hybrid attention stack."},
    ],
    extra_body={"thinking_mode": "thinking_max"},
    temperature=1.0,
    top_p=1.0,
)

print(response.choices[0].message.content)

Contoh Klien Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com/v1",
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a fizzbuzz in Rust." }],
  temperature: 1.0,
  top_p: 1.0,
});

console.log(response.choices[0].message.content);

Referensi parameter/response detail di panduan API DeepSeek V4.

Jalur 3: Iterasi Cepat dengan Apidog

Curl bagus untuk uji satu kali, namun iterasi berulang lebih efisien dengan API client seperti Apidog.

Unduh Apidog untuk Mac, Windows, atau Linux dari sini.
Buat project API baru, tambah request POST ke https://api.deepseek.com/v1/chat/completions.
Header: Authorization: Bearer {{DEEPSEEK_API_KEY}}. Simpan API key di env variable, bukan di request body.
Paste JSON payload, simpan, dan replay dengan sekali klik setiap kali ingin uji perubahan.
Bandingkan output antara Non-Think & Think Max via response viewer bawaan.

Satu koleksi Apidog bisa memuat request GPT-5.5, Claude, dan DeepSeek V4 sekaligus—A/B test dan monitoring biaya jadi transparan.
Sudah pakai Apidog untuk API AI lain? Cukup ganti base URL ke endpoint DeepSeek, koleksi tetap bisa dipakai ulang. Lihat panduan API GPT-5.5 untuk referensi paralel.

Jalur 4: Self-Host V4-Flash

Untuk kebutuhan compliance, air-gap, atau efisiensi biaya (unit economics), manfaatkan lisensi MIT dengan menjalankan model sendiri.

Perangkat Keras Minimum

V4-Flash: 2–4 GPU H100/H200/MI300X (FP8) atau 1 GPU 80GB (INT4 tight batch)
V4-Pro: 16–32 H100 (cluster production inference)

Download Bobot Model

pip install -U "huggingface_hub[cli]"
huggingface-cli login  # optional, helps with rate limit

huggingface-cli download deepseek-ai/DeepSeek-V4-Flash \
  --local-dir ./models/deepseek-v4-flash \
  --local-dir-use-symlinks False

V4-Flash: ~500GB (FP8), V4-Pro: multi-terabyte.

Jalankan Inferensi Lokal

pip install "vllm>=0.9.0"

vllm serve deepseek-ai/DeepSeek-V4-Flash \
  --tensor-parallel-size 4 \
  --max-model-len 1048576 \
  --dtype auto

Setelah aktif, arahkan klien OpenAI ke http://localhost:8000/v1. Koleksi Apidog sama, base URL berbeda.

Tips Prompting DeepSeek V4

Setel mode reasoning eksplisit. Tentukan thinking_mode pada setiap request.
Prompt sistem untuk persona saja. Tempatkan instruksi tugas di pesan user, bukan system.
Tugas kode: beri test harness. Sertakan test case; model lebih reliable untuk menghasilkan kode yang lolos tes.

Untuk long-context (ratusan ribu token), taruh data relevan di dekat awal & akhir prompt window. V4 punya hybrid attention yang efisien, tapi recency & primacy bias tetap ada.

Kontrol Biaya

Default ke V4-Flash. Upgrade ke V4-Pro hanya jika gap kualitas signifikan.
Default ke Non-Think. Naik ke Think High jika butuh reasoning lebih, Think Max untuk pekerjaan kritikal.
Set batas max_tokens. Limit output, hindari waste context. Sebagian besar jawaban cukup di 2.000 token.

Di Apidog, set env var DEEPSEEK_API_KEY untuk memisahkan uji coba vs produksi. Apidog melacak jumlah token per response—cepat deteksi prompt yang kelewat panjang.

Migrasi dari DeepSeek V3/GPT/Claude

Dari deepseek-chat/deepseek-reasoner: Ganti ID model ke deepseek-v4-pro atau deepseek-v4-flash. Deadline: 24 Juli 2026.
Dari OpenAI GPT-5.x: Ganti base URL ke https://api.deepseek.com/v1, ubah model ID. Format request kompatibel. Referensi di sini.
Dari Anthropic Claude: Endpoint https://api.deepseek.com/anthropic (format pesan Anthropic) atau konversi ke OpenAI-format.

FAQ

Apakah perlu akun berbayar untuk V4? Web chat gratis. API butuh topup minimal $2. Lihat cara menggunakan DeepSeek V4 gratis untuk opsi free tier.

Model default terbaik? Mulai dari V4-Flash Non-Think. Naik level hanya jika perlu.

Bisa run V4 di MacBook? V4-Flash jalan di M3 Max/M4 Max (128GB unified mem, INT4), tapi lambat. V4-Pro tidak didukung. Untuk eksperimen ringan, pakai API/web chat.

Dukungan tools & function call? Ya. Endpoint OpenAI-compatible support tools array; respons tool_calls sama seperti OpenAI. Endpoint Anthropic-format juga tersedia.

Cara streaming response? Set stream: true di JSON body. Response: SSE stream, kompatibel klien OpenAI streaming.

Ada rate limit? API hosted: rate per tier, cek di api-docs.deepseek.com. Self-host: limit = hardware.

DEV Community