Moonshot AI’s Kimi K2.6 announcement positions it as the new open-source state of the art for coding, long-horizon execution, and agent swarms. The API is OpenAI-compatible, hosted at https://api.moonshot.ai/v1, and documented on platform. If you have the OpenAI SDK installed, you can be sending real requests in about five minutes.
This guide covers authentication, your first request, streaming, tool calling, vision and video input, thinking mode, using Agent Swarm with 300 sub-agents, and how to test every endpoint with Apidog before writing integration code.
💡 Fast path: Test the Kimi K2.6 API visually in Apidog before committing any integration code. One import, one Bearer token, and you’re making real streamed requests with full history and schema validation. Download Apidog free.
TL;DR: Kimi K2.6 API in 60 seconds
-
Base URL:
https://api.moonshot.ai/v1 -
Endpoint:
POST /chat/completions -
Model IDs:
kimi-k2.6,kimi-k2.6-thinking -
Auth:
Authorization: Bearer $KIMI_API_KEY -
Format: OpenAI chat completions schema (
messages,tools,stream, etc.) - Context: 262,144 input tokens, up to 98,304 output tokens for reasoning
-
Defaults:
temperature: 1.0,top-p: 1.0
Minimal curl example:
curl https://api.moonshot.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $KIMI_API_KEY" \
-d '{
"model": "kimi-k2.6",
"messages": [{"role": "user", "content": "Write a Python function that reverses a string."}]
}'
The rest of this guide details advanced usage, including Agent Swarm and the 4,000-step execution cap.
What you can actually do with this API
- Run coding agents on tasks for 12+ hours (see Qwen3.5-0.8B Mac inference demo: 4,000+ tool calls, 15→193 tokens/sec throughput).
- Manage infrastructure autonomously over multi-day sessions with incident response.
- Achieve long-horizon reliability across Rust, Go, Python, and Zig.
- Orchestrate agent swarms up to 300 sub-agents and 4,000+ coordinated steps.
- Generate full-stack apps (auth, database, transactions) from a single prompt.
- Build vision + Python pipelines (e.g., MathVision with Python: 93.2%).
If you’re building tools like Claude Code, Cursor Composer 2, or similar, K2.6 API is a direct model-layer swap.
Step 1: Get an API key
- Go to platform.moonshot.ai or platform.kimi.ai and sign up (email or Google OAuth).
- Verify your account (international users may need SMS).
- Add billing (new accounts usually get a small free balance).
- Go to API Keys, click Create Key, and copy it immediately (it’s shown once).
-
Export your key:
export KIMI_API_KEY="sk-..."Add to
.zshrc,.bashrc, or a secret manager. Never commit it.
For cost-free development options, see How to Use Kimi K2.6 for Free.
Step 2: Pick your SDK
The API is OpenAI-compatible. Use official OpenAI SDKs—just change the base URL.
| Option | Install | Best for |
|---|---|---|
| curl | built-in | Quick tests, CI |
| OpenAI Python | pip install openai |
Python services |
| OpenAI Node | npm install openai |
JS/TS apps |
Python example:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("KIMI_API_KEY"),
base_url="https://api.moonshot.ai/v1",
)
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
Node.js example:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.KIMI_API_KEY,
baseURL: "https://api.moonshot.ai/v1",
});
const response = await client.chat.completions.create({
model: "kimi-k2.6",
messages: [{ role: "user", content: "What is the capital of France?" }],
});
console.log(response.choices[0].message.content);
curl example:
curl https://api.moonshot.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $KIMI_API_KEY" \
-d '{
"model": "kimi-k2.6",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'
All three return the same response format.
Step 3: Understand the request body
The schema matches OpenAI chat completions:
{
"model": "kimi-k2.6",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Your prompt here." }
],
"temperature": 1.0,
"top_p": 1.0,
"max_tokens": 8192,
"stream": false,
"tools": [],
"tool_choice": "auto",
"thinking": { "type": "disabled" }
}
Moonshot-specific notes:
- Defaults are high (
temperature: 1.0,top_p: 1.0). Don't use OpenAI's low-temp habits for code generation here. -
thinkingtoggles the reasoning trace onkimi-k2.6-thinking. Use{"type": "disabled"}for fast answers.
Step 4: Streaming
For UI or long generations, always use streaming. Max output can reach 98,304 tokens.
Python streaming:
stream = client.chat.completions.create(
model="kimi-k2.6",
messages=[{"role": "user", "content": "Write a 500-word essay on MoE models."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Node.js streaming:
const stream = await client.chat.completions.create({
model: "kimi-k2.6",
messages: [{ role: "user", content: "Write a 500-word essay on MoE models." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
Streaming works with tool calls; arguments arrive as JSON deltas.
Step 5: Tool calling
Kimi K2.6 uses the OpenAI function-calling format. Toolathlon score: 50.0%, 96.60% invocation success in partner testing.
Define tools:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
First call (model decides):
import json
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
resp = client.chat.completions.create(
model="kimi-k2.6",
messages=messages,
tools=tools,
tool_choice="auto",
)
msg = resp.choices[0].message
messages.append(msg)
if msg.tool_calls:
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
result = fetch_weather(args["location"], args.get("unit", "celsius"))
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(result),
})
Second call (final answer):
final = client.chat.completions.create(
model="kimi-k2.6",
messages=messages,
tools=tools,
)
print(final.choices[0].message.content)
K2.6 supports multi-step tool chains, enabling long-running agents. For other frameworks, see Claude Code workflows.
Step 6: Vision input
K2.6 supports images in user messages via OpenAI’s image_url format.
Example:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image in one sentence."},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
],
)
Local files (base64):
import base64
with open("photo.jpg", "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")
image_url = f"data:image/jpeg;base64,{b64}"
For OCR/diagrams, combine text instructions with the image. For math, use a Python interpreter tool.
Step 7: Video input
Pass a video URL or frame sequence:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize what happens in this video."},
{"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
]
}
],
)
Short clips (<30s) work in a single call. For longer video, use streaming.
Step 8: Thinking mode
kimi-k2.6-thinking produces a visible reasoning trace.
Thinking on (default for thinking model):
response = client.chat.completions.create(
model="kimi-k2.6-thinking",
messages=[{"role": "user", "content": "Prove sqrt(2) is irrational."}],
)
Thinking off:
response = client.chat.completions.create(
model="kimi-k2.6-thinking",
messages=[{"role": "user", "content": "Quick: what's 17 * 23?"}],
extra_body={"thinking": {"type": "disabled"}},
)
Reasoning trace returns in a reasoning field. Hide it from end users as needed.
Step 9: Agent Swarm
Agent Swarm supports up to 300 sub-agents and 4,000+ steps.
Invoke via agent parameter:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[{
"role": "user",
"content": "Build a 5-page marketing site for a coffee brand with responsive design and a newsletter signup."
}],
extra_body={
"agent": {
"type": "swarm",
"max_agents": 30,
"max_steps": 4000
}
},
)
Production tips:
- Use streaming to monitor and kill long runs as needed.
- Cap
max_agents(10–30 is safer than 300 for typical tasks). - Log usage and set budgets; long swarm runs consume tokens quickly.
Step 10: Test everything with Apidog
Every endpoint and body format can be tested visually before coding.
Quick Apidog setup
- Download Apidog and create a new project.
- Create a
kimi-prodenvironment:BASE_URL = https://api.moonshot.ai/v1KIMI_API_KEY = sk-...
- Add API request:
POST {{BASE_URL}}/chat/completions - Set headers:
Authorization: Bearer {{KIMI_API_KEY}}Content-Type: application/json
-
Example body (streaming):
{ "model": "kimi-k2.6", "messages": [{ "role": "user", "content": "Hello, Kimi K2.6!" }], "stream": true } Click Send. Tokens stream in real time.
What Apidog adds
- Schema validation against the OpenAI chat spec (see missing fields instantly)
- Request history and replay
- Environment switching (dev, staging, prod)
- Team sharing via project export
- Mock servers for offline/incident testing
- SSE stream support for Kimi format
- VS Code extension available
If you’re moving from Postman, see API testing without Postman.
Error handling that won’t fight you
Moonshot uses standard HTTP status codes:
- 400: Malformed body or wrong model name
- 401: Auth failure
- 429: Rate limit/quota exhausted
- 500: Server error (retry with backoff)
- 529: Overloaded (retry after delay)
Retry wrapper (Python):
import time
from openai import OpenAI, RateLimitError, APIError
def call_kimi(messages, max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="kimi-k2.6",
messages=messages,
)
except RateLimitError:
time.sleep(2 ** attempt)
except APIError as e:
if e.status_code >= 500 and attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raise
raise RuntimeError("Kimi K2.6 failed after retries")
For mid-stream disconnects, track tokens and restart with "continue from here" if needed. Long streams (up to 98,304 tokens) are normal.
Cost control
Pricing: kimi.com/membership/pricing
Tips:
- Cap
max_tokens(2,048 is plenty for chat). - Cache system prompts to benefit from prompt caching.
- Log
prompt_tokens,completion_tokens,total_tokens—pipe to metrics and alerting.
Production pattern: a GitHub-issue fixer
Agent structure for reading a GitHub issue, locating code, proposing a fix, and running tests:
from openai import OpenAI
import os, json
client = OpenAI(
api_key=os.getenv("KIMI_API_KEY"),
base_url="https://api.moonshot.ai/v1",
)
tools = [
{"type": "function", "function": {
"name": "read_file",
"description": "Read a file in the repo.",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
}},
{"type": "function", "function": {
"name": "search_code",
"description": "Ripgrep the codebase for a pattern.",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}},
{"type": "function", "function": {
"name": "run_tests",
"description": "Run the project test suite.",
"parameters": {"type": "object", "properties": {}}
}},
]
def tool_dispatch(name, args):
if name == "read_file":
with open(args["path"]) as f:
return f.read()
if name == "search_code":
return run_ripgrep(args["query"])
if name == "run_tests":
return run_pytest()
raise ValueError(f"Unknown tool: {name}")
messages = [
{"role": "system", "content": "You are a senior engineer. Fix the described bug."},
{"role": "user", "content": "Issue: login form submits twice on slow networks."}
]
while True:
resp = client.chat.completions.create(
model="kimi-k2.6",
messages=messages,
tools=tools,
)
msg = resp.choices[0].message
messages.append(msg)
if not msg.tool_calls:
print(msg.content)
break
for call in msg.tool_calls:
result = tool_dispatch(call.function.name, json.loads(call.function.arguments))
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": result,
})
Scale to Agent Swarm by adding the extra_body swarm config. Can be combined with human-in-the-loop stacks.
FAQ
Do I need a Moonshot-specific SDK?
No. Use OpenAI Python/Node SDKs after changing base_url.
Is the API rate-limited?
Yes. Limits depend on your tier and usage; check dashboard.
Does Kimi K2.6 work with LangChain, LlamaIndex, Vercel AI SDK?
Yes, if the framework accepts an OpenAI-compatible base URL.
Does Kimi K2.6 support JSON mode?
Yes. Use response_format: {"type": "json_object"} or strict schema.
Context window size?
262,144 input tokens, 98,304 output tokens for reasoning.
Fine-tuning via API?
Not yet. Fine-tune by running open weights on your hardware.
kimi-k2.6 vs kimi-k2.6-thinking?
kimi-k2.6: fast agent. kimi-k2.6-thinking: exposes reasoning steps; tuned for math/logic/planning.
Is there a free tier?
See our Kimi K2.6 free access guide.
Summary
Kimi K2.6 API drops into any OpenAI-compatible toolchain: just change the base URL and API key. You get a 262K context window, Agent Swarm, 96.60% tool invocation, and open-source weights if you want to self-host.
For new integrations, use Apidog to visually construct and validate endpoints. This catches schema errors, streaming bugs, and auth issues before they hit your codebase. Then port working requests into your Python/Node services.
References and further reading
- Official announcement: Kimi K2.6 — Moonshot AI blog
- API quickstart: platform.kimi.ai
- API platform: platform.moonshot.ai
- Kimi Code terminal agent: kimi.com/code
- Pricing: kimi.com/membership/pricing
- Open weights: huggingface.co/moonshotai/Kimi-K2.6
- Related Apidog guides: What is Kimi K2.6, Kimi K2.6 for free, Qwen 3.6 free on OpenRouter, Qwen3.5-Omni API, Apidog inside VS Code, API testing without Postman, API testing for 50+ engineers, Claude Code workflows, Cursor Composer 2.


Top comments (0)