Preecha

Posted on Apr 25

How to Use the Kimi K2.6 API?

Moonshot AI’s Kimi K2.6 announcement positions it as the new open-source state of the art for coding, long-horizon execution, and agent swarms. The API is OpenAI-compatible, hosted at https://api.moonshot.ai/v1, and documented on platform. If you have the OpenAI SDK installed, you can be sending real requests in about five minutes.

Try Apidog today

This guide covers authentication, your first request, streaming, tool calling, vision and video input, thinking mode, using Agent Swarm with 300 sub-agents, and how to test every endpoint with Apidog before writing integration code.

💡 Fast path: Test the Kimi K2.6 API visually in Apidog before committing any integration code. One import, one Bearer token, and you’re making real streamed requests with full history and schema validation. Download Apidog free.

TL;DR: Kimi K2.6 API in 60 seconds

Base URL: https://api.moonshot.ai/v1
Endpoint: POST /chat/completions
Model IDs: kimi-k2.6, kimi-k2.6-thinking
Auth: Authorization: Bearer $KIMI_API_KEY
Format: OpenAI chat completions schema (messages, tools, stream, etc.)
Context: 262,144 input tokens, up to 98,304 output tokens for reasoning
Defaults: temperature: 1.0, top-p: 1.0

Minimal curl example:

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KIMI_API_KEY" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [{"role": "user", "content": "Write a Python function that reverses a string."}]
  }'

The rest of this guide details advanced usage, including Agent Swarm and the 4,000-step execution cap.

What you can actually do with this API

Run coding agents on tasks for 12+ hours (see Qwen3.5-0.8B Mac inference demo: 4,000+ tool calls, 15→193 tokens/sec throughput).
Manage infrastructure autonomously over multi-day sessions with incident response.
Achieve long-horizon reliability across Rust, Go, Python, and Zig.
Orchestrate agent swarms up to 300 sub-agents and 4,000+ coordinated steps.
Generate full-stack apps (auth, database, transactions) from a single prompt.
Build vision + Python pipelines (e.g., MathVision with Python: 93.2%).

If you’re building tools like Claude Code, Cursor Composer 2, or similar, K2.6 API is a direct model-layer swap.

Step 1: Get an API key

Go to platform.moonshot.ai or platform.kimi.ai and sign up (email or Google OAuth).
Verify your account (international users may need SMS).
Add billing (new accounts usually get a small free balance).
Go to API Keys, click Create Key, and copy it immediately (it’s shown once).
Export your key:
```
export KIMI_API_KEY="sk-..."
```
Add to .zshrc, .bashrc, or a secret manager. Never commit it.

For cost-free development options, see How to Use Kimi K2.6 for Free.

Step 2: Pick your SDK

The API is OpenAI-compatible. Use official OpenAI SDKs—just change the base URL.

Option	Install	Best for
curl	built-in	Quick tests, CI
OpenAI Python	`pip install openai`	Python services
OpenAI Node	`npm install openai`	JS/TS apps

Python example:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("KIMI_API_KEY"),
    base_url="https://api.moonshot.ai/v1",
)

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)

Node.js example:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.KIMI_API_KEY,
  baseURL: "https://api.moonshot.ai/v1",
});

const response = await client.chat.completions.create({
  model: "kimi-k2.6",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

console.log(response.choices[0].message.content);

curl example:

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KIMI_API_KEY" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

All three return the same response format.

Step 3: Understand the request body

The schema matches OpenAI chat completions:

{
  "model": "kimi-k2.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Your prompt here." }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "max_tokens": 8192,
  "stream": false,
  "tools": [],
  "tool_choice": "auto",
  "thinking": { "type": "disabled" }
}

Moonshot-specific notes:

Defaults are high (temperature: 1.0, top_p: 1.0). Don't use OpenAI's low-temp habits for code generation here.
thinking toggles the reasoning trace on kimi-k2.6-thinking. Use {"type": "disabled"} for fast answers.

Step 4: Streaming

For UI or long generations, always use streaming. Max output can reach 98,304 tokens.

Python streaming:

stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "Write a 500-word essay on MoE models."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Node.js streaming:

const stream = await client.chat.completions.create({
  model: "kimi-k2.6",
  messages: [{ role: "user", content: "Write a 500-word essay on MoE models." }],
  stream: true,
});
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

Streaming works with tool calls; arguments arrive as JSON deltas.

Step 5: Tool calling

Kimi K2.6 uses the OpenAI function-calling format. Toolathlon score: 50.0%, 96.60% invocation success in partner testing.

Define tools:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

First call (model decides):

import json

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
resp = client.chat.completions.create(
    model="kimi-k2.6",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
msg = resp.choices[0].message
messages.append(msg)

if msg.tool_calls:
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        result = fetch_weather(args["location"], args.get("unit", "celsius"))
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result),
        })

Second call (final answer):

final = client.chat.completions.create(
    model="kimi-k2.6",
    messages=messages,
    tools=tools,
)
print(final.choices[0].message.content)

K2.6 supports multi-step tool chains, enabling long-running agents. For other frameworks, see Claude Code workflows.

Step 6: Vision input

K2.6 supports images in user messages via OpenAI’s image_url format.

Example:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in one sentence."},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
            ]
        }
    ],
)

Local files (base64):

import base64
with open("photo.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode("utf-8")
image_url = f"data:image/jpeg;base64,{b64}"

For OCR/diagrams, combine text instructions with the image. For math, use a Python interpreter tool.

Step 7: Video input

Pass a video URL or frame sequence:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize what happens in this video."},
                {"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
            ]
        }
    ],
)

Short clips (<30s) work in a single call. For longer video, use streaming.

Step 8: Thinking mode

kimi-k2.6-thinking produces a visible reasoning trace.

Thinking on (default for thinking model):

response = client.chat.completions.create(
    model="kimi-k2.6-thinking",
    messages=[{"role": "user", "content": "Prove sqrt(2) is irrational."}],
)

Thinking off:

response = client.chat.completions.create(
    model="kimi-k2.6-thinking",
    messages=[{"role": "user", "content": "Quick: what's 17 * 23?"}],
    extra_body={"thinking": {"type": "disabled"}},
)

Reasoning trace returns in a reasoning field. Hide it from end users as needed.

Step 9: Agent Swarm

Agent Swarm supports up to 300 sub-agents and 4,000+ steps.

Invoke via agent parameter:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{
        "role": "user",
        "content": "Build a 5-page marketing site for a coffee brand with responsive design and a newsletter signup."
    }],
    extra_body={
        "agent": {
            "type": "swarm",
            "max_agents": 30,
            "max_steps": 4000
        }
    },
)

Production tips:

Use streaming to monitor and kill long runs as needed.
Cap max_agents (10–30 is safer than 300 for typical tasks).
Log usage and set budgets; long swarm runs consume tokens quickly.

Step 10: Test everything with Apidog

Every endpoint and body format can be tested visually before coding.

Quick Apidog setup

Download Apidog and create a new project.
Create a kimi-prod environment:
- BASE_URL = https://api.moonshot.ai/v1
- KIMI_API_KEY = sk-...
Add API request: POST {{BASE_URL}}/chat/completions
Set headers:
- Authorization: Bearer {{KIMI_API_KEY}}
- Content-Type: application/json

Example body (streaming):

{
  "model": "kimi-k2.6",
  "messages": [{ "role": "user", "content": "Hello, Kimi K2.6!" }],
  "stream": true
}

Click Send. Tokens stream in real time.

What Apidog adds

Schema validation against the OpenAI chat spec (see missing fields instantly)
Request history and replay
Environment switching (dev, staging, prod)
Team sharing via project export
Mock servers for offline/incident testing
SSE stream support for Kimi format
VS Code extension available

If you’re moving from Postman, see API testing without Postman.

Error handling that won’t fight you

Moonshot uses standard HTTP status codes:

400: Malformed body or wrong model name
401: Auth failure
429: Rate limit/quota exhausted
500: Server error (retry with backoff)
529: Overloaded (retry after delay)

Retry wrapper (Python):

import time
from openai import OpenAI, RateLimitError, APIError

def call_kimi(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="kimi-k2.6",
                messages=messages,
            )
        except RateLimitError:
            time.sleep(2 ** attempt)
        except APIError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(2 ** attempt)
            else:
                raise
    raise RuntimeError("Kimi K2.6 failed after retries")

For mid-stream disconnects, track tokens and restart with "continue from here" if needed. Long streams (up to 98,304 tokens) are normal.

Cost control

Pricing: kimi.com/membership/pricing

Tips:

Cap max_tokens (2,048 is plenty for chat).
Cache system prompts to benefit from prompt caching.
Log prompt_tokens, completion_tokens, total_tokens—pipe to metrics and alerting.

Production pattern: a GitHub-issue fixer

Agent structure for reading a GitHub issue, locating code, proposing a fix, and running tests:

from openai import OpenAI
import os, json

client = OpenAI(
    api_key=os.getenv("KIMI_API_KEY"),
    base_url="https://api.moonshot.ai/v1",
)

tools = [
    {"type": "function", "function": {
        "name": "read_file",
        "description": "Read a file in the repo.",
        "parameters": {
            "type": "object",
            "properties": {"path": {"type": "string"}},
            "required": ["path"]
        }
    }},
    {"type": "function", "function": {
        "name": "search_code",
        "description": "Ripgrep the codebase for a pattern.",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }},
    {"type": "function", "function": {
        "name": "run_tests",
        "description": "Run the project test suite.",
        "parameters": {"type": "object", "properties": {}}
    }},
]

def tool_dispatch(name, args):
    if name == "read_file":
        with open(args["path"]) as f:
            return f.read()
    if name == "search_code":
        return run_ripgrep(args["query"])
    if name == "run_tests":
        return run_pytest()
    raise ValueError(f"Unknown tool: {name}")

messages = [
    {"role": "system", "content": "You are a senior engineer. Fix the described bug."},
    {"role": "user", "content": "Issue: login form submits twice on slow networks."}
]

while True:
    resp = client.chat.completions.create(
        model="kimi-k2.6",
        messages=messages,
        tools=tools,
    )
    msg = resp.choices[0].message
    messages.append(msg)

    if not msg.tool_calls:
        print(msg.content)
        break

    for call in msg.tool_calls:
        result = tool_dispatch(call.function.name, json.loads(call.function.arguments))
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })

Scale to Agent Swarm by adding the extra_body swarm config. Can be combined with human-in-the-loop stacks.

FAQ

Do I need a Moonshot-specific SDK?

No. Use OpenAI Python/Node SDKs after changing base_url.

Is the API rate-limited?

Yes. Limits depend on your tier and usage; check dashboard.

Does Kimi K2.6 work with LangChain, LlamaIndex, Vercel AI SDK?

Yes, if the framework accepts an OpenAI-compatible base URL.

Does Kimi K2.6 support JSON mode?

Yes. Use response_format: {"type": "json_object"} or strict schema.

Context window size?

262,144 input tokens, 98,304 output tokens for reasoning.

Fine-tuning via API?

Not yet. Fine-tune by running open weights on your hardware.

kimi-k2.6 vs kimi-k2.6-thinking?

kimi-k2.6: fast agent. kimi-k2.6-thinking: exposes reasoning steps; tuned for math/logic/planning.

Is there a free tier?

See our Kimi K2.6 free access guide.

Summary

Kimi K2.6 API drops into any OpenAI-compatible toolchain: just change the base URL and API key. You get a 262K context window, Agent Swarm, 96.60% tool invocation, and open-source weights if you want to self-host.

For new integrations, use Apidog to visually construct and validate endpoints. This catches schema errors, streaming bugs, and auth issues before they hit your codebase. Then port working requests into your Python/Node services.

References and further reading

Official announcement: Kimi K2.6 — Moonshot AI blog
API quickstart: platform.kimi.ai
API platform: platform.moonshot.ai
Kimi Code terminal agent: kimi.com/code
Pricing: kimi.com/membership/pricing
Open weights: huggingface.co/moonshotai/Kimi-K2.6
Related Apidog guides: What is Kimi K2.6, Kimi K2.6 for free, Qwen 3.6 free on OpenRouter, Qwen3.5-Omni API, Apidog inside VS Code, API testing without Postman, API testing for 50+ engineers, Claude Code workflows, Cursor Composer 2.

DEV Community