DEV Community

Cover image for How to Use the Kimi K2.6 API?
Preecha
Preecha

Posted on

How to Use the Kimi K2.6 API?

Moonshot AI’s Kimi K2.6 announcement positions it as the new open-source state of the art for coding, long-horizon execution, and agent swarms. The API is OpenAI-compatible, hosted at https://api.moonshot.ai/v1, and documented on platform. If you have the OpenAI SDK installed, you can be sending real requests in about five minutes.

Try Apidog today

This guide covers authentication, your first request, streaming, tool calling, vision and video input, thinking mode, using Agent Swarm with 300 sub-agents, and how to test every endpoint with Apidog before writing integration code.

💡 Fast path: Test the Kimi K2.6 API visually in Apidog before committing any integration code. One import, one Bearer token, and you’re making real streamed requests with full history and schema validation. Download Apidog free.

Kimi API Visual Test in Apidog

TL;DR: Kimi K2.6 API in 60 seconds

  • Base URL: https://api.moonshot.ai/v1
  • Endpoint: POST /chat/completions
  • Model IDs: kimi-k2.6, kimi-k2.6-thinking
  • Auth: Authorization: Bearer $KIMI_API_KEY
  • Format: OpenAI chat completions schema (messages, tools, stream, etc.)
  • Context: 262,144 input tokens, up to 98,304 output tokens for reasoning
  • Defaults: temperature: 1.0, top-p: 1.0

Minimal curl example:

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KIMI_API_KEY" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [{"role": "user", "content": "Write a Python function that reverses a string."}]
  }'
Enter fullscreen mode Exit fullscreen mode

The rest of this guide details advanced usage, including Agent Swarm and the 4,000-step execution cap.


What you can actually do with this API

  • Run coding agents on tasks for 12+ hours (see Qwen3.5-0.8B Mac inference demo: 4,000+ tool calls, 15→193 tokens/sec throughput).
  • Manage infrastructure autonomously over multi-day sessions with incident response.
  • Achieve long-horizon reliability across Rust, Go, Python, and Zig.
  • Orchestrate agent swarms up to 300 sub-agents and 4,000+ coordinated steps.
  • Generate full-stack apps (auth, database, transactions) from a single prompt.
  • Build vision + Python pipelines (e.g., MathVision with Python: 93.2%).

If you’re building tools like Claude Code, Cursor Composer 2, or similar, K2.6 API is a direct model-layer swap.


Step 1: Get an API key

  1. Go to platform.moonshot.ai or platform.kimi.ai and sign up (email or Google OAuth).
  2. Verify your account (international users may need SMS).
  3. Add billing (new accounts usually get a small free balance).
  4. Go to API Keys, click Create Key, and copy it immediately (it’s shown once).
  5. Export your key:

    export KIMI_API_KEY="sk-..."
    

    Add to .zshrc, .bashrc, or a secret manager. Never commit it.

For cost-free development options, see How to Use Kimi K2.6 for Free.


Step 2: Pick your SDK

The API is OpenAI-compatible. Use official OpenAI SDKs—just change the base URL.

Option Install Best for
curl built-in Quick tests, CI
OpenAI Python pip install openai Python services
OpenAI Node npm install openai JS/TS apps

Python example:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("KIMI_API_KEY"),
    base_url="https://api.moonshot.ai/v1",
)

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Node.js example:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.KIMI_API_KEY,
  baseURL: "https://api.moonshot.ai/v1",
});

const response = await client.chat.completions.create({
  model: "kimi-k2.6",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

curl example:

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KIMI_API_KEY" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'
Enter fullscreen mode Exit fullscreen mode

All three return the same response format.


Step 3: Understand the request body

The schema matches OpenAI chat completions:

{
  "model": "kimi-k2.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Your prompt here." }
  ],
  "temperature": 1.0,
  "top_p": 1.0,
  "max_tokens": 8192,
  "stream": false,
  "tools": [],
  "tool_choice": "auto",
  "thinking": { "type": "disabled" }
}
Enter fullscreen mode Exit fullscreen mode

Moonshot-specific notes:

  • Defaults are high (temperature: 1.0, top_p: 1.0). Don't use OpenAI's low-temp habits for code generation here.
  • thinking toggles the reasoning trace on kimi-k2.6-thinking. Use {"type": "disabled"} for fast answers.

Step 4: Streaming

For UI or long generations, always use streaming. Max output can reach 98,304 tokens.

Python streaming:

stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "Write a 500-word essay on MoE models."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Node.js streaming:

const stream = await client.chat.completions.create({
  model: "kimi-k2.6",
  messages: [{ role: "user", content: "Write a 500-word essay on MoE models." }],
  stream: true,
});
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}
Enter fullscreen mode Exit fullscreen mode

Streaming works with tool calls; arguments arrive as JSON deltas.


Step 5: Tool calling

Kimi K2.6 uses the OpenAI function-calling format. Toolathlon score: 50.0%, 96.60% invocation success in partner testing.

Define tools:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

First call (model decides):

import json

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
resp = client.chat.completions.create(
    model="kimi-k2.6",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
msg = resp.choices[0].message
messages.append(msg)

if msg.tool_calls:
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        result = fetch_weather(args["location"], args.get("unit", "celsius"))
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result),
        })
Enter fullscreen mode Exit fullscreen mode

Second call (final answer):

final = client.chat.completions.create(
    model="kimi-k2.6",
    messages=messages,
    tools=tools,
)
print(final.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

K2.6 supports multi-step tool chains, enabling long-running agents. For other frameworks, see Claude Code workflows.


Step 6: Vision input

K2.6 supports images in user messages via OpenAI’s image_url format.

Example:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in one sentence."},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
            ]
        }
    ],
)
Enter fullscreen mode Exit fullscreen mode

Local files (base64):

import base64
with open("photo.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode("utf-8")
image_url = f"data:image/jpeg;base64,{b64}"
Enter fullscreen mode Exit fullscreen mode

For OCR/diagrams, combine text instructions with the image. For math, use a Python interpreter tool.


Step 7: Video input

Pass a video URL or frame sequence:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize what happens in this video."},
                {"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
            ]
        }
    ],
)
Enter fullscreen mode Exit fullscreen mode

Short clips (<30s) work in a single call. For longer video, use streaming.


Step 8: Thinking mode

kimi-k2.6-thinking produces a visible reasoning trace.

Thinking on (default for thinking model):

response = client.chat.completions.create(
    model="kimi-k2.6-thinking",
    messages=[{"role": "user", "content": "Prove sqrt(2) is irrational."}],
)
Enter fullscreen mode Exit fullscreen mode

Thinking off:

response = client.chat.completions.create(
    model="kimi-k2.6-thinking",
    messages=[{"role": "user", "content": "Quick: what's 17 * 23?"}],
    extra_body={"thinking": {"type": "disabled"}},
)
Enter fullscreen mode Exit fullscreen mode

Reasoning trace returns in a reasoning field. Hide it from end users as needed.


Step 9: Agent Swarm

Agent Swarm supports up to 300 sub-agents and 4,000+ steps.

Invoke via agent parameter:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{
        "role": "user",
        "content": "Build a 5-page marketing site for a coffee brand with responsive design and a newsletter signup."
    }],
    extra_body={
        "agent": {
            "type": "swarm",
            "max_agents": 30,
            "max_steps": 4000
        }
    },
)
Enter fullscreen mode Exit fullscreen mode

Production tips:

  • Use streaming to monitor and kill long runs as needed.
  • Cap max_agents (10–30 is safer than 300 for typical tasks).
  • Log usage and set budgets; long swarm runs consume tokens quickly.

Step 10: Test everything with Apidog

Every endpoint and body format can be tested visually before coding.

Apidog Kimi API Testing

Quick Apidog setup

  1. Download Apidog and create a new project.
  2. Create a kimi-prod environment:
    • BASE_URL = https://api.moonshot.ai/v1
    • KIMI_API_KEY = sk-...
  3. Add API request: POST {{BASE_URL}}/chat/completions
  4. Set headers:
    • Authorization: Bearer {{KIMI_API_KEY}}
    • Content-Type: application/json
  5. Example body (streaming):

    {
      "model": "kimi-k2.6",
      "messages": [{ "role": "user", "content": "Hello, Kimi K2.6!" }],
      "stream": true
    }
    
  6. Click Send. Tokens stream in real time.

What Apidog adds

  • Schema validation against the OpenAI chat spec (see missing fields instantly)
  • Request history and replay
  • Environment switching (dev, staging, prod)
  • Team sharing via project export
  • Mock servers for offline/incident testing
  • SSE stream support for Kimi format
  • VS Code extension available

If you’re moving from Postman, see API testing without Postman.


Error handling that won’t fight you

Moonshot uses standard HTTP status codes:

  • 400: Malformed body or wrong model name
  • 401: Auth failure
  • 429: Rate limit/quota exhausted
  • 500: Server error (retry with backoff)
  • 529: Overloaded (retry after delay)

Retry wrapper (Python):

import time
from openai import OpenAI, RateLimitError, APIError

def call_kimi(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="kimi-k2.6",
                messages=messages,
            )
        except RateLimitError:
            time.sleep(2 ** attempt)
        except APIError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(2 ** attempt)
            else:
                raise
    raise RuntimeError("Kimi K2.6 failed after retries")
Enter fullscreen mode Exit fullscreen mode

For mid-stream disconnects, track tokens and restart with "continue from here" if needed. Long streams (up to 98,304 tokens) are normal.


Cost control

Pricing: kimi.com/membership/pricing

Tips:

  • Cap max_tokens (2,048 is plenty for chat).
  • Cache system prompts to benefit from prompt caching.
  • Log prompt_tokens, completion_tokens, total_tokens—pipe to metrics and alerting.

Production pattern: a GitHub-issue fixer

Agent structure for reading a GitHub issue, locating code, proposing a fix, and running tests:

from openai import OpenAI
import os, json

client = OpenAI(
    api_key=os.getenv("KIMI_API_KEY"),
    base_url="https://api.moonshot.ai/v1",
)

tools = [
    {"type": "function", "function": {
        "name": "read_file",
        "description": "Read a file in the repo.",
        "parameters": {
            "type": "object",
            "properties": {"path": {"type": "string"}},
            "required": ["path"]
        }
    }},
    {"type": "function", "function": {
        "name": "search_code",
        "description": "Ripgrep the codebase for a pattern.",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }},
    {"type": "function", "function": {
        "name": "run_tests",
        "description": "Run the project test suite.",
        "parameters": {"type": "object", "properties": {}}
    }},
]

def tool_dispatch(name, args):
    if name == "read_file":
        with open(args["path"]) as f:
            return f.read()
    if name == "search_code":
        return run_ripgrep(args["query"])
    if name == "run_tests":
        return run_pytest()
    raise ValueError(f"Unknown tool: {name}")

messages = [
    {"role": "system", "content": "You are a senior engineer. Fix the described bug."},
    {"role": "user", "content": "Issue: login form submits twice on slow networks."}
]

while True:
    resp = client.chat.completions.create(
        model="kimi-k2.6",
        messages=messages,
        tools=tools,
    )
    msg = resp.choices[0].message
    messages.append(msg)

    if not msg.tool_calls:
        print(msg.content)
        break

    for call in msg.tool_calls:
        result = tool_dispatch(call.function.name, json.loads(call.function.arguments))
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })
Enter fullscreen mode Exit fullscreen mode

Scale to Agent Swarm by adding the extra_body swarm config. Can be combined with human-in-the-loop stacks.


FAQ

Do I need a Moonshot-specific SDK?

No. Use OpenAI Python/Node SDKs after changing base_url.

Is the API rate-limited?

Yes. Limits depend on your tier and usage; check dashboard.

Does Kimi K2.6 work with LangChain, LlamaIndex, Vercel AI SDK?

Yes, if the framework accepts an OpenAI-compatible base URL.

Does Kimi K2.6 support JSON mode?

Yes. Use response_format: {"type": "json_object"} or strict schema.

Context window size?

262,144 input tokens, 98,304 output tokens for reasoning.

Fine-tuning via API?

Not yet. Fine-tune by running open weights on your hardware.

kimi-k2.6 vs kimi-k2.6-thinking?

kimi-k2.6: fast agent. kimi-k2.6-thinking: exposes reasoning steps; tuned for math/logic/planning.

Is there a free tier?

See our Kimi K2.6 free access guide.


Summary

Kimi K2.6 API drops into any OpenAI-compatible toolchain: just change the base URL and API key. You get a 262K context window, Agent Swarm, 96.60% tool invocation, and open-source weights if you want to self-host.

For new integrations, use Apidog to visually construct and validate endpoints. This catches schema errors, streaming bugs, and auth issues before they hit your codebase. Then port working requests into your Python/Node services.


References and further reading

Top comments (0)