Thanawat Wongchai

Posted on Apr 8 • Originally published at apidog.com

วิธีใช้ GLM-5.1 API: คู่มือฉบับสมบูรณ์พร้อมตัวอย่างโค้ด

สรุป (TL;DR)

GLM-5.1 สามารถใช้งานผ่าน BigModel API ที่ https://open.bigmodel.cn/api/paas/v4/ ซึ่ง API นี้เข้ากันได้กับ OpenAI: มี endpoint, request format, และ streaming pattern เหมือนกัน คุณต้องมีบัญชี BigModel, API Key, และใช้ชื่อโมเดล glm-5.1 คู่มือนี้จะสอนวิธีตั้งค่าการยืนยันตัวตน, ส่งคำขอแรก, การสตรีม, การเรียกใช้เครื่องมือ (tool calling) และวิธีทดสอบ integration ของคุณด้วย Apidog

ทดลองใช้ Apidog วันนี้

บทนำ

GLM-5.1 เป็นโมเดล AI ตัวแทน (agentic model) เรือธงของ Z.AI เปิดตัวเมษายน 2026 ได้อันดับ 1 SWE-Bench Pro และนำ GLM-5 ในเกณฑ์มาตรฐานโค้ดหลัก ถ้าคุณกำลังสร้าง AI code assistant, agent อัตโนมัติ หรือแอปใดๆ ที่ต้อง long-horizon task execution, GLM-5.1 คือทางเลือกที่ควรรวมเข้าด้วย

ข่าวดีสำหรับ dev: API เข้ากันได้กับ OpenAI ถ้าเคยใช้ GPT-4 หรือ Claude มาก่อน เปลี่ยน Base URL กับชื่อโมเดล ก็ย้ายมา GLM-5.1 ได้ทันที ไม่ต้องเรียนรู้ SDK ใหม่หรือจัดการ response format ใหม่

💡 ความท้าทายหลักของ Agentic API คือการทดสอบ Agent ที่ต้องเรียกใช้เครื่องมือหลายร้อยครั้งในหลายๆ นาที ซึ่งถ้าทดสอบกับ API จริงจะเปลืองโควต้า Apidog Test Scenarios แก้ปัญหานี้: คุณสามารถ mock ลำดับ request ทั้งหมด จำลอง response แต่ละ state และตรวจสอบว่า integration ของคุณ handle streaming, tool calling และ error ได้ถูกต้องก่อน production

ข้อกำหนดเบื้องต้น

ก่อนเริ่มใช้งาน คุณต้องมี:

บัญชี BigModel สมัครฟรีที่ bigmodel.cn
API Key จาก BigModel console (API Keys)
Python 3.8+ หรือ Node.js 18+
OpenAI SDK หรือใช้ requests/fetch มาตรฐาน

ตั้งค่า API Key ใน environment variable:

export BIGMODEL_API_KEY="your_api_key_here"

อย่า hardcode API Key ใน source code

การยืนยันตัวตน

ทุกคำขอต้องมี Bearer token:

Authorization: Bearer YOUR_API_KEY

API Key มีรูปแบบ xxxxxxxx.xxxxxxxxxxxxxxxx (สองส่วนคั่นด้วยจุด) ใช้ใน header แบบเดียวกับของ OpenAI

Base URL

https://open.bigmodel.cn/api/paas/v4/

Chat completions endpoint:

POST https://open.bigmodel.cn/api/paas/v4/chat/completions

คำขอแรกของคุณ

การใช้ curl

curl https://open.bigmodel.cn/api/paas/v4/chat/completions \
  -H "Authorization: Bearer $BIGMODEL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
      }
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

การใช้ Python (requests)

import os
import requests

api_key = os.environ["BIGMODEL_API_KEY"]

response = requests.post(
    "https://open.bigmodel.cn/api/paas/v4/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "glm-5.1",
        "messages": [
            {
                "role": "user",
                "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
            }
        ],
        "max_tokens": 1024,
        "temperature": 0.7
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

การใช้ OpenAI SDK (แนะนำ)

ใช้ OpenAI Python SDK ได้เลยโดยเปลี่ยน base_url:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BIGMODEL_API_KEY"],
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
        }
    ],
    max_tokens=1024,
    temperature=0.7
)

print(response.choices[0].message.content)

OpenAI SDK จะช่วยจัดการ retries, timeout, และ response parsing ให้โดยอัตโนมัติ

รูปแบบการตอบกลับ

Response format ตรงกับ OpenAI:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1744000000,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def sieve_of_eratosthenes(n):\n    ..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 215,
    "total_tokens": 247
  }
}

เข้าถึงข้อความตอบกลับผ่าน result["choices"][0]["message"]["content"]

usage บอกจำนวน token ที่ใช้ — ติดตามเพื่อตรวจสอบโควต้าของคุณ (GLM-5.1 จะคิดโควต้า 3 เท่าในช่วง peak 14:00-18:00 UTC+8)

การตอบกลับแบบสตรีมมิ่ง

สำหรับงานสร้างโค้ดยาวๆ ใช้ streaming mode จะได้ token ทีละส่วนทันที:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BIGMODEL_API_KEY"],
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

stream = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {
            "role": "user",
            "content": "Explain how a B-tree index works in a database, with a code example."
        }
    ],
    stream=True,
    max_tokens=2048
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()

ในแต่ละ chunk จะมีเฉพาะ token ใหม่ ส่วนสุดท้ายจะมี finish_reason เป็น "stop" หรือ "length"

สตรีมด้วย raw requests

ถ้าไม่ใช้ OpenAI SDK:

import os
import json
import requests

api_key = os.environ["BIGMODEL_API_KEY"]

response = requests.post(
    "https://open.bigmodel.cn/api/paas/v4/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "glm-5.1",
        "messages": [{"role": "user", "content": "Write a merge sort in Python."}],
        "stream": True,
        "max_tokens": 1024
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            data = line[6:]
            if data == "[DONE]":
                break
            chunk = json.loads(data)
            delta = chunk["choices"][0]["delta"]
            if "content" in delta:
                print(delta["content"], end="", flush=True)

การเรียกใช้เครื่องมือ (Tool Calling)

GLM-5.1 รองรับ tool calling (function calling) สำหรับ agent workflow ที่ต้องรันโค้ด, ค้นหาข้อมูล, หรือเรียก API ภายนอก

การกำหนดเครื่องมือ

import os
import json
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BIGMODEL_API_KEY"],
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return the output. Use this to test, profile, or benchmark code.",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "The Python code to execute"
                    }
                },
                "required": ["code"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "File path to read"
                    }
                },
                "required": ["path"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {
            "role": "user",
            "content": "Write a function to compute Fibonacci numbers, test it for n=10, and show me the output."
        }
    ],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message
print(f"Finish reason: {response.choices[0].finish_reason}")

if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"\nTool called: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

การจัดการ response การเรียกใช้เครื่องมือ

เมื่อโมเดลร้องขอ tool call ให้คุณ execute แล้วส่งผลลัพธ์กลับไปใน message ถัดไป:

import subprocess

def execute_tool(tool_call):
    name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if name == "run_python":
        result = subprocess.run(
            ["python3", "-c", args["code"]],
            capture_output=True,
            text=True,
            timeout=10
        )
        return result.stdout or result.stderr

    elif name == "read_file":
        try:
            with open(args["path"]) as f:
                return f.read()
        except FileNotFoundError:
            return f"Error: file {args['path']} not found"

    return f"Unknown tool: {name}"


def run_agent_loop(user_message, tools, max_iterations=20):
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="glm-5.1",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_tokens=4096
        )

        message = response.choices[0].message
        messages.append(message.model_dump())

        if response.choices[0].finish_reason == "stop":
            return message.content

        if response.choices[0].finish_reason == "tool_calls":
            for tool_call in message.tool_calls:
                tool_result = execute_tool(tool_call)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": tool_result
                })

    return "Max iterations reached"


result = run_agent_loop(
    "Write a quicksort implementation, test it with a random list of 1000 integers, and report the time.",
    tools
)
print(result)

ด้วย pattern นี้ agent จะวนลูปเรียกเครื่องมือ/รันโค้ด/ตอบกลับอัตโนมัติจนเสร็จ

พารามิเตอร์สำคัญ

พารามิเตอร์	ประเภท	ค่าเริ่มต้น	คำอธิบาย
`model`	string	required	ใช้ `"glm-5.1"`
`messages`	array	required	ประวัติการสนทนา
`max_tokens`	integer	1024	โทเค็นสูงสุด (สูงสุด 163,840)
`temperature`	float	0.95	ความสุ่ม (0.0-1.0)
`top_p`	float	0.7	Nucleus sampling, แนะนำ 0.7 สำหรับ code
`stream`	boolean	false	เปิด streaming
`tools`	array	null	ฟังก์ชันสำหรับ tool calling
`tool_choice`	string/object	"auto"	`"auto"`, `"none"` หรือระบุเครื่องมือโดยตรง
`stop`	string/array	null	custom stop sequence

การตั้งค่าแนะนำสำหรับงานเขียนโค้ด:

{
    "model": "glm-5.1",
    "temperature": 1.0,
    "top_p": 0.95,
    "max_tokens": 163840  # context ยาวสำหรับ agentic run
}

Z.AI ใช้ค่าเหล่านี้สำหรับ benchmark ตัวเอง ถ้าต้องการ output ที่ deterministic ลด temperature ลงเหลือ 0.2-0.4

การใช้ GLM-5.1 กับผู้ช่วยเขียนโค้ด

Z.AI Coding Plan ให้คุณเปลี่ยนเส้นทางผู้ช่วยเขียนโค้ดอย่าง Claude Code, Cline, Kilo Code มายัง GLM-5.1 ผ่าน BigModel API ได้ ง่ายต่อการใช้งานและประหยัดกว่า Claude Opus หรือ GPT-5.4

การตั้งค่า Claude Code

ในไฟล์ config Claude Code (~/.claude/settings.json):

{
  "model": "glm-5.1",
  "baseURL": "https://open.bigmodel.cn/api/paas/v4/",
  "apiKey": "your_bigmodel_api_key"
}

การตั้งค่า Cline / Roo Code

ใน VS Code settings หรือ config ส่วนขยาย Cline:

{
  "cline.apiProvider": "openai",
  "cline.openAIBaseURL": "https://open.bigmodel.cn/api/paas/v4/",
  "cline.openAIApiKey": "your_bigmodel_api_key",
  "cline.openAIModelId": "glm-5.1"
}

การใช้โควต้า

GLM-5.1 ใช้ระบบโควต้าของ Z.AI:

ช่วง peak (14:00-18:00 UTC+8): โควต้า 3 เท่า/คำขอ
นอกเวลา peak: โควต้า 2 เท่า/คำขอ
โปรโมชันถึงเมษายน 2026: 1 เท่าในช่วงนอก peak

งาน agentic ที่หนักควรตั้งเวลาให้อยู่ในช่วง off-peak เพื่อลดโควต้าที่ใช้

การทดสอบ GLM-5.1 API ด้วย Apidog

การทดสอบ integration agentic API ต้อง handle หลาย response type: ปกติ, streaming, tool calling, tool result, error ถ้าทดสอบกับ API จริงจะเปลืองโควต้าและต้องต่อเน็ต

Smart Mock ของ Apidog ให้คุณ mock ทุก state ได้ — ไม่เปลืองโควต้า

การตั้งค่า endpoint Mock

ใน Apidog สร้าง endpoint: POST https://open.bigmodel.cn/api/paas/v4/chat/completions
เพิ่ม Mock Expectation สำหรับ success:

{
  "id": "chatcmpl-test123",
  "object": "chat.completion",
  "created": 1744000000,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "def sieve(n): ..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 120,
    "total_tokens": 152
  }
}

เพิ่ม Expectation สำหรับ tool calling:

{
  "id": "chatcmpl-tool456",
  "object": "chat.completion",
  "created": 1744000001,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc",
            "type": "function",
            "function": {
              "name": "run_python",
              "arguments": "{\"code\": \"print(2+2)\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 35,
    "total_tokens": 83
  }
}

เพิ่ม response 429 (rate limit):

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

การทดสอบ Agent Loop เต็มรูปแบบ

ใช้ Test Scenarios ของ Apidog สร้าง workflow หลาย step เช่น

Step 1: POST /chat/completions ด้วย user message, expect 200 + finish_reason == "tool_calls"
Step 2: POST อีกครั้งพร้อมผลลัพธ์เครื่องมือใน messages, expect 200 + finish_reason == "stop"
Step 3: ดึง content สุดท้ายและ assert ว่าเป็นโค้ดที่ถูกต้อง

Test Scenarios รองรับการส่งค่าระหว่าง step เช่น request_id หรือ tool_call_id สะท้อน Agent Loop จริงและช่วยจับ bug integration ก่อน production

การจัดการข้อผิดพลาด

API ส่ง HTTP status code มาตรฐาน:

สถานะ	ความหมาย	การดำเนินการ
200	สำเร็จ	ประมวลผล response ตามปกติ
400	คำขอไม่ถูกต้อง	ตรวจสอบ request format
401	ไม่ได้รับอนุญาต	ตรวจสอบ API key
429	อัตราการจำกัด	รอและ retry หลังจากค่า `Retry-After` header
500	Server error	Retry ด้วย exponential backoff
503	บริการไม่พร้อมใช้	Retry เช่นเดียวกับ 500

ตัวอย่าง retry logic:

import time
import requests

def call_with_retry(payload, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://open.bigmodel.cn/api/paas/v4/chat/completions",
                headers={"Authorization": f"Bearer {os.environ['BIGMODEL_API_KEY']}",
                         "Content-Type": "application/json"},
                json=payload,
                timeout=120
            )

            if response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                print(f"Rate limited. Waiting {retry_after}s...")
                time.sleep(retry_after)
                continue

            response.raise_for_status()
            return response.json()

        except requests.exceptions.Timeout:
            wait = 2 ** attempt
            print(f"Timeout on attempt {attempt + 1}. Retrying in {wait}s...")
            time.sleep(wait)

    raise Exception("Max retries exceeded")

สำหรับงาน agentic ที่แต่ละ step ใช้ 30-60 วินาที แนะนำตั้ง timeout 120-300 วินาที

สรุป

GLM-5.1 API ที่เข้ากันได้กับ OpenAI ช่วยให้คุณต่อเข้ากับระบบที่มีเดิมในไม่กี่นาที ต่างแค่ endpoint (open.bigmodel.cn) และระบบโควต้าแทน per-token billing

สำหรับ agentic application ที่ต้องเรียกใช้เครื่องมือจำนวนมากใน session เดียว จุดเด่นของ GLM-5.1 คือ long-horizon optimization ผสานกับ Apidog Smart Mock และ Test Scenarios เพื่อรับรองว่า integration ของคุณ handle ทุก edge case ก่อน production

ดูข้อมูลเบื้องลึก GLM-5.1 และ benchmark ได้ที่ ภาพรวมโมเดล GLM-5.1 และเรียนรู้เพิ่มเกี่ยวกับ agentic AI workflow test ด้วย Apidog ที่ วิธีการทำงานของหน่วยความจำ Agent AI

คำถามที่พบบ่อย

GLM-5.1 API เข้ากันได้กับ OpenAI หรือไม่?

ใช่ ทั้ง request, response, streaming, tool calling format เหมือนกับ OpenAI chat completions API ใช้ OpenAI Python SDK ได้เลยโดยเปลี่ยน Base URL เป็น https://open.bigmodel.cn/api/paas/v4/

ชื่อโมเดลที่ใช้ใน API request คืออะไร?

ใช้ "glm-5.1" ไม่ต้องใส่เวอร์ชันเต็ม

Pricing ของ GLM-5.1 API เป็นอย่างไร?

BigModel API ใช้ระบบโควต้า GLM-5.1 คิดโควต้า 3 เท่าในช่วง peak (14:00-18:00 UTC+8), 2 เท่าในช่วง off-peak และโปรโมชันถึงเมษายน 2026 คิด 1 เท่าในช่วง off-peak

ความยาว context สูงสุดเท่าไหร่?

context input 200,000 tokens, output สูงสุด 163,840 tokens ใช้ max_tokens ให้สูง (32,768+) ถ้าต้อง agentic run ยาว

GLM-5.1 รองรับ tool calling หรือไม่?

รองรับ กำหนดเครื่องมือใน array tools (type: "function") แล้ว handle finish_reason: "tool_calls" ใน agent loop

ทดสอบ GLM-5.1 API โดยไม่ใช้โควต้าได้อย่างไร?

ใช้ Smart Mock ของ Apidog ตั้ง mock response สำหรับแต่ละ state (success, tool call, rate limit, error) แล้วรันทดสอบกับ mock จนกว่าจะพร้อม production

หา weights ของ GLM-5.1 ได้ที่ไหน?

weights แบบ open-source ที่ HuggingFace: zai-org/GLM-5.1 (MIT License) รองรับ vLLM, SGLang inference local

DEV Community