Gophernment

Posted on Jul 1

Harness Engineering 101 — สิ่งที่อยู่ใต้พรมของ Agentic AI

#llm #ai #beginners #tutorial

Harness Engineering 101 — สิ่งที่อยู่ใต้พรมของ Agentic AI

บทความก่อนเราคุยกันเรื่อง "จาก LLM เปล่า → Agentic AI" แบบ 7 layer
คราวนี้มาดูว่าภายในแต่ละ layer มันทำงานยังไง — และอะไรที่พังได้บ้าง

เวลาเราใช้ Claude Code, Cursor, หรือ Hermes — เราเห็น AI ทำงานเป็นขั้นตอน:

คิด → เรียก tool → ดูผล → คิดต่อ → เรียก tool → เสร็จ

สิ่งที่เราไม่เห็นคือ ทุกอย่างที่พังระหว่างทาง — และมีคน (หรือโค้ด) ที่คอยจัดการความพังพวกนั้นอยู่ตลอดเวลา

นั่นแหละคือ Harness Engineering — ศาสตร์แห่งการสร้าง "โครง" ที่ห่อหุ้ม LLM ไว้ แล้วจัดการทุกอย่างให้ agent ทำงานได้จริงในโลกที่ไม่มีอะไร perfect

1. The Loop — หัวใจของ Harness

นี่คือ loop ที่ทุก agent รัน:

while not done and budget_ok:
    response = llm.chat(messages, tools)

    if response.has_tool_calls():
        for tool in response.tool_calls:
            result = execute_tool(tool)
            messages.append(result)
    else:
        return response.text

ดูเผิน ๆ เหมือน while loop ธรรมดา — แต่นี่คือที่ที่ทุกอย่างพังได้

เรื่องจริง: ตอน AI คิดว่า tool call สำเร็จ... แต่จริง ๆ ไม่ได้

ครั้งหนึ่งผมให้ Hermes หาคอนโดในเว็บ LED (เว็บประมูลทรัพย์ของกรมบังคับคดี)

AI วางแผน: เข้าเว็บ → กรอกฟอร์ม → กด submit → อ่านผล

AI เรียก tool browser_click(ref="submit_button") — tool return ว่า "clicked"

AI ดีใจ — "เรียบร้อย! ได้ผลลัพธ์แล้ว" — แล้วพยายามอ่านผลลัพธ์จากหน้าที่ไม่โหลดขึ้นมาจริง

เกิดอะไรขึ้น? Tool return "clicked" แต่หน้าเว็บยัง submit ไม่เสร็จ — JavaScript ยังทำงาน, DOM ยังไม่เปลี่ยน, CAPTCHA ยังไม่ validate

Harness ต้องจัดการ: หลังจาก browser_click ต้องมี browser_snapshot เพื่อยืนยันว่าหน้าเว็บเปลี่ยนจริง — และถ้าหน้าไม่เปลี่ยน ต้อง retry หรือเปลี่ยนกลยุทธ์

นี่คือสิ่งที่ harness ทำ — มันไม่เชื่อ tool call ทันที แต่มัน verify

2. Token Budget — เหมือนคุมเงินในกระเป๋า

ทุกครั้งที่ AI เรียก tool — context window จะยาวขึ้น เพราะต้องเก็บ:

[user message] → [assistant tool_call] → [tool result] → [assistant tool_call] → [tool result] → ...

ถ้า AI ทำงาน 50 รอบ — context อาจยาวถึง 100K+ tokens

ปัญหา:

💰 เสียเงิน — Claude Sonnet $3/M input = ถ้าใช้ 100K tokens ก็ $0.30 ต่อ API call
🧠 LLM หลง — context ยาวเกินไปแล้ว LLM เริ่มลืมคำสั่งแรก
⏱️ ช้า — ยิ่ง context ยาว ยิ่ง infer นาน

Harness ต้องจัดการ:

if token_count > threshold:
    compress_context()  # ตัด tool result เก่า ๆ ออก เหลือแต่ใจความ

เรื่องเล่า: เคยมีครั้งนึง ในทีมที่ดูแล agent ตัวหนึ่ง — AI ทำงานนาน 50+ tool calls — context ยาว 150K tokens — มันเริ่มวน loop: อ่านไฟล์ซ้ำ, แก้แล้วแก้อีก, ลืมว่าตัวเองทำอะไรไปแล้ว — harness ตัดสินใจ compress context อัตโนมัติ — เหลือ 30K tokens — AI กลับมามีสติและทำงานต่อได้ทันที

3. Tool Error Recovery — เมื่อ Tool พัง

Tool call ไม่ได้สำเร็จเสมอไป:

- terminal("git push"): Permission denied
- browser_click("submit"): Page did not change
- web_search("Go 1.27"): CAPTCHA blocked
- read_file("config.yaml"): File not found

AI ต้องรู้ว่า tool ล้มเหลว — และต้องมี กลยุทธ์กู้คืน

เรื่องจริง: CAPTCHA สอนให้รู้จัก retry

Hermes ต้องค้นหาทรัพย์ในเว็บ LED — ครั้งแรกทำตามปกติ: กรอกฟอร์ม → กด submit → CAPTCHA block

Harness pattern ที่ใช้จริง:

Attempt 1: form.submit() → CAPTCHA block ❌
Attempt 2: อ่าน CAPTCHA ก่อน → submit → CAPTCHA เปลี่ยนระหว่าง submit ❌  
Attempt 3: กรอกข้อมูลในฟอร์มทุกช่องแบบเงียบ ๆ — ใช้ JavaScript ใส่ค่าลงใน input field โดยตรง โดยไม่ให้เว็บรู้ว่ากำลังมีคนกรอก (ไม่ trigger `onChange` event เพราะ event พวกนั้นจะไปเรียก AJAX โหลดข้อมูลอำเภอ ซึ่งทำให้ CAPTCHA รีเฟรชก่อน submit) → อ่าน CAPTCHA เป็นขั้นตอนสุดท้าย → กดปุ่ม submit ด้วย `button.click()` แทน `form.submit()` → ✅ สำเร็จ!

Harness ไม่ได้แค่ "retry" แบบโง่ ๆ — มันเปลี่ยนกลยุทธ์ในแต่ละครั้ง:

Attempt	กลยุทธ์	ผล
1	form.submit()	❌ CAPTCHA
2	CAPTCHA ก่อน submit	❌ race condition
3	button.click() + CAPTCHA last	✅

นี่คือ adaptive retry — ไม่ใช่แค่เรียกซ้ำด้วย parameter เดิม

4. Context Compaction — ตัดของที่ไม่จำเป็น

สมมติ AI เรียก read_file("main.go") — ได้โค้ด 500 บรรทัด

ใน loop ถัดไป AI อ่านอีก 3 ไฟล์, รัน test, แก้โค้ด, รัน test อีก — context ยาวขึ้นเรื่อย ๆ

แต่ AI ไม่จำเป็นต้อง "จำ" เนื้อหาทั้ง 500 บรรทัดของ main.go ในรอบที่ 10 — มันอาจต้องรู้แค่ "main.go มี function main ที่เรียก RunServer"

Harness ต้องตัด:

📄 main.go (500 lines) — รอบที่ 1: เก็บหมด
📄 main.go (500 lines) — รอบที่ 10: เก็บเฉพาะ summary "defines main(), calls RunServer()"

เทคนิคนี้เรียกว่า context compaction — harness ใช้ LLM ตัวเล็ก (ถูก) อ่าน tool result เก่า ๆ → สรุป → เก็บเฉพาะ summary

เรื่องจริง: Hermes มี compression.threshold ใน config — default 0.50 (50% ของ context window) — พอ context เกินครึ่ง มันจะ compress อัตโนมัติ — จาก 100K → 20K tokens — ประหยัดเงินไป $0.24 ต่อ API call

5. Role Alternation — กฏที่คนไม่ค่อยรู้

LLM มีกฏตายตัว: ข้อความใน conversation ต้องสลับ role กัน

✅ user → assistant → user → assistant → user
❌ user → assistant → assistant → user    ← พัง!

ฟังดูง่าย — แต่ใน agent loop มันไม่ง่ายเลย:

[user: "สร้าง API"]
[assistant: tool_call read_file]   ← assistant
[tool result: "ไฟล์มี 200 บรรทัด"]  ← tool (ไม่ใช่ user ไม่ใช่ assistant)
[assistant: tool_call write_file]  ← assistant ซ้อน assistant! ❌

Harness ต้องแก้: รวม tool calls ที่ต่อเนื่องกันเข้าเป็น assistant message เดียว

# แทนที่จะส่งทีละ tool call — harness รวบก่อนส่ง
messages = [
    {"role": "user", "content": "สร้าง API"},
    {"role": "assistant", "tool_calls": [read_file, write_file, run_test]},  # รวม 3 calls
    {"role": "tool", "results": [...]},
    {"role": "assistant", "content": "เสร็จแล้วครับ"}
]

เรื่องจริง: มี bug ใน Hermes เวอร์ชันก่อนที่ role alternation พังตอน /stop — AI กำลังเรียก tool แล้ว user กด stop — harness ไม่ได้รวบ tool calls ที่ค้างอยู่ → ส่ง assistant สองครั้งติด → API error — ใช้เวลา debug 3 ชั่วโมงถึงเจอ

6. Tool Schema — แปลงโค้ดเป็น "ภาษา LLM"

เรามี function ใน Python:

def read_file(path: str, offset: int = 1, limit: int = 500) -> dict:
    """Read a text file with line numbers."""

LLM ไม่เข้าใจ Python — harness ต้องแปลงเป็น JSON schema:

{
  "name": "read_file",
  "description": "Read a text file with line numbers.",
  "parameters": {
    "path": {"type": "string", "description": "Path to the file"},
    "offset": {"type": "integer", "default": 1},
    "limit": {"type": "integer", "default": 500}
  }
}

และเมื่อ LLM ตอบกลับมา:

{"name": "read_file", "arguments": {"path": "/home/user/main.go"}}

Harness ต้องแปลงกลับเป็น read_file(path="/home/user/main.go") — แล้วเรียกจริง

ฟังดู trivial — แต่เวลามี 50 tools, แต่ละตัวมี parameter 5-10 ตัว — harness จัดการ schema ทั้งหมดนี้ให้เราโดยที่เราไม่ต้องคิด

7. Interrupts — หยุด AI โดยไม่พัง

ผู้ใช้เปลี่ยนใจกลางทาง: "หยุด! ไม่เอาอันนั้นแล้ว"

Harness ต้อง:

หยุด loop ทันที
ยกเลิก tool calls ที่กำลังรันอยู่ (เช่น terminal ที่รัน build อยู่)
ไม่ทำให้ role alternation พัง
กลับมาพร้อมรับคำสั่งใหม่

ตัวอย่าง: ผู้ใช้สั่ง /stop ตอน AI กำลัง clone repo ขนาดใหญ่ — git clone รันไป 80% แล้ว — harness ต้องฆ่า process, cleanup ไฟล์ที่ clone มาแล้วบางส่วน, แล้วกลับมาพร้อมตอบ — ทั้งหมดนี้ในเวลา < 1 วิ

🧩 Harness = วิศวกรรม ไม่ใช่ AI

สรุป: Harness Engineering คือ วิศวกรรมซอฟต์แวร์ธรรมดา ที่ถูกออกแบบมาเพื่อจัดการกับ LLM ที่ไม่ธรรมดา

ปัญหา	วิธีแก้	ไม่ใช่ AI — คือ Engineering
Context ยาวเกิน	Compaction	จัดการ memory
Tool call พัง	Retry + adaptive strategy	Error handling
Role ซ้ำ	Merge tool calls	Message routing
Schema ไม่ตรง	Auto-generate JSON schema	Serialization
User สั่ง stop	Graceful interrupt	Process management

LLM คือสมอง — Harness คือระบบประสาท กล้ามเนื้อ และภูมิคุ้มกันที่ห่อหุ้มสมองนั้นอยู่

เริ่มสร้างเอง — ขั้นต่ำสุด

อยากลองสร้าง harness ของตัวเอง?

# harness ขั้นต่ำ — 30 บรรทัด
def run_agent(user_input, tools, max_rounds=10):
    messages = [{"role": "user", "content": user_input}]

    for _ in range(max_rounds):
        response = llm.chat(messages, tools)

        if not response.has_tool_calls():
            return response.text  # ✅ จบ

        # รวบทุก tool call เป็น message เดียว (role alternation)
        tool_results = []
        for tc in response.tool_calls:
            result = execute(tc.name, tc.args)
            tool_results.append(result)

        messages.append({"role": "assistant", "tool_calls": response.tool_calls})
        messages.append({"role": "tool", "results": tool_results})

        # Token budget — compress ถ้าเกิน 80%
        if count_tokens(messages) > 80000:
            messages = compress(messages)

    return "Max rounds exceeded"

จาก 30 บรรทัดนี้ — คุณจะเจอปัญหาเดียวกับที่ Kiro, Antigravity, Hermes เจอ — และนั่นคือจุดเริ่มต้นของ Harness Engineering

📚 อ่านต่อ:

บทความก่อน: จาก chatbot สู่ Agentic AI

Anthropic: Building effective agents

DEV Community

Harness Engineering 101 — สิ่งที่อยู่ใต้พรมของ Agentic AI

Harness Engineering 101 — สิ่งที่อยู่ใต้พรมของ Agentic AI

1. The Loop — หัวใจของ Harness

เรื่องจริง: ตอน AI คิดว่า tool call สำเร็จ... แต่จริง ๆ ไม่ได้

2. Token Budget — เหมือนคุมเงินในกระเป๋า

3. Tool Error Recovery — เมื่อ Tool พัง

เรื่องจริง: CAPTCHA สอนให้รู้จัก retry

4. Context Compaction — ตัดของที่ไม่จำเป็น

5. Role Alternation — กฏที่คนไม่ค่อยรู้

6. Tool Schema — แปลงโค้ดเป็น "ภาษา LLM"

7. Interrupts — หยุด AI โดยไม่พัง

🧩 Harness = วิศวกรรม ไม่ใช่ AI

เริ่มสร้างเอง — ขั้นต่ำสุด

Top comments (0)