Thanawat Wongchai

Posted on May 12 • Originally published at apidog.com

วิธีติดตามค่าใช้จ่าย OpenAI API แยกตามฟีเจอร์: คู่มือการจัดสรรต้นทุน

ใบแจ้งหนี้ OpenAI ของคุณอาจบอกว่าเดือนที่แล้วใช้ไป 4,237 ดอลลาร์ แต่ไม่ได้บอกว่า 3,100 ดอลลาร์มาจาก endpoint สรุปข้อมูลที่ทำงานผิดพลาด, 700 ดอลลาร์มาจากลูกค้าที่จ่ายเพียง 50 ดอลลาร์/เดือน และอีก 437 ดอลลาร์มาจากฟีเจอร์ที่ไม่มีใครใช้ แดชบอร์ดการเรียกเก็บเงินให้ยอดรวม แต่ไม่ให้บริบทที่ทีมวิศวกรรมต้องใช้เพื่อตัดสินใจเรื่องราคา ความจุ และ roadmap

ลองใช้ Apidog วันนี้

คู่มือนี้จะแสดงวิธีระบุแหล่งที่มาของค่าใช้จ่าย OpenAI API อย่างเป็นระบบ: แท็กทุกคำขอด้วย metadata, บันทึก token และ cost ต่อ request, รวมค่าใช้จ่ายตาม feature/route/customer, ตั้ง budget limit ต่อ key และทดสอบ wrapper ก่อน deploy จริง เพื่อเปลี่ยนค่าใช้จ่าย AI จาก “ตัวเลขลึกลับ” ให้เป็นต้นทุนผลิตภัณฑ์ที่จัดการได้

💡 Apidog ช่วยให้คุณทดสอบ request-level visibility และ scenario testing สำหรับตรวจสอบว่า cost-tracking wrapper ทำงานถูกต้องก่อนนำไปใช้จริง ใช้ Apidog เพื่อ replay request ที่มี tag, ตรวจรูปแบบ log และยืนยันว่าทุก API call ส่ง metadata ที่ data warehouse ต้องใช้

สรุป (TL;DR)

ให้แท็ก OpenAI API call ทุกครั้งด้วย metadata ที่มีโครงสร้าง เช่น feature, route, customer_id, environment จากนั้นเขียน structured log ต่อ request ที่มี token usage และ cost ที่คำนวณแล้ว ส่ง log เข้า data warehouse แล้ว aggregate ตาม tag

สิ่งที่ควรทำทันที:

สร้าง wrapper กลางสำหรับทุก OpenAI call
บังคับให้ทุก call ส่ง feature, route, customer_id, environment
อ่าน response.usage แล้วคำนวณ cost_usd
ส่ง JSON log เข้า BigQuery, ClickHouse, Snowflake หรือ Postgres
ตั้ง budget limit ต่อ project key ใน OpenAI dashboard
สร้าง alert จาก warehouse เช่น ค่าใช้จ่ายต่อ feature เกินค่าเฉลี่ย 3 เท่า
ทดสอบ end-to-end ด้วย Apidog ก่อนเชื่อข้อมูลบน dashboard

บทนำ

คุณ deploy ฟีเจอร์ AI ใหม่เมื่อวันอังคาร พอถึงวันศุกร์ CFO ถามว่าทำไมค่า OpenAI เพิ่มขึ้น 40% คุณเปิด dashboard แล้วเห็นเพียงค่าใช้จ่ายรวมเพิ่มขึ้น แต่ไม่รู้ว่าเกิดจากฟีเจอร์ไหน ลูกค้าคนใด หรือ route ใด สุดท้ายทีมต้องเดา

นี่คือปัญหาปกติของทีมที่ใช้ LLM ใน production หน้า billing ของ OpenAI เหมาะกับฝ่ายบัญชี แต่ไม่พอสำหรับ engineering attribution คุณเห็นยอดรวมรายวันและการแจกแจงตาม model แต่ไม่เห็น request pattern, customer, route หรือ feature ที่สร้างค่าใช้จ่าย

แนวทางแก้คือสร้าง attribution layer เอง:

ห่อทุก API call ด้วย wrapper เดียว
แนบ metadata ทุกครั้ง
บันทึก token usage และ cost ต่อ request
ส่ง event เข้า warehouse
aggregate ด้วย SQL
ตั้ง alert และ budget guardrail

สำหรับบริบทด้านราคา โปรดดู การแจกแจงราคา GPT-5.5 สำหรับประเด็น billing attribution ฝั่ง developer tooling โปรดดู การเรียกเก็บเงินการใช้งาน GitHub Copilot สำหรับทีม API และสำหรับ OpenAI API โดยตรง โปรดดู เอกสารอ้างอิง OpenAI API อย่างเป็นทางการ

ทำไมแดชบอร์ดการเรียกเก็บเงินของ OpenAI จึงไม่เพียงพอ

OpenAI billing dashboard แสดง daily spend, model breakdown และ usage limit ซึ่งเพียงพอถ้าคุณมี application เดียว ลูกค้าคนเดียว และ feature เดียว แต่เมื่อมีหลาย feature, หลาย customer, หลาย environment หรือหลายทีม ข้อมูลนี้ไม่พอสำหรับการตัดสินใจเชิงผลิตภัณฑ์

สิ่งที่ขาดคือ:

1. ค่าใช้จ่ายรวมที่ไม่มีบริบท

Dashboard อาจบอกว่าเมื่อวานใช้ 312 ดอลลาร์กับ GPT-5.5 แต่ไม่บอกว่าเกิดจาก:

ลูกค้าคนเดียวเรียก support chat 50,000 ครั้ง
background job สรุป knowledge base ซ้ำทั้งระบบ
developer test script ที่รันค้าง
feature ใหม่ที่ prompt ยาวเกินจำเป็น

ทุกกรณีดูเหมือนกันบนกราฟรวม

2. ไม่มี breakdown ตาม feature

OpenAI tag request ตาม API key และ model แต่ไม่รู้จัก feature, route, customer_id หรือ environment ของแอปคุณ ถ้าต้องการ dimension เหล่านี้ คุณต้องสร้างเอง

3. รายงานมี latency

Usage data มักมาช้าหลายนาทีถึงหลายชั่วโมง หาก loop ผิดพลาดเริ่มเผา token ตอน 10:00 คุณอาจเห็นผลบน dashboard ตอนที่เสียเงินไปมากแล้ว ระบบ production ต้องมี near-real-time log และ alert ของตัวเอง

4. Alert ไม่ละเอียดพอ

OpenAI มี organization budget และ soft email alert แต่ไม่มี threshold ต่อ feature, ต่อ route หรือต่อลูกค้า เช่น “แจ้งเตือนถ้า support-chat ใช้เกิน 50 ดอลลาร์ใน 1 ชั่วโมง” คุณต้องสร้างเอง

5. ไม่มี customer attribution

ถ้าคุณขาย B2B SaaS ที่มี AI feature คุณต้องรู้ว่าลูกค้าคนใดสร้างต้นทุนเท่าไร เพื่อกำหนดราคา จำกัด quota หรือ upsell ได้ถูกต้อง Dashboard ไม่ตอบคำถามว่า “customer X ทำให้เราเสียค่า OpenAI เท่าไรเดือนนี้”

6. Project key ช่วยได้บางส่วนเท่านั้น

OpenAI project key ช่วยแยก usage ตาม project ได้ แต่ยังไม่ให้ข้อมูลต่อ feature, customer หรือ route ส่วน OpenAI usage API ส่งคืนข้อมูล aggregate ต่อ project ไม่ใช่ข้อมูลระดับ request

ปัญหานี้คือสิ่งที่บทความบน Dev.to อย่าง “OpenAI Tells You What You Spent. Not Where. So I Built a Dashboard” ชี้ไว้ชัดเจน: คุณจัดการสิ่งที่คุณวัดไม่ได้ไม่ได้

โมเดลข้อมูลการระบุแหล่งที่มาของค่าใช้จ่าย

เริ่มจากกฎเดียว: OpenAI request ทุกครั้งต้องสร้าง event หนึ่งรายการใน warehouse

event นี้คือหน่วยวิเคราะห์หลัก ถ้า schema ถูกต้อง dashboard, alert, quota และ forecast ทั้งหมดจะกลายเป็น SQL query

Schema ขั้นต่ำที่แนะนำ:

คอลัมน์	ประเภท	ตัวอย่าง	ทำไมจึงสำคัญ
`request_id`	uuid	`7a91...`	dedupe, retry, tracing
`timestamp`	timestamptz	`2026-05-06T14:23:01Z`	time-series query, anomaly detection
`feature`	text	`support-chat`	ฟีเจอร์ที่ trigger request
`route`	text	`/api/v1/chat/answer`	HTTP route หรือ background job ID
`customer_id`	text	`cust_4291`	cost per customer, gross margin
`environment`	text	`prod`, `staging`, `dev`	แยก production cost ออกจาก dev/test
`model`	text	`gpt-5.5`, `gpt-5.4-mini`	ราคาแตกต่างตาม model
`prompt_tokens`	int	`15234`	input token count
`completion_tokens`	int	`812`	output token count
`reasoning_tokens`	int	`4500`	reasoning token คิดเป็น output cost
`cached_tokens`	int	`12000`	prompt cache usage
`latency_ms`	int	`2341`	เชื่อม cost กับ UX
`cost_usd`	numeric(10,6)	`0.045672`	cost ที่คำนวณ ณ เวลาบันทึก
`prompt_cache_key`	text	`system-v3`	วิเคราะห์ cache hit ต่อ feature
`error_code`	text	`null`, `429`	ป้องกันการนับ retry ผิด

ให้คำนวณต้นทุนตอนเขียน event ไม่ใช่ตอน query ภายหลัง เพราะราคา model เปลี่ยนได้ คุณต้องการ historical cost ที่สะท้อน rate ในวันที่ request เกิดขึ้น

ตัวอย่าง pricing table และ cost function:

PRICING = {  # USD per 1M tokens, as of May 2026
    "gpt-5.5":      {"input": 5.00,  "cached": 2.50,  "output": 30.00},
    "gpt-5.5-pro":  {"input": 30.00, "cached": 15.00, "output": 180.00},
    "gpt-5.4":      {"input": 2.50,  "cached": 1.25, "output": 15.00},
    "gpt-5.4-mini": {"input": 0.25,  "cached": 0.125, "output": 2.00},
}

def compute_cost_usd(model, prompt_tokens, cached_tokens, completion_tokens, reasoning_tokens):
    rates = PRICING[model]
    uncached = max(0, prompt_tokens - cached_tokens)

    input_cost  = (uncached * rates["input"]) / 1_000_000
    cache_cost  = (cached_tokens * rates["cached"]) / 1_000_000
    output_cost = ((completion_tokens + reasoning_tokens) * rates["output"]) / 1_000_000

    return round(input_cost + cache_cost + output_cost, 6)

หมายเหตุ: reasoning tokens นับเป็น output cost OpenAI API ส่งคืนข้อมูลนี้ใน usage.completion_tokens_details.reasoning_tokens ถ้าคุณไม่นับรวมกับ output คุณจะประเมินต้นทุนของ reasoning/thinking mode ต่ำกว่าความจริง ดูรายละเอียดเพิ่มเติมได้ที่ การแจกแจงราคา GPT-5.5

สร้าง OpenAI wrapper สำหรับ cost attribution

หลักการคือห้ามให้โค้ด production เรียก OpenAI SDK โดยตรง ทุก call ต้องผ่าน wrapper กลาง

ตัวอย่าง Python:

import time
import uuid
import json
import logging
from openai import OpenAI

client = OpenAI()
logger = logging.getLogger("llm.cost")

def call_with_attribution(
    *,
    feature,
    route,
    customer_id,
    environment,
    model,
    messages,
    **openai_kwargs
):
    request_id = str(uuid.uuid4())
    started = time.time()
    error_code = None
    response = None

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **openai_kwargs
        )
        return response

    except Exception as e:
        error_code = getattr(e, "code", "unknown_error")
        raise

    finally:
        latency_ms = int((time.time() - started) * 1000)
        u = response.usage if response else None

        prompt_tokens = getattr(u, "prompt_tokens", 0)
        completion_tokens = getattr(u, "completion_tokens", 0)

        cached_tokens = (
            getattr(getattr(u, "prompt_tokens_details", None), "cached_tokens", 0)
            or 0
        )

        reasoning_tokens = (
            getattr(getattr(u, "completion_tokens_details", None), "reasoning_tokens", 0)
            or 0
        )

        cost_usd = compute_cost_usd(
            model,
            prompt_tokens,
            cached_tokens,
            completion_tokens,
            reasoning_tokens
        )

        logger.info(json.dumps({
            "event": "openai.request",
            "request_id": request_id,
            "feature": feature,
            "route": route,
            "customer_id": customer_id,
            "environment": environment,
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "reasoning_tokens": reasoning_tokens,
            "cached_tokens": cached_tokens,
            "latency_ms": latency_ms,
            "cost_usd": cost_usd,
            "error_code": error_code,
        }))

ตัวอย่างการใช้งาน:

response = call_with_attribution(
    feature="support-chat",
    route="/api/v1/chat/answer",
    customer_id="cust_4291",
    environment="prod",
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a support assistant."},
        {"role": "user", "content": "How do I reset my API key?"}
    ],
    temperature=0.2,
)

แนวทางสำคัญ:

feature, route, customer_id, environment ต้องเป็น required argument
อย่าตั้ง default เป็น "unknown" เพราะจะสร้าง attribution black hole
บันทึก log เป็น JSON หนึ่งบรรทัดต่อ request
ส่ง log เข้า pipeline ที่มีอยู่ เช่น Vector, Fluent Bit, Logstash หรือ OTLP collector
ปลายทางอาจเป็น BigQuery, ClickHouse, Snowflake หรือ Postgres

สำหรับ Node.js รูปแบบเหมือนกัน: wrap OpenAI SDK, รับ metadata, อ่าน response.usage, คำนวณ cost แล้วเขียน JSON event หรือ publish เข้า Kafka, NATS, Pub/Sub

เชื่อมโยงการติดตามค่าใช้จ่ายและทดสอบด้วย Apidog

เมื่อมี schema และ wrapper แล้ว ให้นำไปใช้งานเป็นขั้นตอน

1. แทนที่ OpenAI call โดยตรงด้วย wrapper

ค้นหาใน codebase:

grep -R "OpenAI(" .
grep -R "chat.completions.create" .

จากนั้นเปลี่ยนทุกจุดให้เรียก:

call_with_attribution(...)

อย่าให้ feature metadata ถูก inject จาก global context ที่ไม่ชัดเจน ให้ส่งจากจุด call site เพราะตรงนั้นรู้ดีที่สุดว่า request นี้เกิดจาก feature หรือ route ใด

2. ส่ง structured log

แนะนำรูปแบบ JSON line:

{"event":"openai.request","feature":"support-chat","route":"/api/v1/chat/answer","customer_id":"cust_4291","model":"gpt-5.5","cost_usd":0.045672}

ตั้ง logger level เป็น INFO สำหรับ event เหล่านี้ และอย่าปะปนกับ debug log

3. Aggregate ใน data warehouse

ตัวอย่าง query สำหรับ cost per feature:

SELECT
  feature,
  DATE_TRUNC(timestamp, DAY) AS day,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  SUM(prompt_tokens + completion_tokens) AS tokens,
  AVG(latency_ms) AS avg_latency_ms,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY feature, day
ORDER BY day DESC, spend_usd DESC;

ตัวอย่าง query สำหรับ customer margin:

SELECT
  customer_id,
  COUNT(*) AS requests,
  SUM(cost_usd) AS llm_cost_usd
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), MONTH)
GROUP BY customer_id
ORDER BY llm_cost_usd DESC;

4. สร้าง dashboard หลัก 3 มุมมอง

ใช้ Grafana, Metabase, Looker หรือ Superset แล้วสร้าง:

ค่าใช้จ่ายตาม feature เมื่อเวลาผ่านไป
ค่าใช้จ่ายตาม customer เมื่อเวลาผ่านไป
Top 20 route ที่ใช้เงินมากที่สุดเมื่อวาน

นี่คือ operational dashboard ที่ทีม engineering และ product ควรดูทุกวัน

5. ทดสอบ wrapper ด้วย Apidog ก่อน deploy

หลายทีมสร้าง wrapper แล้วข้ามขั้นตอนตรวจสอบ ทำให้ schema ผิดเงียบ ๆ และ dashboard แสดงตัวเลขผิด

ใช้ Apidog เพื่อทดสอบ end-to-end:

สร้าง scenario ที่ยิง request ไปยัง AI endpoint ของคุณ
ส่ง customer_id, feature และ input ที่ควบคุมได้
ตรวจ response ว่าสำเร็จ
ตรวจ log payload ว่ามี feature, route, customer_id, environment
assert ว่า prompt_tokens > 0
assert ว่า cost_usd > 0
รันซ้ำใน staging และ prod ด้วย environment variable ของ Apidog
replay request ที่มี tag แล้วตรวจว่า retry ไม่ถูกนับซ้ำ

ตัวอย่าง assertion ที่ควรมีใน test:

{
  "feature": "support-chat",
  "route": "/api/v1/chat/answer",
  "customer_id": "cust_test_001",
  "environment": "staging"
}

สำหรับแนวทางทดสอบ API เพิ่มเติม โปรดดู เครื่องมือทดสอบ API สำหรับวิศวกร QA และถ้าทีมของคุณใช้ contract-first workflow โปรดดู การพัฒนา API แบบ contract-first

6. ตั้ง budget limit และ alert

สร้าง OpenAI project key แยกตาม environment หรือ feature เช่น:

prod-support-chat
prod-summarization
staging-all

จากนั้นตั้ง hard limit ใน OpenAI dashboard เพื่อป้องกัน runaway spend

เสริมด้วย alert จาก warehouse เช่น query ทุก 10 นาที:

SELECT
  feature,
  SUM(cost_usd) AS spend_last_hour
FROM openai_events
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
  AND environment = 'prod'
GROUP BY feature
HAVING spend_last_hour > 50;

ส่ง alert ไป Slack, PagerDuty หรือ Opsgenie ได้ตาม stack ที่ทีมใช้

เทคนิคขั้นสูงและเคล็ดลับมือโปร

Prompt caching

GPT-5.5 คิดค่า cached token ที่ 50% ของ input rate วาง system prompt และ prefix ที่คงที่ไว้ด้านหน้า แล้วใส่ข้อมูลเฉพาะ request ไว้ท้าย prompt

ติดตาม cache hit ต่อ feature:

SELECT
  feature,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY cache_hit_rate ASC;

ถ้า cache hit rate ลดหลัง deploy แสดงว่า prompt change อาจทำให้ต้นทุนเพิ่ม ดูกฎเพิ่มเติมใน เอกสารการแคชพรอมต์อย่างเป็นทางการของ OpenAI

Batch API สำหรับงาน offline

งานที่ไม่ต้องตอบแบบ synchronous เช่น nightly summarization, eval runs, embedding backfill หรือ document reprocessing ควรใช้ Batch API เพื่อรับส่วนลด 50%

ให้เพิ่ม field เช่น batch_job_id ใน event เพื่อ trace กลับไปยัง workload ต้นทาง

Reasoning effort tuning

หากใช้ reasoning/thinking mode ให้ตรวจว่า feature จำเป็นต้องใช้ effort สูงจริงหรือไม่

แนวทาง:

รัน A/B test ระหว่าง low, medium, high
วัด quality metric
วัด reasoning_tokens และ cost_usd
ใช้ระดับที่ถูกที่สุดที่ยังผ่าน quality bar

อ่านรายละเอียดเพิ่มเติมได้ที่ วิธีใช้ GPT-5.5 API

Context window discipline

Prompt ที่ยาวเกินจำเป็นทำให้ cost สูงโดยตรง ถ้าคุณใช้ RAG ให้จำกัด retrieval budget แทนการยัด knowledge base ทั้งหมดลง context

ติดตามค่าเฉลี่ย prompt token:

SELECT
  feature,
  AVG(prompt_tokens) AS avg_prompt_tokens
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY avg_prompt_tokens DESC;

ถ้า avg_prompt_tokens เพิ่มขึ้นทุกสัปดาห์โดยไม่มี feature change แปลว่า prompt กำลังบวม

ระวัง token cliff ที่ 272K

OpenAI ใช้ตัวคูณ input 2 เท่า และ output 1.5 เท่า สำหรับ request ที่เกิน 272K tokens ให้เพิ่ม guardrail ใน wrapper:

if prompt_tokens > 250_000:
    logger.warning(json.dumps({
        "event": "openai.large_prompt_warning",
        "request_id": request_id,
        "feature": feature,
        "route": route,
        "prompt_tokens": prompt_tokens,
    }))

ดูรายละเอียดราคาได้ที่ โพสต์ราคา GPT-5.5

จำกัดค่าใช้จ่ายต่อลูกค้า

สำหรับ B2B SaaS ให้สร้าง quota ต่อ customer_id

ตัวอย่าง logic:

def check_customer_ai_quota(customer_id):
    monthly_spend = get_monthly_llm_spend(customer_id)

    if monthly_spend >= get_customer_limit(customer_id):
        raise QuotaExceeded("โควต้า AI รายเดือนเกินกำหนด")

เมื่อเกิน quota ให้ return 429 พร้อมข้อความที่ชัดเจนและ billing CTA วิธีนี้เปลี่ยน AI feature จาก margin risk ให้กลายเป็น product ที่ทำกำไรได้

ข้อผิดพลาดทั่วไปที่ควรหลีกเลี่ยง

นับ reasoning tokens เป็น input ทั้งที่ต้องคิดเป็น output
เชื่อ OpenAI dashboard สำหรับ real-time alert ทั้งที่ข้อมูลมี latency
tag ที่ SDK level แทน call site ทำให้ไม่รู้ feature จริง
ลืม tag background jobs เช่น cron, queue worker, webhook
ใช้ customer_id = null ให้ใช้ internal หรือ system แทน
sample log เพื่อลดปริมาณข้อมูล ทั้งที่ attribution ต้องการข้อมูลครบทุก request
ไม่ dedupe retry ทำให้ cost ถูกนับซ้ำ
ไม่ version pricing table ทำให้ historical cost เปลี่ยนเมื่อราคาใหม่ถูก update

ทางเลือกและเครื่องมือ

คุณไม่จำเป็นต้องสร้างทุกอย่างเอง ตารางนี้ช่วยเลือกแนวทาง:

แนวทาง	สิ่งที่ทำได้ดี	ค่าใช้จ่าย	เมื่อควรใช้
OpenAI usage API	พื้นฐาน, ไม่ต้องตั้งค่า, ตรงกับ invoice	ฟรี	หนึ่ง project, หนึ่ง feature, ไม่ต้องการ customer attribution
Helicone	proxy, dashboard, caching, cost per user	มี free tier; paid เริ่มที่ 20 ดอลลาร์/เดือน	ต้องการ hosted dashboard เร็ว ๆ และรับ proxy ได้
Langfuse	open source, self-host/cloud, tracing + cost	self-host ฟรี; cloud เริ่มที่ 29 ดอลลาร์/เดือน	ต้องการ observability และ cost ในเครื่องมือเดียว
LangSmith	integration กับ LangChain, eval + cost	paid เริ่มที่ 39 ดอลลาร์/user/month	ใช้ LangChain อยู่แล้ว
Custom warehouse	ควบคุมเต็มที่, เข้ากับ data stack, ไม่มี proxy	engineering time	workload ใหญ่, dimension เฉพาะ, retention requirement เข้มงวด

ข้อแลกเปลี่ยน:

proxy เช่น Helicone เพิ่ม dependency ใน critical path
self-hosted observability เช่น Langfuse ให้ control สูง แต่ต้องดูแล infra
custom warehouse เหมาะกับทีมใหญ่ที่ต้องการรวมกับ data stack เดิม
OpenAI usage API ใช้ได้ดีสำหรับ reconciliation แต่ไม่พอสำหรับ product attribution

อ่านเพิ่มเติมได้จาก คู่มือการติดตามค่าใช้จ่าย LLM ของทีม Helicone และ Langfuse เกี่ยวกับการติดตามค่าใช้จ่าย

ถ้าคุณทำสิ่งนี้ในระดับ platform โปรดดู แพลตฟอร์ม API สำหรับสถาปัตยกรรม Microservices เพื่อดูว่า cost-attribution wrapper เข้ากับ service-mesh strategy ได้อย่างไร

กรณีศึกษาจริง

B2B SaaS ที่ต้องรู้ LLM cost ต่อลูกค้า

บริษัทหนึ่งขายผลิตภัณฑ์ sales intelligence ลูกค้าแต่ละราย trigger GPT-5.5 เพื่อสร้างสรุปข้อมูล ก่อนมี attribution บริษัทเห็นเพียงว่าใช้ OpenAI 80,000 ดอลลาร์/เดือน

หลังเพิ่ม customer attribution พบว่า 12% ของลูกค้าสร้าง 71% ของต้นทุน ทีมจึงปรับ pricing tier, ใส่ soft quota ใน plan ล่าง และคิด overage ต่อ seat ผลคือ gross margin ของ AI feature เพิ่มจาก 41% เป็น 73% ในหนึ่งไตรมาส

Internal developer assistant

องค์กรวิศวกรรมให้ developer ทุกคนใช้ GPT-5.5 chat assistant ภายใน โดย tag customer_id เป็น dev_email

ทีม platform พบว่า developer 3 คนสร้าง 50% ของค่าใช้จ่ายภายใน สองคนรัน agent loop ทิ้งไว้โดยไม่รู้ตัว การปิด loop ประหยัดได้ 1,800 ดอลลาร์/เดือน ส่วนคนที่สามใช้จริงอย่างถูกต้อง ข้อมูลนี้จึงช่วย justify quota เพิ่มให้เขา

Forecast ค่าใช้จ่ายก่อน launch feature

ทีม product ต้องการเปิดตัว summarization feature แต่ไม่รู้ว่าจะมีต้นทุนเท่าไร เมื่อมีข้อมูลต่อ feature ทีมสร้าง forecast ได้จาก:

average prompt tokens ต่อ call
average output tokens ต่อ call
expected calls ต่อ active user
expected active users
model price

ผล forecast คือ 0.04 ดอลลาร์ต่อ active user ต่อวัน หรือ 1.20 ดอลลาร์ต่อเดือน ทีมจึงตั้งราคา feature ที่ 5 ดอลลาร์/user/month และ finance อนุมัติเพราะ unit economics ชัดเจน

บทสรุป

OpenAI billing dashboard ตอบคำถามว่า “ใช้ไปเท่าไร” แต่ไม่ตอบว่า “ใครหรืออะไรทำให้ใช้” ถ้าคุณนำ LLM เข้า production คุณต้องมี attribution layer ของตัวเอง

สิ่งที่ควรทำ:

แท็กทุก request ด้วย feature, route, customer_id, environment
คำนวณ cost ตอนเขียน event
ใช้ project key แยกตาม environment หรือ feature
ตั้ง hard limit ใน OpenAI dashboard
สร้าง alert จาก warehouse
ทดสอบ wrapper ด้วย Apidog ก่อน deploy
ตรวจ reasoning effort, prompt size และ cache hit rate เป็นรอบ ๆ

ดาวน์โหลด Apidog แล้วใช้ตรวจสอบ cost-attribution wrapper แบบ end-to-end: ยิง request ที่มี tag, ตรวจ log payload และ replay scenario ในหลาย environment เพื่อให้มั่นใจว่าข้อมูลใน warehouse เชื่อถือได้

สำหรับบทความที่เกี่ยวข้อง โปรดดู การแจกแจงราคา GPT-5.5 และ การเรียกเก็บเงินการใช้งาน GitHub Copilot สำหรับทีม API

คำถามที่พบบ่อย (FAQ)

โทเค็นการให้เหตุผลนับเป็น input หรือ output?

นับเป็น output OpenAI API ส่งคืนใน usage.completion_tokens_details.reasoning_tokens ให้บวกเข้ากับ completion_tokens ตอนคำนวณต้นทุน ดูรายละเอียดได้ที่ การแจกแจงราคา GPT-5.5

`response.usage` แม่นยำแค่ไหนเมื่อเทียบกับ OpenAI dashboard?

จำนวน token ใน response.usage ควรตรงกับ dashboard ความคลาดเคลื่อนมักเกิดจาก pricing table ในระบบคุณล้าสมัย ให้ version rate ต่อ model และ update เมื่อ OpenAI เปลี่ยนราคา

ใช้ OpenAI project key อย่างเดียวพอไหม?

ไม่พอสำหรับ product attribution เพราะ project key ให้มิติระดับ project เท่านั้น ใช้ project key สำหรับ environment segmentation และ budget limit ส่วน feature/customer/route attribution ต้องทำที่ application layer

Retry และ rate limit จะทำให้ cost ถูกนับซ้ำไหม?

request ที่ล้มเหลวก่อน model ทำงานมักไม่มี usage จึงไม่ควรถูกนับ cost แต่ request ที่สำเร็จแล้วถูก retry ใน application อาจถูกนับซ้ำถ้าไม่ dedupe ให้ใช้ request_id เดิมสำหรับ idempotent retry

OpenAI usage API เร็วพอสำหรับ alert ไหม?

ไม่เหมาะกับ real-time alert เพราะมี latency หลายนาทีถึงหลายสิบนาที ใช้ warehouse/log pipeline ของคุณเองสำหรับ alert และ kill-switch ส่วน usage API ใช้สำหรับ reconciliation รายเดือน

ควร sample request log เพื่อลดปริมาณข้อมูลไหม?

ไม่ควร หนึ่ง request คือ JSON หนึ่งบรรทัด ปริมาณข้อมูลไม่สูงเมื่อเทียบกับความแม่นยำที่ต้องใช้สำหรับ customer และ route attribution

ใช้วิธีนี้กับ LLM provider อื่นได้ไหม?

ได้ ให้เพิ่ม column provider เช่น openai, anthropic, google, deepseek แล้วแยก pricing table ต่อ provider wrapper อาจต่างกัน แต่ warehouse schema ใช้ร่วมกันได้ ดูตัวอย่างราคา provider อื่นได้ที่ ราคา DeepSeek V4 API

ใช้กับ embeddings และ image generation ได้ไหม?

ได้ แต่ต้องมี cost function เฉพาะ endpoint เช่น embeddings คิดตาม input token ส่วน image generation คิดต่อภาพตาม resolution แนะนำเพิ่ม column endpoint เช่น chat, embeddings, image