π The Problem
Most OpenAI-compatible APIs send token usage only in the final chunk of a stream.
So if a user:
- refreshes
- closes the tab
π the stream stops
π usage data never arrives
But the user has already seen part of the response.
β οΈ Why This Matters
- β tokens not recorded
- β users get partial responses for free
- β can be abused
- β billing becomes inaccurate
π The Fix
I added a fallback inside the streaming loop (finally block).
π If no usage data is received:
- calculate tokens manually using
tiktoken - count prompt + generated output
if not tokens_used:
enc = tiktoken.get_encoding("cl100k_base")
prompt_tokens = sum(len(enc.encode(m["content"])) for m in messages)
output_tokens = len(enc.encode(full_content))
tokens_used = prompt_tokens + output_tokens
βοΈ Result
- βοΈ tokens tracked even if stream is interrupted
- βοΈ no free token exploits
- βοΈ accurate usage tracking
π Takeaway
Streaming APIs donβt guarantee usage data.
If youβre building anything with token limits or billing, donβt depend only on the final chunk.
Top comments (0)