Discussion on: Do You Actually Need an AI Gateway? (And When a Simple LLM Wrapper Isn't Enough)

View post

The API Gateway vs AI Gateway distinction is the key insight here. I've seen teams try to bolt token tracking onto Kong or Nginx and it always ends up as a fragile custom plugin that breaks on streaming responses. The moment you need to count tokens mid-stream or enforce per-team budgets at the request level, generic reverse proxies fall apart. One thing I'd add: semantic caching is where the real cost savings hide. Most teams have 20-30% duplicate or near-duplicate prompts across users, and caching those at the gateway layer cuts spend without any application code changes.