The Problem With Talking Directly to LLMs
Most teams start by wiring their app straight to the OpenAI API. It works — until you need to add auth, rate limiting, observability, or swap out the model provider. Now you're rewriting application code instead of config.
An AI Gateway solves this. One entry point, one place to govern traffic, providers become swappable. Kong Gateway is a mature choice here — it's been doing this for APIs for years, and the AI Proxy plugin extends that to LLMs.
This post walks through the key ideas. For the full step-by-step guide, head over to the tutorial on Hashnode.
What We're Building
A Kong Gateway 3.14 data plane running on Kubernetes (kind locally), connected to a Kong Konnect control plane. The AI Proxy plugin sits on a route and handles forwarding to OpenAI — your app just talks to Kong.
Your app
→ POST /ai/chat (Kong proxy)
→ AI Proxy plugin attaches API key
→ OpenAI API
→ response back to your app
Your app never holds an OpenAI key. Kong does. You get rate limiting, logging, and model-swapping for free at the gateway layer.
The Key Bit: decK Config as Code
The most interesting part of this setup is using decK to define the service, route, and plugin as a YAML state file — then syncing it to Konnect, which pushes it down to the data plane automatically.
# kong-ai.yaml
_format_version: "3.0"
services:
- name: openai-service
url: https://api.openai.com
routes:
- name: openai-chat-route
paths:
- /ai/chat
plugins:
- name: ai-proxy
config:
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: "Bearer $OPENAI_API_KEY"
model:
provider: openai
name: gpt-4o
options:
max_tokens: 512
One sync command and Konnect pushes the config to every connected data plane:
deck gateway sync kong-ai.yaml \
--konnect-token "$KONNECT_TOKEN" \
--konnect-control-plane-name "kong-ai-tutorial"
Once it's live, a single HTTPie call confirms the whole chain is working:
http POST localhost:8080/ai/chat \
Content-Type:application/json \
messages:='[{"role": "user", "content": "What is Kong Gateway in one sentence?"}]'
The response comes back in the standard OpenAI chat format — because Kong normalises it — even if you swap the underlying model later.
Try It Yourself
The full tutorial covers:
- Creating a kind cluster with correct port mappings
- Setting up a Konnect control plane and downloading cluster certs
- Creating a System Account + Admin Role + PAT (the right way to handle automated access)
- Installing Kong 3.14 via Helm with a complete values file
- Full decK state file for the AI Proxy plugin
- Troubleshooting guide for the common failure modes
👉 Kong AI Gateway on Kubernetes: Proxy OpenAI via Konnect
What's Next
Once you have this baseline, the interesting extensions are:
- Adding rate limiting per consumer so you don't blow your OpenAI budget
- Routing to multiple providers (OpenAI + Anthropic) behind a single endpoint
- JWT auth so only your services can hit the AI routes
All of those are just more decK config. That's the point.
✏️ Drafted with KewBot (AI), edited and approved by Drew.
Top comments (0)