DEV Community

Andrew Kew
Andrew Kew

Posted on

Proxy OpenAI Through Kong AI Gateway on Kubernetes

The Problem With Talking Directly to LLMs

Most teams start by wiring their app straight to the OpenAI API. It works — until you need to add auth, rate limiting, observability, or swap out the model provider. Now you're rewriting application code instead of config.

An AI Gateway solves this. One entry point, one place to govern traffic, providers become swappable. Kong Gateway is a mature choice here — it's been doing this for APIs for years, and the AI Proxy plugin extends that to LLMs.

This post walks through the key ideas. For the full step-by-step guide, head over to the tutorial on Hashnode.


What We're Building

A Kong Gateway 3.14 data plane running on Kubernetes (kind locally), connected to a Kong Konnect control plane. The AI Proxy plugin sits on a route and handles forwarding to OpenAI — your app just talks to Kong.

Your app
  → POST /ai/chat (Kong proxy)
    → AI Proxy plugin attaches API key
      → OpenAI API
        → response back to your app
Enter fullscreen mode Exit fullscreen mode

Your app never holds an OpenAI key. Kong does. You get rate limiting, logging, and model-swapping for free at the gateway layer.


The Key Bit: decK Config as Code

The most interesting part of this setup is using decK to define the service, route, and plugin as a YAML state file — then syncing it to Konnect, which pushes it down to the data plane automatically.

# kong-ai.yaml
_format_version: "3.0"

services:
  - name: openai-service
    url: https://api.openai.com
    routes:
      - name: openai-chat-route
        paths:
          - /ai/chat
        plugins:
          - name: ai-proxy
            config:
              route_type: llm/v1/chat
              auth:
                header_name: Authorization
                header_value: "Bearer $OPENAI_API_KEY"
              model:
                provider: openai
                name: gpt-4o
                options:
                  max_tokens: 512
Enter fullscreen mode Exit fullscreen mode

One sync command and Konnect pushes the config to every connected data plane:

deck gateway sync kong-ai.yaml \
  --konnect-token "$KONNECT_TOKEN" \
  --konnect-control-plane-name "kong-ai-tutorial"
Enter fullscreen mode Exit fullscreen mode

Once it's live, a single HTTPie call confirms the whole chain is working:

http POST localhost:8080/ai/chat \
  Content-Type:application/json \
  messages:='[{"role": "user", "content": "What is Kong Gateway in one sentence?"}]'
Enter fullscreen mode Exit fullscreen mode

The response comes back in the standard OpenAI chat format — because Kong normalises it — even if you swap the underlying model later.


Try It Yourself

The full tutorial covers:

  • Creating a kind cluster with correct port mappings
  • Setting up a Konnect control plane and downloading cluster certs
  • Creating a System Account + Admin Role + PAT (the right way to handle automated access)
  • Installing Kong 3.14 via Helm with a complete values file
  • Full decK state file for the AI Proxy plugin
  • Troubleshooting guide for the common failure modes

👉 Kong AI Gateway on Kubernetes: Proxy OpenAI via Konnect


What's Next

Once you have this baseline, the interesting extensions are:

  • Adding rate limiting per consumer so you don't blow your OpenAI budget
  • Routing to multiple providers (OpenAI + Anthropic) behind a single endpoint
  • JWT auth so only your services can hit the AI routes

All of those are just more decK config. That's the point.


✏️ Drafted with KewBot (AI), edited and approved by Drew.

Top comments (0)