TL;DR
I built a local AI gateway using Envoy, Rust, and Kubernetes to understand how AI traffic actually works.
It broke multiple times. I fixed it. I learned a lot.
Why I Did This
I wanted to understand how AI gateways actually work.
Not the diagrams.
Not the marketing slides.
The real system — the code, the flow, the failures.
So I built one.
Three weeks later, I had something working.
But getting there meant debugging cryptic errors, chasing version mismatches, and nearly giving up a few times.
Here's what I learned.
What I Built
A local AI Gateway that looks like this:
curl → agentgateway proxy → Rust module → httpbun (mock LLM) → response
Everything runs locally using kind (Kubernetes in Docker):
- No cloud costs
- No API keys
- Fully reproducible
Components:
- Envoy → handles traffic
- kgateway + agentgateway → control plane
- Rust module → request/response transformation
- httpbun → fake OpenAI-compatible LLM
This isn't production-ready.
It's a learning lab — and it taught me more than any tutorial ever could.
Why Even Build This?
AI traffic isn't like regular API traffic.
When calling an LLM, you often need to:
- Inject system prompts
- Mask sensitive data
- Route requests to different models
- Track tokens and cost usage
Traditional API gateways don't handle this well.
That's where kgateway comes in — it lets you extend Envoy with custom logic using Rust.
That's what I wanted to explore.
The Stack
| Tool | Role |
|---|---|
| kind | Local Kubernetes cluster |
| kgateway + agentgateway | Gateway control plane |
| Envoy | Data plane proxy |
| Rust | Custom transformation logic |
| httpbun | Mock LLM backend |
Everything is open source. Everything runs locally.
Architecture
Request flow through the AI Gateway. Numbers show the sequence from client request to mock LLM response. (Source : draw.io
This diagram looks simple — but getting each step to work correctly took hours of debugging.
The Problems That Almost Broke Me
1. Rust Versions Move Fast
One day everything worked. The next day:
error: feature edition2024 is required
A dependency (getrandom) needed a newer Rust version than I had.
Fix: Upgraded Rust in my Dockerfile (1.75 → 1.85)
Lesson: Pin versions — or be ready to chase updates.
2. The "Undefined Symbol" Nightmare
Envoy crashed with:
undefined symbol: envoy_dynamic_module_callback_http_add_response_header
Everything looked correct.
Root cause: My SDK didn't match the Envoy version.
Fix: Used the official SDK directly from Envoy source.
Lesson: Version mismatches in Envoy dynamic modules will break everything. No shortcuts.
3. The filter_config Mystery
Envoy kept throwing:
error parsing filter config: EOF while parsing a value
Tried everything:
{}"{}"- YAML tricks…
Nothing worked.
Fix:
filter_config:
"@type": type.googleapis.com/google.protobuf.StringValue
value: "{}"
Lesson: Sometimes the docs do have the answer — you just haven't found it yet.
The Moment It Worked
Then I ran:
curl -X POST http://localhost:8082/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'
And got:
{
"choices": [{
"message": {
"content": "This is a mock chat response from httpbun."
}
}]
}
That moment hits differently.
Everything connected:
Rust module
Gateway routing
Mock LLM response
Why I Used a Mock LLM
Real LLMs:
Cost money
Require API keys
Add latency
So I used httpbun, which mimics OpenAI APIs locally.
This made the project:
Fully local
Reproducible
Beginner-friendly
What I Learned
For Platform Engineers
Envoy dynamic modules are powerful — but strict
Version alignment is critical
Gateway API is worth learning deeply
For Documentation Engineers
Broken systems reveal real documentation gaps
Every error is a learning opportunity
Keeping a debug log is invaluable
For Everyone
Read the docs
Match versions exactly
Start with mocks before real integrations
The Code:
👉 link
Includes:
Kubernetes manifests
Rust source code
Docker setup
Quick start guide
You can run everything locally in ~10 minutes.
What's Next
To make this production-ready:
- Replace
httpbunwith a real LLM (Ollama / OpenAI) - Add auth + rate limiting
- Build more advanced Rust transformations
Final Thoughts
Building from scratch forces understanding.
You don't just "use" tools — you see how they break, how they connect, and why they exist.
That's where real learning happens.
If you're curious about AI infrastructure:
Build something. Break it. Fix it. Write about it.

Top comments (0)