floworkos

Posted on Jun 11

Flow Router: Build Your Own LLM Gateway with OpenAI-Compatible Routing in One Binary

#llm #routing #openai #gateway

Most teams end up juggling multiple API keys, vendor lock-in fears, and cost surprises when scaling LLM applications. Flow Router solves this differently: instead of choosing one provider, you get a single OpenAI-compatible endpoint that intelligently routes requests to whatever backs your stack.

The Problem It Solves

If you've integrated OpenAI into your application, you've hit at least one of these friction points:

Vendor lock-in: Switching providers means rewriting client code
Cost unpredictability: No way to automatically fall back to cheaper models without manual intervention
Multi-provider complexity: Managing keys, rate limits, and format differences across Anthropic, Gemini, local Llama, and others becomes a coordination nightmare
Local LLM adoption friction: You've got a local model running, but your existing agents and tools expect OpenAI format

Flow Router's core idea is elegant: one unified gateway that translates requests to whatever you want underneath.

What's Under the Hood

Single Binary, No Infrastructure Overhead

Flow Router ships as a standalone Go binary (http://127.0.0.1:2402/v1 by default). No Docker containers to orchestrate, no Python runtime to manage, no database setup. This matters more than it sounds—especially if you're running on a Raspberry Pi or in a resource-constrained environment. Cold start: milliseconds. Memory footprint: single-digit MB.

Your entire stack can look like this:

Your App → Flow Router (port 2402) → OpenAI / Anthropic / Local Llama / Azure

Transparent Format Translation

The web still runs on incompatible APIs. Anthropic's format differs from Anthropic's format, which differs from Gemini's. Flow Router handles this translation layer:

Accepts standard OpenAI /chat/completions requests
Detects the target provider in routing rules
Converts request/response shapes automatically
Your application code never knows the difference

In practice:

# Your app sends this (OpenAI format)
curl http://127.0.0.1:2402/v1/chat/completions \
  -H "Authorization: Bearer any-key" \
  -d '{"model": "gpt-4", "messages": [...]}'

# Flow Router routes to your chosen provider (Anthropic, local, etc.)
# and returns OpenAI-formatted responses

Intelligent Fallback Chains

Cost optimization and resilience are the same problem. Flow Router lets you define chains like:

Try gpt-4-turbo (expensive, low latency, highest quality)
Fall back to claude-3 if rate-limited or quota-exceeded
Fall back to mistral-local if both are down (fully offline)
Return error only if all fail

You can weight these by:

Priority: Try in order
Round-robin: Distribute load evenly
Cost-optimal: Minimize spend while meeting latency/quality thresholds

Real example use case: your agent runs 100 short reasoning tasks daily and 5 complex ones. Route shorts to Mistral (cheap), complex ones to GPT-4. One routing rule, no code changes.

Token Optimization

LLM billing is token-based. Flow Router includes a token-saver that:

Caches identical requests (with TTL config)
Strips redundant whitespace and reformats before sending
Tracks token usage per model, per prompt
Helps you measure: "Does switching to a smaller model for this task cost us in quality?"

The knowledge base uses FTS5 full-text search (built-in, SQLite-based) so you can embed domain knowledge without external search infrastructure:

Knowledge Brain (FTS5) ← Your documents
                      ↓
                   Query against RAG context
                      ↓
                   Augment LLM prompt

Subscription Cloaking & Multi-Tenancy

If you're routing through your own OpenAI organization account, Flow Router can:

Accept client-provided API keys via headers
Mask them server-side to prevent leaks
Route each request to the correct account/org
Track usage per client for billing

This is critical if you're building a small AI SaaS without wanting to become an API gateway vendor yourself.

Optional P2P Mesh (Offline Resilience)

In edge deployments, Flow Router can form a peer-to-peer mesh. If your primary gateway node goes down, requests route to peers still online. The knowledge base (FTS5) replicates across the mesh, so even isolated nodes can serve cached queries.

Trade-off: P2P adds latency and complexity. It's optional; most teams don't enable it.

Real-World Trade-Offs

Strengths:

Genuinely lightweight (Go binary, no runtime)
Transparent to existing OpenAI clients (drop-in compatibility)
Flexible fallback logic without code changes
Local-first philosophy (runs offline, P2P optional)

Honest Limitations:

Streaming support exists but is less tested than standard OpenAI calls
Format translation is deterministic but may not 100% preserve edge-case API quirks across all providers
Token optimization cache adds latency (slight) to first request; subsequent hits are instant
Knowledge base is SQLite/FTS5; scales to millions of documents but isn't Elasticsearch

You're trading some API completeness for operational simplicity and sovereignty.

Getting Started

# Download the binary
wget https://github.com/flowrouter/flowrouter/releases/latest

# Define your routing rules (YAML)
cat > config.yaml <<EOF
chains:
  default:
    - model: gpt-4
      provider: openai
      key: ${OPENAI_KEY}
    - model: claude-3-sonnet
      provider: anthropic
      key: ${ANTHROPIC_KEY}
    - model: mistral-7b
      provider: local
      url: http://127.0.0.1:8000/v1
EOF

# Run
./flowrouter --config config.yaml

Your application immediately gains:

One endpoint for all models
Automatic fallbacks
Token tracking
Knowledge brain for RAG
Zero additional infrastructure

When to Use It

Flow Router is a fit if you:

Run multiple LLM providers and want unified routing
Are building multi-tenant LLM services
Want to run local models alongside cloud APIs
Care about operational sovereignty (no SaaS gateway)
Need knowledge base + chat in one system

It's not a fit if:

You're deeply invested in a single provider's ecosystem
You need real-time streaming with complex fallback logic
Your knowledge base is already in Pinecone/Weaviate

Closing

Flow Router is honest about what it is: a pragmatic routing layer for teams that want LLM flexibility without vendor lock-in or infrastructure sprawl. One binary, OpenAI-compatible, offline-capable. That simplicity is its strength.

Flowork is open source — both products:

🤖 Flowork Agent (the self-hosted agent OS): https://github.com/flowork-os/Flowork_Agent
🛣️ Flow Router (the sovereign LLM gateway): https://github.com/flowork-os/flowork_Router

DEV Community