DEV Community

floworkos
floworkos

Posted on

Flow Router: Build Your Own LLM Gateway with OpenAI-Compatible Routing in One Binary

Most teams end up juggling multiple API keys, vendor lock-in fears, and cost surprises when scaling LLM applications. Flow Router solves this differently: instead of choosing one provider, you get a single OpenAI-compatible endpoint that intelligently routes requests to whatever backs your stack.

The Problem It Solves

If you've integrated OpenAI into your application, you've hit at least one of these friction points:

  • Vendor lock-in: Switching providers means rewriting client code
  • Cost unpredictability: No way to automatically fall back to cheaper models without manual intervention
  • Multi-provider complexity: Managing keys, rate limits, and format differences across Anthropic, Gemini, local Llama, and others becomes a coordination nightmare
  • Local LLM adoption friction: You've got a local model running, but your existing agents and tools expect OpenAI format

Flow Router's core idea is elegant: one unified gateway that translates requests to whatever you want underneath.

What's Under the Hood

Single Binary, No Infrastructure Overhead

Flow Router ships as a standalone Go binary (http://127.0.0.1:2402/v1 by default). No Docker containers to orchestrate, no Python runtime to manage, no database setup. This matters more than it sounds—especially if you're running on a Raspberry Pi or in a resource-constrained environment. Cold start: milliseconds. Memory footprint: single-digit MB.

Your entire stack can look like this:

Your App → Flow Router (port 2402) → OpenAI / Anthropic / Local Llama / Azure
Enter fullscreen mode Exit fullscreen mode

Transparent Format Translation

The web still runs on incompatible APIs. Anthropic's format differs from Anthropic's format, which differs from Gemini's. Flow Router handles this translation layer:

  • Accepts standard OpenAI /chat/completions requests
  • Detects the target provider in routing rules
  • Converts request/response shapes automatically
  • Your application code never knows the difference

In practice:

# Your app sends this (OpenAI format)
curl http://127.0.0.1:2402/v1/chat/completions \
  -H "Authorization: Bearer any-key" \
  -d '{"model": "gpt-4", "messages": [...]}'

# Flow Router routes to your chosen provider (Anthropic, local, etc.)
# and returns OpenAI-formatted responses
Enter fullscreen mode Exit fullscreen mode

Intelligent Fallback Chains

Cost optimization and resilience are the same problem. Flow Router lets you define chains like:

  1. Try gpt-4-turbo (expensive, low latency, highest quality)
  2. Fall back to claude-3 if rate-limited or quota-exceeded
  3. Fall back to mistral-local if both are down (fully offline)
  4. Return error only if all fail

You can weight these by:

  • Priority: Try in order
  • Round-robin: Distribute load evenly
  • Cost-optimal: Minimize spend while meeting latency/quality thresholds

Real example use case: your agent runs 100 short reasoning tasks daily and 5 complex ones. Route shorts to Mistral (cheap), complex ones to GPT-4. One routing rule, no code changes.

Token Optimization

LLM billing is token-based. Flow Router includes a token-saver that:

  • Caches identical requests (with TTL config)
  • Strips redundant whitespace and reformats before sending
  • Tracks token usage per model, per prompt
  • Helps you measure: "Does switching to a smaller model for this task cost us in quality?"

The knowledge base uses FTS5 full-text search (built-in, SQLite-based) so you can embed domain knowledge without external search infrastructure:

Knowledge Brain (FTS5) ← Your documents
                      ↓
                   Query against RAG context
                      ↓
                   Augment LLM prompt
Enter fullscreen mode Exit fullscreen mode

Subscription Cloaking & Multi-Tenancy

If you're routing through your own OpenAI organization account, Flow Router can:

  • Accept client-provided API keys via headers
  • Mask them server-side to prevent leaks
  • Route each request to the correct account/org
  • Track usage per client for billing

This is critical if you're building a small AI SaaS without wanting to become an API gateway vendor yourself.

Optional P2P Mesh (Offline Resilience)

In edge deployments, Flow Router can form a peer-to-peer mesh. If your primary gateway node goes down, requests route to peers still online. The knowledge base (FTS5) replicates across the mesh, so even isolated nodes can serve cached queries.

Trade-off: P2P adds latency and complexity. It's optional; most teams don't enable it.

Real-World Trade-Offs

Strengths:

  • Genuinely lightweight (Go binary, no runtime)
  • Transparent to existing OpenAI clients (drop-in compatibility)
  • Flexible fallback logic without code changes
  • Local-first philosophy (runs offline, P2P optional)

Honest Limitations:

  • Streaming support exists but is less tested than standard OpenAI calls
  • Format translation is deterministic but may not 100% preserve edge-case API quirks across all providers
  • Token optimization cache adds latency (slight) to first request; subsequent hits are instant
  • Knowledge base is SQLite/FTS5; scales to millions of documents but isn't Elasticsearch

You're trading some API completeness for operational simplicity and sovereignty.

Getting Started

# Download the binary
wget https://github.com/flowrouter/flowrouter/releases/latest

# Define your routing rules (YAML)
cat > config.yaml <<EOF
chains:
  default:
    - model: gpt-4
      provider: openai
      key: ${OPENAI_KEY}
    - model: claude-3-sonnet
      provider: anthropic
      key: ${ANTHROPIC_KEY}
    - model: mistral-7b
      provider: local
      url: http://127.0.0.1:8000/v1
EOF

# Run
./flowrouter --config config.yaml
Enter fullscreen mode Exit fullscreen mode

Your application immediately gains:

  • One endpoint for all models
  • Automatic fallbacks
  • Token tracking
  • Knowledge brain for RAG
  • Zero additional infrastructure

When to Use It

Flow Router is a fit if you:

  • Run multiple LLM providers and want unified routing
  • Are building multi-tenant LLM services
  • Want to run local models alongside cloud APIs
  • Care about operational sovereignty (no SaaS gateway)
  • Need knowledge base + chat in one system

It's not a fit if:

  • You're deeply invested in a single provider's ecosystem
  • You need real-time streaming with complex fallback logic
  • Your knowledge base is already in Pinecone/Weaviate

Closing

Flow Router is honest about what it is: a pragmatic routing layer for teams that want LLM flexibility without vendor lock-in or infrastructure sprawl. One binary, OpenAI-compatible, offline-capable. That simplicity is its strength.


Flowork is open source — both products:

Top comments (0)