Most teams end up juggling multiple API keys, vendor lock-in fears, and cost surprises when scaling LLM applications. Flow Router solves this differently: instead of choosing one provider, you get a single OpenAI-compatible endpoint that intelligently routes requests to whatever backs your stack.
The Problem It Solves
If you've integrated OpenAI into your application, you've hit at least one of these friction points:
- Vendor lock-in: Switching providers means rewriting client code
- Cost unpredictability: No way to automatically fall back to cheaper models without manual intervention
- Multi-provider complexity: Managing keys, rate limits, and format differences across Anthropic, Gemini, local Llama, and others becomes a coordination nightmare
- Local LLM adoption friction: You've got a local model running, but your existing agents and tools expect OpenAI format
Flow Router's core idea is elegant: one unified gateway that translates requests to whatever you want underneath.
What's Under the Hood
Single Binary, No Infrastructure Overhead
Flow Router ships as a standalone Go binary (http://127.0.0.1:2402/v1 by default). No Docker containers to orchestrate, no Python runtime to manage, no database setup. This matters more than it sounds—especially if you're running on a Raspberry Pi or in a resource-constrained environment. Cold start: milliseconds. Memory footprint: single-digit MB.
Your entire stack can look like this:
Your App → Flow Router (port 2402) → OpenAI / Anthropic / Local Llama / Azure
Transparent Format Translation
The web still runs on incompatible APIs. Anthropic's format differs from Anthropic's format, which differs from Gemini's. Flow Router handles this translation layer:
- Accepts standard OpenAI
/chat/completionsrequests - Detects the target provider in routing rules
- Converts request/response shapes automatically
- Your application code never knows the difference
In practice:
# Your app sends this (OpenAI format)
curl http://127.0.0.1:2402/v1/chat/completions \
-H "Authorization: Bearer any-key" \
-d '{"model": "gpt-4", "messages": [...]}'
# Flow Router routes to your chosen provider (Anthropic, local, etc.)
# and returns OpenAI-formatted responses
Intelligent Fallback Chains
Cost optimization and resilience are the same problem. Flow Router lets you define chains like:
- Try
gpt-4-turbo(expensive, low latency, highest quality) - Fall back to
claude-3if rate-limited or quota-exceeded - Fall back to
mistral-localif both are down (fully offline) - Return error only if all fail
You can weight these by:
- Priority: Try in order
- Round-robin: Distribute load evenly
- Cost-optimal: Minimize spend while meeting latency/quality thresholds
Real example use case: your agent runs 100 short reasoning tasks daily and 5 complex ones. Route shorts to Mistral (cheap), complex ones to GPT-4. One routing rule, no code changes.
Token Optimization
LLM billing is token-based. Flow Router includes a token-saver that:
- Caches identical requests (with TTL config)
- Strips redundant whitespace and reformats before sending
- Tracks token usage per model, per prompt
- Helps you measure: "Does switching to a smaller model for this task cost us in quality?"
The knowledge base uses FTS5 full-text search (built-in, SQLite-based) so you can embed domain knowledge without external search infrastructure:
Knowledge Brain (FTS5) ← Your documents
↓
Query against RAG context
↓
Augment LLM prompt
Subscription Cloaking & Multi-Tenancy
If you're routing through your own OpenAI organization account, Flow Router can:
- Accept client-provided API keys via headers
- Mask them server-side to prevent leaks
- Route each request to the correct account/org
- Track usage per client for billing
This is critical if you're building a small AI SaaS without wanting to become an API gateway vendor yourself.
Optional P2P Mesh (Offline Resilience)
In edge deployments, Flow Router can form a peer-to-peer mesh. If your primary gateway node goes down, requests route to peers still online. The knowledge base (FTS5) replicates across the mesh, so even isolated nodes can serve cached queries.
Trade-off: P2P adds latency and complexity. It's optional; most teams don't enable it.
Real-World Trade-Offs
Strengths:
- Genuinely lightweight (Go binary, no runtime)
- Transparent to existing OpenAI clients (drop-in compatibility)
- Flexible fallback logic without code changes
- Local-first philosophy (runs offline, P2P optional)
Honest Limitations:
- Streaming support exists but is less tested than standard OpenAI calls
- Format translation is deterministic but may not 100% preserve edge-case API quirks across all providers
- Token optimization cache adds latency (slight) to first request; subsequent hits are instant
- Knowledge base is SQLite/FTS5; scales to millions of documents but isn't Elasticsearch
You're trading some API completeness for operational simplicity and sovereignty.
Getting Started
# Download the binary
wget https://github.com/flowrouter/flowrouter/releases/latest
# Define your routing rules (YAML)
cat > config.yaml <<EOF
chains:
default:
- model: gpt-4
provider: openai
key: ${OPENAI_KEY}
- model: claude-3-sonnet
provider: anthropic
key: ${ANTHROPIC_KEY}
- model: mistral-7b
provider: local
url: http://127.0.0.1:8000/v1
EOF
# Run
./flowrouter --config config.yaml
Your application immediately gains:
- One endpoint for all models
- Automatic fallbacks
- Token tracking
- Knowledge brain for RAG
- Zero additional infrastructure
When to Use It
Flow Router is a fit if you:
- Run multiple LLM providers and want unified routing
- Are building multi-tenant LLM services
- Want to run local models alongside cloud APIs
- Care about operational sovereignty (no SaaS gateway)
- Need knowledge base + chat in one system
It's not a fit if:
- You're deeply invested in a single provider's ecosystem
- You need real-time streaming with complex fallback logic
- Your knowledge base is already in Pinecone/Weaviate
Closing
Flow Router is honest about what it is: a pragmatic routing layer for teams that want LLM flexibility without vendor lock-in or infrastructure sprawl. One binary, OpenAI-compatible, offline-capable. That simplicity is its strength.
Flowork is open source — both products:
- 🤖 Flowork Agent (the self-hosted agent OS): https://github.com/flowork-os/Flowork_Agent
- 🛣️ Flow Router (the sovereign LLM gateway): https://github.com/flowork-os/flowork_Router
Top comments (0)