Eros De Grande (ErosLifeStyle)

Posted on Jun 23 • Originally published at github.com

Claude Code kept going down. So I built a router.

#ai #opensource #python #devops

The Hook

Last month, mid-sprint, my Claude subscription decided to take a vacation. Three days of "Service Unavailable" while I had a feature to ship. That's when I decided: no more single points of failure in my AI stack.

Background: Why Claude Code Alone Isn't Enough

Claude Code is excellent. The context window, the reasoning, the tool use. But it's a subscription service with its own limits: rate caps, availability windows, and that annoying moment when your plan doesn't cover what you need right now.

MiniMax offers a solid alternative, but their API behaves differently. Different endpoints, different parameters, different quirks. Switching manually between them is a workflow killer.

I needed a proxy that could sit in front of both, route intelligently based on what I'm doing, and fail over automatically when something breaks.

What I Built: Four Routing Modes

The router operates in four distinct modes:

Forced Claude - Everything goes to Claude. Use when you need maximum reasoning capability.
Forced MiniMax - Everything goes to MiniMax. Cheaper tasks, bulk processing.
Automatic - The proxy decides based on task complexity. Simple queries to MiniMax, complex ones to Claude.
Interactive (Smart Mode) - The proxy categorizes tasks in real-time and routes accordingly, with manual override available.

The interactive mode is where it gets interesting. I built a lightweight classifier that distinguishes between:

T0/T1 tasks: Quick edits, single-file changes, documentation updates
T2 tasks: Multi-file refactors, architecture decisions, complex debugging

Architecture: Three Services + One Proxy

The system runs as:

ai-router-proxy.py    # The HTTP proxy (port 8080)
ai-router-switcher.sh # Service switcher script
ai-health-checker.sh  # Health monitoring daemon

Each runs as a systemd service:

~/.config/systemd/user/ai-router-proxy.service
~/.config/systemd/user/ai-router-switcher.service
~/.config/user/ai-router-health-checker.service

The proxy listens locally, forwards to whichever backend is active, and handles the translation between API formats.

Smart Mode: How It Categorizes Tasks

The routing logic examines the incoming request to determine complexity:

Request length and structure
Presence of code patterns (imports, function definitions, class declarations)
Explicit mode hints in the request
Token count estimates

Simple prompts with minimal context go to MiniMax. Anything resembling a full refactor or complex reasoning task gets routed to Claude.

Triple Defense: Service Resilience

One service failing shouldn't break your workflow. The resilience stack:

systemd restart policies - Each service auto-restarts on failure with backoff
cron health checks - Every 5 minutes, a script verifies both backends and switches if needed
pre-request hooks - The proxy checks backend health before each request

If Claude goes down, the system detects it within 5 minutes (cron interval) or immediately (hook). If MiniMax fails, same story. You keep working.

The Core Proxy Logic

Here's the essential routing function from ai-router-proxy.py:

def handle_request(req_data, mode):
    if mode == "claude":
        return forward(req_data, CLAUDE_ENDPOINT)
    elif mode == "minimax":
        return forward(req_data, MINIMAX_ENDPOINT)
    elif mode == "auto":
        complexity = classify(req_data)
        target = CLAUDE_ENDPOINT if complexity > 0.7 else MINIMAX_ENDPOINT
        return forward(req_data, target)
    elif mode == "interactive":
        task_type = detect_task_type(req_data)
        if task_type in ["edit", "doc", "refactor"]:
            return forward(req_data, MINIMAX_ENDPOINT)
        return forward(req_data, CLAUDE_ENDPOINT)

Ten lines that handle four modes, with the complexity classifier being the real intelligence.

How it stacks up vs the alternatives

Real-World Use Cases

Documentation writing - MiniMax handles the bulk. When I need to explain an architecture decision, Claude takes over.
Bug hunting - Automatic mode detects the complexity and routes to Claude for the heavy lifting.
Code review - Interactive mode identifies review patterns and routes accordingly.
Bulk refactoring - Forced MiniMax mode for repetitive changes where I just need speed.

Performance and Results

After a month of daily use:

Switchover time when a backend fails: under 10 seconds
Routing accuracy in auto mode: approximately 85% (based on manual overrides)
Cost savings from MiniMax for simple tasks: roughly 40% compared to Claude-only
Zero workflow interruptions from backend issues: 31 days and counting

What I Learned

Building a resilient AI proxy taught me three things:

Failover is easy; failback is hard - Detecting when a service comes back and switching back requires careful state management.
Classification is the hard part - The routing logic is simple; knowing when to route is the actual problem.
Local proxies are underrated - A simple HTTP proxy solved what I thought would require a complex service mesh.

Links and Call to Action

The full project is open source:

Repository: https://github.com/eroslifestyle/ai-router-switch
Documentation: https://eroslifestyle.github.io/ai-router-switch/
Banner/Assets: https://github.com/eroslifestyle/ai-router-switch/tree/main/assets
Discussion: https://github.com/eroslifestyle/ai-router-switch/discussions/2

If you've been burned by AI service downtime, or if you want to optimize costs without sacrificing capability, clone the repo and run the installer. The setup script handles the systemd units, the proxy configuration, and the health checks.

Questions, contributions, and feature requests welcome in the discussions.

DEV Community