DEV Community

Cover image for How I built a terminal AI agent that never hits rate limits (open source, Python)
Arnau Moyano
Arnau Moyano

Posted on

How I built a terminal AI agent that never hits rate limits (open source, Python)

A month ago I was building a side project and kept
hitting the same wall: I'd start a task with OpenAI,
hit the rate limit, manually switch to Anthropic,
hit a different limit, then open yet another tab to
configure Gemini. Three API dashboards open, three
different billing pages, and my actual project sitting
there waiting.

I didn't want to pay for multiple APIs just to keep
working. So I built something to fix it.

What I built

HelloChusquis is an open source terminal AI agent that automatically switches between 35+ AI providers when one hits rate limits or goes down.

One config file. Zero manual switching.

pip install hellochusquis
hellochusquis --quick
Enter fullscreen mode Exit fullscreen mode

The agent tries your first provider, and if it fails or hits limits, silently falls back to the next one. You never see an error — the task just completes.

The hardest part

The trickiest bug was getting the agent to execute commands correctly during multi-step plans. The agent would generate a plan, start executing, and then lose access to its tools halfway through. Step 1 worked, steps 2-6 failed with "Unknown tool" errors.

The problem: tools were available in the initial context but weren't being passed through each step of the execution loop. Once I fixed the context propagation, multi-step tasks like "search the web for AI news and summarize the top 3 stories" started working end to end.

How the fallback works

The core is a ProviderPool class that tracks each provider's state:

@dataclass
class Provider:
    name: str
    base_url: str
    api_key: str
    model: str
    exhausted: bool = False
    exhausted_at: datetime = None

class ProviderPool:
    def chat(self, messages, tools=None):
        available = self._available()
        for provider in available:
            try:
                return self._call(provider, messages, tools)
            except Exception as e:
                self._handle_error(provider, e)
        raise RuntimeError("All providers failed")
Enter fullscreen mode Exit fullscreen mode

When a provider returns a 429, 402, or 503, it gets marked as exhausted with a timestamp. After a configurable window (default 1 hour), it resets automatically. It's essentially a circuit breaker pattern applied to LLM providers.

What it can do

Beyond the fallback, HelloChusquis has grown into

 a full terminal agent:

  • 128 integrations (Stripe, Supabase, AWS, Discord...)
  • Browser automation with human-like mouse movement
  • Web UI with voice I/O
  • Auto-Tool Builder: describe an integration, it generates the plugin
  • REST API mode
  • Persistent memory across sessions

Try it

pip install hellochusquis
hellochusquis --quick  # 60 second setup
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/aminoy77/HelloChusquis

Open source, MIT license, free forever.

If you've hit the same rate limit frustration, I'd
love to hear how you're handling it — or what you'd
want HelloChusquis to do that it doesn't yet.

Top comments (0)