How I built a terminal AI agent that never hits rate limits (open source, Python)

#ai #python #opensource #terminal

A month ago I was building a side project and kept
hitting the same wall: I'd start a task with OpenAI,
hit the rate limit, manually switch to Anthropic,
hit a different limit, then open yet another tab to
configure Gemini. Three API dashboards open, three
different billing pages, and my actual project sitting
there waiting.

I didn't want to pay for multiple APIs just to keep
working. So I built something to fix it.

What I built

HelloChusquis is an open source terminal AI agent that automatically switches between 35+ AI providers when one hits rate limits or goes down.

One config file. Zero manual switching.

pip install hellochusquis
hellochusquis --quick

The agent tries your first provider, and if it fails or hits limits, silently falls back to the next one. You never see an error — the task just completes.

The hardest part

The trickiest bug was getting the agent to execute commands correctly during multi-step plans. The agent would generate a plan, start executing, and then lose access to its tools halfway through. Step 1 worked, steps 2-6 failed with "Unknown tool" errors.

The problem: tools were available in the initial context but weren't being passed through each step of the execution loop. Once I fixed the context propagation, multi-step tasks like "search the web for AI news and summarize the top 3 stories" started working end to end.

How the fallback works

The core is a ProviderPool class that tracks each provider's state:

@dataclass
class Provider:
    name: str
    base_url: str
    api_key: str
    model: str
    exhausted: bool = False
    exhausted_at: datetime = None

class ProviderPool:
    def chat(self, messages, tools=None):
        available = self._available()
        for provider in available:
            try:
                return self._call(provider, messages, tools)
            except Exception as e:
                self._handle_error(provider, e)
        raise RuntimeError("All providers failed")

When a provider returns a 429, 402, or 503, it gets marked as exhausted with a timestamp. After a configurable window (default 1 hour), it resets automatically. It's essentially a circuit breaker pattern applied to LLM providers.

What it can do

Beyond the fallback, HelloChusquis has grown into

a full terminal agent:

128 integrations (Stripe, Supabase, AWS, Discord...)
Browser automation with human-like mouse movement
Web UI with voice I/O
Auto-Tool Builder: describe an integration, it generates the plugin
REST API mode
Persistent memory across sessions

Try it

pip install hellochusquis
hellochusquis --quick  # 60 second setup

GitHub: github.com/aminoy77/HelloChusquis

Open source, MIT license, free forever.

If you've hit the same rate limit frustration, I'd
love to hear how you're handling it — or what you'd
want HelloChusquis to do that it doesn't yet.

DEV Community