Nathaniel Hamlett

Posted on Mar 29 • Originally published at nathanhamlett.com

How to Stop browser-use From Choking When Your Primary LLM Hits a 429

#python #webdev #ai #automation

How to Stop browser-use From Choking When Your Primary LLM Hits a 429

If you're running browser-use in production — actual automated form submissions, multi-step web agents, anything that has to keep working while you're asleep — you've hit this wall: your LLM returns a 429, and the whole agent dies.

Most tutorials show you how to get browser-use working. Nobody talks about keeping it working when your free quota runs out at 2am and there are 40 jobs left in the queue.

Here's the actual fix.

Why the Default Setup Is Fragile

Out of the box, browser-use scripts look like this:

from browser_use.llm.openai.chat import ChatOpenAI

llm = ChatOpenAI(
    model='anthropic/claude-sonnet-4',
    base_url='https://openrouter.ai/api/v1',
    api_key=os.environ['OPENROUTER_API_KEY'],
)
agent = Agent(task=task, llm=llm, browser=browser)
await agent.run()

This hardcodes one LLM. When that LLM returns a 429 (rate limit), 503 (overloaded), or a quota error, the agent crashes. Your queue stops. You find out in the morning.

This is fine for demos. It's not fine for anything you want to run continuously.

The Three-Tier Fallback Pattern

The fix is a _get_llm() factory function that tries backends in order and only fails if all of them fail:

from browser_use.llm import ChatGoogle, ChatDeepSeek, ChatOpenRouter, ChatOllama

def _get_llm(exclude=None):
    """Return the first available LLM from the fallback chain."""
    exclude = exclude or []

    backends = [
        {
            'name': 'gemini-flash',
            'cls': ChatGoogle,
            'kwargs': {'model': 'gemini-2.5-flash'},
            'env_check': 'GEMINI_API_KEY',
        },
        {
            'name': 'openrouter',
            'cls': ChatOpenRouter,
            'kwargs': {
                'model': 'anthropic/claude-sonnet-4',
                'api_key': os.environ.get('OPENROUTER_API_KEY', ''),
            },
            'env_check': 'OPENROUTER_API_KEY',
        },
        {
            'name': 'deepseek',
            'cls': ChatDeepSeek,
            'kwargs': {
                'model': 'deepseek-chat',
                'api_key': os.environ.get('DEEPSEEK_API_KEY', ''),
            },
            'env_check': 'DEEPSEEK_API_KEY',
        },
        {
            'name': 'ollama',
            'cls': ChatOllama,
            'kwargs': {'model': 'browser-agent'},  # local, no quota
            'env_check': None,
        },
    ]

    for backend in backends:
        name = backend['name']
        if name in exclude:
            continue
        env_key = backend.get('env_check')
        if env_key and not os.environ.get(env_key):
            continue
        try:
            llm = backend['cls'](**backend['kwargs'])
            print(f"LLM backend: {name}")
            return llm, name
        except Exception as e:
            print(f"Backend {name} init failed: {e}")
            continue

    raise RuntimeError("All LLM backends unavailable")

Return both the LLM and the backend name. You'll need the name to skip it on retry.

Making the Retry Loop 429-Aware

The retry loop in most browser-use scripts catches generic exceptions. Add a specific check for rate-limit errors and swap backends instead of just waiting:

async def apply_to_job(url, opp_data, max_retries=3):
    llm, backend_name = _get_llm()
    tried_backends = []

    for attempt in range(max_retries):
        try:
            agent = Agent(task=build_task(url, opp_data), llm=llm, ...)
            result = await agent.run()
            return result

        except Exception as e:
            error_msg = str(e)
            is_rate_limit = any(x in error_msg for x in [
                '429', 'ResourceExhausted', 'quota', 'rate_limit', 'Too Many Requests'
            ])

            if is_rate_limit and attempt < max_retries - 1:
                print(f"Rate limit on {backend_name}, switching backends...")
                tried_backends.append(backend_name)
                try:
                    llm, backend_name = _get_llm(exclude=tried_backends)
                    continue  # retry with new LLM, same attempt count
                except RuntimeError:
                    print("All backends exhausted")
                    raise

            # Non-rate-limit errors: log and retry with same LLM
            if attempt < max_retries - 1:
                await asyncio.sleep(2 ** attempt)
            else:
                raise

The key distinction: rate-limit errors switch backends. Other errors (form not found, selector timeout, agent confusion) retry with the same backend after exponential backoff.

Backend Priority Reasoning

The ordering matters. Here's why this order works:

1. Gemini Flash first. Free tier, high request quota, fast. Gemini 2.5 Flash handles multi-step web reasoning well. The catch: free quota resets daily, so it will eventually 429 too — but that's what the chain is for.

2. OpenRouter second. Paid but reliable. Claude Sonnet via OpenRouter is the gold standard for complex form navigation. Use it as the fallback when Gemini is exhausted, not the primary.

3. DeepSeek third. Cheap and surprisingly capable for structured form-filling tasks. API is less stable than OpenRouter but usually available.

4. Ollama last. Local model, zero quota, never 429s. The catch is context length and capability — a local 8B model will struggle on complex multi-step forms. But it can handle simple checkboxes, dropdowns, and "submit this form" tasks that make up most of the queue. The browser-agent Modelfile with temperature 0.3 and a system prompt tuned for web interaction makes a significant difference over a plain base model.

The Gotcha: browser-use Requires Its Own LLM Wrappers

One thing that burned me: browser-use has its own LLM wrapper classes (ChatOpenRouter, ChatOllama, ChatGoogle from browser_use.llm) that are not the same as LangChain's ChatOpenAI. If you're using browser-use's Agent class, you need to use browser-use's wrappers — not LangChain directly.

The common mistake is this:

# Wrong — this is LangChain's ChatOpenAI
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="anthropic/claude-sonnet-4", base_url="https://openrouter.ai/api/v1")

It'll work sometimes (because LangChain classes often satisfy browser-use's interface duck-typing) but it bypasses browser-use's internal retry handling and can cause subtle failures.

Use the right imports:

# Right
from browser_use.llm import ChatOpenRouter, ChatOllama, ChatGoogle, ChatDeepSeek

One More Layer: Email Fallback

If all LLM backends fail and the opportunity has a direct contact email, don't give up — just email the application:

async def apply_to_job(url, opp_data, max_retries=3):
    try:
        # ... above retry loop ...
    except RuntimeError as e:
        if 'backends exhausted' in str(e) and opp_data.get('contact_email'):
            return await email_application_fallback(opp_data)
        raise

async def email_application_fallback(opp_data):
    """When browser automation fails, fall back to direct email."""
    from send_email import send

    result = send(
        to=opp_data['contact_email'],
        subject=f"Application: {opp_data['role']} - Nathan Hamlett",
        body=build_cover_letter(opp_data),
        attachments=[get_resume_path(opp_data)],
    )
    return {'method': 'email', 'status': 'sent', 'to': opp_data['contact_email']}

This is the last line of defense. The queue keeps moving even when the entire LLM layer is down.

Summary

Production browser-use pipelines need three things the tutorials don't cover:

Multi-backend LLM factory with ordered fallback (Gemini → OpenRouter → DeepSeek → Ollama)
429-aware retry loop that switches backends on rate limit errors, not just waits
Non-LLM fallback for when the whole automation layer is unavailable

The pattern takes about 30 minutes to implement. The alternative is waking up to a dead queue and 40 missed applications.

Running a multi-agent job automation system on WSL2/systemd. Using Claude (via OpenClaw) as the orchestration layer. Notes from actually doing this in production.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.