Free LLMs on OpenRouter Keep Going 404. I Fixed It With 120 Lines of Python

#python #llm #tutorial #openclaw

I built a small pipeline on OpenClaw to stay on top of 3D printing news.

Nothing fancy — a Python script that pulls from YouTube, RSS feeds, and Reddit, uses a free LLM to summarize what's worth reading, and emails me a digest. I use OpenRouter's free tier because I'm cheap and the models are good enough for summarization.

It worked great. For about two weeks.

Then I started getting errors.

The problem nobody talks about

Here's something I didn't fully appreciate until it bit me: free models on OpenRouter change constantly. Models get added, removed, rate-limited into uselessness, or quietly replaced with different versions. If you hardcode your model list — which every tutorial tells you to do — you're building on sand.

One morning I woke up to this:

[06:03] LLM HTTP 404 [openai/gpt-oss-120b:free]: model not found
[06:03] LLM HTTP 429 [nousresearch/hermes-3-llama-3.1-405b:free]: rate limited
[06:03] LLM HTTP 404 [mistralai/mistral-small-3.1-24b-instruct:free]: model not found
[06:03] All free models exhausted — returning empty

Three of my six hardcoded models were dead. The pipeline silently produced nothing. I missed a week of content before I noticed.

Hardcoded lists are technical debt. Free model availability is a moving target. These two facts collide badly.

The fix: treat the model list as a live data source

OpenRouter has a public endpoint — no auth required — that returns their full model catalog:

GET https://openrouter.ai/api/v1/models

It returns ~346 models right now. Filtering to free ones with decent context windows gives you 10-15 candidates. The question is: which ones are actually worth using?

I wanted to rank them. My criteria:

Context window — longer is better for summarization. A 262K context model can swallow an entire article thread without chunking.
Model size — bigger models write better. A 70B model beats a 7B model for prose quality.
Historical reliability — has this model actually worked when I've called it before?

That last one is the one nobody tracks. So I built tracking.

model-registry.py — the discovery layer

The registry script runs once every 6 hours. It:

Checks if the cache (~/.openclaw/free-models.json) is fresh — if yes, exits in <100ms (just a file stat)
If stale, hits the OpenRouter catalog and scores every free model:

def score_model(model_id, context_length):
    context_score = min(context_length / 1000, 200)  # caps at 200
    size_score = get_size_score(model_id)             # regex: 405b=200, 70b=140, 8b=50...
    return context_score + size_score

Takes the top 10, writes them to free-models.json
Logs a diff — "Added: X, Removed: Y since last run"

The diff log is where it gets interesting. On my first run after building this, I discovered two models I'd never heard of that scored in my top 6. One of them — qwen/qwen3-next-80b-a3b-instruct:free — has a 262K context window and an 80B parameter count. It's now my primary model. It wasn't in any tutorial I'd read.

model-metrics.py — the performance layer

HTTP 200 doesn't mean the model was useful. A model can return 200 with three sentences of hallucinated nonsense that breaks your JSON parser downstream.

So I added tracking at two levels:

Level 1 — HTTP success:

t0 = time.time()
try:
    resp = urllib.request.urlopen(req, timeout=90)
    content = resp.read()...
    record_metric(model_id, task, success=True,
                 latency_ms=int((time.time()-t0)*1000),
                 output_len=len(content))
except urllib.error.HTTPError as e:
    record_metric(model_id, task, success=False,
                 latency_ms=..., error_code=str(e.code))

Level 2 — parse success (parse_ok):

After every call that expects structured JSON, I record whether the downstream parsing succeeded:

response = call_free_llm(prompt, task="claim_extraction")
try:
    data = json.loads(response)
    update_parse_ok(True)   # output was actually usable
    return data
except json.JSONDecodeError:
    update_parse_ok(False)  # model returned garbage

parse_ok is the metric I care about most. It answers: was this model actually useful, not just technically responsive?

After a week of pipeline runs, I get a table like this:

Model                                      calls  ok%  p_ok%  avg_ms  errors
meta-llama/llama-3.3-70b-instruct:free       47   94%   88%   1240ms
qwen/qwen3-next-80b-a3b-instruct:free        31   97%   91%   1180ms
openai/gpt-oss-120b:free                     12   58%   42%   1890ms  5×404
nousresearch/hermes-3-llama-3.1-405b:free    8    62%   55%   2100ms  3×404

The last two models look fine on paper (they're large, they have long context) but they're dying constantly. Their scores get penalized:

def score_penalty(stats_entry):
    ok = stats_entry["ok_pct"]
    if ok < 50: return 0.3   # heavy penalty
    if ok < 70: return 0.7
    if ok < 85: return 0.9
    return 1.0               # no penalty

final_score = catalog_score * score_penalty(historical_stats)

When the registry next refreshes, those models sink to the bottom of the fallback chain. Automatically. Without me touching anything.

The result

The pipeline now:

Discovers new free models within 6 hours of them appearing on OpenRouter
Drops dead models from the rotation within one pipeline run
Prioritizes models with proven parse reliability, not just raw specs
Costs $0.00 extra — one public HTTP GET every 6 hours

The whole thing is ~250 lines across two files. No pip dependencies for the registry itself (stdlib only — json, urllib, sqlite3). The metrics use SQLite so they survive reboots and redeploys.

Grab the code

model-registry.py and model-metrics.py — both standalone, drop them next to any script that calls OpenRouter:

# Replace your hardcoded list with this:
REGISTRY_PATH = Path.home() / ".openclaw" / "free-models.json"
_FALLBACK = ["meta-llama/llama-3.3-70b-instruct:free", ...]

def load_free_models():
    try:
        data = json.loads(REGISTRY_PATH.read_text())
        models = [m["id"] for m in data.get("models", [])]
        if len(models) >= 2:
            return models
    except Exception:
        pass
    return list(_FALLBACK)

FREE_MODELS = load_free_models()

Run the registry as a preflight step before any pipeline that uses free models. If the cache is fresh, it exits immediately. If it's stale, it updates in ~1 second.

python3 model-registry.py --max-age 21600   # refresh if >6h old
python3 your-pipeline.py                     # now uses fresh model list

The thing I keep thinking about: I built this to find 3D printing news: the RepRap machines that print their own parts. Then foraging for news made me realize I needed this algorithm. Now the algorithm helps me find better news about the Van Neuman probe itself. It's turtles all the way down — but at least they're free turtles.

Full code on GitHub Gist: