Claude Opus is brilliant. It's also not free.
After my AI bounty hunting agent started scanning bounty boards around the clock, I ran into an uncomfortable math problem: analyzing every GitHub issue with Claude Opus costs real money. Most bounties are junk — wrong skill, already claimed, outdated, repo abandoned. I was paying to read bad job postings.
The fix was obvious once I thought about it: use a free local model to do the dirty work first.
This is part 3 of my ongoing AI bounty hunting series (part 1, part 2). Today I'm showing you the triage layer I added between "raw bounty feed" and "call the expensive API."
The Problem With Using One Model for Everything
When a bounty scanner runs at scale, it touches hundreds of issues per day. Each one needs:
- Is this actually claimable? (not already assigned, not closed)
- What tech stack is required? (matches skills?)
- Is the issue description clear enough to code against?
- What's the reward-to-complexity ratio?
Running all four of those questions through Claude Opus at $15/million tokens adds up fast. Most issues fail on question 1 or 2 — they're disqualified before we even need nuanced analysis.
That's exactly what a local model is good at: coarse filtering at zero cost.
The Hybrid Architecture
[Bounty Feed] → [codestral:22b local] → [passes triage?]
│
┌───────────────┤
▼ ▼
[discard] [Claude Opus deep analysis]
│
▼
[prioritized shortlist]
The local model handles the first 80% of the funnel. Claude Opus only sees the bounties worth $50+ that match my skills and haven't been touched by other devs.
In practice, this means Opus processes roughly 1 in 10 bounties instead of all of them.
Setting Up Ollama + codestral:22b
If you don't have Ollama yet:
# macOS (Apple Silicon)
brew install ollama
ollama serve &
# Pull the model (~12GB, so grab a coffee)
ollama pull codestral:22b
Verify it's running:
curl -s http://localhost:11434/api/generate \
-d '{"model":"codestral:22b","prompt":"Say hello","stream":false}' \
| python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
Why codestral specifically? It's Mistral's code-focused model. For reading GitHub issues — which are often half technical spec, half bug report — it handles code snippets and stack traces well without hallucinating fake library names. The 22b version runs comfortably on a MacBook Pro M3 with 36GB RAM in about 3-8 seconds per query.
The Triage Script
Here's the actual Python script I use. It pulls issues from a bounty feed and sends them through codestral for pre-screening:
import requests
import json
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "codestral:22b"
TRIAGE_PROMPT = """You are a software bounty triage assistant.
Given a GitHub issue, answer these questions with YES/NO and a brief reason:
1. CLAIMABLE: Is this issue open and available for external contributors?
2. CLEAR: Is the problem description specific enough to code against?
3. MATCH: Does this issue involve Python, TypeScript, or JavaScript?
4. COMPLEXITY: Rate as LOW / MEDIUM / HIGH based on scope described.
Issue title: {title}
Issue body:
{body}
Respond in JSON: {{"claimable": true/false, "clear": true/false, "match": true/false, "complexity": "LOW|MEDIUM|HIGH", "summary": "one sentence"}}"""
def triage_issue(title: str, body: str, reward: int) -> dict:
prompt = TRIAGE_PROMPT.format(
title=title,
body=body[:2000] # truncate — local model has smaller context
)
response = requests.post(OLLAMA_URL, json={
"model": MODEL,
"prompt": prompt,
"stream": False,
"options": {"temperature": 0.1} # low temp for consistent JSON
})
raw = response.json()["response"].strip()
# Extract JSON from response (model sometimes wraps it in markdown)
if "```
" in raw:
raw = raw.split("
```")[1].replace("json", "").strip()
result = json.loads(raw)
result["reward"] = reward
result["title"] = title
return result
def should_escalate(triage: dict) -> bool:
"""Decide if this bounty deserves Claude Opus analysis."""
return (
triage["claimable"]
and triage["clear"]
and triage["match"]
and triage["complexity"] in ("LOW", "MEDIUM")
and triage["reward"] >= 50
)
# Example usage
issues = [
{
"title": "Fix race condition in WebSocket reconnect handler",
"body": "When the WebSocket drops and reconnects within 500ms, the message queue processes duplicate events. Steps to reproduce: ...",
"reward": 150
},
{
"title": "Update README",
"body": "The README needs updating.",
"reward": 25
},
]
for issue in issues:
result = triage_issue(issue["title"], issue["body"], issue["reward"])
print(f"\n{'='*50}")
print(f"Title: {result['title']}")
print(f"Summary: {result['summary']}")
print(f"Escalate to Opus: {should_escalate(result)}")
print(f"Raw: {json.dumps(result, indent=2)}")
Run it:
python3 triage.py
Output for the race condition issue:
{
"claimable": true,
"clear": true,
"match": true,
"complexity": "MEDIUM",
"reward": 150,
"summary": "Race condition in WebSocket reconnect causing duplicate message processing"
}
Escalate to Opus: True
Output for the README issue:
{
"claimable": true,
"clear": false,
"match": false,
"complexity": "LOW",
"reward": 25,
"summary": "Vague documentation update request with no specifics"
}
Escalate to Opus: False
The README issue never touches Claude. Done.
Integrating With the Bounty Pipeline
In my actual pipeline, this script sits between the Opire API scanner and the Claude analysis step:
#!/bin/bash
# Full pipeline
echo "Step 1: Fetch live bounties from Opire API..."
python3 fetch_bounties.py > /tmp/raw_bounties.json
echo "Step 2: Local triage with codestral..."
python3 triage.py /tmp/raw_bounties.json > /tmp/shortlist.json
COUNT=$(python3 -c "import json; data=json.load(open('/tmp/shortlist.json')); print(len(data))")
TOTAL=$(python3 -c "import json; data=json.load(open('/tmp/raw_bounties.json')); print(len(data))")
echo "Filtered: $COUNT/$TOTAL passed triage"
echo "Step 3: Deep analysis with Claude Opus (shortlist only)..."
python3 analyze_shortlist.py /tmp/shortlist.json
On a typical run over Opire's 30-40 live bounties, roughly 5-8 make it past triage. Claude Opus analyzes those 5-8 instead of all 40.
The Numbers
Before this change, a full analysis pass cost roughly $0.18-0.25 per run (40 issues × ~400 tokens each × Opus pricing). Running this multiple times per day added up.
After adding the local triage layer:
- codestral pass: ~$0.00 (local compute)
- Opus pass: ~$0.02-0.04 (5-8 issues instead of 40)
- Reduction: ~85%
The latency actually improved too. codestral at 3-5 seconds per issue in parallel is faster than waiting for API rate limits and network round-trips.
What Local Models Are Actually Good At
One thing I've learned running both models in the same pipeline: they have different strengths, and that's the point.
codestral:22b is genuinely good at:
- Parsing structured info from messy text
- Making binary yes/no decisions with clear criteria
- Extracting tech stack signals from issue descriptions
- Running in tight loops without API throttling
It's not great at:
- Nuanced reasoning about architectural tradeoffs
- Estimating effort on complex multi-file changes
- Generating production-quality code patches
That's fine. Triage doesn't need nuance. It needs speed and cost=0.
Claude Opus gets the interesting cases — the ones worth $100+ that require actual thinking. That's exactly how you'd use a senior engineer: don't have them read every job posting, have them evaluate the ones that already cleared basic screening.
Next Steps
I'm currently working on a feedback loop: when a bounty that passed triage turns out to be a bust (repo dead, issue confusing on closer read), that signal goes back to tune the triage prompt. The local model gets smarter about what "clear enough to code against" actually means for each platform.
The goal is a triage accuracy rate above 80% — meaning 8 out of 10 issues that pass local screening are genuinely worth Opus's time.
If you're running any AI-assisted development workflow and paying for API calls, audit where those calls are going. A local model running on hardware you already own is the most underused tool in the stack.
The code above is intentionally simple — a starting point. Run it, break it, make it fit your stack.
Top comments (0)