Deek Roumy

Posted on Mar 23

How I Used a Local LLM (codestral:22b) to Pre-Screen GitHub Bounties for Free

#ai #automation #llm #productivity

Claude Opus is brilliant. It's also not free.

After my AI bounty hunting agent started scanning bounty boards around the clock, I ran into an uncomfortable math problem: analyzing every GitHub issue with Claude Opus costs real money. Most bounties are junk — wrong skill, already claimed, outdated, repo abandoned. I was paying to read bad job postings.

The fix was obvious once I thought about it: use a free local model to do the dirty work first.

This is part 3 of my ongoing AI bounty hunting series (part 1, part 2). Today I'm showing you the triage layer I added between "raw bounty feed" and "call the expensive API."

The Problem With Using One Model for Everything

When a bounty scanner runs at scale, it touches hundreds of issues per day. Each one needs:

Is this actually claimable? (not already assigned, not closed)
What tech stack is required? (matches skills?)
Is the issue description clear enough to code against?
What's the reward-to-complexity ratio?

Running all four of those questions through Claude Opus at $15/million tokens adds up fast. Most issues fail on question 1 or 2 — they're disqualified before we even need nuanced analysis.

That's exactly what a local model is good at: coarse filtering at zero cost.

The Hybrid Architecture

[Bounty Feed] → [codestral:22b local] → [passes triage?]
                                              │
                              ┌───────────────┤
                              ▼               ▼
                         [discard]    [Claude Opus deep analysis]
                                             │
                                             ▼
                                    [prioritized shortlist]

The local model handles the first 80% of the funnel. Claude Opus only sees the bounties worth $50+ that match my skills and haven't been touched by other devs.

In practice, this means Opus processes roughly 1 in 10 bounties instead of all of them.

Setting Up Ollama + codestral:22b

If you don't have Ollama yet:

# macOS (Apple Silicon)
brew install ollama
ollama serve &

# Pull the model (~12GB, so grab a coffee)
ollama pull codestral:22b

Verify it's running:

curl -s http://localhost:11434/api/generate \
  -d '{"model":"codestral:22b","prompt":"Say hello","stream":false}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

Why codestral specifically? It's Mistral's code-focused model. For reading GitHub issues — which are often half technical spec, half bug report — it handles code snippets and stack traces well without hallucinating fake library names. The 22b version runs comfortably on a MacBook Pro M3 with 36GB RAM in about 3-8 seconds per query.

The Triage Script

Here's the actual Python script I use. It pulls issues from a bounty feed and sends them through codestral for pre-screening:

import requests
import json

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "codestral:22b"

TRIAGE_PROMPT = """You are a software bounty triage assistant.

Given a GitHub issue, answer these questions with YES/NO and a brief reason:

1. CLAIMABLE: Is this issue open and available for external contributors?
2. CLEAR: Is the problem description specific enough to code against?
3. MATCH: Does this issue involve Python, TypeScript, or JavaScript?
4. COMPLEXITY: Rate as LOW / MEDIUM / HIGH based on scope described.

Issue title: {title}
Issue body:
{body}

Respond in JSON: {{"claimable": true/false, "clear": true/false, "match": true/false, "complexity": "LOW|MEDIUM|HIGH", "summary": "one sentence"}}"""


def triage_issue(title: str, body: str, reward: int) -> dict:
    prompt = TRIAGE_PROMPT.format(
        title=title,
        body=body[:2000]  # truncate — local model has smaller context
    )

    response = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0.1}  # low temp for consistent JSON
    })

    raw = response.json()["response"].strip()

    # Extract JSON from response (model sometimes wraps it in markdown)
    if "```

" in raw:
        raw = raw.split("

```")[1].replace("json", "").strip()

    result = json.loads(raw)
    result["reward"] = reward
    result["title"] = title
    return result


def should_escalate(triage: dict) -> bool:
    """Decide if this bounty deserves Claude Opus analysis."""
    return (
        triage["claimable"]
        and triage["clear"]
        and triage["match"]
        and triage["complexity"] in ("LOW", "MEDIUM")
        and triage["reward"] >= 50
    )


# Example usage
issues = [
    {
        "title": "Fix race condition in WebSocket reconnect handler",
        "body": "When the WebSocket drops and reconnects within 500ms, the message queue processes duplicate events. Steps to reproduce: ...",
        "reward": 150
    },
    {
        "title": "Update README",
        "body": "The README needs updating.",
        "reward": 25
    },
]

for issue in issues:
    result = triage_issue(issue["title"], issue["body"], issue["reward"])
    print(f"\n{'='*50}")
    print(f"Title: {result['title']}")
    print(f"Summary: {result['summary']}")
    print(f"Escalate to Opus: {should_escalate(result)}")
    print(f"Raw: {json.dumps(result, indent=2)}")

Run it:

python3 triage.py

Output for the race condition issue:

{
  "claimable": true,
  "clear": true,
  "match": true,
  "complexity": "MEDIUM",
  "reward": 150,
  "summary": "Race condition in WebSocket reconnect causing duplicate message processing"
}
Escalate to Opus: True

Output for the README issue:

{
  "claimable": true,
  "clear": false,
  "match": false,
  "complexity": "LOW",
  "reward": 25,
  "summary": "Vague documentation update request with no specifics"
}
Escalate to Opus: False

The README issue never touches Claude. Done.

Integrating With the Bounty Pipeline

In my actual pipeline, this script sits between the Opire API scanner and the Claude analysis step:

#!/bin/bash
# Full pipeline

echo "Step 1: Fetch live bounties from Opire API..."
python3 fetch_bounties.py > /tmp/raw_bounties.json

echo "Step 2: Local triage with codestral..."
python3 triage.py /tmp/raw_bounties.json > /tmp/shortlist.json

COUNT=$(python3 -c "import json; data=json.load(open('/tmp/shortlist.json')); print(len(data))")
TOTAL=$(python3 -c "import json; data=json.load(open('/tmp/raw_bounties.json')); print(len(data))")
echo "Filtered: $COUNT/$TOTAL passed triage"

echo "Step 3: Deep analysis with Claude Opus (shortlist only)..."
python3 analyze_shortlist.py /tmp/shortlist.json

On a typical run over Opire's 30-40 live bounties, roughly 5-8 make it past triage. Claude Opus analyzes those 5-8 instead of all 40.

The Numbers

Before this change, a full analysis pass cost roughly $0.18-0.25 per run (40 issues × ~400 tokens each × Opus pricing). Running this multiple times per day added up.

After adding the local triage layer:

codestral pass: ~$0.00 (local compute)
Opus pass: ~$0.02-0.04 (5-8 issues instead of 40)
Reduction: ~85%

The latency actually improved too. codestral at 3-5 seconds per issue in parallel is faster than waiting for API rate limits and network round-trips.

What Local Models Are Actually Good At

One thing I've learned running both models in the same pipeline: they have different strengths, and that's the point.

codestral:22b is genuinely good at:

Parsing structured info from messy text
Making binary yes/no decisions with clear criteria
Extracting tech stack signals from issue descriptions
Running in tight loops without API throttling

It's not great at:

Nuanced reasoning about architectural tradeoffs
Estimating effort on complex multi-file changes
Generating production-quality code patches

That's fine. Triage doesn't need nuance. It needs speed and cost=0.

Claude Opus gets the interesting cases — the ones worth $100+ that require actual thinking. That's exactly how you'd use a senior engineer: don't have them read every job posting, have them evaluate the ones that already cleared basic screening.

Next Steps

I'm currently working on a feedback loop: when a bounty that passed triage turns out to be a bust (repo dead, issue confusing on closer read), that signal goes back to tune the triage prompt. The local model gets smarter about what "clear enough to code against" actually means for each platform.

The goal is a triage accuracy rate above 80% — meaning 8 out of 10 issues that pass local screening are genuinely worth Opus's time.

If you're running any AI-assisted development workflow and paying for API calls, audit where those calls are going. A local model running on hardware you already own is the most underused tool in the stack.

The code above is intentionally simple — a starting point. Run it, break it, make it fit your stack.

DEV Community