DEV Community

Cover image for 83k tokens to fix a few tests!? No thanks
Chris Kilner
Chris Kilner

Posted on

83k tokens to fix a few tests!? No thanks

Claude burned 83,000 tokens fixing test failures after a refactor — raw pytest output, coverage noise, ruff warnings, all re-fed every loop.

It worked. But it was absurdly expensive.

The problem isn’t the model — it’s the context.

So I made cq (python-code-quality on PyPI) It runs 10+ quality tools and surfaces exactly one thing at a time.

Minimal context

Instead of dumping everything into the prompt, cq:

  • runs tools in priority order
  • stops at the first failure
  • emits a single, focused fix request
> cq check . -o llm
Enter fullscreen mode Exit fullscreen mode
`src/myproject/utils.py:21`**F841**: Local variable `unused_variable` is assigned to but never used

18:     min_dist = float("inf")
19:     nearest_city = None
20:     for city in cities:
21:         unused_variable = 67
22:         dist = calc_dist(current_city, city)

Please fix only this issue. After fixing, run `cq check . -o llm` to verify.
Enter fullscreen mode Exit fullscreen mode

That’s it. No test logs, no coverage spam, no unrelated warnings.

If the error looks like a caller / callee mismatch, we fetch the callee signature to potentially avoid an extra tool-call.

The minimal loop

smallest complete context → smallest capable model → fewest tool calls → successful edit

Small, focused context means you can use a small, cheap model and get the fix in 1 second. No tool-calling needed (if you edit yourself):

cq check . -o llm | ollama run qwen3:4b --think=false \
  'show a unified diff to correct this code. Add a one line explanation'
Enter fullscreen mode Exit fullscreen mode
--- a/src/myapp/calculator.py
+++ b/src/myapp/calculator.py
@@ -1,5 +1,5 @@
 def evaluate(expression):
-    return eval(expression)
+    import ast
+    return ast.literal_eval(expression)
Enter fullscreen mode Exit fullscreen mode

Explanation: Replaced eval() with ast.literal_eval() to safely evaluate strings as Python literals.

Apply the fix. Run cq again. Repeat.

Or with Claude Code:

cq check . -o llm | claude -p "fix this"
Enter fullscreen mode Exit fullscreen mode

Tool ordering

In -o llm mode, the tools are run sequentially, and we stop at the first error.

In other modes, we run in parralel and cache results for fast re-runs.

cq check .
Enter fullscreen mode Exit fullscreen mode
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ Tool             ┃     Time ┃                    Metric ┃ Score   ┃ Status   ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ compile          │    0.47s │                   compile │ 1.000   │ OK       │
│ ruff             │    0.22s │                      lint │ 1.000   │ OK       │
│ ty               │    0.80s │                type_check │ 1.000   │ OK       │
│ bandit           │    0.53s │                  security │ 1.000   │ OK       │
│ pytest           │    2.11s │                     tests │ 1.000   │ OK       │
│ radon-cc         │    0.34s │                simplicity │ 0.982   │ OK       │
│ radon-mi         │    0.41s │           maintainability │ 0.848   │ OK       │
│ radon-hal        │    0.36s │             file_bug_free │ 0.810   │ OK       │
│ radon-hal        │          │            file_smallness │ 0.655   │ OK       │
│ radon-hal        │          │        functions_bug_free │ 0.808   │ OK       │
│ radon-hal        │          │       functions_smallness │ 0.808   │ OK       │
│ vulture          │    0.37s │                 dead_code │ 1.000   │ OK       │
│ interrogate      │    0.38s │              doc_coverage │ 0.853   │ OK       │
│                  │          │                     Score │ 0.945   │          │
└──────────────────┴──────────┴───────────────────────────┴─────────┴──────────┘
Enter fullscreen mode Exit fullscreen mode

Claude Code stop hook

If you want to auto-run, add a hook to your project's .claude/settings.json:

{
  "hooks": {
    "Stop": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "cq check . -o score && echo 'CQ: all clear' || cq check . -o llm; true"
      }]
    }]
  }
}
Enter fullscreen mode Exit fullscreen mode
  • pass → tiny output
  • fail → targeted fix prompt
  • loop continues with minimal context

For manual use, create .claude/commands/cq-fix.md:

$(cq check . -o llm)
Enter fullscreen mode Exit fullscreen mode

/cq-fix embeds the live output directly into the prompt.

Install

uv tool install python-code-quality
Enter fullscreen mode Exit fullscreen mode

Help

cq check --help

 Usage: cq check [OPTIONS] [PATH]                                                                                                                                                                                                                                                                                                                                            

 Feed the results from 11+ code quality tools to an LLM. Try: cq check . -o llm

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│   path      [PATH]  Path to Python file or project directory [default: .]                                                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --output       -o      [table|score|json|llm|raw]  Output mode: table (default), score, json, llm                                   │
│ --log-level            TEXT                        Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) [default: CRITICAL]        │
│ --clear-cache                                      Clear cached tool results before running                                         │
│ --workers              INTEGER                     Max parallel workers (default: one per tool, use 1 for sequential) [default: 0]  │
│ --language     -l      TEXT                        Override language detection (e.g. python, typescript, rust) # FUTURE             │--only                 TEXT                        Comma-separated tool IDs to run (e.g. ruff,ty,pytest)                            │
│ --skip                 TEXT                        Comma-separated tool IDs to skip (e.g. bandit,vulture)                           │
│ --exclude              TEXT                        Comma-separated paths to exclude (e.g. demo,docs)                                │
│ --help                                             Show this message and exit.                                                      │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Enter fullscreen mode Exit fullscreen mode

Notes

  • Python only (for now), but the approach generalizes
  • No agent/tool orchestration required — just a shell pipeline
  • Works with local models or hosted ones

Repo: github.com/rhiza-fr/py-cq — MIT, actively maintained.

Enjoy!

Top comments (0)