G
Two weeks ago I shipped core/reasoning/budget.py to test whether per-call dynamic token budgets could cut JAMES's reasoning cost by 60-80% on gemma4:e4b. Built as an experiment: A/B sweep, raw JSON, env-flag default-OFF.
The hypothesis flipped.
π― Finding 1 β cap was a ceiling, not the floor.
gemma4:e4b naturally stops well below 4096 on every workload tier. Cap 4096β200/800 produced +0%/+8%/-2% eval_count change, done_reason=stop on every cell, zero quality regression. PR #399's lifted cap was permission to finish, not waste.
Real wins: Latency -17.5%/-7.3% on sub/light tiers (KV-cache sizing); ~20x memory cut on sub; safety bound (cap=200). Ships behind JAMES_ADAPTIVE_BUDGET=1 (default OFF).
π― Finding 2 β 7-tier monotonic natural-stop gradient.
Combined free-form + 4 cognitive middleware stages on the same fixture:
substitution verbatim 62 tokens
light synth e-commerce 235
query_rewriter ~370
planner ~690
reflect ~910
verify ~970
heavy synth 4-step 1681
27x dynamic range, cross-sweep noise <5% per tier. The quantitative form of Robin's "workload gradient is multi-tier monotonic on a single model." Natural-stop length IS the workload measurement.
π― Finding 3 β verify is a high-clustering cognitive stage. Mechanism 2 needs a second axis.
At T=0.2, verify produces only 2-3 unique responses across 20 baseline calls (~12.5%) β stable across two sweeps. Other cognitive stages at same workload tier: 20/20 unique. verify emits structured JSON; answer space is small finite set.
Mechanism 2 (answer convergence) now has two axes: workload weight (sub 1/20 β heavy 20/20) AND task type (structured-JSON clusters independent of workload). Ali's "ceiling vs path" framing extends here cleanly.
π― Process finding β falsification β revision β confirmation.
First cognitive sweep at CAP_LIGHT=800 truncated reflect (926) and verify (984) 19/20 each, quality -40~-75%. Data drove the bump (800β1200); re-sweep PASSed (0/20 truncation, 20/20 quality).
π€ Three-author joint-piece status:
Headline locked (Ali + Robin + JAMES): "Substitution is free. Synthesis costs in proportion to what it has to invent."
New sub-clauses:
β’ "β¦and inversely to parameter count." (Robin, 2 evidence layers)
β’ "β¦and the gradient is multi-tier monotonic β 7 tiers, 27x range." (JAMES)
β’ "β¦and answer convergence has a task-type axis." (JAMES, cross-sweep)
Three stacks: Robin (26b MoE), JAMES (e4b cognitive), Ali (mid-June Gemini).
π Citable archive (Zenodo DOI): https://doi.org/10.5281/zenodo.20363998
π PRs #461 / #463: https://github.com/Hashevolution/James-RAG-Evol/pull/461
@robin Converse @ali Afana β three axes locked.
Top comments (0)