DEV Community

MxGuru
MxGuru

Posted on

Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

Part 2 of the quantization series. Spent two days and $12 hunting for the right localizer after Part 1 showed the per-layer drift metric lies. Both candidates — token-level logit-divergence at wrong tokens, AWQ-clipping on the surfaced layers — came back empty. Honest finding: an 8B model on a 12GB card costs ~12.7% PPL on wikitext-2, the gap is diffuse and proportional, no clever subset-targeted fix closes it. One process habit (a no-op control reproducing the baseline to 4 decimals) caught a silent bug that would have shipped a wrong 'AWQ-clipping wins' claim.

Top comments (0)