DEV Community

JEONSEWON
JEONSEWON

Posted on

"Why don't you just use a better embedder?" — the most reasonable suggestion I keep refusing

Every time I show the E3 finding — that my semantic layer barely separates genuinely-distinct outputs on real, same-topic data — someone gives the same sensible advice: just swap in a stronger embedding model. It's a good instinct. I keep saying no, and the reason is more interesting than the suggestion.

The semantic layer's job is to confirm whether two outputs are really redundant. On synthetic data it worked. On real same-topic text, outputs share so much vocabulary that almost everything scores "similar," and my threshold (φ=0.514) can't tell waste from normal progress. So yes — a different embedder might separate them better.

Here's the trap. I have exactly three real pairs that expose this. If I start swapping embedders and tuning until those three separate cleanly, I haven't fixed the detector — I've fitted it to three examples. The number would look great and mean nothing, because I'd have chosen the instrument after seeing what makes the result pretty. That is the precise mistake that killed my first project: a signal that looked strong because it was quietly shaped to the data in front of me.

Freezing the embedding model isn't stubbornness about this particular model. It's refusing to change the ruler after seeing the measurement. The honest fix for E3 isn't a better embedder chosen against three examples — it's real traces across many topics and domains, enough to see the actual distribution, and then a pre-registered recalibration I commit to before looking.

Which is why the bottleneck isn't a model on Hugging Face. It's traces from people running real multi-agent systems. The better embedder might genuinely be the answer — but I only get to find that out honestly once, so I'm not spending it on n=3.
Code, the E3 log, and the frozen params:
github.com/JEONSEWON/Clew-by-Custos

Top comments (0)