DEV Community

Tabor Bachelor
Tabor Bachelor

Posted on

A Score Nobody Can Act On Is Just Anxiety: Building Relevyn's "What To Fix" Engine

Last time I wrote about querying four different LLMs to measure whether a brand shows up when someone asks AI for a recommendation. That's the easy half of the problem. The harder half: once you know you're invisible, what do you actually do about it?

A score with no next step is just a source of dread. So the engine underneath Relevyn's "Fix This Week" panel turned out to be more interesting to build than the scanner — and it broke in more interesting ways.

Model selection is a latency problem before it's a quality problem

The obvious move: throw the same big model at everything. We started generating each of the three "Fix This Week" content briefs with a larger reasoning model, since it writes better structured recommendations than a smaller one. It also routinely blew past our serverless function's timeout window generating three of them back to back.

Swapping the brief-generation step to a faster, cheaper model fixed the timeouts immediately, and the quality difference didn't actually matter at that stage — a brief is a short, structured recommendation, not the finished asset. We kept the larger model for the one place quality is load-bearing: the actual 700–900 word draft a user downloads and publishes. Not every step in a pipeline deserves the same model.

Truncated JSON doesn't look like an error. It looks like a crash.

Content briefs come back as structured JSON so the frontend can render them into cards. Early on we set a conservative token limit to keep costs down — and started seeing intermittent parsing failures that looked like backend bugs. They weren't. The model was running out of tokens mid-object, so the JSON just stopped, unparseable — closer to a hallucinated syntax error than an honest failure message.

Two fixes: raise the ceiling enough that a full brief never gets cut off, and strip markdown code fences before parsing — because even when a prompt explicitly says "return raw JSON only," the model still wraps it in triple backticks a meaningful fraction of the time anyway. Defensive stripping is now the first line of every parser downstream of an LLM call in this codebase.

Pre-generate before anyone asks

Waiting on three LLM calls the moment a user clicks "view plan" is a bad experience even when nothing goes wrong. So all three content briefs generate during the scan itself and get cached — by the time someone opens the panel, they're reading a stored result, not waiting on one. If the cache is somehow empty, on-demand generation is the fallback, not the default path.

The pattern generalizes past this one feature: if you know a user is going to want something 90% of the time, generate it before they ask instead of after.

The actual point

None of this is about AI writing your content for you. It's that "you're invisible to ChatGPT" is a useless sentence on its own. The only version of this worth building is one that ends in a specific paragraph you could publish this afternoon.

Free to check where you stand: relevyn.com

Top comments (0)