The AI Feature Nobody Asked For (And Why I Built It Anyway)
Here's a thing I've learned after three decades writing code: clients don't always know what they need until they see it working.
Last month, I shipped an AI feature for a client that wasn't in the original spec. Not even close. They wanted a basic chat interface for their support docs — pretty straightforward, right? Feed the LLM their documentation, let customers ask questions, call it a day. Standard RAG pattern. We've all built a dozen of these by now.
But something about this one bugged me.
The Problem They Didn't Mention
Their docs were... fine. Technically accurate. But they'd been written over five years by different people, and you could feel it. Tone shifted. Examples got stale. Some sections were way too detailed while others basically said "figure it out yourself."
I'm watching the LLM try to answer questions, and it's pulling from three different docs that contradict each other. The AI's doing its best to synthesize, but honestly? Garbage in, garbage out.
After 30 years of writing code, I still get surprised by how often the real problem isn't the one you're paid to solve.
What I Built Instead
So I added a layer they didn't ask for — a documentation health monitor. Every time the RAG system pulls a chunk to answer a question, it also tracks:
- How often that chunk gets used
- Whether users ask follow-up questions (sign of confusion)
- If multiple contradicting chunks get pulled together
- Gaps where users ask questions but no good docs exist
Then I built a simple dashboard showing them where their docs were actually failing. Heat maps. Conflict reports. The works.
Took me maybe two extra days. Didn't charge for it.
The Client's Reaction
You know that moment when someone sees something they didn't know they needed? That happened.
They've now got a backlog of doc improvements driven by actual user behavior, not guesses. Their support chat works better because they're fixing the source material. And honestly — this is the part that clicked for me — the AI stopped being a band-aid over bad docs and started being a lens that showed them where to improve.
Here's what I keep telling my team: AI features shouldn't just answer questions. They should surface the questions you should've been asking.
The Engineering Part
Technically, this wasn't rocket science. I'm tracking:
# Simplified version
def track_retrieval(query, chunks, user_satisfaction):
for chunk in chunks:
analytics.log({
'doc_id': chunk.source_id,
'query': query,
'rank': chunk.rank,
'follow_up_needed': user_satisfaction < 0.7,
'timestamp': now()
})
# Check for conflicts
if has_contradicting_chunks(chunks):
alert_doc_team(chunks)
Basic analytics. The magic isn't in the code — it's in what you choose to measure.
I've been building with LLMs for a few years now, and the pattern I keep seeing: the most valuable features aren't the AI outputs. They're the insights you get from watching the AI work.
What This Means for Your Projects
If you're integrating AI into an existing workflow, ask yourself: what could the AI tell you about the system it's plugging into?
- RAG over messy data? Track where the conflicts are.
- Code generation? Log what patterns devs accept vs. reject.
- Content recommendations? Watch what people skip.
The AI's already doing the work. You're just not capturing the metadata.
And here's the thing — clients love this stuff. They hired you to build a chat interface, but you're delivering strategic insights about their business. That's the kind of value that turns one-off projects into long-term relationships.
One Last Thing
You might be thinking: "But won't this add scope creep? Extra maintenance? Technical debt?"
Maybe. Sort of. But I'd rather ship something that actually solves the underlying problem than just tick off the requirements. After three decades, I'm done building exactly what was asked for if I can see it won't work.
Your mileage may vary. But I sleep better knowing I built something useful.
Have you ever shipped an AI feature that solved a problem the client didn't know they had? I'd love to hear about it in the comments.
Top comments (1)
"The AI stopped being a band-aid over bad docs and started being a lens" — that framing applies equally well to prompts. Most teams treat prompt failures as random outputs to tweak, but a structured prompt makes the failure mode obvious: was it the role definition? The constraints? The output format? The examples? Each block fails differently.
Your doc health monitor is essentially a structured lens over unstructured data. The same principle works on the prompt side. flompt.dev / github.com/Nyrok/flompt — built for systematic prompt iteration rather than random tweaking.