DEV Community

pickuma
pickuma

Posted on • Originally published at pickuma.com

Turning Support Tickets Into Product Insight With AI

Your support queue already knows what's wrong with your product. Every refund request, every "how do I..." email, every angry thread is a data point about where the experience breaks. The problem is that this signal lives in a format nobody can analyze: thousands of one-off conversations, each handled and closed in isolation. By the time a pattern is obvious enough for a human to notice it unaided, you've usually shipped the same broken flow to a few more cohorts of users.

We spent a week running a backlog of roughly 2,400 closed tickets through an LLM pipeline to see how much of that gap is closeable with off-the-shelf tooling. The short version: the clustering and summarization are genuinely useful, the tagging needs a human in the loop, and the part that actually changes your roadmap has nothing to do with the model at all.

Why raw tickets resist analysis

The obvious move is to slap categories on tickets and count them. Most help desks already support this, and most teams already ignore the output, because manual tagging fails in two predictable ways.

First, agents tag for triage speed, not analysis. A ticket about a failed export gets filed under "Billing" because that's the team it was routed to, not because billing is the root cause. Second, the taxonomy is frozen at the moment you wrote it. A category list built in 2024 has no bucket for the bug you introduced last Tuesday, so the fastest-growing problem in your queue is invisible — it's smeared across "Other" and "General."

LLMs help here precisely because they don't need a fixed taxonomy up front. You can embed each ticket, cluster the embeddings, and let the groupings emerge from the actual language users used. In our run, this surfaced a cluster of 60-some tickets that all described the same single-sign-on timeout in different words — "keeps logging me out," "session expired," "have to sign in twice." No human-authored category would have caught all three phrasings, and the volume was high enough to justify a fix that had been sitting unprioritized for months.

Support tickets are some of the most PII-dense text your company holds: names, emails, billing details, sometimes screenshots of private data. Before piping them into any model, strip or redact identifiers and confirm the provider's data-retention terms. Using an API tier that excludes your data from training is the floor, not the ceiling — check whether your support tool's terms even permit export to a third-party model in the first place.

A pipeline that actually produces insight

The workflow that held up across our test has four stages. None of them is exotic, and the value is in the sequence, not any single step.

1. Normalize and redact. Pull the ticket body and the first customer message (skip the agent replies — they add noise and length). Run a redaction pass to remove emails, names, and order numbers. This both protects users and stops the model from clustering on irrelevant tokens like specific account IDs.

2. Embed and cluster. Generate an embedding per ticket and group them. We used cosine similarity with a clustering pass and got coherent groups at around 40 clusters for 2,400 tickets — small enough to review, large enough to separate distinct problems. This is the step that replaces the frozen taxonomy.

3. Summarize each cluster, not each ticket. This is the move most teams miss. Don't ask the model to summarize 2,400 tickets one at a time — that just gives you 2,400 summaries you still can't read. Feed it a sample of 15–20 tickets from a single cluster and ask for the shared underlying issue, the variations in how users describe it, and the apparent severity. One paragraph per cluster is something a product manager will actually read on a Monday morning.

4. Quantify and rank. Attach the raw count, the time trend (is this cluster growing?), and any available revenue or plan-tier data to each summary. "42 tickets, up 3x month-over-month, 60% from paid accounts" is a roadmap input. "Users are confused about exports" is not.

The tagging caveat is worth stating plainly: when we let the model assign clusters to a fixed product taxonomy automatically, it was confidently wrong often enough that we stopped trusting the auto-labels. The reliable pattern was model-proposes, human-confirms — the model drafts the cluster label and severity, a person spends ten minutes a week sanity-checking the top ten clusters. That review is cheap because you're reading 40 summaries, not 2,400 tickets.

Run the pipeline on a rolling 30-day window rather than your entire history. A one-time analysis of all tickets tells you what was broken on average over two years. A monthly window tells you what broke last week — which is the thing you can still act on before it spreads.

Closing the loop so insight reaches the roadmap

A cluster summary that lives in a notebook nobody opens is the same dead signal as the original ticket queue, just compressed. The step that makes this pay off is routing the output into wherever your product decisions actually get made.

The pattern that worked: each monthly run writes its ranked clusters into a shared database — one row per theme, with the count, trend, severity, and a link back to representative tickets. Product reviews that table the same way they review any other backlog input, and a theme that recurs for three months with rising volume becomes a roadmap item with evidence attached. The link back to real tickets matters: it lets an engineer read five actual user messages before deciding how to fix the thing, which beats acting on a model's paraphrase alone.

The honest limit here is that the AI does the part that was always tedious — reading and grouping thousands of messages — but not the part that was always hard: deciding which problem is worth fixing. The model will happily rank a high-volume cluster of low-stakes complaints above a small cluster of churning enterprise accounts. Volume is an input to that judgment, not a substitute for it. Keep a human deciding what matters, and let the pipeline make sure they're deciding with the full picture instead of whatever ten tickets happened to land in their inbox this week.


Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Top comments (0)