Neelagiri65

Posted on Jun 21 • Originally published at nativerse-ventures.com

The App Store's silent giants: AI assistants reply to almost none of their reviewers

#ios #appstore #ai #datascience

An App Store rating looks like a verdict. It behaves more like a monument, built over years and slow to move. It says very little about how this month's users feel.

I took the 12 most-rated Productivity apps on the US App Store, 32 million ratings between them, and split the headline star into the two numbers it hides: how far recent sentiment has fallen below the lifetime average, and whether the developer replies when users complain.

How it is measured

Population truth. Lifetime ratings and the star histogram come from Apple's full ratings data, every rating an app has ever received.
Recent sentiment. A fixed window of the most recent reviews by date, so an app captured to a depth of thousands is not compared on a multi-year average against an app with a few hundred. Same window for everyone.
Developer response. Reply share and median latency over that recent window.
Complaints are bucketed with a rule-based taxonomy. It is a heuristic, not a trained classifier, and I treat it as one.

What turned up

The AI assistants now own this chart, and they reply to almost no one.

App	Lifetime	Recent	Reply share
ChatGPT	4.8	4.18	0%
Claude	4.7	3.06	0%
Grok	4.9	3.77	0%
Perplexity	4.8	3.60	0%
Google Gemini	4.7	3.65	13%
Dropbox	4.8	2.75	58%
Gmail	4.7	2.40	26%
Google Drive	4.8	3.90	23%
Microsoft Authenticator	4.7	2.18	1%

The older tools are the ones still in the trenches: Dropbox answers 58% of recent reviewers, Gmail 26%, Drive 23%. The steepest recent drops belong to Microsoft Authenticator (4.7 to 2.18), Gmail (4.7 to 2.40) and Dropbox (4.8 to 2.75).

Plotted on two axes, backlash against response, every app falls into one of four archetypes: Firefighters, Ghost Ships, Complacent Giants and Resilient Leaders. Eight of the twelve are Ghost Ships, taking a recent hit in near silence.

The honest limits

Recent reviewers self-select toward the dissatisfied. A person who hits a bug is far more likely to leave a review than a contented one, so a low recent average blends genuine decline with that bias, and this data cannot cleanly separate the two. I tie no drop to a specific app release, because the version data is too sparse to support that claim. The lifetime figure is population truth; the recent figure is a biased sample; I never present one as the other.

The full interactive Friction Matrix, the per-app complaint archetypes, and the method in detail are here: https://nativerse-ventures.com/productivity-friction-matrix

Independent research from the Nativerse lab. Figures are public App Store data, cited, not invented.

Top comments (5)

AI Buddy • Jun 21

The 0% reply rate for AI assistants matches what I've seen building a small Chrome extension. Once you cross a few thousand users, the volume of "doesn't work" reviews outpaces anything you can reply to manually — and most of those reviews are actually the same three problems worded differently.

The Dropbox / Gmail / Drive number makes sense because those products have ticket systems and SLAs, so replying to a review is a side-effect of a process that exists for other reasons. AI assistants don't have that — every reply is a one-off, and it's easy to let the queue rot.

I'd push back slightly on "Ghost Ships" framing though. For AI assistants specifically, the failure mode is usually not silence but copy-paste support replies that don't address what the user actually said. A "thanks for the feedback, we'll look into it" reply counts toward the metric but doesn't fix anything. Hard to capture in a dashboard, but it's worth distinguishing.

Neelagiri65 • Jun 22

@cwsaibuddy Genuinely it is a really useful note.. ticket system framing is better than mine. Dropbox, Gmail and Drive reply because a support process already exists and a review is just one more inbox into it. AI assistants have no such queue so every reply is a one off and the backlog wins.

Your Ghost Ships pushback is right and it is the next thing I am building. The current metric only asks whether a reply exists not whether it answers anything. My research already flags templated replies by their shared openings so I will add reply quality split.

The "same three problems worded differently" point is the other half. If a few clusters cover most recent complaints answering the cluster once beats answering every review and that's a fairer bar than raw reply rate. Thanks again for taking the time.

AI Buddy • Jun 22

thanks, this landed. ticket system framing is the one I keep coming back to. same three apps reply rates and it tracks their org chart more than their product maturity. if I had to pick one chart that would predict 0% reply rate it'd be a headcount chart of the company.

the cluster reply bar is the right one to add. raw reply rate is gamed by 'thanks for the feedback' macros and a quality split would expose that. one question: are you flagging templated replies by shared openings, or something more semantic? I would have naively started with shared openings and would love to hear what didn't work before you moved on.

the ghost ships update is genuinely useful. once reply quality is split, a 60% reply rate with 90% templated is worse than 30% with 80% substantive, and that gap is what everyone in this space should be benchmarking on.

Nazar Boyko • Jun 21

Splitting one star into two numbers is a smart frame, since a lifetime average barely moves and tells you almost nothing about how this month feels. What I liked most is that you flagged the selection bias yourself. People who hit a bug rush to leave a review while happy users stay quiet, so a low recent score is part real decline and part who bothered to show up. Keeping the reply rate as its own axis is what makes the four buckets actually useful, because silence is a choice the team made, not noise in the data.

Neelagiri65 • Jun 21

@nazar_boyko thank you for your valuable inputs. would really appreciate if you could give any inputs on what other perspectives would make it more interesting and really useful for you.