DEV Community

Cover image for The App Store's silent giants: AI assistants reply to almost none of their reviewers
Neelagiri65
Neelagiri65

Posted on • Originally published at nativerse-ventures.com

The App Store's silent giants: AI assistants reply to almost none of their reviewers

An App Store rating looks like a verdict. It behaves more like a monument, built over years and slow to move. It says very little about how this month's users feel.

I took the 12 most-rated Productivity apps on the US App Store, 32 million ratings between them, and split the headline star into the two numbers it hides: how far recent sentiment has fallen below the lifetime average, and whether the developer replies when users complain.

How it is measured

  • Population truth. Lifetime ratings and the star histogram come from Apple's full ratings data, every rating an app has ever received.
  • Recent sentiment. A fixed window of the most recent reviews by date, so an app captured to a depth of thousands is not compared on a multi-year average against an app with a few hundred. Same window for everyone.
  • Developer response. Reply share and median latency over that recent window.
  • Complaints are bucketed with a rule-based taxonomy. It is a heuristic, not a trained classifier, and I treat it as one.

What turned up

The AI assistants now own this chart, and they reply to almost no one.

App Lifetime Recent Reply share
ChatGPT 4.8 4.18 0%
Claude 4.7 3.06 0%
Grok 4.9 3.77 0%
Perplexity 4.8 3.60 0%
Google Gemini 4.7 3.65 13%
Dropbox 4.8 2.75 58%
Gmail 4.7 2.40 26%
Google Drive 4.8 3.90 23%
Microsoft Authenticator 4.7 2.18 1%

The older tools are the ones still in the trenches: Dropbox answers 58% of recent reviewers, Gmail 26%, Drive 23%. The steepest recent drops belong to Microsoft Authenticator (4.7 to 2.18), Gmail (4.7 to 2.40) and Dropbox (4.8 to 2.75).

Plotted on two axes, backlash against response, every app falls into one of four archetypes: Firefighters, Ghost Ships, Complacent Giants and Resilient Leaders. Eight of the twelve are Ghost Ships, taking a recent hit in near silence.

The honest limits

Recent reviewers self-select toward the dissatisfied. A person who hits a bug is far more likely to leave a review than a contented one, so a low recent average blends genuine decline with that bias, and this data cannot cleanly separate the two. I tie no drop to a specific app release, because the version data is too sparse to support that claim. The lifetime figure is population truth; the recent figure is a biased sample; I never present one as the other.

The full interactive Friction Matrix, the per-app complaint archetypes, and the method in detail are here: https://nativerse-ventures.com/productivity-friction-matrix

Independent research from the Nativerse lab. Figures are public App Store data, cited, not invented.

Top comments (2)

Collapse
 
nazar_boyko profile image
Nazar Boyko

Splitting one star into two numbers is a smart frame, since a lifetime average barely moves and tells you almost nothing about how this month feels. What I liked most is that you flagged the selection bias yourself. People who hit a bug rush to leave a review while happy users stay quiet, so a low recent score is part real decline and part who bothered to show up. Keeping the reply rate as its own axis is what makes the four buckets actually useful, because silence is a choice the team made, not noise in the data.

Collapse
 
cwsaibuddy profile image
AI Buddy

The 0% reply rate for AI assistants matches what I've seen building a small Chrome extension. Once you cross a few thousand users, the volume of "doesn't work" reviews outpaces anything you can reply to manually — and most of those reviews are actually the same three problems worded differently.

The Dropbox / Gmail / Drive number makes sense because those products have ticket systems and SLAs, so replying to a review is a side-effect of a process that exists for other reasons. AI assistants don't have that — every reply is a one-off, and it's easy to let the queue rot.

I'd push back slightly on "Ghost Ships" framing though. For AI assistants specifically, the failure mode is usually not silence but copy-paste support replies that don't address what the user actually said. A "thanks for the feedback, we'll look into it" reply counts toward the metric but doesn't fix anything. Hard to capture in a dashboard, but it's worth distinguishing.