DEV Community

Uya
Uya

Posted on • Originally published at zenn.dev

I Went Looking for the Basis of 'N Characters Per Minute Is Fast' — There Wasn't One. Setting Read-Aloud Thresholds Honestly

📝 Originally published in Japanese on Zenn. This is the English version.
Canonical: https://zenn.dev/uya0526_design/articles/satellite3_metrics-rationale

📚 This is satellite article #3 in my "Read-Aloud Speed Meter dev log" series. For the whole picture, see the main article.

Where This Sits

The read-aloud speed meter converts speaking speed into an evaluation label like "slightly fast," and stagnation rate into one like "few." Those labels ultimately become the foundation for Claude Haiku's feedback.

So — on what basis did I draw the thresholds (the dividing lines)? This article digs into that "basis."

The short answer from my research:

I couldn't find a paper that defines an academic threshold for "N characters/min = fast/slow."

This is a record of how I drew the lines honestly once I'd learned there was no firm basis. More than the metric numbers themselves, I believe being transparent about why I chose those numbers is what makes an evaluation app trustworthy.

💡 I'm an ex-Java engineer learning TypeScript in public. This one is mostly about design decisions.


Why Obsess Over the "Basis"?

An evaluation app passes judgment on the user: "your reading is slightly fast." Once you're passing judgment, if you can't explain "why we can say that," it's just guesswork.

This app in particular passes the labels straight to Claude Haiku to generate coaching. If the foundational label has an unclear basis, the feedback built on top of it is a castle on sand. So I decided to nail down the basis for the thresholds first.

Two things to research:

  • The judgment basis for speaking speed (characters/min)
  • The judgment basis for stagnation rate (the proportion of silence)

As it turned out, these two had completely different kinds of basis.


Speed Thresholds: No Academic Threshold → Draw From General Rules of Thumb

What I found

For speaking speed, I first looked for academic backing. Here's what I found:

  • Speaking speed has traditionally been measured against mora count, but prior research from the listener's perspective is itself scarce.
  • Japanese is considered a "fast language" with many syllables per second, yet there's research suggesting that, correlated with information density, the overall information-transfer rate is similar across languages.

But none of this serves as a basis for judging whether an individual's reading is fast or slow. I couldn't find an academic threshold that says "N characters/min is fast."

The basis I adopted: the announcer standard

So I adopted a rule-of-thumb basis. The premise: the speed expected of an announcer is 300 characters/min.

  • The NHK announcer standard is "300 chars/min." It's shared knowledge across program staff and serves as a guide for writing scripts.
  • The trend has been getting faster recently — said to be around 380 chars/min at NHK and 400 at commercial broadcasters.
  • In settings that demand fast delivery, like sports play-by-play, over 400 chars/min is common.

Announcers are in the realm of language professionals. So if an ordinary reader reaches this speed, I positioned that as already on the "slightly fast" side.

Thresholds (adopted version)

Speed label Chars/min
Slow – 149
Slightly slow 150 – 199
Standard 200 – 299
Slightly fast 300 – 350
Fast 351 –

This is where an app-specific design decision comes in. General tables usually put "the NHK standard, ~300 = standard," but this app intentionally shifts that, placing the professional standard of 300 as the start of "slightly fast" for an ordinary person.

The reason is that this app's users are ordinary readers. If I put the professional level at "standard," almost everyone gets a "slow" verdict, and the coaching pushes in a discouraging direction. "If an ordinary person reaches the professional level, that's already on the fast side" is a more natural framing for an encouragement tool.

Mapping into code: These thresholds live as constants in thresholds.ts. The explanation of the numbers' basis (300 = pro standard, etc.) isn't in code comments but in this article and the LEARNING_LOG. In Java terms, it's the same as "extract config values into constants + keep the intent in the design docs."

Writing about it honestly in the article and UI

What I was careful about when presenting the thresholds was not creating the misunderstanding that "Claude is judging on strict academic grounds."

  • Since there's no academic threshold, write honestly that I adopted a rule-of-thumb basis (announcer standard, 300 chars/min).
  • Spell out the intent of the custom placement — putting the pro standard of 300 as the ordinary person's "slightly fast" start.
  • Note that, more rigorously, chars/min is a simplified metric and mora/min is the usual measure.

Rather than dressing it up as "strict science," I figured showing the process — "I drew the lines this way, on this basis" — earns more trust for an evaluation app.


Stagnation-Rate Thresholds: There Isn't Even a "Ruler"

Even less basis than speed

For stagnation rate (the proportion of silence across the whole utterance), the conclusion was even tougher than for speed:

The basis for a numeric threshold of "stagnation under X% is good" is even harder to find than for speed. It's a domain where the "numeric ruler" itself, the kind speed has, simply doesn't exist.

It's also thorny as a general principle. A speaker not taking pauses is not itself regarded as evidence of fluent speech. In fact, research holds that pauses serve important functions:

  • Speech rate (including pauses) and articulation rate (the speaking portion only) are distinct concepts, and the one including pauses correlates more strongly with auditory impression.
  • Pauses don't just mark word/phrase/sentence boundaries — they serve the important function of giving the listener a chance to digest information.

In other words, "fewer pauses = better" doesn't simply hold.

Accepting it as an "in-house metric"

Here I made a design decision. For stagnation rate, I decided to claim no academic or general backing and accept it frankly as an in-house heuristic for this app.

Stagnation label Stagnation rate (% after labelMetrics)
Few – 0.9%
Somewhat few 1.0 – 5.9%
Normal 6.0 – 9.9%
Somewhat many 10.0 – 19.9%
Many 20.0% –

The basis for putting "0% on the good side" is cultural context

Let me get ahead of the question a reader will naturally have: "Why put 0% (speaking without breaks) on the good side?"

The basis isn't academic — it's the cultural context of Japanese. Japanese has an expression, "tate-ita ni mizu" (literally "water down a standing board" — speaking fluently and without interruption), reflecting a disposition to see uninterrupted, flowing speech as "fluent." I drew the line leaning toward that connotation.

That said, this isn't "the correct answer." As noted above, there's research finding that "pauses serve a function of letting the listener digest information," so 0% isn't always good. I judged this to be a domain that isn't academically settled and where people's perceptions differ, and — within a deadline-bound Phase 1 — I didn't chase it deeper; as the designer I adopted the connotation on the "tate-ita ni mizu" side.

⚠️ This metric's limits: As it stands, the stagnation rate does not distinguish Japan's meaningful pauses (ma). A 0.5-second stop that lands appropriately at a punctuation point and a 0.5-second stumble where the reader got stuck are both counted as the same "silence" stagnation. It really ought to be looked at qualitatively, paired with "appropriateness of the pause's position," and this is a research item not yet listed in the README's Phase 2. The app's footer makes this limit explicit too.


The Two Metrics Had Different Kinds of Basis

To summarize, even for the same "evaluation-metric threshold," how I grounded them was completely different:

Metric Academic threshold Basis adopted How I handle it in the article
Speed None Rule of thumb (announcer 300 chars/min) Present an industry-custom value as basis; spell out the custom placement
Stagnation rate None (no ruler at all) App-specific heuristic + cultural context Claim no academic backing; honestly label it an in-house metric

This is the point I most want to land in this article. When you can't find a basis, don't force a "this is scientific" facade. Disclosing the process — "there's no academic threshold, so I drew from a rule of thumb," "I accepted this as an in-house metric" — is, I think, actually stronger for an evaluation app.


What I Decided Myself / What I Asked AI For

Area Detail
My decisions The final thresholds, the custom placement of the announcer standard, the call to treat stagnation as an in-house metric, adopting the "tate-ita ni mizu" context, stating the MVP limits
Asked AI for Presenting tentative boundary tables, helping survey related research and industry-custom values
Verified myself Checking sources, the final judgment on the validity of the basis (not taking AI's research at face value)

The threshold numbers and their basis were all decided by me. AI helped with the survey and tentative drafts, but "which basis to adopt and how to place it" is the designer's judgment.


Wrapping Up

This was a summary of "on what basis" I drew the read-aloud evaluation thresholds. Three takeaways:

  1. There was no academic threshold for speed, so I grounded it in a rule of thumb (announcer 300 chars/min) and custom-placed the pro standard as the ordinary person's "slightly fast" start.
  2. Stagnation rate has no ruler at all, so I accepted it as an app-specific heuristic; the basis for putting 0% on the good side rests on the cultural context of "tate-ita ni mizu."
  3. When there's no basis, don't dress it up as scientific — disclose the process. That, I believe, is what makes an evaluation app honest.

An evaluation tool's value isn't decided by the precision of the numbers alone. It's whether you can be transparent with the user about how you justified those numbers. I recorded even the dead ends of the basis hunt because that was the thing I most wanted to convey.

The detailed development log is in the repository's LEARNING_LOG.

Main references

Sources I consulted while organizing the basis for this article.

  • Ayumi Marushima, "A Study on the Cognition of Speech Rate," Studies in Linguistics online inaugural issue (2008) [in Japanese]
  • Ayumi Marushima, "A Study on the Tempo of Spoken Language," Studies in Linguistics online no. 2 (2009) [in Japanese]
  • Ayumi Marushima, "An Experimental-Phonetics Study of Speech Rate from the Speaker's and Listener's Perspectives," doctoral dissertation, University of Tsukuba (2015) [in Japanese]
  • Reporting and shared industry knowledge on NHK / commercial-broadcaster announcer speech rates (Diamond Online, etc.)

💡 As this article argues, I believe an academic "speed threshold" does not exist. The above aren't sources for the thresholds themselves, but background knowledge for "how to think about speech rate and pauses." I've checked each against the original.

Next time is the finale — I look back at the "AI-collaborative development" that underpinned all these design decisions → satellite #4, "Reflections on AI-collaborative development & a collection of stumbles" (https://dev.to/uya0526design/not-did-you-use-ai-but-are-you-the-one-driving-reflections-on-building-a-real-product-through-464m).


This article is part of my public learning journey using AI tools (Claude / Cursor). The thresholds and the judgment of their basis are mine. I collaborate with AI on the article's structure, outline, and draft prose, and I review and revise every line before publishing.

Top comments (0)