Why I Built a Readability Analyzer That Sends Your Text Nowhere

#webdev #javascript #privacy #showdev

Why I Built a Readability Analyzer That Sends Your Text Nowhere

Most productivity tools that analyze your writing send your text to a server. That's true of Grammarly. It's true of most AI writing assistants. And it's worth thinking about, because writers paste a lot of sensitive material into these tools — drafts of internal reports, client work, things under NDA, early chapters of books they haven't published yet.

ProseScore doesn't send your text anywhere. Here's why that was a deliberate choice, and what it took to make it work.

The problem with sending your text to a server

When you paste something into a web-based writing tool, you're implicitly trusting that tool with whatever you wrote. That might be fine for a recipe. It's different for a confidential internal memo, a legal brief draft, or a chapter from a novel you've been working on for two years.

The data minimization argument is simple: if you don't need a server, don't have one. A server is a liability — it's a place where data can be retained, subpoenaed, breached, or sold. Readability analysis doesn't require a server. It requires math. So I didn't build one.

What ProseScore actually does

The entire analysis runs in the browser. All of it — the 8 readability formulas, AFINN-based sentiment scoring, TF-IDF keyword extraction — executes synchronously in a Web Worker. The UI stays responsive even on long documents because the analysis runs off the main thread.

The code path looks roughly like this:

// Everything runs here, client-side
const result = analyzeText(inputText);
// No network call. No tracking. Just math.

There are no fetch() calls anywhere in the analysis path. The only network request the page makes is the initial page load. After that, you can take your browser offline and it keeps working — because it's not waiting for anything.

The Web Worker approach was worth the extra complexity. Without it, analyzing a 5,000-word document would lock the UI for a noticeable fraction of a second. With it, the interface stays interactive while the analysis runs in the background.

Why 8 readability formulas work offline

Not all NLP tasks are created equal. Sentiment analysis at scale usually means embeddings, which means models, which means GPU time on a server somewhere. That's a legitimate reason to send text to a server.

Readability scoring is different. Flesch-Kincaid, Gunning Fog, SMOG, Coleman-Liau, ARI, Dale-Chall, Linsear Write — these are pure algorithmic formulas. They operate on word counts, sentence lengths, and syllable counts. No ML models. No embeddings. No hardware requirements beyond whatever CPU is in the person's laptop.

The irony is that readability scoring is one of the few NLP tasks that was basically designed for offline computation. These formulas were developed in an era before networked computers. They're deterministic. Given the same input, every browser produces the same score.

ProseScore computes all 8 formulas and derives a consensus grade level from the results. Running eight formulas instead of one doesn't meaningfully change the performance profile — they're all operating on the same precomputed word/sentence/syllable counts.

What surprised me

Two things caught me off guard during the build.

The first was syllable counting. It sounds trivial. It isn't. English doesn't have a clean algorithmic rule for syllables — the exceptions are numerous, and naive implementations that count vowel runs get wrong answers constantly. I ended up with a heuristic that handles common patterns and exceptions, and it's good enough for readability scoring, but it's not perfect. Don't use it to settle poetry arguments.

The second was deriving the consensus grade level. Each of the 8 formulas uses a different scale. Flesch-Kincaid outputs a US grade level. Flesch Reading Ease runs 0-100 in the opposite direction (higher = easier). SMOG and Gunning Fog have their own calibrations. To combine them into a single consensus grade, I had to understand what each formula was actually measuring, not just run the equations. That took longer than the implementation itself.

The catch

No server means no persistence. There's no history. No way to compare your score on this draft against the version you edited last Tuesday. No sharing. No team dashboards. If you close the tab, the analysis is gone.

For personal writing and private documents, that's fine — it's actually the point. For teams that want to track readability trends across a documentation repo over time, ProseScore isn't the right tool. That use case requires a server. I'm not pretending otherwise.

The trade-off is intentional: ProseScore does one thing — analyze text you paste into it, right now, without that text leaving your browser. It doesn't try to be everything.

The tool is at prosescore.ckmtools.dev — open source, MIT license. If you're curious about the implementation, the full source is at github.com/ckmtools/prosescore.