ckmtools

Posted on Mar 4 • Originally published at ckmtools.dev

How to Score Text Readability in TypeScript (Zero Dependencies)

#typescript #javascript #nlp #webdev

Your blog post gets 2,000 impressions but only 200 reads. Your documentation has a 70% bounce rate. Your newsletter open rate is fine, but click-through is dismal.

The problem often isn't what you're saying — it's how you're saying it.

Readability research going back to the 1940s has produced formulas that predict whether a given audience will understand a piece of text. Tools like Hemingway Editor expose a few of these, but if you want to automate readability scoring in your own apps, CI pipelines, or content tools, you've been stuck cobbling together multiple packages with overlapping (and conflicting) dependencies.

textlens solves this. It ships 8 readability formulas, sentiment analysis, keyword extraction, and more — all in a single zero-dependency TypeScript package.

Let's walk through everything it can do.

Quick Setup

npm install textlens

The fastest way to get a complete picture of any text:

import { analyze } from 'textlens';

const report = analyze(`
  Climate change poses significant challenges to global agriculture.
  Rising temperatures alter growing seasons and increase drought frequency.
  Farmers must adapt their practices to maintain crop yields.
  New irrigation techniques and drought-resistant varieties offer some solutions.
  However, systemic changes in food production remain necessary.
`);

console.log(report.readability.consensusGrade);  // ~9
console.log(report.sentiment.label);             // 'negative'
console.log(report.readingTime.minutes);         // 1
console.log(report.keywords[0].word);            // 'drought'

analyze() runs every analysis in one call. But you can also import each function individually for lighter usage. Let's break them down.

The 8 Readability Formulas (and When to Use Each)

Every formula takes a different approach to the same question: "How hard is this to read?" Some count syllables, some count characters, some use word lists. Using multiple formulas and averaging them gives a far more reliable score than any single metric.

1. Flesch Reading Ease

The most widely recognized readability score. Higher = easier.

import { readability } from 'textlens';

const text = 'The cat sat on the mat. It was a good day.';
const r = readability(text);

console.log(r.fleschReadingEase.score);
// 116 (Very Easy)
console.log(r.fleschReadingEase.interpretation);
// "Very Easy"

Score ranges:
| Score | Level | Audience |
|-------|-------|----------|
| 90-100 | Very Easy | 5th grader |
| 80-89 | Easy | 6th grader |
| 70-79 | Fairly Easy | 7th grader |
| 60-69 | Standard | 8th-9th grader |
| 50-59 | Fairly Difficult | High school |
| 30-49 | Difficult | College |
| 0-29 | Very Confusing | Graduate |

Best for: General audience content, blog posts, marketing copy.

2. Flesch-Kincaid Grade Level

Same creator as Flesch Reading Ease, but outputs a US grade level instead of a score.

console.log(r.fleschKincaidGrade.grade);
// 2 (2nd grade)
console.log(r.fleschKincaidGrade.interpretation);
// "2nd grade"

Best for: Education content, government documents (the US military uses this to evaluate training manuals).

3. Gunning Fog Index

Penalizes "complex words" — words with 3+ syllables. Developed for business writing.

console.log(r.gunningFog.grade);
// Grade level — ideal target: 7-8 for public content

Best for: Business communications, corporate docs. A Fog index above 12 means most readers will struggle.

4. Coleman-Liau Index

Unique approach: uses character counts instead of syllables. This makes it more reliable for technical writing where unusual words might trip up syllable-counting algorithms.

console.log(r.colemanLiau.grade);

Best for: Technical documentation, academic papers, any content with domain-specific terminology.

5. Automated Readability Index (ARI)

The fastest formula — uses only character and word counts. Originally designed for real-time monitoring on typewriters.

console.log(r.automatedReadability.grade);

Best for: Real-time analysis, large-scale content processing where speed matters.

6. SMOG Index

"Simple Measure of Gobbledygook." Specifically designed for healthcare communications.

console.log(r.smog.grade);
// Note: most accurate with 30+ sentences

Best for: Medical/health content, patient-facing documents. The gold standard in health literacy research.

7. Dale-Chall Readability Score

Instead of counting syllables, this formula checks words against a list of ~3,000 words that most 4th graders know. Any word NOT on that list is "difficult."

console.log(r.daleChall.score);
console.log(r.daleChall.interpretation);
// e.g., "9th-10th grade"

Best for: Content aimed at general audiences. Catches jargon that other formulas miss.

8. Linsear Write Formula

Originally developed by the US Air Force for technical manuals. Weights "easy" words (≤2 syllables) and "hard" words (≥3 syllables) differently.

console.log(r.linsearWrite.grade);

Best for: Technical documentation, instruction manuals, standard operating procedures.

The Consensus Grade

No single formula is perfect. textlens computes a weighted average across all grade-level formulas to give you one reliable number:

console.log(r.consensusGrade);
// e.g., 9 — target 6-8 for general audiences

This is the number to optimize for. If your content targets a general audience, aim for grades 6-8. Technical docs can go up to 10-12.

Sentiment Analysis

textlens includes AFINN-165 lexicon-based sentiment analysis. It scores every word against a dictionary of ~3,300 English words with known positive/negative associations.

import { sentiment } from 'textlens';

// Positive text
const positive = sentiment(
  'This framework is excellent and the documentation is incredibly helpful.'
);
console.log(positive.label);      // 'positive'
console.log(positive.score);      // 0.5 (normalized -1 to +1)
console.log(positive.positive);   // ['excellent', 'incredibly', 'helpful']
console.log(positive.negative);   // []

// Negative text
const negative = sentiment(
  'The API is broken and the error messages are terrible and confusing.'
);
console.log(negative.label);      // 'negative'
console.log(negative.score);      // -0.46
console.log(negative.negative);   // ['broken', 'terrible', 'confusing']

// Mixed text
const mixed = sentiment(
  'The interface looks great but performance is awful.'
);
console.log(mixed.label);         // result depends on word weights
console.log(mixed.confidence);    // 0.25 — low confidence = mixed signals

Practical use cases:

Flag negative tone in documentation before publishing
Monitor sentiment in user feedback or reviews
Detect unintentionally harsh language in automated emails

Heads up: Lexicon-based sentiment analysis works well for straightforward text but can't detect sarcasm, irony, or context-dependent meaning. It's a useful signal, not a definitive judgment.

Keyword Extraction

Extract the most important terms from any text using TF-IDF scoring with automatic stop word filtering:

import { keywords } from 'textlens';

const article = `
  TypeScript has become the standard for large-scale JavaScript applications.
  TypeScript adds static typing to JavaScript, catching errors at compile time.
  The TypeScript compiler produces clean JavaScript output.
  Many frameworks now ship TypeScript types by default.
`;

const kw = keywords(article, { topN: 5, minLength: 3 });
kw.forEach(k => {
  console.log(`${k.word}: score=${k.score.toFixed(2)}, count=${k.count}, density=${(k.density * 100).toFixed(1)}%`);
});
// typescript: score=0.85, count=4, density=12.5%
// javascript: score=0.64, count=3, density=9.4%
// ...

Keyword Density Analysis

For SEO work, you often need n-gram frequency analysis. The density() function gives you unigrams, bigrams, and trigrams:

import { density } from 'textlens';

const d = density(article);

// Top single words
console.log(d.unigrams.slice(0, 3));
// [{ text: 'typescript', count: 4, density: 0.125 }, ...]

// Top two-word phrases
console.log(d.bigrams.slice(0, 3));
// [{ text: 'typescript types', count: 1, density: 0.03 }, ...]

// Top three-word phrases
console.log(d.trigrams.slice(0, 3));

This is exactly what SEO tools charge monthly subscriptions for — running locally, for free.

CLI Usage

textlens ships a CLI for quick analysis from the terminal:

# Analyze a file
npx textlens README.md

# Pipe content in
echo "Your text here" | npx textlens

# JSON output for scripting
npx textlens article.md --json

# Just the keywords
npx textlens article.md --keywords 5

# Just sentiment
npx textlens article.md --sentiment

# SEO score targeting a keyword
npx textlens article.md --seo "typescript readability"

# Auto-summarize
npx textlens article.md --summary 3

# Everything at once
npx textlens article.md --all

Example output for npx textlens article.md:

📊 Text Statistics
  Words: 842  Sentences: 38  Paragraphs: 12
  Avg sentence length: 22.2 words
  Avg word length: 5.1 chars

📖 Readability
  Consensus Grade: 8
  Flesch Reading Ease: 62 (Standard)
  Flesch-Kincaid Grade: 8.2
  Gunning Fog: 10.1
  Coleman-Liau: 9.3

⏱ Reading Time: 4 min

In a CI Pipeline

Add a readability gate to your GitHub Actions:

- name: Check docs readability
  run: |
    GRADE=$(npx textlens docs/guide.md --json | jq '.readability.consensusGrade')
    if [ "$GRADE" -gt 12 ]; then
      echo "::error::Readability grade $GRADE exceeds limit of 12"
      exit 1
    fi

Why Zero Dependencies?

Every npm install with a tree of transitive dependencies is a supply chain risk you're accepting. textlens implements everything from scratch:

Syllable counting: Rule-based algorithm (~95% accuracy)
Sentiment lexicon: AFINN-165 baked into the source
Stop words: ~300 common English words for filtering
Dale-Chall word list: ~3,000 words embedded directly

The result: one package, no dependency tree, no breaking changes from upstream, no node_modules bloat.

textlens: 199 KB total, 0 dependencies

DEV Community