ckmtools

Posted on Mar 8

How to Extract Keywords and Score Content Quality in Node.js

#webdev #typescript #node #javascript

Content teams at companies like HubSpot and Clearscope pay hundreds per month for keyword extraction and content scoring tools. These tools analyze your text, extract the key topics, score the content for quality, and tell you what to improve.

You can build the same pipeline in Node.js with open-source tools. No API keys, no monthly fees, no external services.

This tutorial shows how to extract keywords, compute content quality scores, and build a content analysis pipeline using textlens.

Install

npm install textlens

Zero dependencies. Works in Node.js 16+, ships ESM and CommonJS with TypeScript types.

Step 1: Extract Keywords with TF-IDF

TF-IDF (Term Frequency-Inverse Document Frequency) identifies words that appear frequently in your text but aren't common English stopwords. It's the same algorithm Google originally used for search ranking.

const { keywords } = require('textlens');

const article = `
  JavaScript frameworks have evolved rapidly over the past decade.
  React changed how developers think about component architecture.
  Vue offered a gentler learning curve. Svelte compiled components
  at build time, eliminating the virtual DOM entirely. Each framework
  solves different problems for different teams.
`;

const kw = keywords(article, { topN: 5, minLength: 4 });
console.log(kw);

Output:

[
  { word: 'framework', score: 3.8, count: 2, density: 4.5 },
  { word: 'component', score: 3.2, count: 2, density: 4.5 },
  { word: 'developers', score: 2.1, count: 1, density: 2.3 },
  { word: 'svelte', score: 2.1, count: 1, density: 2.3 },
  { word: 'architecture', score: 1.9, count: 1, density: 2.3 }
]

Each keyword has a score (TF-IDF weight), count (raw frequency), and density (percentage of total words). For SEO, a keyword density of 1-3% for your target keyword is the sweet spot.

Options

Option	Default	Description
`topN`	10	Number of keywords to return
`minLength`	3	Minimum word length

Step 2: Analyze Keyword Density with N-grams

Single-word keywords miss phrases like "content marketing" or "machine learning." The density() function extracts unigrams (single words), bigrams (two-word phrases), and trigrams (three-word phrases):

const { density } = require('textlens');

const result = density(article);

console.log('Top single words:', result.unigrams.slice(0, 3));
console.log('Top phrases:', result.bigrams.slice(0, 3));
console.log('Top 3-word phrases:', result.trigrams.slice(0, 3));

Output:

// Unigrams
[
  { text: 'framework', count: 2, density: 4.5 },
  { text: 'component', count: 2, density: 4.5 },
  { text: 'developers', count: 1, density: 2.3 }
]

// Bigrams
[
  { text: 'javascript frameworks', count: 1, density: 2.3 },
  { text: 'component architecture', count: 1, density: 2.3 },
  { text: 'learning curve', count: 1, density: 2.3 }
]

Bigrams are especially useful for SEO. They reveal the actual phrases your content covers, not just isolated words.

Step 3: Score Content Quality with SEO Scoring

textlens includes an seoScore() function that evaluates your content on four dimensions, each worth 25 points:

const { seoScore } = require('textlens');

const result = seoScore(article, { targetKeyword: 'javascript frameworks' });

console.log(`Overall: ${result.score}/100 (${result.grade})`);
console.log('Issues:', result.issues);
console.log('Suggestions:', result.suggestions);
console.log('Details:', result.details);

Output:

{
  score: 68,
  grade: 'C',
  issues: ['Content length is below 300 words'],
  suggestions: ['Add more content to reach at least 300 words'],
  details: {
    readabilityScore: 22,      // out of 25
    contentLengthScore: 10,    // out of 25 (too short)
    keywordScore: 20,          // out of 25
    sentenceVarietyScore: 16   // out of 25
  }
}

The Four Quality Dimensions

Readability (25 pts): Is the grade level appropriate? Targets grade 7 by default. Content above grade 12 loses points.

Content Length (25 pts): Is there enough substance? Under 300 words is thin content. 300-2500 is the sweet spot. Over 5000 words gets a suggestion to split.

Keyword Usage (25 pts): If you provide a targetKeyword, the scorer checks whether it appears at 1-3% density. Too low = not on topic. Too high = keyword stuffing.

Sentence Variety (25 pts): Do you mix short and long sentences? Monotonous sentence lengths (all long or all short) lose points. Variety keeps readers engaged.

Step 4: Add Sentiment Analysis

Content tone matters. A product announcement shouldn't read as negative. A security advisory shouldn't read as enthusiastic.

const { sentiment } = require('textlens');

const result = sentiment(article);
console.log(result.label);      // 'positive', 'negative', or 'neutral'
console.log(result.confidence); // 0.0 to 1.0
console.log(result.positive);   // matched positive words
console.log(result.negative);   // matched negative words

textlens uses the AFINN-165 lexicon (~3,300 English words scored from -5 to +5). No external API needed.

Putting It Together: A Content Pipeline

Here's a complete content analysis function that combines all four steps:

const { analyze, seoScore } = require('textlens');
const { readFileSync } = require('fs');

function scoreContent(filePath, targetKeyword) {
  const text = readFileSync(filePath, 'utf8');
  const analysis = analyze(text);
  const seo = seoScore(text, { targetKeyword });

  return {
    file: filePath,
    wordCount: analysis.statistics.words,
    readingTime: `${analysis.readingTime.minutes} min`,
    gradeLevel: analysis.readability.consensusGrade,
    sentiment: analysis.sentiment.label,
    topKeywords: analysis.keywords.slice(0, 5).map(k => k.word),
    seoScore: seo.score,
    seoGrade: seo.grade,
    issues: seo.issues,
    suggestions: seo.suggestions,
  };
}

// Score a single article
const report = scoreContent('article.md', 'javascript');
console.log(JSON.stringify(report, null, 2));

Batch Processing

Score every markdown file in a directory:

const { readdirSync } = require('fs');
const path = require('path');

const dir = process.argv[2] || 'content';
const keyword = process.argv[3];
const files = readdirSync(dir).filter(f => f.endsWith('.md'));

console.log(`Scoring ${files.length} files...\n`);

for (const file of files) {
  const report = scoreContent(path.join(dir, file), keyword);
  const status = report.seoScore >= 70 ? 'PASS' : 'FAIL';
  console.log(`${status} ${file} — ${report.seoGrade} (${report.seoScore}/100) — Grade ${report.gradeLevel}`);
}

Run it:

node score.js content/ "javascript tutorial"

Output:

Scoring 4 files...

PASS intro.md — B (78/100) — Grade 7.2
FAIL advanced.md — D (52/100) — Grade 13.1
PASS getting-started.md — A (85/100) — Grade 6.8
PASS faq.md — B (72/100) — Grade 8.4

CLI Alternative

Don't want to write code? Use the textlens CLI:

# Full analysis
npx textlens article.md --all

# Keywords only
npx textlens article.md --keywords 10

# SEO score with target keyword
npx textlens article.md --seo "javascript frameworks"

# JSON output for scripting
npx textlens article.md --json | jq '.keywords[].word'

Compared to Paid Tools

Feature	textlens	Clearscope	SurferSEO
Keyword extraction	Yes	Yes	Yes
Content scoring	Yes	Yes	Yes
Readability analysis	8 formulas	Basic	Basic
Sentiment analysis	Yes	No	No
Competitor analysis	No	Yes	Yes
SERP data	No	Yes	Yes
Price	Free (MIT)	$170/mo	$89/mo
Self-hosted	Yes	No	No
API/CLI	Yes	API only	API only

textlens doesn't replace Clearscope or SurferSEO if you need SERP data and competitor analysis. But for keyword extraction, content scoring, and readability — it does the job for free.

DEV Community

How to Extract Keywords and Score Content Quality in Node.js

Install

Step 1: Extract Keywords with TF-IDF

Options

Step 2: Analyze Keyword Density with N-grams

Step 3: Score Content Quality with SEO Scoring

The Four Quality Dimensions

Step 4: Add Sentiment Analysis

Putting It Together: A Content Pipeline

Batch Processing

CLI Alternative

Compared to Paid Tools

Links

Top comments (0)