DEV Community

Cover image for How to Extract Keywords and Score Content Quality in Node.js
ckmtools
ckmtools

Posted on

How to Extract Keywords and Score Content Quality in Node.js

Content teams at companies like HubSpot and Clearscope pay hundreds per month for keyword extraction and content scoring tools. These tools analyze your text, extract the key topics, score the content for quality, and tell you what to improve.

You can build the same pipeline in Node.js with open-source tools. No API keys, no monthly fees, no external services.

This tutorial shows how to extract keywords, compute content quality scores, and build a content analysis pipeline using textlens.

Install

npm install textlens
Enter fullscreen mode Exit fullscreen mode

Zero dependencies. Works in Node.js 16+, ships ESM and CommonJS with TypeScript types.

Step 1: Extract Keywords with TF-IDF

TF-IDF (Term Frequency-Inverse Document Frequency) identifies words that appear frequently in your text but aren't common English stopwords. It's the same algorithm Google originally used for search ranking.

const { keywords } = require('textlens');

const article = `
  JavaScript frameworks have evolved rapidly over the past decade.
  React changed how developers think about component architecture.
  Vue offered a gentler learning curve. Svelte compiled components
  at build time, eliminating the virtual DOM entirely. Each framework
  solves different problems for different teams.
`;

const kw = keywords(article, { topN: 5, minLength: 4 });
console.log(kw);
Enter fullscreen mode Exit fullscreen mode

Output:

[
  { word: 'framework', score: 3.8, count: 2, density: 4.5 },
  { word: 'component', score: 3.2, count: 2, density: 4.5 },
  { word: 'developers', score: 2.1, count: 1, density: 2.3 },
  { word: 'svelte', score: 2.1, count: 1, density: 2.3 },
  { word: 'architecture', score: 1.9, count: 1, density: 2.3 }
]
Enter fullscreen mode Exit fullscreen mode

Each keyword has a score (TF-IDF weight), count (raw frequency), and density (percentage of total words). For SEO, a keyword density of 1-3% for your target keyword is the sweet spot.

Options

Option Default Description
topN 10 Number of keywords to return
minLength 3 Minimum word length

Step 2: Analyze Keyword Density with N-grams

Single-word keywords miss phrases like "content marketing" or "machine learning." The density() function extracts unigrams (single words), bigrams (two-word phrases), and trigrams (three-word phrases):

const { density } = require('textlens');

const result = density(article);

console.log('Top single words:', result.unigrams.slice(0, 3));
console.log('Top phrases:', result.bigrams.slice(0, 3));
console.log('Top 3-word phrases:', result.trigrams.slice(0, 3));
Enter fullscreen mode Exit fullscreen mode

Output:

// Unigrams
[
  { text: 'framework', count: 2, density: 4.5 },
  { text: 'component', count: 2, density: 4.5 },
  { text: 'developers', count: 1, density: 2.3 }
]

// Bigrams
[
  { text: 'javascript frameworks', count: 1, density: 2.3 },
  { text: 'component architecture', count: 1, density: 2.3 },
  { text: 'learning curve', count: 1, density: 2.3 }
]
Enter fullscreen mode Exit fullscreen mode

Bigrams are especially useful for SEO. They reveal the actual phrases your content covers, not just isolated words.

Step 3: Score Content Quality with SEO Scoring

textlens includes an seoScore() function that evaluates your content on four dimensions, each worth 25 points:

const { seoScore } = require('textlens');

const result = seoScore(article, { targetKeyword: 'javascript frameworks' });

console.log(`Overall: ${result.score}/100 (${result.grade})`);
console.log('Issues:', result.issues);
console.log('Suggestions:', result.suggestions);
console.log('Details:', result.details);
Enter fullscreen mode Exit fullscreen mode

Output:

{
  score: 68,
  grade: 'C',
  issues: ['Content length is below 300 words'],
  suggestions: ['Add more content to reach at least 300 words'],
  details: {
    readabilityScore: 22,      // out of 25
    contentLengthScore: 10,    // out of 25 (too short)
    keywordScore: 20,          // out of 25
    sentenceVarietyScore: 16   // out of 25
  }
}
Enter fullscreen mode Exit fullscreen mode

The Four Quality Dimensions

Readability (25 pts): Is the grade level appropriate? Targets grade 7 by default. Content above grade 12 loses points.

Content Length (25 pts): Is there enough substance? Under 300 words is thin content. 300-2500 is the sweet spot. Over 5000 words gets a suggestion to split.

Keyword Usage (25 pts): If you provide a targetKeyword, the scorer checks whether it appears at 1-3% density. Too low = not on topic. Too high = keyword stuffing.

Sentence Variety (25 pts): Do you mix short and long sentences? Monotonous sentence lengths (all long or all short) lose points. Variety keeps readers engaged.

Step 4: Add Sentiment Analysis

Content tone matters. A product announcement shouldn't read as negative. A security advisory shouldn't read as enthusiastic.

const { sentiment } = require('textlens');

const result = sentiment(article);
console.log(result.label);      // 'positive', 'negative', or 'neutral'
console.log(result.confidence); // 0.0 to 1.0
console.log(result.positive);   // matched positive words
console.log(result.negative);   // matched negative words
Enter fullscreen mode Exit fullscreen mode

textlens uses the AFINN-165 lexicon (~3,300 English words scored from -5 to +5). No external API needed.

Putting It Together: A Content Pipeline

Here's a complete content analysis function that combines all four steps:

const { analyze, seoScore } = require('textlens');
const { readFileSync } = require('fs');

function scoreContent(filePath, targetKeyword) {
  const text = readFileSync(filePath, 'utf8');
  const analysis = analyze(text);
  const seo = seoScore(text, { targetKeyword });

  return {
    file: filePath,
    wordCount: analysis.statistics.words,
    readingTime: `${analysis.readingTime.minutes} min`,
    gradeLevel: analysis.readability.consensusGrade,
    sentiment: analysis.sentiment.label,
    topKeywords: analysis.keywords.slice(0, 5).map(k => k.word),
    seoScore: seo.score,
    seoGrade: seo.grade,
    issues: seo.issues,
    suggestions: seo.suggestions,
  };
}

// Score a single article
const report = scoreContent('article.md', 'javascript');
console.log(JSON.stringify(report, null, 2));
Enter fullscreen mode Exit fullscreen mode

Batch Processing

Score every markdown file in a directory:

const { readdirSync } = require('fs');
const path = require('path');

const dir = process.argv[2] || 'content';
const keyword = process.argv[3];
const files = readdirSync(dir).filter(f => f.endsWith('.md'));

console.log(`Scoring ${files.length} files...\n`);

for (const file of files) {
  const report = scoreContent(path.join(dir, file), keyword);
  const status = report.seoScore >= 70 ? 'PASS' : 'FAIL';
  console.log(`${status} ${file}${report.seoGrade} (${report.seoScore}/100) — Grade ${report.gradeLevel}`);
}
Enter fullscreen mode Exit fullscreen mode

Run it:

node score.js content/ "javascript tutorial"
Enter fullscreen mode Exit fullscreen mode

Output:

Scoring 4 files...

PASS intro.md — B (78/100) — Grade 7.2
FAIL advanced.md — D (52/100) — Grade 13.1
PASS getting-started.md — A (85/100) — Grade 6.8
PASS faq.md — B (72/100) — Grade 8.4
Enter fullscreen mode Exit fullscreen mode

CLI Alternative

Don't want to write code? Use the textlens CLI:

# Full analysis
npx textlens article.md --all

# Keywords only
npx textlens article.md --keywords 10

# SEO score with target keyword
npx textlens article.md --seo "javascript frameworks"

# JSON output for scripting
npx textlens article.md --json | jq '.keywords[].word'
Enter fullscreen mode Exit fullscreen mode

Compared to Paid Tools

Feature textlens Clearscope SurferSEO
Keyword extraction Yes Yes Yes
Content scoring Yes Yes Yes
Readability analysis 8 formulas Basic Basic
Sentiment analysis Yes No No
Competitor analysis No Yes Yes
SERP data No Yes Yes
Price Free (MIT) $170/mo $89/mo
Self-hosted Yes No No
API/CLI Yes API only API only

textlens doesn't replace Clearscope or SurferSEO if you need SERP data and competitor analysis. But for keyword extraction, content scoring, and readability — it does the job for free.

Links

This is part of the textlens series — tutorials on text analysis in JavaScript and TypeScript.

Top comments (0)