Content teams at companies like HubSpot and Clearscope pay hundreds per month for keyword extraction and content scoring tools. These tools analyze your text, extract the key topics, score the content for quality, and tell you what to improve.
You can build the same pipeline in Node.js with open-source tools. No API keys, no monthly fees, no external services.
This tutorial shows how to extract keywords, compute content quality scores, and build a content analysis pipeline using textlens.
Install
npm install textlens
Zero dependencies. Works in Node.js 16+, ships ESM and CommonJS with TypeScript types.
Step 1: Extract Keywords with TF-IDF
TF-IDF (Term Frequency-Inverse Document Frequency) identifies words that appear frequently in your text but aren't common English stopwords. It's the same algorithm Google originally used for search ranking.
const { keywords } = require('textlens');
const article = `
JavaScript frameworks have evolved rapidly over the past decade.
React changed how developers think about component architecture.
Vue offered a gentler learning curve. Svelte compiled components
at build time, eliminating the virtual DOM entirely. Each framework
solves different problems for different teams.
`;
const kw = keywords(article, { topN: 5, minLength: 4 });
console.log(kw);
Output:
[
{ word: 'framework', score: 3.8, count: 2, density: 4.5 },
{ word: 'component', score: 3.2, count: 2, density: 4.5 },
{ word: 'developers', score: 2.1, count: 1, density: 2.3 },
{ word: 'svelte', score: 2.1, count: 1, density: 2.3 },
{ word: 'architecture', score: 1.9, count: 1, density: 2.3 }
]
Each keyword has a score (TF-IDF weight), count (raw frequency), and density (percentage of total words). For SEO, a keyword density of 1-3% for your target keyword is the sweet spot.
Options
| Option | Default | Description |
|---|---|---|
topN |
10 | Number of keywords to return |
minLength |
3 | Minimum word length |
Step 2: Analyze Keyword Density with N-grams
Single-word keywords miss phrases like "content marketing" or "machine learning." The density() function extracts unigrams (single words), bigrams (two-word phrases), and trigrams (three-word phrases):
const { density } = require('textlens');
const result = density(article);
console.log('Top single words:', result.unigrams.slice(0, 3));
console.log('Top phrases:', result.bigrams.slice(0, 3));
console.log('Top 3-word phrases:', result.trigrams.slice(0, 3));
Output:
// Unigrams
[
{ text: 'framework', count: 2, density: 4.5 },
{ text: 'component', count: 2, density: 4.5 },
{ text: 'developers', count: 1, density: 2.3 }
]
// Bigrams
[
{ text: 'javascript frameworks', count: 1, density: 2.3 },
{ text: 'component architecture', count: 1, density: 2.3 },
{ text: 'learning curve', count: 1, density: 2.3 }
]
Bigrams are especially useful for SEO. They reveal the actual phrases your content covers, not just isolated words.
Step 3: Score Content Quality with SEO Scoring
textlens includes an seoScore() function that evaluates your content on four dimensions, each worth 25 points:
const { seoScore } = require('textlens');
const result = seoScore(article, { targetKeyword: 'javascript frameworks' });
console.log(`Overall: ${result.score}/100 (${result.grade})`);
console.log('Issues:', result.issues);
console.log('Suggestions:', result.suggestions);
console.log('Details:', result.details);
Output:
{
score: 68,
grade: 'C',
issues: ['Content length is below 300 words'],
suggestions: ['Add more content to reach at least 300 words'],
details: {
readabilityScore: 22, // out of 25
contentLengthScore: 10, // out of 25 (too short)
keywordScore: 20, // out of 25
sentenceVarietyScore: 16 // out of 25
}
}
The Four Quality Dimensions
Readability (25 pts): Is the grade level appropriate? Targets grade 7 by default. Content above grade 12 loses points.
Content Length (25 pts): Is there enough substance? Under 300 words is thin content. 300-2500 is the sweet spot. Over 5000 words gets a suggestion to split.
Keyword Usage (25 pts): If you provide a targetKeyword, the scorer checks whether it appears at 1-3% density. Too low = not on topic. Too high = keyword stuffing.
Sentence Variety (25 pts): Do you mix short and long sentences? Monotonous sentence lengths (all long or all short) lose points. Variety keeps readers engaged.
Step 4: Add Sentiment Analysis
Content tone matters. A product announcement shouldn't read as negative. A security advisory shouldn't read as enthusiastic.
const { sentiment } = require('textlens');
const result = sentiment(article);
console.log(result.label); // 'positive', 'negative', or 'neutral'
console.log(result.confidence); // 0.0 to 1.0
console.log(result.positive); // matched positive words
console.log(result.negative); // matched negative words
textlens uses the AFINN-165 lexicon (~3,300 English words scored from -5 to +5). No external API needed.
Putting It Together: A Content Pipeline
Here's a complete content analysis function that combines all four steps:
const { analyze, seoScore } = require('textlens');
const { readFileSync } = require('fs');
function scoreContent(filePath, targetKeyword) {
const text = readFileSync(filePath, 'utf8');
const analysis = analyze(text);
const seo = seoScore(text, { targetKeyword });
return {
file: filePath,
wordCount: analysis.statistics.words,
readingTime: `${analysis.readingTime.minutes} min`,
gradeLevel: analysis.readability.consensusGrade,
sentiment: analysis.sentiment.label,
topKeywords: analysis.keywords.slice(0, 5).map(k => k.word),
seoScore: seo.score,
seoGrade: seo.grade,
issues: seo.issues,
suggestions: seo.suggestions,
};
}
// Score a single article
const report = scoreContent('article.md', 'javascript');
console.log(JSON.stringify(report, null, 2));
Batch Processing
Score every markdown file in a directory:
const { readdirSync } = require('fs');
const path = require('path');
const dir = process.argv[2] || 'content';
const keyword = process.argv[3];
const files = readdirSync(dir).filter(f => f.endsWith('.md'));
console.log(`Scoring ${files.length} files...\n`);
for (const file of files) {
const report = scoreContent(path.join(dir, file), keyword);
const status = report.seoScore >= 70 ? 'PASS' : 'FAIL';
console.log(`${status} ${file} — ${report.seoGrade} (${report.seoScore}/100) — Grade ${report.gradeLevel}`);
}
Run it:
node score.js content/ "javascript tutorial"
Output:
Scoring 4 files...
PASS intro.md — B (78/100) — Grade 7.2
FAIL advanced.md — D (52/100) — Grade 13.1
PASS getting-started.md — A (85/100) — Grade 6.8
PASS faq.md — B (72/100) — Grade 8.4
CLI Alternative
Don't want to write code? Use the textlens CLI:
# Full analysis
npx textlens article.md --all
# Keywords only
npx textlens article.md --keywords 10
# SEO score with target keyword
npx textlens article.md --seo "javascript frameworks"
# JSON output for scripting
npx textlens article.md --json | jq '.keywords[].word'
Compared to Paid Tools
| Feature | textlens | Clearscope | SurferSEO |
|---|---|---|---|
| Keyword extraction | Yes | Yes | Yes |
| Content scoring | Yes | Yes | Yes |
| Readability analysis | 8 formulas | Basic | Basic |
| Sentiment analysis | Yes | No | No |
| Competitor analysis | No | Yes | Yes |
| SERP data | No | Yes | Yes |
| Price | Free (MIT) | $170/mo | $89/mo |
| Self-hosted | Yes | No | No |
| API/CLI | Yes | API only | API only |
textlens doesn't replace Clearscope or SurferSEO if you need SERP data and competitor analysis. But for keyword extraction, content scoring, and readability — it does the job for free.
Links
-
textlens on npm —
npm install textlens - GitHub: ckmtools/textlens
- Docs: ckmtools.dev/textlens
This is part of the textlens series — tutorials on text analysis in JavaScript and TypeScript.
Top comments (0)