If I wanted to analyze several articles on the same topic, the process was always the same:
- open multiple tabs
- read each article
- copy headings
- copy paragraphs
- organize everything manually
It was slow and repetitive.
So I decided to automate it.
Thatβs how content-scraper-cli was created.
The Idea
Instead of manually collecting information from multiple articles, I wanted a tool that could:
- Take a list of URLs
- Extract the important parts of each page
- Organize everything in a structured format
The result is a JSON file that can be used as research input for AI tools or content analysis.
Installing the CLI
You can install the tool globally with npm:
npm install -g content-scraper-cli
Then run it from your terminal:
content-scraper
The CLI will ask for:
- the URLs you want to analyze
- the name of the JSON output file
Example:
π Enter URLs separated by comma:
https://blog.com/article-1, https://site.com/post-2
πΎ Output JSON file name:
research-data
After processing the pages, the tool generates a JSON file with structured information from each article.
What the Tool Extracts
For every URL, the CLI extracts:
Metadata
- title
- meta description
- author (if available)
- publication date
- site language
- meta keywords
Content Structure
- H1 headings
- H2 headings
- H3 headings
Content
- paragraphs with meaningful length
- lists (ul / ol)
Statistics
- total paragraphs
- total word count
This makes it easy to analyze how different articles are structured.
Example Output
The output file contains structured data like this:
{
"generado_en": "...",
"total_fuentes": 3,
"fuentes": [
{
"url": "...",
"titulo": "...",
"estructura": {
"h1": [],
"h2": [],
"h3": []
},
"parrafos": [],
"listas": []
}
]
}
This dataset can then be used for:
- content research
- SEO analysis
- AI article generation workflows
- studying how top articles structure their content
Why I Built It
The goal was simple:
Make content research faster.
Instead of manually reading and copying data from multiple pages, the CLI collects the structure of several articles in seconds.
Notes
The tool works best with:
- blogs
- news websites
- long-form articles
Some sites with strong anti-bot protection may block requests.
The CLI intentionally does not extract images and does not rely on paid APIs.
Everything runs locally.
Open Source
The project is open source and available on npm.
If you want to try it:
npm install -g content-scraper-cli

Top comments (0)