Never scraped a website before? This guide gets you from zero to extracting real data in 10 minutes.
Step 1: Set Up (1 minute)
mkdir my-first-scraper && cd my-first-scraper
npm init -y
npm install cheerio
Step 2: Write the Scraper (3 minutes)
Create scraper.js:
const cheerio = require('cheerio');
async function scrape() {
// Fetch a webpage
const url = 'https://news.ycombinator.com';
const res = await fetch(url);
const html = await res.text();
// Parse the HTML
const $ = cheerio.load(html);
// Extract data
const stories = [];
$('.titleline > a').each((i, el) => {
stories.push({
rank: i + 1,
title: $(el).text(),
url: $(el).attr('href')
});
});
// Display results
console.log(`Found ${stories.length} stories:\n`);
stories.slice(0, 10).forEach(s => {
console.log(`${s.rank}. ${s.title}`);
});
return stories;
}
scrape();
Step 3: Run It (1 minute)
node scraper.js
Output:
Found 30 stories:
1. Show HN: I built a terminal-only Bluesky client
2. ArXiv Declares Independence from Cornell
3. OpenCode – The open source AI coding agent
...
Congratulations! You just scraped Hacker News.
Step 4: Save as JSON (2 minutes)
const fs = require('fs');
const stories = await scrape();
fs.writeFileSync('stories.json', JSON.stringify(stories, null, 2));
console.log('Saved to stories.json');
Step 5: Save as CSV (2 minutes)
const csv = stories.map(s => `${s.rank},"${s.title}","${s.url}"`).join('\n');
fs.writeFileSync('stories.csv', 'rank,title,url\n' + csv);
console.log('Saved to stories.csv');
What You Learned
- fetch() — get webpage HTML
- cheerio.load() — parse HTML
- $('selector') — find elements (same as jQuery)
- $(el).text() — get text content
- $(el).attr('href') — get attribute
Next Steps
- Stop Parsing HTML — 7 Sites That Return JSON
- 5 Patterns That Never Break
- Anti-Bot Guide
- 77 Free Scrapers
Want me to build a scraper for you instead? $20 flat rate. Any site, any format. Email: Spinov001@gmail.com | Hire me
Top comments (0)