DEV Community

Алексей Спинов
Алексей Спинов

Posted on

Web Scraping Cheat Sheet: Every Tool, API, and Pattern in One Place

Bookmark this. Everything you need for web scraping in one article.

Tools

Tool Use Case Install
Cheerio HTML parsing npm i cheerio
Playwright Browser automation npm i playwright
xml2js XML/RSS parsing npm i xml2js
xlsx Excel output npm i xlsx

Free APIs (No Key)

API URL Pattern
Reddit reddit.com/r/SUB.json
YouTube youtubei/v1/search
Shopify store.com/products.json
HN hn.algolia.com/api/v1/search
Wikipedia en.wikipedia.org/w/api.php
arXiv export.arxiv.org/api/query
npm registry.npmjs.org/-/v1/search
DuckDuckGo api.duckduckgo.com/?q=X&format=json
Bluesky public.api.bsky.app/xrpc/

Anti-Bot Checklist

  • [ ] Set User-Agent header
  • [ ] Add random delays (2-5s)
  • [ ] Rotate user agents
  • [ ] Handle 429 with exponential backoff
  • [ ] Use Promise.allSettled for parallel
  • [ ] Validate output data

Output Formats

// JSON
fs.writeFileSync("out.json", JSON.stringify(data, null, 2));

// CSV
const csv = data.map(d => Object.values(d).join(",")).join("\n");
fs.writeFileSync("out.csv", csv);

// Excel
const ws = XLSX.utils.json_to_sheet(data);
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Data");
XLSX.writeFile(wb, "out.xlsx");
Enter fullscreen mode Exit fullscreen mode

All 100+ Guides

Complete Index


Need a scraper built? $20. Any site, any format, 24h delivery. Email: Spinov001@gmail.com | Pricing

Top comments (0)