Bookmark this. Everything you need for web scraping in one article.
Tools
| Tool | Use Case | Install |
|---|---|---|
| Cheerio | HTML parsing | npm i cheerio |
| Playwright | Browser automation | npm i playwright |
| xml2js | XML/RSS parsing | npm i xml2js |
| xlsx | Excel output | npm i xlsx |
Free APIs (No Key)
| API | URL Pattern |
|---|---|
reddit.com/r/SUB.json |
|
| YouTube | youtubei/v1/search |
| Shopify | store.com/products.json |
| HN | hn.algolia.com/api/v1/search |
| Wikipedia | en.wikipedia.org/w/api.php |
| arXiv | export.arxiv.org/api/query |
| npm | registry.npmjs.org/-/v1/search |
| DuckDuckGo | api.duckduckgo.com/?q=X&format=json |
| Bluesky | public.api.bsky.app/xrpc/ |
Anti-Bot Checklist
- [ ] Set User-Agent header
- [ ] Add random delays (2-5s)
- [ ] Rotate user agents
- [ ] Handle 429 with exponential backoff
- [ ] Use Promise.allSettled for parallel
- [ ] Validate output data
Output Formats
// JSON
fs.writeFileSync("out.json", JSON.stringify(data, null, 2));
// CSV
const csv = data.map(d => Object.values(d).join(",")).join("\n");
fs.writeFileSync("out.csv", csv);
// Excel
const ws = XLSX.utils.json_to_sheet(data);
const wb = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(wb, ws, "Data");
XLSX.writeFile(wb, "out.xlsx");
All 100+ Guides
Need a scraper built? $20. Any site, any format, 24h delivery. Email: Spinov001@gmail.com | Pricing
Top comments (0)