DEV Community

# webscraping

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How I Built a Real Chinese Product Review Aggregator (and Why English Reviews Are Broken)

How I Built a Real Chinese Product Review Aggregator (and Why English Reviews Are Broken)

Comments
1 min read
Feeding Raw HTML to Your LLM Is a Token Tax. I Measured It on 10 Real Pages — Median 7.4 , and It Hits Every Scheduled Run

Feeding Raw HTML to Your LLM Is a Token Tax. I Measured It on 10 Real Pages — Median 7.4 , and It Hits Every Scheduled Run

2
Comments 1
8 min read
How I scraped Welcome to the Jungle Jobs and what the dataset actually looks like

How I scraped Welcome to the Jungle Jobs and what the dataset actually looks like

Comments
4 min read
What I learned scraping Website Contact: schema, gotchas and the tooling that worked

What I learned scraping Website Contact: schema, gotchas and the tooling that worked

Comments
3 min read
Sample dataset analysis: a 100-row snapshot of Sitemap

Sample dataset analysis: a 100-row snapshot of Sitemap

Comments
3 min read
The German Web Scraping Market: €190M and Growing

The German Web Scraping Market: €190M and Growing

Comments
4 min read
DSGVO-Compliant Web Scraping: What German Businesses Need to Know

DSGVO-Compliant Web Scraping: What German Businesses Need to Know

Comments
4 min read
Automating Web Intelligence with Python: A Practical Guide

Automating Web Intelligence with Python: A Practical Guide

Comments
4 min read
I Built a Web Scraper API That Handles JS Rendering, CAPTCHAs, and Proxies

I Built a Web Scraper API That Handles JS Rendering, CAPTCHAs, and Proxies

Comments
2 min read
xcrawl-scraper v1.0.1 — Node.js SDK for Web Scraping

xcrawl-scraper v1.0.1 — Node.js SDK for Web Scraping

1
Comments
1 min read
Giving n8n AI Workflows Fresh Web Data Without Babysitting Scrapers

Giving n8n AI Workflows Fresh Web Data Without Babysitting Scrapers

Comments 2
5 min read
Raw HTML is where LLM context goes to die

Raw HTML is where LLM context goes to die

1
Comments
5 min read
Scraping Chinese Social Platforms for LLM Training Data: A Practical Multi-Source Pipeline (Python, 2026)

Scraping Chinese Social Platforms for LLM Training Data: A Practical Multi-Source Pipeline (Python, 2026)

Comments
7 min read
What to do when websites change and your spider doesn't know

What to do when websites change and your spider doesn't know

1
Comments
6 min read
Web Scraping in 2024: Whats Legal, Whats Not, and What Works

Web Scraping in 2024: Whats Legal, Whats Not, and What Works

Comments
6 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.