roberto degani

Posted on Mar 23

How I Built a Web Scraping API That Handles 100K Requests/Day for Free

#javascript #api #tutorial #webdev

I needed to scrape product prices across 50+ competitor sites. Hiring a scraping service? $500/month minimum. Building it myself on traditional infrastructure? The bandwidth costs alone would kill me.

Then I discovered Cloudflare Workers. Three months later, I'm handling 100K+ requests daily—for free tier pricing.

The Problem I Solved

My e-commerce startup needed real-time price monitoring across competitors. Manual checks? Impossible. Existing APIs? Either overpriced or inconsistent. So I built the Degani Web Scraper API—deployed on Cloudflare Workers—and it's been running lean ever since.

The wins:

Extracts structured data from any webpage in milliseconds
Handles 100K daily requests within Cloudflare's free tier
Zero infrastructure to manage
Rate limiting built-in
Returns clean JSON—not HTML soup

What This API Actually Does

The endpoint structure is simple. You POST to:

https://degani-web-scraper.deganiagency.workers.dev

And it gives you clean data back. Here's what you can extract:

POST /extract - Full DOM extraction with CSS selectors
POST /meta - Meta tags, titles, descriptions
POST /links - All anchor tags with href validation
POST /images - Image URLs with alt text
POST /text - Plain text content, cleaned

Real Use Cases (And Why They Matter)

1. Price Monitoring for E-commerce

Monitor competitor pricing in real-time. I use this daily to catch pricing wars before they start.

curl -X POST https://degani-web-scraper.deganiagency.workers.dev/extract \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://competitor.com/products",
    "selectors": {
      "price": ".product-price",
      "title": ".product-title",
      "stock": ".stock-status"
    }
  }'

Response:

{
  "price": "$29.99",
  "title": "Premium Widget",
  "stock": "In Stock"
}

2. Lead Generation & B2B Prospecting

Extract company info from directories—contact names, emails, phone numbers. Perfect for building prospect lists.

const response = await fetch(
  'https://degani-web-scraper.deganiagency.workers.dev/extract',
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url: 'https://directory.com/companies',
      selectors: {
        company: '.company-name',
        contact: '.contact-email',
        phone: '.phone-number'
      }
    })
  }
);

const data = await response.json();
console.log(data); // { company: "Acme Inc", contact: "...", phone: "..." }

3. SEO Audit Data Collection

Extract meta tags, headers, and structured data. Feed it into your SEO tools.

curl -X POST https://degani-web-scraper.deganiagency.workers.dev/meta \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

The response includes:

{
  "title": "Page Title",
  "description": "Meta description",
  "og_image": "https://...",
  "canonical": "https://...",
  "h1": ["Main Heading"],
  "language": "en"
}

Why Cloudflare Workers?

When I started, I was going to run this on a VPS. Then I calculated:

Bandwidth for 100K requests/day = ~$40-100/month
VPS cost = $10-20/month
Maintenance headaches = priceless

Cloudflare Workers changed the equation:

Distributed globally (sub-100ms response times)
No servers to manage
Auto-scales
Free tier handles 100K requests/day

It's not magic—it's smart infrastructure.

Extraction in Action

Let's say you're scraping a product listing page:

const scrapeProducts = async (pageUrl) => {
  const response = await fetch(
    'https://degani-web-scraper.deganiagency.workers.dev/extract',
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        url: pageUrl,
        selectors: {
          products: {
            name: '.product-name',
            price: '.price',
            rating: '.stars',
            url: 'a.product-link'
          }
        }
      })
    }
  );

  return response.json();
};

// Usage
const products = await scrapeProducts('https://example.com/products');
// { products: [...] }

No parsing HTML by hand. No fighting regex. Just clean JSON.

Getting Started

Hit the API endpoint with your target URL
Define CSS selectors for what you need
Get back structured JSON
Build something awesome

The full API docs are available at https://rapidapi.com/deganiagency/api/web-scraper-extractor

What I Learned Building This

Lesson 1: Cloudflare Workers handle concurrency beautifully. I was worried about request spikes. Never happened.

Lesson 2: CSS selectors are powerful. Most websites use consistent class names—your selectors work across 90% of pages.

Lesson 3: Respect robots.txt and rate limits. The API enforces sensible defaults, but always check target sites' ToS.

What's Next

I'm adding:

Screenshot capture (headless browser support)
JavaScript rendering (for SPA content)
Automatic selector optimization

Already using the web scraper API? Drop a comment—I'd love to hear what you're building.

Got alternative solutions you prefer? Let's discuss. This is a tool that solves real problems for people who can't afford $500/month scrapers.

Try it free: https://rapidapi.com/deganiagency/api/web-scraper-extractor
Source: https://degani-web-scraper.deganiagency.workers.dev
Perfect for: Price monitoring, lead gen, SEO audits, competitive analysis

DEV Community