Ozor

Posted on Mar 5

How to Scrape Any Website Without Getting Blocked (JavaScript)

#api #webdev #tutorial #javascript

If you've ever tried web scraping, you know the pain: CAPTCHAs, IP bans, rate limits, dynamic JavaScript rendering, and anti-bot detection. Most scraping attempts fail not because of bad code, but because websites actively fight scrapers.

Here's how to scrape reliably — without managing proxies, headless browsers, or getting your IP blacklisted.

The Problem with Traditional Scraping

// This works... until it doesn't
const res = await fetch('https://example.com/products');
const html = await res.text();
// ❌ 403 Forbidden
// ❌ Empty HTML (JS-rendered content)
// ❌ CAPTCHA page
// ❌ Your IP is now banned

Traditional scraping breaks because:

JavaScript-rendered pages return empty HTML to fetch/axios
Anti-bot systems (Cloudflare, Akamai) detect automated requests
IP rate limiting blocks repeated requests from the same IP
CAPTCHAs stop automated access entirely

The Solution: Use a Scraping API

Instead of fighting each website's defenses, delegate the hard parts to a scraping service that handles rendering, proxies, and anti-bot bypass:

const API_KEY = 'your-api-key';
const BASE = 'https://agent-gateway-kappa.vercel.app/api';

async function scrape(url) {
  const res = await fetch(
    `${BASE}/scrape?url=${encodeURIComponent(url)}&format=markdown`,
    { headers: { 'X-API-Key': API_KEY } }
  );
  return res.json();
}

const data = await scrape('https://news.ycombinator.com');
console.log(data.content); // Clean markdown content

This returns fully rendered content — even from JavaScript-heavy SPAs.

Get a Free API Key

curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create

You get 200 free credits instantly. No signup, no credit card.

5 Practical Scraping Recipes

1. Extract Article Text from Any URL

async function extractArticle(url) {
  const res = await fetch(
    `${BASE}/scrape?url=${encodeURIComponent(url)}&format=text`,
    { headers: { 'X-API-Key': API_KEY } }
  );
  const data = await res.json();

  return {
    title: data.title,
    content: data.content,
    wordCount: data.content.split(/\s+/).length,
    url: data.url
  };
}

const article = await extractArticle('https://blog.example.com/post');
console.log(`"${article.title}" — ${article.wordCount} words`);

2. Monitor a Page for Changes

import { createHash } from 'crypto';

const WATCHED_URLS = [
  'https://competitor.com/pricing',
  'https://example.com/status',
];

async function checkForChanges() {
  for (const url of WATCHED_URLS) {
    const res = await fetch(
      `${BASE}/scrape?url=${encodeURIComponent(url)}&format=text`,
      { headers: { 'X-API-Key': API_KEY } }
    );
    const { content } = await res.json();
    const hash = createHash('sha256').update(content).digest('hex');

    const previous = cache.get(url);
    if (previous && previous !== hash) {
      console.log(`⚠️ CHANGE DETECTED: ${url}`);
      // Send alert via webhook, email, Slack, etc.
    }
    cache.set(url, hash);
  }
}

// Run every 30 minutes
setInterval(checkForChanges, 30 * 60 * 1000);

3. Scrape Product Prices from Multiple Sites

async function scrapePrice(url, selector) {
  const res = await fetch(
    `${BASE}/scrape?url=${encodeURIComponent(url)}&format=markdown`,
    { headers: { 'X-API-Key': API_KEY } }
  );
  const { content } = await res.json();

  // Extract price pattern from markdown content
  const priceMatch = content.match(/\$[\d,]+\.?\d*/);
  return priceMatch ? priceMatch[0] : 'Price not found';
}

const sites = [
  'https://store-a.com/product/123',
  'https://store-b.com/item/456',
];

const prices = await Promise.all(
  sites.map(url => scrapePrice(url))
);
console.table(prices);

4. Build a Content Aggregator

async function aggregateContent(sources) {
  const results = [];

  for (const source of sources) {
    const res = await fetch(
      `${BASE}/scrape?url=${encodeURIComponent(source.url)}&format=markdown`,
      { headers: { 'X-API-Key': API_KEY } }
    );
    const data = await res.json();

    results.push({
      source: source.name,
      title: data.title,
      preview: data.content.substring(0, 200) + '...',
      scrapedAt: new Date().toISOString()
    });
  }

  return results;
}

const feed = await aggregateContent([
  { name: 'HN', url: 'https://news.ycombinator.com' },
  { name: 'Product Hunt', url: 'https://www.producthunt.com' },
  { name: 'Dev.to', url: 'https://dev.to' },
]);

console.log(JSON.stringify(feed, null, 2));

5. Extract Structured Data (Emails, Links, Metadata)

async function extractMetadata(url) {
  const res = await fetch(
    `${BASE}/scrape?url=${encodeURIComponent(url)}&format=markdown`,
    { headers: { 'X-API-Key': API_KEY } }
  );
  const { content, title } = await res.json();

  // Extract all links
  const links = [...content.matchAll(/\[([^\]]+)\]\(([^)]+)\)/g)]
    .map(m => ({ text: m[1], href: m[2] }));

  // Extract emails
  const emails = [...content.matchAll(/[\w.-]+@[\w.-]+\.\w+/g)]
    .map(m => m[0]);

  return { title, linkCount: links.length, links: links.slice(0, 10), emails };
}

const meta = await extractMetadata('https://example.com/contact');
console.log(`Found ${meta.linkCount} links and ${meta.emails.length} emails`);

Adding Screenshots for Visual Monitoring

Combine scraping with screenshots to get both content and visual snapshots:

async function fullPageAudit(url) {
  const [scrapeRes, screenshotRes] = await Promise.all([
    fetch(`${BASE}/scrape?url=${encodeURIComponent(url)}&format=text`, {
      headers: { 'X-API-Key': API_KEY }
    }),
    fetch(`${BASE}/screenshot?url=${encodeURIComponent(url)}&width=1280`, {
      headers: { 'X-API-Key': API_KEY }
    })
  ]);

  const content = await scrapeRes.json();
  const screenshot = await screenshotRes.json();

  return {
    title: content.title,
    wordCount: content.content.split(/\s+/).length,
    screenshotUrl: screenshot.url,
    timestamp: new Date().toISOString()
  };
}

Python Version

import requests

API_KEY = 'your-api-key'
BASE = 'https://agent-gateway-kappa.vercel.app/api'

def scrape(url, fmt='markdown'):
    res = requests.get(
        f'{BASE}/scrape',
        params={'url': url, 'format': fmt},
        headers={'X-API-Key': API_KEY}
    )
    return res.json()

data = scrape('https://news.ycombinator.com')
print(data['title'])
print(data['content'][:500])

cURL One-Liner

# Quick scrape from the terminal
curl "https://agent-gateway-kappa.vercel.app/api/scrape?url=https://example.com&format=text" \
  -H "X-API-Key: YOUR_KEY"

Why This Beats DIY Scraping

Feature	DIY (Puppeteer/Playwright)	Scraping API
Setup time	Hours (Chrome, deps)	0 minutes
JS rendering	Manual config	Built-in
Anti-bot bypass	You handle it	Built-in
Proxy rotation	You pay + manage	Built-in
Server costs	$20-100/mo for Chrome	Pay per request
Maintenance	Constant	Zero

Rate Limiting & Best Practices

Be respectful: Don't hammer sites with 100 requests/second
Cache results: Store scraped data locally to avoid redundant calls
Check robots.txt: Respect site policies (the API handles this for you)
Use appropriate formats: text for content extraction, markdown for structured data, html for full page

Get Started

Get your free API key (200 credits, no signup):

curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create

Start scraping:

curl "https://agent-gateway-kappa.vercel.app/api/scrape?url=https://example.com&format=markdown" \
  -H "X-API-Key: YOUR_KEY"

Full docs: API Documentation

DEV Community