If you've ever tried web scraping, you know the pain: CAPTCHAs, IP bans, rate limits, dynamic JavaScript rendering, and anti-bot detection. Most scraping attempts fail not because of bad code, but because websites actively fight scrapers.
Here's how to scrape reliably — without managing proxies, headless browsers, or getting your IP blacklisted.
The Problem with Traditional Scraping
// This works... until it doesn't
const res = await fetch('https://example.com/products');
const html = await res.text();
// ❌ 403 Forbidden
// ❌ Empty HTML (JS-rendered content)
// ❌ CAPTCHA page
// ❌ Your IP is now banned
Traditional scraping breaks because:
-
JavaScript-rendered pages return empty HTML to
fetch/axios - Anti-bot systems (Cloudflare, Akamai) detect automated requests
- IP rate limiting blocks repeated requests from the same IP
- CAPTCHAs stop automated access entirely
The Solution: Use a Scraping API
Instead of fighting each website's defenses, delegate the hard parts to a scraping service that handles rendering, proxies, and anti-bot bypass:
const API_KEY = 'your-api-key';
const BASE = 'https://agent-gateway-kappa.vercel.app/api';
async function scrape(url) {
const res = await fetch(
`${BASE}/scrape?url=${encodeURIComponent(url)}&format=markdown`,
{ headers: { 'X-API-Key': API_KEY } }
);
return res.json();
}
const data = await scrape('https://news.ycombinator.com');
console.log(data.content); // Clean markdown content
This returns fully rendered content — even from JavaScript-heavy SPAs.
Get a Free API Key
curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create
You get 200 free credits instantly. No signup, no credit card.
5 Practical Scraping Recipes
1. Extract Article Text from Any URL
async function extractArticle(url) {
const res = await fetch(
`${BASE}/scrape?url=${encodeURIComponent(url)}&format=text`,
{ headers: { 'X-API-Key': API_KEY } }
);
const data = await res.json();
return {
title: data.title,
content: data.content,
wordCount: data.content.split(/\s+/).length,
url: data.url
};
}
const article = await extractArticle('https://blog.example.com/post');
console.log(`"${article.title}" — ${article.wordCount} words`);
2. Monitor a Page for Changes
import { createHash } from 'crypto';
const WATCHED_URLS = [
'https://competitor.com/pricing',
'https://example.com/status',
];
async function checkForChanges() {
for (const url of WATCHED_URLS) {
const res = await fetch(
`${BASE}/scrape?url=${encodeURIComponent(url)}&format=text`,
{ headers: { 'X-API-Key': API_KEY } }
);
const { content } = await res.json();
const hash = createHash('sha256').update(content).digest('hex');
const previous = cache.get(url);
if (previous && previous !== hash) {
console.log(`⚠️ CHANGE DETECTED: ${url}`);
// Send alert via webhook, email, Slack, etc.
}
cache.set(url, hash);
}
}
// Run every 30 minutes
setInterval(checkForChanges, 30 * 60 * 1000);
3. Scrape Product Prices from Multiple Sites
async function scrapePrice(url, selector) {
const res = await fetch(
`${BASE}/scrape?url=${encodeURIComponent(url)}&format=markdown`,
{ headers: { 'X-API-Key': API_KEY } }
);
const { content } = await res.json();
// Extract price pattern from markdown content
const priceMatch = content.match(/\$[\d,]+\.?\d*/);
return priceMatch ? priceMatch[0] : 'Price not found';
}
const sites = [
'https://store-a.com/product/123',
'https://store-b.com/item/456',
];
const prices = await Promise.all(
sites.map(url => scrapePrice(url))
);
console.table(prices);
4. Build a Content Aggregator
async function aggregateContent(sources) {
const results = [];
for (const source of sources) {
const res = await fetch(
`${BASE}/scrape?url=${encodeURIComponent(source.url)}&format=markdown`,
{ headers: { 'X-API-Key': API_KEY } }
);
const data = await res.json();
results.push({
source: source.name,
title: data.title,
preview: data.content.substring(0, 200) + '...',
scrapedAt: new Date().toISOString()
});
}
return results;
}
const feed = await aggregateContent([
{ name: 'HN', url: 'https://news.ycombinator.com' },
{ name: 'Product Hunt', url: 'https://www.producthunt.com' },
{ name: 'Dev.to', url: 'https://dev.to' },
]);
console.log(JSON.stringify(feed, null, 2));
5. Extract Structured Data (Emails, Links, Metadata)
async function extractMetadata(url) {
const res = await fetch(
`${BASE}/scrape?url=${encodeURIComponent(url)}&format=markdown`,
{ headers: { 'X-API-Key': API_KEY } }
);
const { content, title } = await res.json();
// Extract all links
const links = [...content.matchAll(/\[([^\]]+)\]\(([^)]+)\)/g)]
.map(m => ({ text: m[1], href: m[2] }));
// Extract emails
const emails = [...content.matchAll(/[\w.-]+@[\w.-]+\.\w+/g)]
.map(m => m[0]);
return { title, linkCount: links.length, links: links.slice(0, 10), emails };
}
const meta = await extractMetadata('https://example.com/contact');
console.log(`Found ${meta.linkCount} links and ${meta.emails.length} emails`);
Adding Screenshots for Visual Monitoring
Combine scraping with screenshots to get both content and visual snapshots:
async function fullPageAudit(url) {
const [scrapeRes, screenshotRes] = await Promise.all([
fetch(`${BASE}/scrape?url=${encodeURIComponent(url)}&format=text`, {
headers: { 'X-API-Key': API_KEY }
}),
fetch(`${BASE}/screenshot?url=${encodeURIComponent(url)}&width=1280`, {
headers: { 'X-API-Key': API_KEY }
})
]);
const content = await scrapeRes.json();
const screenshot = await screenshotRes.json();
return {
title: content.title,
wordCount: content.content.split(/\s+/).length,
screenshotUrl: screenshot.url,
timestamp: new Date().toISOString()
};
}
Python Version
import requests
API_KEY = 'your-api-key'
BASE = 'https://agent-gateway-kappa.vercel.app/api'
def scrape(url, fmt='markdown'):
res = requests.get(
f'{BASE}/scrape',
params={'url': url, 'format': fmt},
headers={'X-API-Key': API_KEY}
)
return res.json()
data = scrape('https://news.ycombinator.com')
print(data['title'])
print(data['content'][:500])
cURL One-Liner
# Quick scrape from the terminal
curl "https://agent-gateway-kappa.vercel.app/api/scrape?url=https://example.com&format=text" \
-H "X-API-Key: YOUR_KEY"
Why This Beats DIY Scraping
| Feature | DIY (Puppeteer/Playwright) | Scraping API |
|---|---|---|
| Setup time | Hours (Chrome, deps) | 0 minutes |
| JS rendering | Manual config | Built-in |
| Anti-bot bypass | You handle it | Built-in |
| Proxy rotation | You pay + manage | Built-in |
| Server costs | $20-100/mo for Chrome | Pay per request |
| Maintenance | Constant | Zero |
Rate Limiting & Best Practices
- Be respectful: Don't hammer sites with 100 requests/second
- Cache results: Store scraped data locally to avoid redundant calls
- Check robots.txt: Respect site policies (the API handles this for you)
-
Use appropriate formats:
textfor content extraction,markdownfor structured data,htmlfor full page
Get Started
- Get your free API key (200 credits, no signup):
curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create
- Start scraping:
curl "https://agent-gateway-kappa.vercel.app/api/scrape?url=https://example.com&format=markdown" \
-H "X-API-Key: YOUR_KEY"
Full docs: API Documentation
Top comments (0)