Last month, I launched my first profitable API on RapidAPI - an Apollo Lead Scraper that's now serving 10k+ requests daily. Here's the honest breakdown of how I built it, the mistakes I made, and what I'd do differently.
The problem with DIY scrapers
I used to maintain 6 different web scrapers for lead generation across my side projects. Every few weeks, one would break because:
- Apollo updated their anti-bot detection
- Rate limits changed unexpectedly
- My IP got blocked from too many requests
- Parsing logic broke after UI updates
The breaking point came when I spent a whole weekend fixing scrapers instead of building features. That's when I decided: what if I centralized this into a robust API that other developers could use too?
Architecture: Express → Railway → RapidAPI
I went with a simple but scalable stack:
Backend: Node.js/Express with Playwright for browser automation
Hosting: Railway (honestly the best developer experience I've had)
Distribution: RapidAPI for instant global reach
Monitoring: Custom dashboard + Railway metrics
Here's my project structure:
apollo-scraper-api/
├── src/
│ ├── scrapers/
│ │ └── apollo.js
│ ├── middleware/
│ │ └── rateLimiter.js
│ ├── utils/
│ │ └── validator.js
│ └── app.js
├── Dockerfile
└── railway.json
The key insight: treat it like enterprise software from day one. No "I'll refactor later" shortcuts.
One real endpoint walkthrough with code
Let me show you the /search-people endpoint that generates 80% of my revenue:
// src/app.js
app.post('/api/v1/search-people', validateRequest, rateLimiter, async (req, res) => {
const { company_names, job_titles, location, limit = 25 } = req.body;
try {
const results = await apolloScraper.searchPeople({
company_names: Array.isArray(company_names) ? company_names : [company_names],
job_titles: Array.isArray(job_titles) ? job_titles : [job_titles],
location,
limit: Math.min(limit, 100) // Cap at 100
});
res.json({
success: true,
count: results.length,
data: results,
credits_used: Math.ceil(results.length / 10)
});
} catch (error) {
logger.error('Search failed:', error);
res.status(500).json({
success: false,
error: 'Search temporarily unavailable'
});
}
});
The scraper core uses Playwright with rotating user agents:
// src/scrapers/apollo.js
class ApolloScraper {
constructor() {
this.userAgents = [
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
// ... more agents
];
}
async searchPeople({ company_names, job_titles, location, limit }) {
const browser = await playwright.chromium.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const context = await browser.newContext({
userAgent: this.getRandomUserAgent(),
viewport: { width: 1920, height: 1080 },
extraHTTPHeaders: {
'Accept-Language': 'en-US,en;q=0.9'
}
});
const page = await context.newPage();
try {
// Build search URL
const searchParams = new URLSearchParams({
finder_table_layout: 'list',
person_titles: job_titles.join(','),
organization_names: company_names.join(','),
person_locations: location || '',
per_page: limit
});
await page.goto(`https://app.apollo.io/#/people?${searchParams}`);
await page.waitForSelector('[data-cy="person-name"]', { timeout: 10000 });
// Extract data
const people = await page.$$eval('[data-cy="person-row"]', (rows) => {
return rows.map(row => ({
name: row.querySelector('[data-cy="person-name"]')?.textContent?.trim(),
title: row.querySelector('[data-cy="person-title"]')?.textContent?.trim(),
company: row.querySelector('[data-cy="person-company"]')?.textContent?.trim(),
location: row.querySelector('[data-cy="person-location"]')?.textContent?.trim(),
email: row.querySelector('[data-cy="person-email"]')?.textContent?.trim(),
linkedin_url: row.querySelector('a[href*="linkedin"]')?.href
})).filter(person => person.name);
});
return people;
} finally {
await browser.close();
}
}
}
Pricing strategy for API monetization
This took me 3 iterations to get right. My current model:
- Free tier: 100 requests/month
- Basic ($29/month): 2,500 requests
- Pro ($99/month): 10,000 requests
- Enterprise ($299/month): 50,000 requests + priority support
Key insight: I charge per successful result, not per API call. If my scraper fails, users don't get charged. This builds massive trust.
The sweet spot was pricing 60% below what companies pay for Apollo's official plans, while providing more flexible data access.
Lessons learned
1. Observability from day one: I wish I'd added comprehensive logging earlier. When things break at 3 AM, you need detailed traces.
2. Rate limiting is crucial: Not just for your API, but for the upstream service. I got Apollo.io mad at me early on.
3. Railway > Heroku for APIs: Railway's automatic deployments and built-in metrics saved me weeks of DevOps work.
4. RapidAPI's discovery is real: 70% of my users found me through their marketplace, not my own marketing.
5. Cache everything: I cache successful results for 24 hours. Same search query = instant response + happy users.
The biggest mistake? Not building a proper retry mechanism initially. Network failures would kill entire scraping jobs.
What's next
I'm adding email finder capabilities and LinkedIn profile enrichment. The goal is becoming the go-to alternative for developers who need reliable lead data without Apollo's restrictions.
Want to try it out? I've got a generous free tier running at https://rapidapi.com/donnydev/api/apollo-lead-scraper
Would love to hear about your API building experiences in the comments! What's the biggest challenge you've faced productionizing a side project?
Top comments (0)