Last month, I launched my first profitable data API after getting frustrated with the existing email extraction tools. Here's the complete breakdown of how I went from idea to $500 MRR in 30 days.
The problem with DIY scrapers
I was building a lead generation tool for my consulting business and needed to extract emails from company websites. The existing solutions were either:
- Too expensive ($200+/month for basic plans)
- Unreliable (failed on modern SPAs)
- Limited (couldn't handle JavaScript-heavy sites)
- Slow (30+ second response times)
After spending two weeks wrestling with Beautiful Soup and Puppeteer, I realized I was solving a problem many developers face. That's when I decided to build a robust API and monetize it.
Architecture: Express → Railway → RapidAPI
Here's my tech stack:
// package.json dependencies
{
"express": "^4.18.2",
"puppeteer": "^21.5.0",
"cheerio": "^1.0.0-rc.12",
"validator": "^13.11.0",
"rate-limiter-flexible": "^3.0.4",
"helmet": "^7.1.0"
}
Why Express? Fast development, huge ecosystem, and easy deployment.
Why Railway? One-click deployments, automatic SSL, and affordable scaling. Their free tier was perfect for testing.
Why RapidAPI? Built-in marketplace, handles billing/auth, and gives you instant credibility.
The architecture is straightforward:
- RapidAPI forwards requests to my Railway-hosted Express server
- Puppeteer launches a headless Chrome instance
- Page content gets parsed with Cheerio
- Emails are extracted, validated, and returned as JSON
One real endpoint walkthrough with code
Let me show you the core /extract endpoint:
app.post('/extract', async (req, res) => {
const { url, deep_scan = false } = req.body;
// Validation
if (!url || !validator.isURL(url)) {
return res.status(400).json({
error: 'Valid URL required'
});
}
let browser;
try {
browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-dev-shm-usage']
});
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (compatible; EmailExtractor/1.0)');
// Navigate with timeout
await page.goto(url, {
waitUntil: 'networkidle0',
timeout: 15000
});
// Extract emails from current page
let emails = await extractEmailsFromPage(page);
// Deep scan: follow internal links
if (deep_scan && emails.length < 3) {
const links = await page.$$eval('a[href]', anchors =>
anchors
.map(a => a.href)
.filter(href => href.includes(new URL(page.url()).hostname))
.slice(0, 5) // Limit to prevent abuse
);
for (const link of links) {
try {
await page.goto(link, { waitUntil: 'domcontentloaded', timeout: 8000 });
const pageEmails = await extractEmailsFromPage(page);
emails = [...new Set([...emails, ...pageEmails])]; // Dedupe
} catch (err) {
// Silently continue if individual pages fail
}
}
}
res.json({
url,
emails: emails.slice(0, 50), // Limit results
count: emails.length,
deep_scan,
processed_at: new Date().toISOString()
});
} catch (error) {
console.error('Extraction failed:', error);
res.status(500).json({
error: 'Failed to extract emails',
message: error.message
});
} finally {
if (browser) await browser.close();
}
});
async function extractEmailsFromPage(page) {
return await page.evaluate(() => {
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
const text = document.body.innerText;
const emails = text.match(emailRegex) || [];
// Filter out common false positives
return emails.filter(email =>
!email.includes('example.com') &&
!email.includes('test.com') &&
!email.includes('placeholder')
);
});
}
The key insights here:
- Always validate inputs first
- Use proper timeout handling for web scraping
- Implement result limits to prevent abuse
- Clean up resources (browser instances) in
finallyblocks
Pricing strategy for API monetization
I studied competitor pricing and went with a freemium model:
- Free: 50 requests/month
- Basic ($9/month): 1,000 requests + deep scan
- Pro ($29/month): 5,000 requests + priority support
- Enterprise ($99/month): 25,000 requests + custom features
The sweet spot was the Pro plan. Most developers need more than 1,000 requests but don't want to pay enterprise pricing.
Pricing lessons:
- Start higher than you think (I initially priced Basic at $5)
- Offer clear value jumps between tiers
- Include one "expensive" tier to make others look reasonable
Lessons learned
1. Monitoring is everything
I use Railway's built-in metrics plus custom logging. Memory leaks from unclosed browser instances killed my server twice in the first week.
2. Rate limiting saves money
Without proper limits, one user can spin up dozens of Puppeteer instances and crash your server:
const rateLimiter = new RateLimiterFlexible({
keyGenerator: (req) => req.headers['x-rapidapi-user'] || req.ip,
points: 10, // requests
duration: 60, // per 60 seconds
});
3. Error handling is user experience
Return meaningful error messages. "Internal server error" helps nobody.
4. Documentation drives adoption
I spent 40% of my time writing clear API docs with examples. It shows in the conversion rate.
What's next?
The API is profitable and growing. Next features:
- Email validation/verification
- Social media profile extraction
- Webhook support for async processing
Building APIs taught me that solving developer problems can be incredibly rewarding—both personally and financially.
Want to try it out? Check out the Email Extractor API and let me know what you think. I'm always looking for feedback from fellow developers!
What APIs are you building? Drop a comment—I'd love to hear about your projects.
Top comments (0)