DEV Community

Donny
Donny

Posted on

Apollo Lead Scraper API — Complete Guide

Last month, I launched my first profitable API on RapidAPI - an Apollo Lead Scraper that's now serving 10k+ requests daily. Here's the honest breakdown of how I built it, the mistakes I made, and what I'd do differently.

The problem with DIY scrapers

I used to maintain 6 different web scrapers for lead generation across my side projects. Every few weeks, one would break because:

  • Apollo updated their anti-bot detection
  • Rate limits changed unexpectedly
  • My IP got blocked from too many requests
  • Parsing logic broke after UI updates

The breaking point came when I spent a whole weekend fixing scrapers instead of building features. That's when I decided: what if I centralized this into a robust API that other developers could use too?

Architecture: Express → Railway → RapidAPI

I went with a simple but scalable stack:

Backend: Node.js/Express with Playwright for browser automation
Hosting: Railway (honestly the best developer experience I've had)
Distribution: RapidAPI for instant global reach
Monitoring: Custom dashboard + Railway metrics

Here's my project structure:

apollo-scraper-api/
├── src/
│   ├── scrapers/
│   │   └── apollo.js
│   ├── middleware/
│   │   └── rateLimiter.js
│   ├── utils/
│   │   └── validator.js
│   └── app.js
├── Dockerfile
└── railway.json
Enter fullscreen mode Exit fullscreen mode

The key insight: treat it like enterprise software from day one. No "I'll refactor later" shortcuts.

One real endpoint walkthrough with code

Let me show you the /search-people endpoint that generates 80% of my revenue:

// src/app.js
app.post('/api/v1/search-people', validateRequest, rateLimiter, async (req, res) => {
  const { company_names, job_titles, location, limit = 25 } = req.body;

  try {
    const results = await apolloScraper.searchPeople({
      company_names: Array.isArray(company_names) ? company_names : [company_names],
      job_titles: Array.isArray(job_titles) ? job_titles : [job_titles],
      location,
      limit: Math.min(limit, 100) // Cap at 100
    });

    res.json({
      success: true,
      count: results.length,
      data: results,
      credits_used: Math.ceil(results.length / 10)
    });
  } catch (error) {
    logger.error('Search failed:', error);
    res.status(500).json({
      success: false,
      error: 'Search temporarily unavailable'
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

The scraper core uses Playwright with rotating user agents:

// src/scrapers/apollo.js
class ApolloScraper {
  constructor() {
    this.userAgents = [
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      // ... more agents
    ];
  }

  async searchPeople({ company_names, job_titles, location, limit }) {
    const browser = await playwright.chromium.launch({ 
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });

    const context = await browser.newContext({
      userAgent: this.getRandomUserAgent(),
      viewport: { width: 1920, height: 1080 },
      extraHTTPHeaders: {
        'Accept-Language': 'en-US,en;q=0.9'
      }
    });

    const page = await context.newPage();

    try {
      // Build search URL
      const searchParams = new URLSearchParams({
        finder_table_layout: 'list',
        person_titles: job_titles.join(','),
        organization_names: company_names.join(','),
        person_locations: location || '',
        per_page: limit
      });

      await page.goto(`https://app.apollo.io/#/people?${searchParams}`);
      await page.waitForSelector('[data-cy="person-name"]', { timeout: 10000 });

      // Extract data
      const people = await page.$$eval('[data-cy="person-row"]', (rows) => {
        return rows.map(row => ({
          name: row.querySelector('[data-cy="person-name"]')?.textContent?.trim(),
          title: row.querySelector('[data-cy="person-title"]')?.textContent?.trim(),
          company: row.querySelector('[data-cy="person-company"]')?.textContent?.trim(),
          location: row.querySelector('[data-cy="person-location"]')?.textContent?.trim(),
          email: row.querySelector('[data-cy="person-email"]')?.textContent?.trim(),
          linkedin_url: row.querySelector('a[href*="linkedin"]')?.href
        })).filter(person => person.name);
      });

      return people;
    } finally {
      await browser.close();
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Pricing strategy for API monetization

This took me 3 iterations to get right. My current model:

  • Free tier: 100 requests/month
  • Basic ($29/month): 2,500 requests
  • Pro ($99/month): 10,000 requests
  • Enterprise ($299/month): 50,000 requests + priority support

Key insight: I charge per successful result, not per API call. If my scraper fails, users don't get charged. This builds massive trust.

The sweet spot was pricing 60% below what companies pay for Apollo's official plans, while providing more flexible data access.

Lessons learned

1. Observability from day one: I wish I'd added comprehensive logging earlier. When things break at 3 AM, you need detailed traces.

2. Rate limiting is crucial: Not just for your API, but for the upstream service. I got Apollo.io mad at me early on.

3. Railway > Heroku for APIs: Railway's automatic deployments and built-in metrics saved me weeks of DevOps work.

4. RapidAPI's discovery is real: 70% of my users found me through their marketplace, not my own marketing.

5. Cache everything: I cache successful results for 24 hours. Same search query = instant response + happy users.

The biggest mistake? Not building a proper retry mechanism initially. Network failures would kill entire scraping jobs.

What's next

I'm adding email finder capabilities and LinkedIn profile enrichment. The goal is becoming the go-to alternative for developers who need reliable lead data without Apollo's restrictions.

Want to try it out? I've got a generous free tier running at https://rapidapi.com/donnydev/api/apollo-lead-scraper

Would love to hear about your API building experiences in the comments! What's the biggest challenge you've faced productionizing a side project?

Top comments (0)