DEV Community

luisgustvo
luisgustvo

Posted on

Integrating CapSolver with Maxun for Seamless Web Data Extraction

In the world of web scraping, Maxun has emerged as a practical open-source, no-code platform for automating data collection. Its robot-based approach allows developers and teams to build scraping pipelines efficiently. However, as many developers know, CAPTCHAs often present a significant hurdle in automated workflows.

To maintain reliable data extraction, integrating a CAPTCHA solving service like CapSolver can help. This guide explores how to combine Maxun's flexibility with CapSolver's automated solving capabilities to handle protected websites.


What is Maxun?

Maxun is an open-source platform designed to simplify web data extraction. It allows users to "train" robots to perform scraping tasks through a visual interface or via its TypeScript/Node.js SDK.

Core Capabilities

  • Visual Robot Builder: Create extraction workflows without writing code.
  • Developer SDK: A robust SDK for programmatic control over robots.
  • Flexible Deployment: Supports both self-hosted (Docker) and cloud-based environments.
  • Smart Selectors: Automatically identifies elements to improve scraper stability.
  • Built-in Scheduling: Run extraction tasks using cron-based schedules.
SDK Class Functionality
Extract Structured data extraction using CSS selectors or LLMs.
Scrape Converts pages into Markdown, HTML, or static screenshots.
Crawl Discovers and processes multiple pages via sitemaps or link following.
Search Extracts content directly from search engine results.

Why Use CapSolver with Maxun?

When scraping sites protected by anti-bot mechanisms, CAPTCHAs can interrupt your Maxun robots. CapSolver provides an API-based solution to bypass these challenges.

Supported Verification Types

CapSolver handles various common challenges, including:

By integrating these two tools, you can ensure that your data extraction remains automated and requires less manual intervention when encountering blocked pages.


Getting Started

Prerequisites

Installation

Install the Maxun SDK and Axios for API requests:

npm install maxun-sdk axios
Enter fullscreen mode Exit fullscreen mode

Environment Configuration

Set up your .env file with the necessary credentials:

CAPSOLVER_API_KEY=your_capsolver_key
MAXUN_API_KEY=your_maxun_key
MAXUN_BASE_URL=https://app.maxun.dev/api/sdk # Use your local URL if self-hosted
Enter fullscreen mode Exit fullscreen mode

Implementation: CapSolver Service

Below is a TypeScript service to handle CAPTCHA solving logic. This service uses a polling mechanism to retrieve results from CapSolver's asynchronous API.

import axios, { AxiosInstance } from 'axios';

interface TaskResult {
  gRecaptchaResponse?: string;
  token?: string;
}

class CapSolverService {
  private client: AxiosInstance;
  private apiKey: string;

  constructor(apiKey: string) {
    this.apiKey = apiKey;
    this.client = axios.create({
      baseURL: 'https://api.capsolver.com',
      headers: { 'Content-Type': 'application/json' },
    });
  }

  private async pollResult(taskId: string, attempts = 30): Promise<TaskResult> {
    for (let i = 0; i < attempts; i++) {
      await new Promise(r => setTimeout(r, 2000));
      const res = await this.client.post('/getTaskResult', {
        clientKey: this.apiKey,
        taskId,
      });

      if (res.data.status === 'ready') return res.data.solution;
      if (res.data.status === 'failed') throw new Error('Task failed');
    }
    throw new Error('Timeout');
  }

  async solveReCaptchaV2(url: string, siteKey: string): Promise<string> {
    const res = await this.client.post('/createTask', {
      clientKey: this.apiKey,
      task: { type: 'ReCaptchaV2TaskProxyLess', websiteURL: url, websiteKey: siteKey },
    });
    const solution = await this.pollResult(res.data.taskId);
    return solution.gRecaptchaResponse || '';
  }
}

export { CapSolverService };
Enter fullscreen mode Exit fullscreen mode

Integration Patterns

1. Pre-Authentication with Maxun

For many sites, you need to solve the CAPTCHA and establish a session before the scraper can access the data.

import { Scrape } from 'maxun-sdk';
import { CapSolverService } from './CapSolverService';
import axios from 'axios';

async function runScraper() {
  const capSolver = new CapSolverService(process.env.CAPSOLVER_API_KEY!);
  const scraper = new Scrape({ apiKey: process.env.MAXUN_API_KEY! });

  const targetUrl = 'https://example.com/data';
  const siteKey = 'SITE_KEY_HERE';

  // Solve CAPTCHA
  const token = await capSolver.solveReCaptchaV2(targetUrl, siteKey);

  // Example: Submit token to the site to get a session
  const authRes = await axios.post(`${targetUrl}/verify`, { 'g-recaptcha-response': token });
  const cookies = authRes.headers['set-cookie'];

  // Run Maxun robot with the session
  const robot = await scraper.create('data-robot', targetUrl);
  // Note: Pass cookies to the robot if supported by your Maxun version
  const result = await robot.run();

  console.log(result.data);
}
Enter fullscreen mode Exit fullscreen mode

2. Handling Parallel Tasks

When running multiple robots, you can manage CAPTCHA solving concurrently to improve throughput.

async function processBatch(urls: string[]) {
  const tasks = urls.map(async (url) => {
    // Logic for individual CAPTCHA solving and scraping
  });
  return Promise.all(tasks);
}
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Error Handling: Always implement retries for CAPTCHA solving, as network issues or timeouts can occur.
  2. Balance Monitoring: Check your CapSolver balance programmatically to avoid workflow interruptions.
  3. Token Management: CAPTCHA tokens usually have a short lifespan (around 90-120 seconds). Ensure you use them immediately after solving.

Conclusion

Combining Maxun with CapSolver provides a scalable way to handle web data extraction even when faced with modern anti-bot protections. By separating the CAPTCHA solving logic from the extraction process, you can maintain clean and maintainable code.

Tip: New users can use the code MAXUN at CapSolver for a 6% bonus on their first deposit.


FAQ

Is Maxun free?
Yes, Maxun is open-source and can be self-hosted for free. They also offer a managed cloud service.

What CAPTCHAs does CapSolver support?
It supports reCAPTCHA, Cloudflare Turnstile, AWS WAF, and several others.

How do I find a site key?
You can usually find it in the HTML source code by searching for data-sitekey or checking network requests to the CAPTCHA provider.

Top comments (0)