In the world of web scraping, Maxun has emerged as a practical open-source, no-code platform for automating data collection. Its robot-based approach allows developers and teams to build scraping pipelines efficiently. However, as many developers know, CAPTCHAs often present a significant hurdle in automated workflows.
To maintain reliable data extraction, integrating a CAPTCHA solving service like CapSolver can help. This guide explores how to combine Maxun's flexibility with CapSolver's automated solving capabilities to handle protected websites.
What is Maxun?
Maxun is an open-source platform designed to simplify web data extraction. It allows users to "train" robots to perform scraping tasks through a visual interface or via its TypeScript/Node.js SDK.
Core Capabilities
- Visual Robot Builder: Create extraction workflows without writing code.
- Developer SDK: A robust SDK for programmatic control over robots.
- Flexible Deployment: Supports both self-hosted (Docker) and cloud-based environments.
- Smart Selectors: Automatically identifies elements to improve scraper stability.
- Built-in Scheduling: Run extraction tasks using cron-based schedules.
| SDK Class | Functionality |
|---|---|
| Extract | Structured data extraction using CSS selectors or LLMs. |
| Scrape | Converts pages into Markdown, HTML, or static screenshots. |
| Crawl | Discovers and processes multiple pages via sitemaps or link following. |
| Search | Extracts content directly from search engine results. |
Why Use CapSolver with Maxun?
When scraping sites protected by anti-bot mechanisms, CAPTCHAs can interrupt your Maxun robots. CapSolver provides an API-based solution to bypass these challenges.
Supported Verification Types
CapSolver handles various common challenges, including:
By integrating these two tools, you can ensure that your data extraction remains automated and requires less manual intervention when encountering blocked pages.
Getting Started
Prerequisites
- Node.js (v18+)
- A CapSolver API Key
- A Maxun instance (Cloud or Self-hosted)
Installation
Install the Maxun SDK and Axios for API requests:
npm install maxun-sdk axios
Environment Configuration
Set up your .env file with the necessary credentials:
CAPSOLVER_API_KEY=your_capsolver_key
MAXUN_API_KEY=your_maxun_key
MAXUN_BASE_URL=https://app.maxun.dev/api/sdk # Use your local URL if self-hosted
Implementation: CapSolver Service
Below is a TypeScript service to handle CAPTCHA solving logic. This service uses a polling mechanism to retrieve results from CapSolver's asynchronous API.
import axios, { AxiosInstance } from 'axios';
interface TaskResult {
gRecaptchaResponse?: string;
token?: string;
}
class CapSolverService {
private client: AxiosInstance;
private apiKey: string;
constructor(apiKey: string) {
this.apiKey = apiKey;
this.client = axios.create({
baseURL: 'https://api.capsolver.com',
headers: { 'Content-Type': 'application/json' },
});
}
private async pollResult(taskId: string, attempts = 30): Promise<TaskResult> {
for (let i = 0; i < attempts; i++) {
await new Promise(r => setTimeout(r, 2000));
const res = await this.client.post('/getTaskResult', {
clientKey: this.apiKey,
taskId,
});
if (res.data.status === 'ready') return res.data.solution;
if (res.data.status === 'failed') throw new Error('Task failed');
}
throw new Error('Timeout');
}
async solveReCaptchaV2(url: string, siteKey: string): Promise<string> {
const res = await this.client.post('/createTask', {
clientKey: this.apiKey,
task: { type: 'ReCaptchaV2TaskProxyLess', websiteURL: url, websiteKey: siteKey },
});
const solution = await this.pollResult(res.data.taskId);
return solution.gRecaptchaResponse || '';
}
}
export { CapSolverService };
Integration Patterns
1. Pre-Authentication with Maxun
For many sites, you need to solve the CAPTCHA and establish a session before the scraper can access the data.
import { Scrape } from 'maxun-sdk';
import { CapSolverService } from './CapSolverService';
import axios from 'axios';
async function runScraper() {
const capSolver = new CapSolverService(process.env.CAPSOLVER_API_KEY!);
const scraper = new Scrape({ apiKey: process.env.MAXUN_API_KEY! });
const targetUrl = 'https://example.com/data';
const siteKey = 'SITE_KEY_HERE';
// Solve CAPTCHA
const token = await capSolver.solveReCaptchaV2(targetUrl, siteKey);
// Example: Submit token to the site to get a session
const authRes = await axios.post(`${targetUrl}/verify`, { 'g-recaptcha-response': token });
const cookies = authRes.headers['set-cookie'];
// Run Maxun robot with the session
const robot = await scraper.create('data-robot', targetUrl);
// Note: Pass cookies to the robot if supported by your Maxun version
const result = await robot.run();
console.log(result.data);
}
2. Handling Parallel Tasks
When running multiple robots, you can manage CAPTCHA solving concurrently to improve throughput.
async function processBatch(urls: string[]) {
const tasks = urls.map(async (url) => {
// Logic for individual CAPTCHA solving and scraping
});
return Promise.all(tasks);
}
Best Practices
- Error Handling: Always implement retries for CAPTCHA solving, as network issues or timeouts can occur.
- Balance Monitoring: Check your CapSolver balance programmatically to avoid workflow interruptions.
- Token Management: CAPTCHA tokens usually have a short lifespan (around 90-120 seconds). Ensure you use them immediately after solving.
Conclusion
Combining Maxun with CapSolver provides a scalable way to handle web data extraction even when faced with modern anti-bot protections. By separating the CAPTCHA solving logic from the extraction process, you can maintain clean and maintainable code.
Tip: New users can use the code MAXUN at CapSolver for a 6% bonus on their first deposit.
FAQ
Is Maxun free?
Yes, Maxun is open-source and can be self-hosted for free. They also offer a managed cloud service.
What CAPTCHAs does CapSolver support?
It supports reCAPTCHA, Cloudflare Turnstile, AWS WAF, and several others.
How do I find a site key?
You can usually find it in the HTML source code by searching for data-sitekey or checking network requests to the CAPTCHA provider.
Top comments (0)