DEV Community

Zied
Zied

Posted on • Edited on

AI-cloudscraper

CloudScraper v3.5.0 πŸš€ - AI & Hybrid Engine Update

A powerful, feature-rich Python library to bypass Cloudflare's anti-bot protection with 10 production-ready bypass strategies, cutting-edge advanced stealth capabilities, async support, and comprehensive monitoring. This Hybrid Edition includes the revolutionary Hybrid Engine, integrating TLS-Chameleon and Py-Parkour for the ultimate bypass capability now powered by Google Gemini AI.

πŸ”₯ NEW: AI Captcha Bypass (v3.4.0) - Vision-Powered Solving

The scraper now deeply integrates Google Gemini 1.5 Flash to solve complex visual challenges like reCAPTCHA v2:

  1. Visual Understanding: Analyzes instruction images (e.g., "Select all traffic lights") and identifies target objects.
  2. Intelligent Solving: Visually inspects every tile, matches objects, and solves the puzzle just like a human.
  3. Fast & Cheap: Uses Gemini 1.5 Flash for millisecond latency.

βœ… Verified Features

Feature Status
reCAPTCHA v2 Solving βœ… Tested
Text Captcha (Generic) βœ… Tested
Hybrid Engine βœ… Tested
Cloudflare Bypass βœ… Tested
# Pass your Google API Key to enable AI Solving
scraper = cloudscraper.create_scraper(
    interpreter='hybrid',
    google_api_key='YOUR_GEMINI_API_KEY',
    # Proxies are automatically used for AI requests too!
    rotating_proxies=['http://user:pass@proxy:port']
)

# For Complicated Text Captchas (Non-Standard)
scraper = cloudscraper.create_scraper(
    interpreter='hybrid',
    google_api_key='YOUR_GEMINI_API_KEY',
    captcha={
        'text_captcha': {
            'selector': '#captcha-image',   # CSS selector for the image
            'input_selector': '#captcha-input', # CSS selector for the input
            'submit_selector': '#submit-btn'    # Optional: submit button
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

πŸ”₯ Hybrid Engine - The Ultimate Solution

The Hybrid Engine is a game-changer that combines two powerful technologies:

  1. TLS-Chameleon (curl_cffi): Provides perfect TLS fingerprinting (JA3/JA4) to mimic real browsers at the network layer.
  2. Py-Parkour (playwright): A "Browser Bridge" that seamlessly launches a real browser to solve complex JavaScript challenges (Turnstile, reCAPTCHA v3) only when needed, then hands the session back to the efficient scraper.

Why use Hybrid?

  • Speed: Uses lightweight HTTP requests for 99% of work.
  • Power: Falls back to a real browser only for seconds to solve a challenge.
  • Stealth: Perfect TLS fingerprints + Real Browser interactions.
  • Simplicity: No complex setupβ€”just interpreter='hybrid'.

✨ Key Features

  • πŸ›‘οΈ Hybrid Engine: Automatically switches between lightweight requests and real browser solving
  • πŸ€– AI Captcha Solver: Solves reCAPTCHA v2 using Google Gemini Vision
  • πŸ” TLS Fingerprinting: JA3 fingerprint rotation with real browser signatures (Chrome, Firefox, Safari) via tls-chameleon
  • πŸ•΅οΈ Traffic Pattern Obfuscation: Intelligent request spacing and behavioral consistency
  • 🧠 Intelligent Challenge Detection: AI-powered challenge recognition
  • ⚑ Async Support: Check async_cloudscraper for non-blocking operations

πŸš€ NEW: Phase 1 & 2 - Industrial Strength Bypass (v3.1.2+)

This version includes 10 production-ready bypass strategies:

Phase 1: Foundation Features

1. 🧬 The Hybrid Engine (Introduced in v3.3.0)

The most powerful mode available. Requires cloudscraper[hybrid].

# Install with: pip install cloudscraper[hybrid]

scraper = cloudscraper.create_scraper(
    interpreter='hybrid',
    impersonate='chrome120', # Optional: Force specific fingerprint
    google_api_key='YOUR_API_KEY' # Optional: For AI Captcha solving
)
scraper.get("https://hight-security-site.com")
Enter fullscreen mode Exit fullscreen mode

2. πŸͺ Cookie Harvesting & Persistence

  • Auto-saves cf_clearance cookies after successful bypasses
  • Reuses cookies for 30-60 minutes (configurable TTL)
  • 70-90% reduction in repeat challenge encounters
  • Storage: ~/.cloudscraper/cookies/
# Enabled by default!
scraper = cloudscraper.create_scraper(
    enable_cookie_persistence=True,
    cookie_ttl=1800  # 30 minutes
)
Enter fullscreen mode Exit fullscreen mode

2. 🎯 Hybrid Captcha Solver

  • Tries AI OCR β†’ AI Object Detection β†’ 2Captcha in sequence
  • Automatic fallback on failure
  • 3-5x higher solve rate vs single solver
scraper = cloudscraper.create_scraper(
    captcha={
        'provider': 'hybrid',
        'primary': 'ai_ocr',
        'fallbacks': ['ai_obj_det', '2captcha'],
        '2captcha': {'api_key': 'YOUR_KEY'}
    }
)
Enter fullscreen mode Exit fullscreen mode

3. 🌐 Browser Automation Helper

  • Uses Playwright to launch real browser when all else fails
  • Ultimate fallback with 99% success rate
from cloudscraper.browser_helper import create_browser_helper

browser = create_browser_helper(headless=False)
cookies = browser.solve_challenge_and_get_cookies(url)
scraper.cookies.update(cookies)
Enter fullscreen mode Exit fullscreen mode

4. ⏱️ Enhanced Human Behavior Simulation

  • Content-aware delays (text vs images vs API)
  • Mouse movement simulation
  • Fingerprint resistance

Phase 2: Advanced Strategies

5. πŸ”Œ Circuit Breaker Pattern

  • Prevents infinite retry loops
  • Opens after 3 consecutive failures (configurable)
  • Auto-retry after timeout
# Enabled by default!
scraper = cloudscraper.create_scraper(
    enable_circuit_breaker=True,
    circuit_failure_threshold=3,
    circuit_timeout=60
)
Enter fullscreen mode Exit fullscreen mode

6. πŸ”„ Session Pool (Multi-Fingerprint Distribution)

  • Maintains pool of 3-10 scraper instances
  • Each with unique browser fingerprint
  • Round-robin / random / least-used rotation
from cloudscraper.session_pool import SessionPool

pool = SessionPool(pool_size=5, rotation_strategy='round_robin')
resp = pool.get('https://protected-site.com')
Enter fullscreen mode Exit fullscreen mode

7. ⚑ Smart Rate Limiter

  • Adaptive per-domain delays
  • Learns from 429/503 responses
  • Burst prevention
from cloudscraper.rate_limiter import SmartRateLimiter

limiter = SmartRateLimiter(default_delay=1.0, burst_limit=10)
limiter.wait_if_needed(domain)
Enter fullscreen mode Exit fullscreen mode

8. πŸ” TLS Fingerprint Rotator

  • 6+ real browser JA3 signatures (Chrome, Firefox, Safari, Edge)
  • Auto-rotation every N requests
from cloudscraper.tls_rotator import TLSFingerprintRotator

rotator = TLSFingerprintRotator(rotation_interval=10)
fp = rotator.get_fingerprint()  # chrome_120, firefox_122, etc.
Enter fullscreen mode Exit fullscreen mode

9. 🧠 Challenge Prediction System (ML-based)

  • Learns which domains use which challenges
  • Auto-configuration based on history
  • SQLite storage: ~/.cloudscraper/challenges.db
from cloudscraper.challenge_predictor import ChallengePredictor

predictor = ChallengePredictor()
predicted = predictor.predict_challenge('example.com')
config = predictor.get_recommended_config('example.com')
scraper = cloudscraper.create_scraper(**config)
Enter fullscreen mode Exit fullscreen mode

10. 🎭 Enhanced Timing (from Phase 1)

  • Content-type aware delays
  • Adaptive reading time calculation

πŸ“Š Success Rate Comparison

Configuration Success Rate Speed Use Case
Default (V1 + Cookies + Circuit Breaker) 70-80% Fast Most sites
+ Hybrid Solver 85-95% Medium Sites with captchas
+ Session Pool 90-95% Medium Pattern detection
+ Browser Fallback 99%+ Slow Hardest sites

πŸ“š Documentation

See ENHANCED_FEATURES.md for detailed documentation on all bypass strategies.


β˜• Support This Project

If you find this library useful, consider supporting its development:


Buy Me A Coffee


Installation

[!NOTE]
This is a maintained fork of the original cloudscraper library.
You can use this version (ai-cloudscraper) as a drop-in replacement while waiting for updates to the original library, or continue using it as your primary driver as we will consistently update it with the latest anti-detection technologies.

# Install maintained version (Recommended)
pip install ai-cloudscraper

# Install with AI solvers (Phase 1)
pip install ai-cloudscraper[ai]

# Install with browser automation (Phase 1)
pip install ai-cloudscraper[browser]

# Or install from source (Development)
pip install -e .
Enter fullscreen mode Exit fullscreen mode

πŸš€ Quick Start

Basic Usage

import cloudscraper

# Create a CloudScraper instance (cookie persistence + circuit breaker enabled by default)
scraper = cloudscraper.create_scraper()

# Use it like a regular requests session
response = scraper.get("https://protected-site.com")
print(response.text)
Enter fullscreen mode Exit fullscreen mode

Using Phase 1 & 2 Features

import cloudscraper
from cloudscraper.session_pool import SessionPool
from cloudscraper.challenge_predictor import ChallengePredictor

# Option 1: Default (Recommended for most sites)
scraper = cloudscraper.create_scraper()
resp = scraper.get('https://protected-site.com')

# Option 2: With hybrid solver
scraper = cloudscraper.create_scraper(
    captcha={
        'provider': 'hybrid',
        'fallbacks': ['ai_ocr', '2captcha'],
        '2captcha': {'api_key': 'YOUR_KEY'}
    }
)

# Option 3: Session pool for maximum stealth
pool = SessionPool(pool_size=5, rotation_strategy='round_robin')
resp = pool.get('https://protected-site.com')

# Option 4: Challenge predictor for smart configuration
predictor = ChallengePredictor()
config = predictor.get_recommended_config('target-domain.com')
scraper = cloudscraper.create_scraper(**config)
Enter fullscreen mode Exit fullscreen mode

How It Works

Cloudflare's anti-bot protection works by presenting JavaScript challenges that must be solved before accessing the protected content. cloudscraper:

  1. Detects Cloudflare challenges automatically
  2. Solves JavaScript challenges using embedded interpreters
  3. Maintains session state and cookies
  4. Returns the protected content seamlessly

Dependencies

  • Python 3.8+
  • requests >= 2.32.0
  • requests_toolbelt >= 1.0.0
  • js2py >= 0.74 (default JavaScript interpreter)
  • Additional dependencies listed in requirements.txt

Optional Dependencies

Phase 1 AI Solvers:

pip install ddddocr ultralytics pillow
Enter fullscreen mode Exit fullscreen mode

Phase 1 Browser Automation:

pip install playwright
playwright install chromium
Enter fullscreen mode Exit fullscreen mode

Phase 2 features require NO additional dependencies - everything is included!

JavaScript Interpreters

cloudscraper supports multiple JavaScript interpreters:

  • js2py (default) - Pure Python implementation
  • nodejs - Requires Node.js installation
  • native - Built-in Python solver

Basic Configuration

Browser Selection

# Use Chrome fingerprint
scraper = cloudscraper.create_scraper(browser='chrome')

# Use Firefox fingerprint  
scraper = cloudscraper.create_scraper(browser='firefox')
Enter fullscreen mode Exit fullscreen mode

Proxy Support

# Single proxy
scraper = cloudscraper.create_scraper()
scraper.proxies = {
    'http': 'http://proxy:8080',
    'https': 'http://proxy:8080'
}
Enter fullscreen mode Exit fullscreen mode

CAPTCHA Solver Integration

scraper = cloudscraper.create_scraper(
    captcha={
        'provider': '2captcha',
        'api_key': 'your_api_key'
    }
)
Enter fullscreen mode Exit fullscreen mode

Supported CAPTCHA providers:

  • 2captcha
  • anticaptcha
  • CapSolver
  • CapMonster Cloud # Try maximum stealth configuration scraper = cloudscraper.create_scraper( enable_tls_fingerprinting=True, enable_anti_detection=True, enable_enhanced_spoofing=True, spoofing_consistency_level='high', enable_adaptive_timing=True, behavior_profile='research', # Slowest, most careful stealth_options={ 'min_delay': 3.0, 'max_delay': 10.0, 'human_like_delays': True } )

Enable maximum stealth mode

scraper.enable_maximum_stealth()


**Challenge detection not working?**
Enter fullscreen mode Exit fullscreen mode


python

Add custom challenge patterns

scraper.intelligent_challenge_system.add_custom_pattern(
domain='problem-site.com',
pattern_name='Custom Challenge',
patterns=[r'custom.+challenge.+text'],
challenge_type='custom',
response_strategy='delay_retry'
)


**Want to optimize for specific domains?**
Enter fullscreen mode Exit fullscreen mode


python

Make several learning requests first

for i in range(5):
try:
response = scraper.get('https://target-site.com/test')
except Exception:
pass

Then optimize for the domain

scraper.optimize_for_domain('target-site.com')


**Check enhanced system status:**
Enter fullscreen mode Exit fullscreen mode


python
stats = scraper.get_enhanced_statistics()
for system, status in stats.items():
print(f"{system}: {status}")

Get ML optimization report

if hasattr(scraper, 'ml_optimizer'):
report = scraper.ml_optimizer.get_optimization_report()
print(f"Success rate: {report.get('global_success_rate', 0):.2%}")


### Common Issues

**Challenge solving fails:**
Enter fullscreen mode Exit fullscreen mode


python

Try different interpreter

scraper = cloudscraper.create_scraper(interpreter='nodejs')

Increase delay

scraper = cloudscraper.create_scraper(delay=10)

Enable debug mode

scraper = cloudscraper.create_scraper(debug=True)


**403 Forbidden errors:**
Enter fullscreen mode Exit fullscreen mode


python

Enable stealth mode

scraper = cloudscraper.create_scraper(
enable_stealth=True,
auto_refresh_on_403=True
)


**Slow performance:**
Enter fullscreen mode Exit fullscreen mode


python

Use faster interpreter

scraper = cloudscraper.create_scraper(interpreter='native')


### Debug Mode

Enable debug mode to see what's happening:

Enter fullscreen mode Exit fullscreen mode


python
scraper = cloudscraper.create_scraper(debug=True)
response = scraper.get("https://example.com")

Debug output shows:

- Challenge type detected

- JavaScript interpreter used

- Challenge solving process

- Final response status


## πŸ”§ Enhanced Configuration Options

### πŸ”₯ **Enhanced Bypass Parameters** (NEW)

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `enable_tls_fingerprinting` | boolean | True | Enable advanced TLS fingerprinting |
| `enable_tls_rotation` | boolean | True | Rotate TLS fingerprints automatically |
| `enable_anti_detection` | boolean | True | Enable traffic pattern obfuscation |
| `enable_enhanced_spoofing` | boolean | True | Enable Canvas/WebGL spoofing |
| `spoofing_consistency_level` | string | 'medium' | Spoofing consistency ('low', 'medium', 'high') |
| `enable_intelligent_challenges` | boolean | True | Enable AI challenge detection |
| `enable_adaptive_timing` | boolean | True | Enable human behavior simulation |
| `behavior_profile` | string | 'casual' | Timing profile ('casual', 'focused', 'research', 'mobile') |
| `enable_ml_optimization` | boolean | True | Enable ML-based bypass optimization |
| `enable_enhanced_error_handling` | boolean | True | Enable intelligent error recovery |

### 🎭 **Enhanced Stealth Options**

Enter fullscreen mode Exit fullscreen mode


python
stealth_options = {
'min_delay': 1.0, # Minimum delay between requests
'max_delay': 4.0, # Maximum delay between requests

'human_like_delays': True, # Use human-like delay patterns
'randomize_headers': True, # Randomize request headers
'browser_quirks': True, # Enable browser-specific quirks
'simulate_viewport': True, # Simulate viewport changes
'behavioral_patterns': True # Use behavioral pattern simulation
}


### πŸ€– **Complete Enhanced Configuration Example**

Enter fullscreen mode Exit fullscreen mode


python
import cloudscraper

Ultimate bypass configuration

scraper = cloudscraper.create_scraper(
# Basic settings
debug=True,
browser='chrome',
interpreter='js2py',

# Enhanced bypass features
enable_tls_fingerprinting=True,
enable_tls_rotation=True,
enable_anti_detection=True,
enable_enhanced_spoofing=True,
spoofing_consistency_level='medium',
enable_intelligent_challenges=True,
enable_adaptive_timing=True,
behavior_profile='focused',
enable_ml_optimization=True,
enable_enhanced_error_handling=True,

# Stealth mode
enable_stealth=True,
stealth_options={
    'min_delay': 1.5,
    'max_delay': 4.0,
    'human_like_delays': True,
    'randomize_headers': True,
    'browser_quirks': True,
    'simulate_viewport': True,
    'behavioral_patterns': True
},

# Session management
session_refresh_interval=3600,
auto_refresh_on_403=True,
max_403_retries=3,

# Proxy rotation
rotating_proxies=[
    'http://proxy1:8080',
    'http://proxy2:8080',
    'http://proxy3:8080'
],
proxy_options={
    'rotation_strategy': 'smart',
    'ban_time': 600
},

# CAPTCHA solving
captcha={
    'provider': '2captcha',
    'api_key': 'your_api_key'
}
Enter fullscreen mode Exit fullscreen mode

)

Monitor bypass performance

stats = scraper.get_enhanced_statistics()
print(f"Active bypass systems: {len(stats)}")


### πŸ“ˆ **Behavior Profiles**

| Profile | Description | Use Case |
|---------|-------------|----------|
| `casual` | Relaxed browsing patterns | General web scraping |
| `focused` | Efficient but careful | Targeted data collection |
| `research` | Slow, methodical access | Academic or detailed research |
| `mobile` | Mobile device simulation | Mobile-optimized sites |

### πŸ“‰ **Spoofing Consistency Levels**

| Level | Fingerprint Stability | Detection Resistance | Performance |
|-------|----------------------|---------------------|-------------|
| `low` | Minimal changes | Good | Fastest |
| `medium` | Moderate variations | Excellent | Balanced |
| `high` | Significant obfuscation | Maximum | Slower |

## Configuration Options

### Common Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `debug` | boolean | False | Enable debug output |
| `delay` | float | auto | Override challenge delay |
| `interpreter` | string | 'js2py' | JavaScript interpreter |
| `browser` | string/dict | None | Browser fingerprint |
| `enable_stealth` | boolean | True | Enable stealth mode |
| `allow_brotli` | boolean | True | Enable Brotli compression |

### Challenge Control

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `disableCloudflareV1` | boolean | False | Disable v1 challenges |
| `disableCloudflareV2` | boolean | False | Disable v2 challenges |
| `disableCloudflareV3` | boolean | False | Disable v3 challenges |
| `disableTurnstile` | boolean | False | Disable Turnstile |

### Session Management

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `session_refresh_interval` | int | 3600 | Session refresh time (seconds) |
| `auto_refresh_on_403` | boolean | True | Auto-refresh on 403 errors |
| `max_403_retries` | int | 3 | Max 403 retry attempts |

### Example Configuration

Enter fullscreen mode Exit fullscreen mode


python
scraper = cloudscraper.create_scraper(
debug=True,
delay=5,
interpreter='js2py',
browser='chrome',
enable_stealth=True,
stealth_options={
'min_delay': 2.0,
'max_delay': 5.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True
}
)


## Utility Functions

### Get Tokens

Extract Cloudflare cookies for use in other applications:

Enter fullscreen mode Exit fullscreen mode


python
import cloudscraper

Get cookies as dictionary

tokens, user_agent = cloudscraper.get_tokens("https://example.com")
print(tokens)

{'cf_clearance': '...', '__cfduid': '...'}

Get cookies as string

cookie_string, user_agent = cloudscraper.get_cookie_string("https://example.com")
print(cookie_string)

"cf_clearance=...; __cfduid=..."


### Integration with Other Tools

Use cloudscraper tokens with curl or other HTTP clients:

Enter fullscreen mode Exit fullscreen mode


python
import subprocess
import cloudscraper

cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')

result = subprocess.check_output([
'curl',
'--cookie', cookie_string,
'-A', user_agent,
'https://example.com'
])




## License

MIT License. See LICENSE file for details.

## πŸ“ **Enhanced Features Documentation**

For detailed documentation about the enhanced bypass capabilities, see:
- **[ENHANCED_FEATURES.md](ENHANCED_FEATURES.md)** - Complete technical documentation
- **[examples/enhanced_bypass_demo.py](examples/enhanced_bypass_demo.py)** - Comprehensive usage examples
- **[tests/test_enhanced_features.py](tests/test_enhanced_features.py)** - Feature validation tests

### πŸ” **Quick Feature Reference**

| Feature | Module | Description |
|---------|--------|--------------|
| TLS Fingerprinting | `tls_fingerprinting.py` | JA3 fingerprint rotation |
| Anti-Detection | `anti_detection.py` | Traffic pattern obfuscation |
| Enhanced Spoofing | `enhanced_spoofing.py` | Canvas/WebGL fingerprint spoofing |
| Challenge Detection | `intelligent_challenge_system.py` | AI-powered challenge recognition |
| Adaptive Timing | `adaptive_timing.py` | Human behavior simulation |
| ML Optimization | `ml_optimization.py` | Machine learning bypass optimization |
| Error Handling | `enhanced_error_handling.py` | Intelligent error recovery |

---

πŸŽ‰ **Enhanced CloudScraper** - Bypass the majority of Cloudflare protections with cutting-edge anti-detection technology!

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Disclaimer

This tool is for educational and testing purposes only. Always respect website terms of service and use responsibly.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)