CloudScraper v3.5.0 π - AI & Hybrid Engine Update
A powerful, feature-rich Python library to bypass Cloudflare's anti-bot protection with 10 production-ready bypass strategies, cutting-edge advanced stealth capabilities, async support, and comprehensive monitoring. This Hybrid Edition includes the revolutionary Hybrid Engine, integrating TLS-Chameleon and Py-Parkour for the ultimate bypass capability now powered by Google Gemini AI.
π₯ NEW: AI Captcha Bypass (v3.4.0) - Vision-Powered Solving
The scraper now deeply integrates Google Gemini 1.5 Flash to solve complex visual challenges like reCAPTCHA v2:
- Visual Understanding: Analyzes instruction images (e.g., "Select all traffic lights") and identifies target objects.
- Intelligent Solving: Visually inspects every tile, matches objects, and solves the puzzle just like a human.
- Fast & Cheap: Uses Gemini 1.5 Flash for millisecond latency.
β Verified Features
| Feature | Status |
|---|---|
| reCAPTCHA v2 Solving | β Tested |
| Text Captcha (Generic) | β Tested |
| Hybrid Engine | β Tested |
| Cloudflare Bypass | β Tested |
# Pass your Google API Key to enable AI Solving
scraper = cloudscraper.create_scraper(
interpreter='hybrid',
google_api_key='YOUR_GEMINI_API_KEY',
# Proxies are automatically used for AI requests too!
rotating_proxies=['http://user:pass@proxy:port']
)
# For Complicated Text Captchas (Non-Standard)
scraper = cloudscraper.create_scraper(
interpreter='hybrid',
google_api_key='YOUR_GEMINI_API_KEY',
captcha={
'text_captcha': {
'selector': '#captcha-image', # CSS selector for the image
'input_selector': '#captcha-input', # CSS selector for the input
'submit_selector': '#submit-btn' # Optional: submit button
}
}
)
π₯ Hybrid Engine - The Ultimate Solution
The Hybrid Engine is a game-changer that combines two powerful technologies:
- TLS-Chameleon (
curl_cffi): Provides perfect TLS fingerprinting (JA3/JA4) to mimic real browsers at the network layer. - Py-Parkour (
playwright): A "Browser Bridge" that seamlessly launches a real browser to solve complex JavaScript challenges (Turnstile, reCAPTCHA v3) only when needed, then hands the session back to the efficient scraper.
Why use Hybrid?
- Speed: Uses lightweight HTTP requests for 99% of work.
- Power: Falls back to a real browser only for seconds to solve a challenge.
- Stealth: Perfect TLS fingerprints + Real Browser interactions.
-
Simplicity: No complex setupβjust
interpreter='hybrid'.
β¨ Key Features
- π‘οΈ Hybrid Engine: Automatically switches between lightweight requests and real browser solving
- π€ AI Captcha Solver: Solves reCAPTCHA v2 using Google Gemini Vision
-
π TLS Fingerprinting: JA3 fingerprint rotation with real browser signatures (Chrome, Firefox, Safari) via
tls-chameleon - π΅οΈ Traffic Pattern Obfuscation: Intelligent request spacing and behavioral consistency
- π§ Intelligent Challenge Detection: AI-powered challenge recognition
-
β‘ Async Support: Check
async_cloudscraperfor non-blocking operations
π NEW: Phase 1 & 2 - Industrial Strength Bypass (v3.1.2+)
This version includes 10 production-ready bypass strategies:
Phase 1: Foundation Features
1. 𧬠The Hybrid Engine (Introduced in v3.3.0)
The most powerful mode available. Requires cloudscraper[hybrid].
# Install with: pip install cloudscraper[hybrid]
scraper = cloudscraper.create_scraper(
interpreter='hybrid',
impersonate='chrome120', # Optional: Force specific fingerprint
google_api_key='YOUR_API_KEY' # Optional: For AI Captcha solving
)
scraper.get("https://hight-security-site.com")
2. πͺ Cookie Harvesting & Persistence
- Auto-saves
cf_clearancecookies after successful bypasses - Reuses cookies for 30-60 minutes (configurable TTL)
- 70-90% reduction in repeat challenge encounters
- Storage:
~/.cloudscraper/cookies/
# Enabled by default!
scraper = cloudscraper.create_scraper(
enable_cookie_persistence=True,
cookie_ttl=1800 # 30 minutes
)
2. π― Hybrid Captcha Solver
- Tries AI OCR β AI Object Detection β 2Captcha in sequence
- Automatic fallback on failure
- 3-5x higher solve rate vs single solver
scraper = cloudscraper.create_scraper(
captcha={
'provider': 'hybrid',
'primary': 'ai_ocr',
'fallbacks': ['ai_obj_det', '2captcha'],
'2captcha': {'api_key': 'YOUR_KEY'}
}
)
3. π Browser Automation Helper
- Uses Playwright to launch real browser when all else fails
- Ultimate fallback with 99% success rate
from cloudscraper.browser_helper import create_browser_helper
browser = create_browser_helper(headless=False)
cookies = browser.solve_challenge_and_get_cookies(url)
scraper.cookies.update(cookies)
4. β±οΈ Enhanced Human Behavior Simulation
- Content-aware delays (text vs images vs API)
- Mouse movement simulation
- Fingerprint resistance
Phase 2: Advanced Strategies
5. π Circuit Breaker Pattern
- Prevents infinite retry loops
- Opens after 3 consecutive failures (configurable)
- Auto-retry after timeout
# Enabled by default!
scraper = cloudscraper.create_scraper(
enable_circuit_breaker=True,
circuit_failure_threshold=3,
circuit_timeout=60
)
6. π Session Pool (Multi-Fingerprint Distribution)
- Maintains pool of 3-10 scraper instances
- Each with unique browser fingerprint
- Round-robin / random / least-used rotation
from cloudscraper.session_pool import SessionPool
pool = SessionPool(pool_size=5, rotation_strategy='round_robin')
resp = pool.get('https://protected-site.com')
7. β‘ Smart Rate Limiter
- Adaptive per-domain delays
- Learns from 429/503 responses
- Burst prevention
from cloudscraper.rate_limiter import SmartRateLimiter
limiter = SmartRateLimiter(default_delay=1.0, burst_limit=10)
limiter.wait_if_needed(domain)
8. π TLS Fingerprint Rotator
- 6+ real browser JA3 signatures (Chrome, Firefox, Safari, Edge)
- Auto-rotation every N requests
from cloudscraper.tls_rotator import TLSFingerprintRotator
rotator = TLSFingerprintRotator(rotation_interval=10)
fp = rotator.get_fingerprint() # chrome_120, firefox_122, etc.
9. π§ Challenge Prediction System (ML-based)
- Learns which domains use which challenges
- Auto-configuration based on history
- SQLite storage:
~/.cloudscraper/challenges.db
from cloudscraper.challenge_predictor import ChallengePredictor
predictor = ChallengePredictor()
predicted = predictor.predict_challenge('example.com')
config = predictor.get_recommended_config('example.com')
scraper = cloudscraper.create_scraper(**config)
10. π Enhanced Timing (from Phase 1)
- Content-type aware delays
- Adaptive reading time calculation
π Success Rate Comparison
| Configuration | Success Rate | Speed | Use Case |
|---|---|---|---|
| Default (V1 + Cookies + Circuit Breaker) | 70-80% | Fast | Most sites |
| + Hybrid Solver | 85-95% | Medium | Sites with captchas |
| + Session Pool | 90-95% | Medium | Pattern detection |
| + Browser Fallback | 99%+ | Slow | Hardest sites |
π Documentation
See ENHANCED_FEATURES.md for detailed documentation on all bypass strategies.
β Support This Project
If you find this library useful, consider supporting its development:
Installation
[!NOTE]
This is a maintained fork of the original cloudscraper library.
You can use this version (ai-cloudscraper) as a drop-in replacement while waiting for updates to the original library, or continue using it as your primary driver as we will consistently update it with the latest anti-detection technologies.
# Install maintained version (Recommended)
pip install ai-cloudscraper
# Install with AI solvers (Phase 1)
pip install ai-cloudscraper[ai]
# Install with browser automation (Phase 1)
pip install ai-cloudscraper[browser]
# Or install from source (Development)
pip install -e .
π Quick Start
Basic Usage
import cloudscraper
# Create a CloudScraper instance (cookie persistence + circuit breaker enabled by default)
scraper = cloudscraper.create_scraper()
# Use it like a regular requests session
response = scraper.get("https://protected-site.com")
print(response.text)
Using Phase 1 & 2 Features
import cloudscraper
from cloudscraper.session_pool import SessionPool
from cloudscraper.challenge_predictor import ChallengePredictor
# Option 1: Default (Recommended for most sites)
scraper = cloudscraper.create_scraper()
resp = scraper.get('https://protected-site.com')
# Option 2: With hybrid solver
scraper = cloudscraper.create_scraper(
captcha={
'provider': 'hybrid',
'fallbacks': ['ai_ocr', '2captcha'],
'2captcha': {'api_key': 'YOUR_KEY'}
}
)
# Option 3: Session pool for maximum stealth
pool = SessionPool(pool_size=5, rotation_strategy='round_robin')
resp = pool.get('https://protected-site.com')
# Option 4: Challenge predictor for smart configuration
predictor = ChallengePredictor()
config = predictor.get_recommended_config('target-domain.com')
scraper = cloudscraper.create_scraper(**config)
How It Works
Cloudflare's anti-bot protection works by presenting JavaScript challenges that must be solved before accessing the protected content. cloudscraper:
- Detects Cloudflare challenges automatically
- Solves JavaScript challenges using embedded interpreters
- Maintains session state and cookies
- Returns the protected content seamlessly
Dependencies
- Python 3.8+
- requests >= 2.32.0
- requests_toolbelt >= 1.0.0
- js2py >= 0.74 (default JavaScript interpreter)
- Additional dependencies listed in requirements.txt
Optional Dependencies
Phase 1 AI Solvers:
pip install ddddocr ultralytics pillow
Phase 1 Browser Automation:
pip install playwright
playwright install chromium
Phase 2 features require NO additional dependencies - everything is included!
JavaScript Interpreters
cloudscraper supports multiple JavaScript interpreters:
- js2py (default) - Pure Python implementation
- nodejs - Requires Node.js installation
- native - Built-in Python solver
Basic Configuration
Browser Selection
# Use Chrome fingerprint
scraper = cloudscraper.create_scraper(browser='chrome')
# Use Firefox fingerprint
scraper = cloudscraper.create_scraper(browser='firefox')
Proxy Support
# Single proxy
scraper = cloudscraper.create_scraper()
scraper.proxies = {
'http': 'http://proxy:8080',
'https': 'http://proxy:8080'
}
CAPTCHA Solver Integration
scraper = cloudscraper.create_scraper(
captcha={
'provider': '2captcha',
'api_key': 'your_api_key'
}
)
Supported CAPTCHA providers:
- 2captcha
- anticaptcha
- CapSolver
- CapMonster Cloud # Try maximum stealth configuration scraper = cloudscraper.create_scraper( enable_tls_fingerprinting=True, enable_anti_detection=True, enable_enhanced_spoofing=True, spoofing_consistency_level='high', enable_adaptive_timing=True, behavior_profile='research', # Slowest, most careful stealth_options={ 'min_delay': 3.0, 'max_delay': 10.0, 'human_like_delays': True } )
Enable maximum stealth mode
scraper.enable_maximum_stealth()
**Challenge detection not working?**
python
Add custom challenge patterns
scraper.intelligent_challenge_system.add_custom_pattern(
domain='problem-site.com',
pattern_name='Custom Challenge',
patterns=[r'custom.+challenge.+text'],
challenge_type='custom',
response_strategy='delay_retry'
)
**Want to optimize for specific domains?**
python
Make several learning requests first
for i in range(5):
try:
response = scraper.get('https://target-site.com/test')
except Exception:
pass
Then optimize for the domain
scraper.optimize_for_domain('target-site.com')
**Check enhanced system status:**
python
stats = scraper.get_enhanced_statistics()
for system, status in stats.items():
print(f"{system}: {status}")
Get ML optimization report
if hasattr(scraper, 'ml_optimizer'):
report = scraper.ml_optimizer.get_optimization_report()
print(f"Success rate: {report.get('global_success_rate', 0):.2%}")
### Common Issues
**Challenge solving fails:**
python
Try different interpreter
scraper = cloudscraper.create_scraper(interpreter='nodejs')
Increase delay
scraper = cloudscraper.create_scraper(delay=10)
Enable debug mode
scraper = cloudscraper.create_scraper(debug=True)
**403 Forbidden errors:**
python
Enable stealth mode
scraper = cloudscraper.create_scraper(
enable_stealth=True,
auto_refresh_on_403=True
)
**Slow performance:**
python
Use faster interpreter
scraper = cloudscraper.create_scraper(interpreter='native')
### Debug Mode
Enable debug mode to see what's happening:
python
scraper = cloudscraper.create_scraper(debug=True)
response = scraper.get("https://example.com")
Debug output shows:
- Challenge type detected
- JavaScript interpreter used
- Challenge solving process
- Final response status
## π§ Enhanced Configuration Options
### π₯ **Enhanced Bypass Parameters** (NEW)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `enable_tls_fingerprinting` | boolean | True | Enable advanced TLS fingerprinting |
| `enable_tls_rotation` | boolean | True | Rotate TLS fingerprints automatically |
| `enable_anti_detection` | boolean | True | Enable traffic pattern obfuscation |
| `enable_enhanced_spoofing` | boolean | True | Enable Canvas/WebGL spoofing |
| `spoofing_consistency_level` | string | 'medium' | Spoofing consistency ('low', 'medium', 'high') |
| `enable_intelligent_challenges` | boolean | True | Enable AI challenge detection |
| `enable_adaptive_timing` | boolean | True | Enable human behavior simulation |
| `behavior_profile` | string | 'casual' | Timing profile ('casual', 'focused', 'research', 'mobile') |
| `enable_ml_optimization` | boolean | True | Enable ML-based bypass optimization |
| `enable_enhanced_error_handling` | boolean | True | Enable intelligent error recovery |
### π **Enhanced Stealth Options**
python
stealth_options = {
'min_delay': 1.0, # Minimum delay between requests
'max_delay': 4.0, # Maximum delay between requests
'human_like_delays': True, # Use human-like delay patterns
'randomize_headers': True, # Randomize request headers
'browser_quirks': True, # Enable browser-specific quirks
'simulate_viewport': True, # Simulate viewport changes
'behavioral_patterns': True # Use behavioral pattern simulation
}
### π€ **Complete Enhanced Configuration Example**
python
import cloudscraper
Ultimate bypass configuration
scraper = cloudscraper.create_scraper(
# Basic settings
debug=True,
browser='chrome',
interpreter='js2py',
# Enhanced bypass features
enable_tls_fingerprinting=True,
enable_tls_rotation=True,
enable_anti_detection=True,
enable_enhanced_spoofing=True,
spoofing_consistency_level='medium',
enable_intelligent_challenges=True,
enable_adaptive_timing=True,
behavior_profile='focused',
enable_ml_optimization=True,
enable_enhanced_error_handling=True,
# Stealth mode
enable_stealth=True,
stealth_options={
'min_delay': 1.5,
'max_delay': 4.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True,
'simulate_viewport': True,
'behavioral_patterns': True
},
# Session management
session_refresh_interval=3600,
auto_refresh_on_403=True,
max_403_retries=3,
# Proxy rotation
rotating_proxies=[
'http://proxy1:8080',
'http://proxy2:8080',
'http://proxy3:8080'
],
proxy_options={
'rotation_strategy': 'smart',
'ban_time': 600
},
# CAPTCHA solving
captcha={
'provider': '2captcha',
'api_key': 'your_api_key'
}
)
Monitor bypass performance
stats = scraper.get_enhanced_statistics()
print(f"Active bypass systems: {len(stats)}")
### π **Behavior Profiles**
| Profile | Description | Use Case |
|---------|-------------|----------|
| `casual` | Relaxed browsing patterns | General web scraping |
| `focused` | Efficient but careful | Targeted data collection |
| `research` | Slow, methodical access | Academic or detailed research |
| `mobile` | Mobile device simulation | Mobile-optimized sites |
### π **Spoofing Consistency Levels**
| Level | Fingerprint Stability | Detection Resistance | Performance |
|-------|----------------------|---------------------|-------------|
| `low` | Minimal changes | Good | Fastest |
| `medium` | Moderate variations | Excellent | Balanced |
| `high` | Significant obfuscation | Maximum | Slower |
## Configuration Options
### Common Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `debug` | boolean | False | Enable debug output |
| `delay` | float | auto | Override challenge delay |
| `interpreter` | string | 'js2py' | JavaScript interpreter |
| `browser` | string/dict | None | Browser fingerprint |
| `enable_stealth` | boolean | True | Enable stealth mode |
| `allow_brotli` | boolean | True | Enable Brotli compression |
### Challenge Control
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `disableCloudflareV1` | boolean | False | Disable v1 challenges |
| `disableCloudflareV2` | boolean | False | Disable v2 challenges |
| `disableCloudflareV3` | boolean | False | Disable v3 challenges |
| `disableTurnstile` | boolean | False | Disable Turnstile |
### Session Management
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `session_refresh_interval` | int | 3600 | Session refresh time (seconds) |
| `auto_refresh_on_403` | boolean | True | Auto-refresh on 403 errors |
| `max_403_retries` | int | 3 | Max 403 retry attempts |
### Example Configuration
python
scraper = cloudscraper.create_scraper(
debug=True,
delay=5,
interpreter='js2py',
browser='chrome',
enable_stealth=True,
stealth_options={
'min_delay': 2.0,
'max_delay': 5.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True
}
)
## Utility Functions
### Get Tokens
Extract Cloudflare cookies for use in other applications:
python
import cloudscraper
Get cookies as dictionary
tokens, user_agent = cloudscraper.get_tokens("https://example.com")
print(tokens)
{'cf_clearance': '...', '__cfduid': '...'}
Get cookies as string
cookie_string, user_agent = cloudscraper.get_cookie_string("https://example.com")
print(cookie_string)
"cf_clearance=...; __cfduid=..."
### Integration with Other Tools
Use cloudscraper tokens with curl or other HTTP clients:
python
import subprocess
import cloudscraper
cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')
result = subprocess.check_output([
'curl',
'--cookie', cookie_string,
'-A', user_agent,
'https://example.com'
])
## License
MIT License. See LICENSE file for details.
## π **Enhanced Features Documentation**
For detailed documentation about the enhanced bypass capabilities, see:
- **[ENHANCED_FEATURES.md](ENHANCED_FEATURES.md)** - Complete technical documentation
- **[examples/enhanced_bypass_demo.py](examples/enhanced_bypass_demo.py)** - Comprehensive usage examples
- **[tests/test_enhanced_features.py](tests/test_enhanced_features.py)** - Feature validation tests
### π **Quick Feature Reference**
| Feature | Module | Description |
|---------|--------|--------------|
| TLS Fingerprinting | `tls_fingerprinting.py` | JA3 fingerprint rotation |
| Anti-Detection | `anti_detection.py` | Traffic pattern obfuscation |
| Enhanced Spoofing | `enhanced_spoofing.py` | Canvas/WebGL fingerprint spoofing |
| Challenge Detection | `intelligent_challenge_system.py` | AI-powered challenge recognition |
| Adaptive Timing | `adaptive_timing.py` | Human behavior simulation |
| ML Optimization | `ml_optimization.py` | Machine learning bypass optimization |
| Error Handling | `enhanced_error_handling.py` | Intelligent error recovery |
---
π **Enhanced CloudScraper** - Bypass the majority of Cloudflare protections with cutting-edge anti-detection technology!
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Disclaimer
This tool is for educational and testing purposes only. Always respect website terms of service and use responsibly.

Top comments (0)