Bright Data (formerly Luminati Networks) is one of the largest proxy and web data collection platforms in the world. With over 72 million residential IPs across 195 countries, it's the enterprise choice for serious data collection. This guide walks you through setting up and using Bright Data effectively.
What is Bright Data?
Bright Data offers several products:
- Residential Proxies — real user IPs from ISPs worldwide
- Datacenter Proxies — fast, shared or dedicated datacenter IPs
- ISP Proxies — static residential IPs
- Mobile Proxies — real mobile carrier IPs
- Web Unlocker — automated proxy + CAPTCHA solving
- Scraping Browser — headless browser with built-in unblocking
Getting Started
First, sign up at Bright Data's website and create a proxy zone:
- Create an account (free trial available)
- Navigate to the Proxy Manager
- Create a zone (residential, datacenter, or ISP)
- Note your zone credentials (username, password, host, port)
Basic Proxy Setup in Python
import requests
# Bright Data proxy configuration
BD_HOST = "brd.superproxy.io"
BD_PORT = 22225
BD_USER = "brd-customer-YOUR_ID-zone-YOUR_ZONE"
BD_PASS = "YOUR_PASSWORD"
proxies = {
"http": f"http://{BD_USER}:{BD_PASS}@{BD_HOST}:{BD_PORT}",
"https": f"http://{BD_USER}:{BD_PASS}@{BD_HOST}:{BD_PORT}"
}
response = requests.get(
"https://httpbin.org/ip",
proxies=proxies,
verify=False
)
print(f"Your proxy IP: {response.json()['origin']}")
Country-Targeted Requests
One of Bright Data's strongest features is geo-targeting:
def make_geo_request(url, country_code):
"""Make a request through a specific country's IP"""
username = f"brd-customer-YOUR_ID-zone-residential-country-{country_code}"
proxies = {
"http": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}",
"https": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}"
}
return requests.get(url, proxies=proxies, verify=False)
# Check prices from different countries
countries = ["us", "gb", "de", "jp", "br"]
for cc in countries:
resp = make_geo_request("https://store.steampowered.com/app/1091500/", cc)
print(f"{cc.upper()}: Status {resp.status_code}, {len(resp.text)} bytes")
Using the Web Unlocker
The Web Unlocker automatically handles CAPTCHAs, JavaScript rendering, and fingerprinting:
def unlocker_request(url):
"""Use Bright Data Web Unlocker for protected sites"""
username = "brd-customer-YOUR_ID-zone-unblocker"
proxies = {
"http": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}",
"https": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}"
}
response = requests.get(
url,
proxies=proxies,
verify=False,
timeout=60 # Unlocker may take longer
)
return response
# Scrape a protected site
response = unlocker_request("https://www.amazon.com/dp/B09V3KXJPB")
print(f"Status: {response.status_code}")
print(f"Content length: {len(response.text)}")
Scraping Browser (Headless)
For sites requiring full browser rendering:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def setup_bright_browser():
"""Connect to Bright Data's Scraping Browser via Selenium"""
SBR_WEBDRIVER = "https://brd-customer-YOUR_ID-zone-scraping_browser:YOUR_PASS@brd.superproxy.io:9515"
options = Options()
options.add_argument("--headless")
driver = webdriver.Remote(
command_executor=SBR_WEBDRIVER,
options=options
)
return driver
def scrape_with_browser(url):
driver = setup_bright_browser()
try:
driver.get(url)
title = driver.title
page_source = driver.page_source
return {"title": title, "html_length": len(page_source)}
finally:
driver.quit()
result = scrape_with_browser("https://www.zillow.com/homedetails/123")
print(result)
Session Management
import time
class BrightDataSession:
def __init__(self, customer_id, zone, password):
self.customer_id = customer_id
self.zone = zone
self.password = password
self.session_id = None
def new_session(self):
"""Create a sticky session (same IP for multiple requests)"""
import random
self.session_id = random.randint(10000, 99999)
return self
def get_proxies(self, country=None):
username = f"brd-customer-{self.customer_id}-zone-{self.zone}"
if country:
username += f"-country-{country}"
if self.session_id:
username += f"-session-{self.session_id}"
return {
"http": f"http://{username}:{self.password}@brd.superproxy.io:22225",
"https": f"http://{username}:{self.password}@brd.superproxy.io:22225"
}
def get(self, url, **kwargs):
return requests.get(url, proxies=self.get_proxies(), verify=False, **kwargs)
# Sticky session example
session = BrightDataSession("YOUR_ID", "residential", "YOUR_PASS")
session.new_session()
for i in range(3):
resp = session.get("https://httpbin.org/ip")
print(f"Request {i+1}: {resp.json()['origin']}")
time.sleep(1)
Cost Optimization Tips
Bright Data charges per GB transferred. Here's how to minimize costs:
def optimized_request(url, session):
"""Cost-optimized request with minimal data transfer"""
headers = {
"Accept": "text/html",
"Accept-Encoding": "gzip, deflate",
}
response = session.get(url, headers=headers, timeout=30)
return response
Cost-saving strategies:
- Use datacenter proxies when residential isn't needed (10x cheaper)
- Cache responses aggressively
- Only download what you need (set appropriate Accept headers)
- Use the API instead of scraping when available
Bright Data vs Alternatives
| Feature | Bright Data | ScraperAPI | ThorData |
|---|---|---|---|
| Residential IPs | 72M+ | Included | Large pool |
| Pricing | Per GB | Per request | Per GB |
| CAPTCHA solving | Yes | Yes | Yes |
| JS rendering | Yes | Yes | Yes |
| Best for | Enterprise | Simple APIs | Residential |
For simpler use cases, ScraperAPI offers a pay-per-request model that's easier to predict costs. ThorData is excellent for residential proxy needs at competitive rates. And ScrapeOps provides monitoring dashboards regardless of which proxy you choose.
Error Handling
def robust_request(url, session, max_retries=3):
for attempt in range(max_retries):
try:
response = session.get(url, timeout=30)
if response.status_code == 200:
return response
elif response.status_code == 403:
session.new_session()
print(f"Rotating session (attempt {attempt + 1})")
elif response.status_code == 429:
time.sleep(5 * (attempt + 1))
except requests.exceptions.Timeout:
print(f"Timeout on attempt {attempt + 1}")
time.sleep(2)
return None
Conclusion
Bright Data is the most comprehensive proxy platform available, with options ranging from simple HTTP proxies to full scraping browsers. The learning curve is steeper than simpler services, but the flexibility and scale are unmatched for enterprise data collection. Start with the Web Unlocker for the easiest experience, then optimize with specific proxy types as your needs grow.
Happy scraping!
Top comments (0)