agenthustler

Posted on Mar 26

How to Use Bright Data (formerly Luminati): A Practical Guide

#python #tutorial #webdev #programming

Bright Data (formerly Luminati Networks) is one of the largest proxy and web data collection platforms in the world. With over 72 million residential IPs across 195 countries, it's the enterprise choice for serious data collection. This guide walks you through setting up and using Bright Data effectively.

What is Bright Data?

Bright Data offers several products:

Residential Proxies — real user IPs from ISPs worldwide
Datacenter Proxies — fast, shared or dedicated datacenter IPs
ISP Proxies — static residential IPs
Mobile Proxies — real mobile carrier IPs
Web Unlocker — automated proxy + CAPTCHA solving
Scraping Browser — headless browser with built-in unblocking

Getting Started

First, sign up at Bright Data's website and create a proxy zone:

Create an account (free trial available)
Navigate to the Proxy Manager
Create a zone (residential, datacenter, or ISP)
Note your zone credentials (username, password, host, port)

Basic Proxy Setup in Python

import requests

# Bright Data proxy configuration
BD_HOST = "brd.superproxy.io"
BD_PORT = 22225
BD_USER = "brd-customer-YOUR_ID-zone-YOUR_ZONE"
BD_PASS = "YOUR_PASSWORD"

proxies = {
    "http": f"http://{BD_USER}:{BD_PASS}@{BD_HOST}:{BD_PORT}",
    "https": f"http://{BD_USER}:{BD_PASS}@{BD_HOST}:{BD_PORT}"
}

response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxies,
    verify=False
)
print(f"Your proxy IP: {response.json()['origin']}")

Country-Targeted Requests

One of Bright Data's strongest features is geo-targeting:

def make_geo_request(url, country_code):
    """Make a request through a specific country's IP"""
    username = f"brd-customer-YOUR_ID-zone-residential-country-{country_code}"
    proxies = {
        "http": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}",
        "https": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}"
    }
    return requests.get(url, proxies=proxies, verify=False)

# Check prices from different countries
countries = ["us", "gb", "de", "jp", "br"]
for cc in countries:
    resp = make_geo_request("https://store.steampowered.com/app/1091500/", cc)
    print(f"{cc.upper()}: Status {resp.status_code}, {len(resp.text)} bytes")

Using the Web Unlocker

The Web Unlocker automatically handles CAPTCHAs, JavaScript rendering, and fingerprinting:

def unlocker_request(url):
    """Use Bright Data Web Unlocker for protected sites"""
    username = "brd-customer-YOUR_ID-zone-unblocker"
    proxies = {
        "http": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}",
        "https": f"http://{username}:{BD_PASS}@{BD_HOST}:{BD_PORT}"
    }

    response = requests.get(
        url,
        proxies=proxies,
        verify=False,
        timeout=60  # Unlocker may take longer
    )
    return response

# Scrape a protected site
response = unlocker_request("https://www.amazon.com/dp/B09V3KXJPB")
print(f"Status: {response.status_code}")
print(f"Content length: {len(response.text)}")

Scraping Browser (Headless)

For sites requiring full browser rendering:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def setup_bright_browser():
    """Connect to Bright Data's Scraping Browser via Selenium"""
    SBR_WEBDRIVER = "https://brd-customer-YOUR_ID-zone-scraping_browser:YOUR_PASS@brd.superproxy.io:9515"

    options = Options()
    options.add_argument("--headless")

    driver = webdriver.Remote(
        command_executor=SBR_WEBDRIVER,
        options=options
    )
    return driver

def scrape_with_browser(url):
    driver = setup_bright_browser()
    try:
        driver.get(url)
        title = driver.title
        page_source = driver.page_source
        return {"title": title, "html_length": len(page_source)}
    finally:
        driver.quit()

result = scrape_with_browser("https://www.zillow.com/homedetails/123")
print(result)

Session Management

import time

class BrightDataSession:
    def __init__(self, customer_id, zone, password):
        self.customer_id = customer_id
        self.zone = zone
        self.password = password
        self.session_id = None

    def new_session(self):
        """Create a sticky session (same IP for multiple requests)"""
        import random
        self.session_id = random.randint(10000, 99999)
        return self

    def get_proxies(self, country=None):
        username = f"brd-customer-{self.customer_id}-zone-{self.zone}"
        if country:
            username += f"-country-{country}"
        if self.session_id:
            username += f"-session-{self.session_id}"

        return {
            "http": f"http://{username}:{self.password}@brd.superproxy.io:22225",
            "https": f"http://{username}:{self.password}@brd.superproxy.io:22225"
        }

    def get(self, url, **kwargs):
        return requests.get(url, proxies=self.get_proxies(), verify=False, **kwargs)

# Sticky session example
session = BrightDataSession("YOUR_ID", "residential", "YOUR_PASS")
session.new_session()

for i in range(3):
    resp = session.get("https://httpbin.org/ip")
    print(f"Request {i+1}: {resp.json()['origin']}")
    time.sleep(1)

Cost Optimization Tips

Bright Data charges per GB transferred. Here's how to minimize costs:

def optimized_request(url, session):
    """Cost-optimized request with minimal data transfer"""
    headers = {
        "Accept": "text/html",
        "Accept-Encoding": "gzip, deflate",
    }
    response = session.get(url, headers=headers, timeout=30)
    return response

Cost-saving strategies:

Use datacenter proxies when residential isn't needed (10x cheaper)
Cache responses aggressively
Only download what you need (set appropriate Accept headers)
Use the API instead of scraping when available

Bright Data vs Alternatives

Feature	Bright Data	ScraperAPI	ThorData
Residential IPs	72M+	Included	Large pool
Pricing	Per GB	Per request	Per GB
CAPTCHA solving	Yes	Yes	Yes
JS rendering	Yes	Yes	Yes
Best for	Enterprise	Simple APIs	Residential

For simpler use cases, ScraperAPI offers a pay-per-request model that's easier to predict costs. ThorData is excellent for residential proxy needs at competitive rates. And ScrapeOps provides monitoring dashboards regardless of which proxy you choose.

Error Handling

def robust_request(url, session, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = session.get(url, timeout=30)
            if response.status_code == 200:
                return response
            elif response.status_code == 403:
                session.new_session()
                print(f"Rotating session (attempt {attempt + 1})")
            elif response.status_code == 429:
                time.sleep(5 * (attempt + 1))
        except requests.exceptions.Timeout:
            print(f"Timeout on attempt {attempt + 1}")
            time.sleep(2)
    return None

Conclusion

Bright Data is the most comprehensive proxy platform available, with options ranging from simple HTTP proxies to full scraping browsers. The learning curve is steeper than simpler services, but the flexibility and scale are unmatched for enterprise data collection. Start with the Web Unlocker for the easiest experience, then optimize with specific proxy types as your needs grow.

Happy scraping!

DEV Community