DEV Community

Mohammad Waseem
Mohammad Waseem

Posted on

Circumventing IP Bans in Web Scraping with React and Open Source Tools

Web scraping is an essential technique for data gathering, but it often runs into the obstacle of IP bans, especially when targeting popular or protected sites. As a security researcher, I developed an approach using React and open source tools to mitigate IP bans effectively.

Understanding the Challenge

Many websites deploy IP-based rate limiting or aggressive security measures that block requests after a certain threshold. Traditional scraping methods encounter repeated bans, disrupting data collection efforts.

Strategy Overview

Our goal was to distribute requests across multiple IP addresses dynamically, making it appear as if requests originate from diverse users. We leveraged React’s frontend capabilities to orchestrate requests through a proxy network, ensuring anonymity and rotation. Additionally, open source tools like ProxyBroker, Tor, and ScraperAPI enabled seamless IP management.

Building the Solution

Setting up a Proxy Network

We chose Tor for its ease of integration and open source nature. By configuring Tor and leveraging ProxyBroker, we dynamically discover and verify a pool of proxies.

# Install Tor and ProxyBroker
sudo apt-get install tor
pip install proxybroker
Enter fullscreen mode Exit fullscreen mode

With Tor running, ProxyBroker scans the network for available proxies:

import asyncio
from proxybroker import Broker

async def show_proxies():
    broker = Broker()
    proxies = await broker.get_proxies()
    for proxy in proxies:
        print(proxy)

asyncio.run(show_proxies())
Enter fullscreen mode Exit fullscreen mode

This script builds a pool of verified proxies that can be rotated during scraping.

Integrating with React

React itself does not handle HTTP requests directly; instead, it orchestrates requests to our backend API, which manages proxy rotation.

// React component to trigger scraping requests
import React, { useState } from 'react';

function ScrapeTrigger() {
    const [status, setStatus] = useState('Idle');

    const startScraping = async () => {
        setStatus('In Progress');
        try {
            const response = await fetch('/api/start-scrape');
            if (response.ok) {
                setStatus('Completed');
            } else {
                throw new Error('Error in scraping');
            }
        } catch (err) {
            setStatus('Failed');
        }
    };

    return (
        <div>
            <button onClick={startScraping}>Start Scraping</button>
            <p>Status: {status}</p>
        </div>
    );
}

export default ScrapeTrigger;
Enter fullscreen mode Exit fullscreen mode

This component communicates with our backend to initiate proxy-rotated scraping.

Backend Proxy Rotation

The backend, powered by Node.js or Python, cycles through the proxy list for each request, avoiding consecutive hits from the same IP:

import requests
import random

proxies = [
    {'http': 'http://proxy1:port'},
    {'http': 'http://proxy2:port'},
    # More proxies
]

def get_next_proxy():
    return random.choice(proxies)

def fetch_data(url):
    proxy = get_next_proxy()
    response = requests.get(url, proxies=proxy, timeout=10)
    return response.text
Enter fullscreen mode Exit fullscreen mode

This method distributes requests and reduces the likelihood of IP bans.

Additional Tips

  • Implement request throttling to mimic human-like behavior.
  • Use headless browsers like Puppeteer with proxy rotation for complex sites.
  • Monitor proxy health and update the pool regularly.

Final Thoughts

Combining React with robust backend proxy management creates a scalable, resilient scraping system that minimizes bans. Always ensure your scraping respects robots.txt and legal considerations.

This approach balances open source flexibility and technical sophistication, providing a durable solution against IP-based security measures.


🛠️ QA Tip

To test this safely without using real user data, I use TempoMail USA.

Top comments (0)