DEV Community

Mohammad Waseem
Mohammad Waseem

Posted on

Overcoming IP Bans When Scraping with Zero Budget Using JavaScript

In the realm of web scraping, getting your IP banned is a common obstacle that can halt your data collection efforts altogether. This challenge intensifies when working with limited or zero budget resources, as traditional solutions like rotating proxy services can be costly. As a DevOps specialist, leveraging existing infrastructure and scripting skills can help mitigate IP bans efficiently.

Understanding the Problem
IP bans typically occur due to excessive requests or suspicious activity from a singular IP address. To continue scraping effectively, you need to introduce variability in your requests to mimic genuine user behavior without incurring costs.

Solution Approach: Implementing a local, lightweight proxy rotation system combined with randomized request headers and delays can help avoid detection. Here’s a step-by-step guide using JavaScript with Node.js, assuming you have minimal resources.

Step 1: Use Public Proxy Lists
Access free proxy lists available online. These are often scattered across forums, GitHub repositories, or maintained on websites dedicated to open proxies. Be aware, free proxies may be unreliable or slow, but they are viable for zero-budget projects.

Example of fetching a list:

const fetch = require('node-fetch');

async function getProxies() {
    const response = await fetch('https://raw.githubusercontent.com/clarketm/proxy-list/main/proxy.json');
    const proxies = await response.json();
    return proxies; // returns array of proxy objects
}

getProxies().then(proxies => {
    console.log(`Fetched ${proxies.length} proxies`);
});
Enter fullscreen mode Exit fullscreen mode

Step 2: Implement Proxy Rotation with Randomization
Create a function to pick a random proxy and assign it to your request.

function getRandomProxy(proxies) {
    const proxy = proxies[Math.floor(Math.random() * proxies.length)];
    return `http://${proxy.ip}:${proxy.port}`;
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Introduce Random Headers and Request Delays
Add variability to mimic human behavior.

function getRandomHeaders() {
    const userAgents = [
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
        'Mozilla/5.0 (X11; Linux x86_64)'
    ];
    const userAgent = userAgents[Math.floor(Math.random() * userAgents.length)];
    return {
        'User-Agent': userAgent,
        'Accept-Language': 'en-US,en;q=0.9'
    };
}

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Scraping Routine with Rotation and Randomization
Combine all elements in your scraping loop.

const fetch = require('node-fetch');

async function scrape(url, proxies) {
    while (true) {
        const proxy = getRandomProxy(proxies);
        const headers = getRandomHeaders();
        const delay = Math.random() * 3000 + 2000; // 2-5 seconds delay
        await sleep(delay);
        try {
            const response = await fetch(url, {
                headers: headers,
                agent: require('https-proxy-agent')(proxy)
            });
            if (response.ok) {
                const data = await response.text();
                console.log(`Data received via proxy: ${proxy}`);
                // Process data as needed
            } else {
                console.warn(`Request failed with status: ${response.status}`);
            }
        } catch (err) {
            console.warn(`Error with proxy ${proxy}: ${err.message}`);
        }
        // Break condition or continue based on needs
    }
}

// Usage
getProxies().then(proxies => {
    scrape('https://example.com', proxies);
});
Enter fullscreen mode Exit fullscreen mode

Additional Tips:

  • Rotate proxies frequently and avoid 100% reuse.
  • Incorporate random delays to emulate human browsing.
  • Detect and handle IP bans or CAPTCHAs as they occur.
  • Regularly update your proxy list for reliability.

Conclusion:
While free proxies and scripting are imperfect, they offer a cost-effective way for developers to evade IP bans during scraping tasks. Keep in mind that this approach is a temporary measure; for long-term solutions, consider more reliable infrastructure as your project scales.

Disclaimer: Ensure your scraping practices comply with target website terms of service and legal regulations to avoid potential legal issues.


🛠️ QA Tip

To test this safely without using real user data, I use TempoMail USA.

Top comments (0)