DEV Community

Mohammad Waseem
Mohammad Waseem

Posted on

Circumventing IP Bans During Web Scraping with TypeScript in Legacy Codebases

In the realm of web scraping, IP banning is a common obstacle that disrupts data collection flows and hampers productivity. Security researchers and developers often face the challenge of maintaining access while respecting server policies. When working within legacy codebases using TypeScript, implementing effective techniques to avoid IP bans requires a strategic approach. This article explores practical methods to mitigate IP bans, leveraging TypeScript's capabilities and best practices.

Understanding the Problem

Many target websites employ rate-limiting and IP blocking as measures to prevent automated scraping. These defenses detect patterns like high request frequencies, repetitive user agents, or abnormal traffic behaviors. To bypass these restrictions, mimicking human-like browsing, rotating IPs, and managing request headers are vital.

Strategies for Bypassing IP Bans

1. Use Proxy Rotation

Implementing a robust proxy rotation system allows distributing requests across multiple IP addresses, reducing the likelihood of repeated bans.

import axios, { AxiosInstance } from 'axios';

class ProxyManager {
  private proxies: string[];
  private currentIndex: number;

  constructor(proxies: string[]) {
    this.proxies = proxies;
    this.currentIndex = 0;
  }

  getNextProxy(): string {
    const proxy = this.proxies[this.currentIndex];
    this.currentIndex = (this.currentIndex + 1) % this.proxies.length;
    return proxy;
  }
}

const proxies = ['http://proxy1:port', 'http://proxy2:port', 'http://proxy3:port'];
const proxyManager = new ProxyManager(proxies);

async function fetchWithProxy(url: string) {
  const currentProxy = proxyManager.getNextProxy();
  const axiosInstance: AxiosInstance = axios.create({
    proxy: {
      host: currentProxy.split(':')[1].replace('//', ''),
      port: parseInt(currentProxy.split(':')[2])
    },
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
  });
  const response = await axiosInstance.get(url);
  return response.data;
}
Enter fullscreen mode Exit fullscreen mode

2. Mimic Human Behavior

Introducing delays between requests and randomizing headers helps make traffic less detectable.

function sleep(ms: number) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

async function scrape(url: string) {
  for (let i = 0; i < 10; i++) { // Loop for multiple requests
    await fetchWithProxy(url);
    // Random delay between 1-3 seconds
    const delay = Math.random() * 2000 + 1000;
    await sleep(delay);
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Rotate User Agents

Assign different User-Agent strings to each request.

const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
  'Mozilla/5.0 (X11; Linux x86_64)...'
];

function getRandomUserAgent() {
  const index = Math.floor(Math.random() * userAgents.length);
  return userAgents[index];
}

async function fetchWithHeaders(url: string) {
  const headers = {
    'User-Agent': getRandomUserAgent(),
    'Accept-Language': 'en-US,en;q=0.9'
  };
  // Use combined options for axios
  const response = await axios.get(url, { headers });
  return response.data;
}
Enter fullscreen mode Exit fullscreen mode

Managing Legacy Code

In older TypeScript codebases, integrating these strategies can be challenging due to tight coupling or outdated dependencies. Focus on modular improvements: create dedicated modules or classes for proxy management, request delays, and headers rotation. This helps to keep the system maintainable.

Also, consider updating the request handling to modern standards by replacing deprecated axios configurations or polyfilling certain features if needed.

Final Tips

  • Always respect robots.txt and legal boundaries.
  • Monitor response headers for hints of IP bans or throttling.
  • Combine multiple tactics: proxy rotation, delays, headers, and session management.

By adopting these practices, security researchers and developers can reduce the risk of getting IP banned during web scraping activities, even within legacy TypeScript environments. Implementing responsible scraping techniques ensures sustainable, efficient data collection while minimizing legal and ethical issues.


🛠️ QA Tip

To test this safely without using real user data, I use TempoMail USA.

Top comments (0)