DEV Community

Miller James
Miller James

Posted on

How to Detect Whether an IP Is a Proxy: Practical Methods, Code Examples, and Real-World Use Cases

How to Detect Whether an IP Is a Proxy: Practical Methods, Code Examples, and Real-World Use Cases

Proxy detection has become a critical capability for modern web applications. Whether you're protecting an e-commerce platform from fraudulent orders, validating advertising traffic authenticity, or defending APIs against automated abuse, understanding how to identify proxy-originating traffic is essential.

This guide provides practical, code-ready techniques for detecting proxy IPs, covering everything from ASN lookups to behavioral analysis patterns.

Introduction

Why should developers and businesses care about proxy detection? The answer lies in trust and verification. When a request arrives at your server, the IP address is often your first data point for assessing legitimacy.

Common scenarios where proxy detection matters:

  • Security and access control: Preventing account takeover attempts that route through anonymizing proxies
  • Fraud prevention: Identifying payment fraud where attackers mask their real location
  • Ad verification: Ensuring ad impressions come from genuine users, not bot farms
  • Rate limiting: Detecting when a single actor uses multiple IPs to bypass limits
  • Compliance: Enforcing geographic restrictions for content licensing

The challenge is that proxy technology has evolved significantly. A decade ago, detecting proxies was relatively straightforward. Today, residential proxy networks can make traffic appear to originate from legitimate ISP customers, making detection far more nuanced.

Proxy Types Overview

Understanding the different proxy categories is fundamental to detection. Each type has distinct technical characteristics that inform detection strategies.

Datacenter Proxies

These originate from servers hosted in commercial data centers. Key characteristics:

  • ASN ownership: Typically owned by hosting companies (AWS, DigitalOcean, OVH, Hetzner)
  • IP ranges: Allocated in large, contiguous blocks
  • Reverse DNS: Often resolves to generic hostnames like server-123.provider.com
  • Latency: Very low and consistent (typically 1-20ms to major endpoints)
  • Detection difficulty: Relatively easy via ASN lookup

Residential Proxies

Traffic routed through real ISP-assigned IP addresses, often via SDK integrations in consumer apps or browser extensions.

  • ASN ownership: Consumer ISPs (Comcast, BT, Deutsche Telekom)
  • IP ranges: Scattered, non-contiguous allocations
  • Reverse DNS: Typical ISP patterns like cpe-192-168-1-1.socal.res.rr.com
  • Latency: Higher and more variable (reflecting real home network conditions)
  • Detection difficulty: Challenging—requires behavioral and pattern analysis

Mobile Proxies

Traffic routed through mobile carrier networks (4G/5G connections).

  • ASN ownership: Mobile carriers (Verizon Wireless, T-Mobile, Vodafone)
  • IP characteristics: Frequently use CGNAT, meaning many users share one IP
  • Rotation: IPs change frequently as devices move between towers
  • Detection difficulty: Very challenging due to legitimate CGNAT usage

ISP Proxies

A hybrid category—datacenter infrastructure but using IP ranges leased from or registered as ISP allocations.

  • ASN ownership: Appears as ISP, but hosting location reveals datacenter
  • Characteristics: Combines datacenter stability with ISP-like ASN classification
  • Detection difficulty: Moderate—cross-referencing ASN type with hosting indicators helps

How to Detect Whether an IP Is a Proxy

Let's examine each detection method with technical depth, implementation code, and honest assessment of limitations.

1. ASN Lookup

The Autonomous System Number identifies the network operator responsible for an IP range. This is often the first and most reliable signal.

Principle: Datacenter IPs belong to hosting ASNs, while legitimate users typically connect through ISP or mobile carrier ASNs.

Implementation using IPinfo (free tier available):

import requests

def get_asn_info(ip: str, token: str = "") -> dict:
    """
    Query ASN information for an IP address.
    Returns org name, ASN, and inferred type.
    """
    url = f"https://ipinfo.io/{ip}/json"
    params = {"token": token} if token else {}

    try:
        response = requests.get(url, params=params, timeout=5)
        response.raise_for_status()
        data = response.json()

        org = data.get("org", "")

        # Heuristic: check for common datacenter indicators
        datacenter_keywords = [
            "hosting", "cloud", "server", "datacenter", "data center",
            "digitalocean", "amazon", "google cloud", "microsoft",
            "ovh", "hetzner", "linode", "vultr", "choopa"
        ]

        is_likely_datacenter = any(
            kw in org.lower() for kw in datacenter_keywords
        )

        return {
            "ip": ip,
            "asn": org.split()[0] if org else None,
            "org": org,
            "is_datacenter": is_likely_datacenter,
            "country": data.get("country"),
            "city": data.get("city")
        }
    except requests.RequestException as e:
        return {"ip": ip, "error": str(e)}

# Example usage
result = get_asn_info("8.8.8.8")
print(result)
Enter fullscreen mode Exit fullscreen mode

When it works well: Identifying datacenter proxies, VPN endpoints from commercial providers.

Limitations: Residential proxies use legitimate ISP ASNs, making this method ineffective against them. Also, some legitimate traffic (corporate VPNs, cloud-based services) originates from datacenter ASNs.

2. Reverse DNS Lookup

The PTR record for an IP often reveals information about the network type and purpose.

Principle: ISPs typically configure reverse DNS with identifiable patterns. Datacenter IPs often have generic or hosting-related hostnames.

import socket

def reverse_dns_lookup(ip: str) -> dict:
    """
    Perform reverse DNS lookup and analyze the hostname pattern.
    """
    try:
        hostname, _, _ = socket.gethostbyaddr(ip)

        # Analyze hostname patterns
        hostname_lower = hostname.lower()

        indicators = {
            "is_residential_pattern": any(p in hostname_lower for p in [
                "dsl", "cable", "res", "home", "dynamic", "pool", "dhcp"
            ]),
            "is_datacenter_pattern": any(p in hostname_lower for p in [
                "server", "cloud", "host", "dedicated", "vps", "static"
            ]),
            "is_mobile_pattern": any(p in hostname_lower for p in [
                "mobile", "wireless", "4g", "5g", "lte", "cell"
            ])
        }

        return {
            "ip": ip,
            "hostname": hostname,
            **indicators
        }
    except socket.herror:
        return {"ip": ip, "hostname": None, "no_ptr": True}
    except socket.gaierror as e:
        return {"ip": ip, "error": str(e)}

# Example
print(reverse_dns_lookup("8.8.8.8"))
Enter fullscreen mode Exit fullscreen mode

When it works well: Distinguishing ISP residential connections from datacenter hosting.

Limitations: Many IPs lack PTR records entirely. Sophisticated proxy operators can configure misleading reverse DNS. Some legitimate services also lack proper PTR configuration.

3. Open Port Detection

Proxy servers often expose characteristic ports. Scanning for these can identify misconfigured or intentionally open proxies.

Principle: SOCKS proxies typically use ports 1080, 1081, 4145. HTTP proxies commonly use 3128, 8080, 8888. Open ports suggest proxy server presence.

import socket
from concurrent.futures import ThreadPoolExecutor, as_completed

COMMON_PROXY_PORTS = [
    (1080, "SOCKS"),
    (1081, "SOCKS"),
    (3128, "HTTP/Squid"),
    (8080, "HTTP"),
    (8888, "HTTP"),
    (8118, "Privoxy"),
    (9050, "Tor SOCKS"),
    (9051, "Tor Control"),
]

def check_port(ip: str, port: int, timeout: float = 2.0) -> bool:
    """Check if a specific port is open."""
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.settimeout(timeout)
    try:
        result = sock.connect_ex((ip, port))
        return result == 0
    except socket.error:
        return False
    finally:
        sock.close()

def scan_proxy_ports(ip: str) -> dict:
    """
    Scan common proxy ports on the target IP.
    Returns list of open proxy-related ports.
    """
    open_ports = []

    with ThreadPoolExecutor(max_workers=8) as executor:
        futures = {
            executor.submit(check_port, ip, port): (port, name)
            for port, name in COMMON_PROXY_PORTS
        }

        for future in as_completed(futures):
            port, name = futures[future]
            if future.result():
                open_ports.append({"port": port, "service": name})

    return {
        "ip": ip,
        "open_proxy_ports": open_ports,
        "proxy_port_detected": len(open_ports) > 0
    }

# Example (be careful with scanning policies)
# result = scan_proxy_ports("example.com")
Enter fullscreen mode Exit fullscreen mode

When it works well: Detecting open/misconfigured proxy servers, identifying Tor exit nodes.

Limitations: Most commercial proxy services don't expose ports directly to arbitrary scanners. Port scanning may violate terms of service or laws depending on jurisdiction. Firewalls block most scans.

4. IP Reputation APIs

Specialized services maintain databases of known proxy, VPN, and malicious IPs based on various signals.

Principle: These services aggregate data from honeypots, abuse reports, traffic analysis, and other sources to score IP addresses.

import requests

def check_ip_reputation_abuseipdb(ip: str, api_key: str) -> dict:
    """
    Check IP reputation using AbuseIPDB.
    Free tier: 1000 checks/day
    """
    url = "https://api.abuseipdb.com/api/v2/check"
    headers = {
        "Accept": "application/json",
        "Key": api_key
    }
    params = {
        "ipAddress": ip,
        "maxAgeInDays": 90
    }

    try:
        response = requests.get(url, headers=headers, params=params, timeout=10)
        response.raise_for_status()
        data = response.json().get("data", {})

        return {
            "ip": ip,
            "abuse_confidence_score": data.get("abuseConfidenceScore", 0),
            "total_reports": data.get("totalReports", 0),
            "is_public": data.get("isPublic", False),
            "is_tor": data.get("isTor", False),
            "usage_type": data.get("usageType", "unknown")
        }
    except requests.RequestException as e:
        return {"ip": ip, "error": str(e)}

# Alternative: Using IP-API (no key required, limited)
def check_ip_basic(ip: str) -> dict:
    """Basic IP info check using ip-api.com (free, no key)."""
    try:
        response = requests.get(
            f"http://ip-api.com/json/{ip}?fields=status,country,isp,org,as,proxy,hosting",
            timeout=5
        )
        data = response.json()

        if data.get("status") == "success":
            return {
                "ip": ip,
                "isp": data.get("isp"),
                "org": data.get("org"),
                "as": data.get("as"),
                "is_proxy": data.get("proxy", False),
                "is_hosting": data.get("hosting", False)
            }
        return {"ip": ip, "error": "lookup failed"}
    except requests.RequestException as e:
        return {"ip": ip, "error": str(e)}
Enter fullscreen mode Exit fullscreen mode

When it works well: Identifying known VPN endpoints, Tor exit nodes, previously flagged abusive IPs.

Limitations: Residential proxies often aren't in these databases. Data can be stale. False positives affect legitimate users on shared IPs or after IP reassignment.

5. TLS Fingerprinting

The TLS handshake reveals client characteristics that can indicate automated or proxy-relayed traffic.

Principle: Different TLS implementations (browsers, curl, Python requests, proxy software) produce distinct fingerprints based on cipher suites, extensions, and ordering.

This typically requires server-side implementation. Here's a conceptual approach:

# Server-side concept (would need a WSGI/ASGI middleware or raw socket handling)
# This is illustrative - actual implementation requires low-level TLS access

"""
TLS Fingerprinting Signals:
1. JA3 fingerprint - hash of TLS client hello parameters
2. Cipher suite ordering
3. Supported extensions
4. ALPN protocols offered

Example JA3 fingerprints (illustrative, not exhaustive):
- Chrome 120: specific 32-char hash
- Firefox 121: different hash
- Python requests: another hash
- curl: yet another hash

Detection logic:
- If User-Agent claims "Chrome" but JA3 matches "Python requests" → suspicious
- If JA3 matches known proxy software → flag
"""

def analyze_tls_mismatch(user_agent: str, ja3_fingerprint: str, 
                          known_fingerprints: dict) -> dict:
    """
    Compare claimed User-Agent against TLS fingerprint.
    known_fingerprints: mapping of fingerprints to client types
    """
    claimed_browser = None
    if "chrome" in user_agent.lower():
        claimed_browser = "chrome"
    elif "firefox" in user_agent.lower():
        claimed_browser = "firefox"
    elif "safari" in user_agent.lower():
        claimed_browser = "safari"

    actual_client = known_fingerprints.get(ja3_fingerprint, "unknown")

    mismatch = (
        claimed_browser is not None and 
        actual_client != "unknown" and
        claimed_browser not in actual_client.lower()
    )

    return {
        "claimed_browser": claimed_browser,
        "tls_client_type": actual_client,
        "fingerprint_mismatch": mismatch
    }
Enter fullscreen mode Exit fullscreen mode

When it works well: Detecting automation tools, identifying when User-Agent is spoofed while TLS fingerprint reveals true client.

Limitations: Requires server-side TLS introspection capability. Sophisticated proxies can relay TLS connections transparently or use browser-like fingerprints.

6. Latency Pattern Analysis

Network timing characteristics differ between datacenter, residential, and mobile connections.

Principle: Datacenter proxies typically show very low, consistent latency. Residential connections have higher latency with more jitter. Mobile connections show distinctive patterns.

import subprocess
import statistics
import re

def measure_latency_pattern(ip: str, count: int = 10) -> dict:
    """
    Measure latency statistics to infer connection type.
    Uses system ping command for reliability.
    """
    try:
        # Run ping command
        result = subprocess.run(
            ["ping", "-c", str(count), "-W", "2", ip],
            capture_output=True,
            text=True,
            timeout=30
        )

        # Parse RTT values from output
        rtt_pattern = r"time=(\d+\.?\d*)\s*ms"
        rtts = [float(m) for m in re.findall(rtt_pattern, result.stdout)]

        if len(rtts) < 3:
            return {"ip": ip, "error": "insufficient responses"}

        avg_rtt = statistics.mean(rtts)
        std_dev = statistics.stdev(rtts) if len(rtts) > 1 else 0
        jitter = std_dev / avg_rtt if avg_rtt > 0 else 0  # coefficient of variation

        # Heuristic classification
        # Datacenter: low latency (<30ms), low jitter (<0.1)
        # Residential: medium latency (30-100ms), medium jitter (0.1-0.3)
        # Mobile/satellite: high latency or high jitter

        if avg_rtt < 30 and jitter < 0.1:
            likely_type = "datacenter"
        elif avg_rtt < 100 and jitter < 0.3:
            likely_type = "residential"
        else:
            likely_type = "mobile_or_congested"

        return {
            "ip": ip,
            "avg_rtt_ms": round(avg_rtt, 2),
            "std_dev_ms": round(std_dev, 2),
            "jitter_coefficient": round(jitter, 3),
            "samples": len(rtts),
            "inferred_type": likely_type
        }

    except subprocess.TimeoutExpired:
        return {"ip": ip, "error": "timeout"}
    except Exception as e:
        return {"ip": ip, "error": str(e)}

# Example
# result = measure_latency_pattern("8.8.8.8")
Enter fullscreen mode Exit fullscreen mode

When it works well: Distinguishing datacenter from residential when ASN alone is ambiguous.

Limitations: Network conditions vary. Geographic distance affects latency. Not reliable as a sole signal. ICMP may be blocked.

7. User Behavior Signals

Beyond the IP itself, request patterns and browser fingerprints provide valuable signals.

Principle: Legitimate users exhibit human-like behavior patterns. Proxy-using bots often show anomalous timing, consistency, or capability mismatches.

from datetime import datetime, timedelta
from collections import defaultdict

class BehaviorAnalyzer:
    """
    Track and analyze request patterns per IP.
    Designed for integration with web frameworks.
    """

    def __init__(self):
        self.request_times = defaultdict(list)
        self.fingerprints = defaultdict(set)

    def record_request(self, ip: str, fingerprint: str, timestamp: datetime = None):
        """Record a request from an IP with its browser fingerprint."""
        ts = timestamp or datetime.utcnow()
        self.request_times[ip].append(ts)
        self.fingerprints[ip].add(fingerprint)

        # Keep only last hour of data
        cutoff = ts - timedelta(hours=1)
        self.request_times[ip] = [
            t for t in self.request_times[ip] if t > cutoff
        ]

    def analyze_ip(self, ip: str) -> dict:
        """Analyze behavior patterns for an IP."""
        times = self.request_times.get(ip, [])
        fps = self.fingerprints.get(ip, set())

        if len(times) < 2:
            return {"ip": ip, "insufficient_data": True}

        # Calculate request intervals
        intervals = [
            (times[i+1] - times[i]).total_seconds()
            for i in range(len(times) - 1)
        ]

        avg_interval = sum(intervals) / len(intervals)

        # Check for machine-like regularity
        if intervals:
            variance = sum((i - avg_interval)**2 for i in intervals) / len(intervals)
            regularity = variance ** 0.5 / avg_interval if avg_interval > 0 else 0
        else:
            regularity = 0

        signals = {
            "ip": ip,
            "requests_per_hour": len(times),
            "avg_interval_seconds": round(avg_interval, 2),
            "interval_regularity": round(regularity, 3),
            "unique_fingerprints": len(fps),
            # Flags
            "high_rate": len(times) > 100,  # >100 req/hour
            "too_regular": regularity < 0.1 and len(times) > 10,  # machine-like
            "fingerprint_rotation": len(fps) > 3  # multiple fingerprints = suspicious
        }

        return signals

# Usage in a web application
analyzer = BehaviorAnalyzer()

def on_request(ip, browser_fingerprint):
    analyzer.record_request(ip, browser_fingerprint)
    behavior = analyzer.analyze_ip(ip)

    if behavior.get("high_rate") or behavior.get("too_regular"):
        # Flag for review or rate limit
        pass
Enter fullscreen mode Exit fullscreen mode

When it works well: Identifying automated traffic regardless of IP type. Catching proxy rotation (same behavior, changing IPs).

Limitations: Requires accumulating data over time. Privacy considerations with fingerprinting. Sophisticated bots can mimic human patterns.

8. Proxy Pool Abnormal Patterns

When attackers use proxy pools, detectable patterns emerge at scale.

Principle: Proxy pools exhibit characteristics like: IPs appearing briefly then disappearing, unusual geographic distribution, IPs sharing behavioral patterns across different "users."

from collections import defaultdict
from datetime import datetime, timedelta

class ProxyPoolDetector:
    """
    Detect proxy pool usage patterns across your traffic.
    """

    def __init__(self):
        self.ip_first_seen = {}
        self.ip_last_seen = {}
        self.ip_request_count = defaultdict(int)
        self.subnet_activity = defaultdict(set)  # /24 subnet -> set of IPs

    def record_ip(self, ip: str, timestamp: datetime = None):
        ts = timestamp or datetime.utcnow()

        if ip not in self.ip_first_seen:
            self.ip_first_seen[ip] = ts
        self.ip_last_seen[ip] = ts
        self.ip_request_count[ip] += 1

        # Track /24 subnet
        subnet = ".".join(ip.split(".")[:3])
        self.subnet_activity[subnet].add(ip)

    def get_pool_indicators(self, lookback_hours: int = 24) -> dict:
        """
        Analyze traffic for proxy pool indicators.
        """
        now = datetime.utcnow()
        cutoff = now - timedelta(hours=lookback_hours)

        # Filter to recent IPs
        recent_ips = {
            ip for ip, ts in self.ip_last_seen.items()
            if ts > cutoff
        }

        # Calculate metrics
        ephemeral_ips = []  # IPs seen briefly
        for ip in recent_ips:
            first = self.ip_first_seen[ip]
            last = self.ip_last_seen[ip]
            duration = (last - first).total_seconds()

            # IP appeared, made few requests, disappeared
            if duration < 300 and self.ip_request_count[ip] < 5:
                ephemeral_ips.append(ip)

        # Subnets with many unique IPs (possible rotation within datacenter blocks)
        suspicious_subnets = [
            (subnet, len(ips))
            for subnet, ips in self.subnet_activity.items()
            if len(ips) > 10  # More than 10 unique IPs from same /24
        ]

        return {
            "total_unique_ips": len(recent_ips),
            "ephemeral_ip_count": len(ephemeral_ips),
            "ephemeral_ratio": len(ephemeral_ips) / len(recent_ips) if recent_ips else 0,
            "suspicious_subnets": sorted(suspicious_subnets, key=lambda x: -x[1])[:10],
            "pool_likelihood": "high" if len(ephemeral_ips) / max(len(recent_ips), 1) > 0.3 else "low"
        }
Enter fullscreen mode Exit fullscreen mode

When it works well: Detecting coordinated attacks using commercial proxy services.

Limitations: Requires significant traffic volume to establish patterns. Legitimate CDNs and mobile carriers also show IP diversity.

End-to-End Detection Pipeline

Combining multiple signals provides more reliable detection than any single method. Here's a comprehensive pipeline:

flowchart TD
    A[Incoming Request] --> B[Extract IP Address]
    B --> C[ASN Lookup]
    C --> D{Datacenter ASN?}
    D -->|Yes| E[High Risk Score +40]
    D -->|No| F[Low Risk Score +0]

    E --> G[Reverse DNS Check]
    F --> G

    G --> H{Hosting Pattern?}
    H -->|Yes| I[Risk Score +20]
    H -->|No| J[No Change]

    I --> K[IP Reputation Check]
    J --> K

    K --> L{Known Proxy/VPN?}
    L -->|Yes| M[Risk Score +30]
    L -->|No| N[No Change]

    M --> O[Behavioral Analysis]
    N --> O

    O --> P{Anomalous Behavior?}
    P -->|Yes| Q[Risk Score +25]
    P -->|No| R[No Change]

    Q --> S[Calculate Final Score]
    R --> S

    S --> T{Score >= 50?}
    T -->|Yes| U[Flag as Likely Proxy]
    T -->|No| V[Allow with Monitoring]

    U --> W[Apply Mitigation]
    V --> X[Log for Analysis]
Enter fullscreen mode Exit fullscreen mode

Scoring Logic Summary:

Signal Points Condition
Datacenter ASN +40 ASN belongs to hosting provider
Hosting rDNS +20 Reverse DNS indicates server/hosting
Known Proxy DB +30 IP flagged in reputation database
Behavioral Anomaly +25 High rate, regularity, or fingerprint rotation
Open Proxy Ports +15 SOCKS/HTTP proxy ports accessible

Threshold recommendations:

  • Score >= 70: Block or require additional verification
  • Score 50-69: Increase friction (CAPTCHA, rate limit)
  • Score < 50: Allow with standard monitoring

Case Studies

Case Study 1: E-commerce Fraud Prevention

Scenario: An online retailer notices a spike in orders that are later flagged as fraudulent chargebacks. Investigation reveals orders placed with stolen credit cards.

Detection Approach:

  1. ASN Analysis: Cross-reference order IPs with ASN data. Flag orders from known hosting ASNs (DigitalOcean, Vultr, etc.) for review.

  2. Velocity Checks: Monitor for multiple orders from the same IP or /24 subnet within short timeframes, even if using different accounts.

  3. Geographic Mismatch: Compare IP geolocation with billing address and shipping destination. Significant mismatches increase risk score.

  4. Behavioral Signals: Track time-on-site, pages visited, and checkout speed. Fraudulent sessions often navigate directly to purchase without typical browsing patterns.

Outcome Logic: Orders with datacenter IPs + geographic mismatch + rapid checkout pattern → automatic hold for manual review.

Case Study 2: Ad Verification

Scenario: An advertising network needs to verify that impressions come from genuine users, not bots running through proxy networks.

Detection Approach:

  1. ASN Classification: Build an allowlist of legitimate ISP and mobile carrier ASNs. Traffic from datacenter ASNs receives zero or reduced credit.

  2. TLS Fingerprinting: Compare User-Agent claims against TLS fingerprints. If traffic claims to be Chrome on Windows but TLS indicates Python or headless browser, discount the impression.

  3. Impression Timing: Analyze timing patterns. Bot farms often generate impressions with unnatural regularity. Genuine users show variable timing.

  4. Session Depth: Track user journey. Real users scroll, click, and engage. Proxy-routed bot traffic often generates impressions without meaningful interaction.

Outcome Logic: Impression from residential ASN + matching TLS fingerprint + organic engagement pattern → full credit. Datacenter ASN + fingerprint mismatch → flagged as invalid traffic (IVT).

Case Study 3: API Rate Limit Evasion

Scenario: A SaaS API provider notices certain accounts consuming resources far beyond normal usage, despite rate limits. Investigation suggests distributed requests through proxy pools.

Detection Approach:

  1. Account-Level Analysis: Group requests by authenticated account, not just IP. Single account making requests from 100+ unique IPs suggests proxy usage.

  2. IP Correlation: Identify IPs that appear briefly across multiple accounts. This pattern indicates shared proxy pool infrastructure.

  3. Request Fingerprinting: Analyze request characteristics (headers, timing, payload patterns). Different accounts showing identical request signatures suggest automation.

  4. Subnet Analysis: Monitor for unusual concentration of traffic from specific IP ranges that historically haven't been associated with the account's declared location or industry.

Outcome Logic: Account with >50 unique IPs/day + ephemeral IP pattern + consistent request signatures → flag for terms of service review and potential suspension.

Sample Code: Build Your Own Lightweight Proxy Detector

Here's a complete, runnable proxy detection script using only public APIs and standard libraries:

#!/usr/bin/env python3
"""
Lightweight Proxy Detector
Combines multiple signals to score IP addresses for proxy likelihood.
Uses only public APIs - no API keys required for basic functionality.
"""

import socket
import subprocess
import re
import statistics
from concurrent.futures import ThreadPoolExecutor
from dataclasses import dataclass
from typing import Optional

import requests

@dataclass
class DetectionResult:
    ip: str
    asn_info: dict
    rdns_info: dict
    latency_info: dict
    reputation_info: dict
    risk_score: int
    risk_level: str
    signals: list

class ProxyDetector:
    """Multi-signal proxy detection with configurable thresholds."""

    # Known datacenter ASN keywords
    DATACENTER_KEYWORDS = [
        "amazon", "aws", "digitalocean", "google cloud", "microsoft azure",
        "linode", "vultr", "ovh", "hetzner", "choopa", "hosting",
        "server", "cloud", "datacenter", "data center"
    ]

    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "ProxyDetector/1.0"
        })

    def check_asn(self, ip: str) -> dict:
        """Query ASN information using ip-api.com (free, no key)."""
        try:
            response = self.session.get(
                f"http://ip-api.com/json/{ip}",
                params={"fields": "status,isp,org,as,proxy,hosting"},
                timeout=5
            )
            data = response.json()

            if data.get("status") != "success":
                return {"error": "lookup failed"}

            org_lower = (data.get("org", "") + data.get("isp", "")).lower()
            is_datacenter = any(kw in org_lower for kw in self.DATACENTER_KEYWORDS)

            return {
                "asn": data.get("as", "").split()[0] if data.get("as") else None,
                "org": data.get("org"),
                "isp": data.get("isp"),
                "is_datacenter": is_datacenter,
                "api_proxy_flag": data.get("proxy", False),
                "api_hosting_flag": data.get("hosting", False)
            }
        except Exception as e:
            return {"error": str(e)}

    def check_rdns(self, ip: str) -> dict:
        """Perform reverse DNS lookup."""
        try:
            hostname, _, _ = socket.gethostbyaddr(ip)
            hostname_lower = hostname.lower()

            residential_patterns = ["dsl", "cable", "res", "home", "dynamic", "pool", "dhcp", "dial"]
            hosting_patterns = ["server", "cloud", "host", "vps", "dedicated", "static"]

            return {
                "hostname": hostname,
                "is_residential_pattern": any(p in hostname_lower for p in residential_patterns),
                "is_hosting_pattern": any(p in hostname_lower for p in hosting_patterns)
            }
        except socket.herror:
            return {"hostname": None, "no_ptr": True}
        except Exception as e:
            return {"error": str(e)}

    def check_latency(self, ip: str, count: int = 5) -> dict:
        """Measure latency characteristics."""
        try:
            result = subprocess.run(
                ["ping", "-c", str(count), "-W", "2", ip],
                capture_output=True,
                text=True,
                timeout=20
            )

            rtts = [float(m) for m in re.findall(r"time=(\d+\.?\d*)", result.stdout)]

            if len(rtts) < 2:
                return {"error": "insufficient ping responses"}

            avg_rtt = statistics.mean(rtts)
            std_dev = statistics.stdev(rtts)
            jitter_coef = std_dev / avg_rtt if avg_rtt > 0 else 0

            # Infer connection type from latency characteristics
            if avg_rtt < 25 and jitter_coef < 0.15:
                inferred = "datacenter_likely"
            elif avg_rtt < 80:
                inferred = "residential_likely"
            else:
                inferred = "high_latency"

            return {
                "avg_rtt_ms": round(avg_rtt, 1),
                "std_dev_ms": round(std_dev, 1),
                "jitter_coefficient": round(jitter_coef, 3),
                "inferred_type": inferred
            }
        except subprocess.TimeoutExpired:
            return {"error": "ping timeout"}
        except FileNotFoundError:
            return {"error": "ping not available"}
        except Exception as e:
            return {"error": str(e)}

    def calculate_score(self, asn: dict, rdns: dict, latency: dict) -> tuple[int, list]:
        """Calculate risk score based on all signals."""
        score = 0
        signals = []

        # ASN signals (max 45 points)
        if asn.get("is_datacenter"):
            score += 35
            signals.append("datacenter_asn")
        if asn.get("api_proxy_flag"):
            score += 25
            signals.append("api_flagged_proxy")
        if asn.get("api_hosting_flag"):
            score += 10
            signals.append("api_flagged_hosting")

        # rDNS signals (max 20 points)
        if rdns.get("is_hosting_pattern"):
            score += 15
            signals.append("hosting_rdns_pattern")
        if rdns.get("no_ptr"):
            score += 5
            signals.append("no_ptr_record")
        if rdns.get("is_residential_pattern"):
            score -= 10  # Reduce score for residential indicators
            signals.append("residential_rdns_pattern")

        # Latency signals (max 15 points)
        if latency.get("inferred_type") == "datacenter_likely":
            score += 15
            signals.append("datacenter_latency_pattern")

        # Ensure score stays in reasonable range
        score = max(0, min(100, score))

        return score, signals

    def detect(self, ip: str) -> DetectionResult:
        """Run full detection pipeline on an IP address."""

        # Run checks (some in parallel)
        with ThreadPoolExecutor(max_workers=3) as executor:
            asn_future = executor.submit(self.check_asn, ip)
            rdns_future = executor.submit(self.check_rdns, ip)
            latency_future = executor.submit(self.check_latency, ip)

            asn_info = asn_future.result()
            rdns_info = rdns_future.result()
            latency_info = latency_future.result()

        # Calculate composite score
        score, signals = self.calculate_score(asn_info, rdns_info, latency_info)

        # Determine risk level
        if score >= 60:
            risk_level = "HIGH"
        elif score >= 35:
            risk_level = "MEDIUM"
        else:
            risk_level = "LOW"

        return DetectionResult(
            ip=ip,
            asn_info=asn_info,
            rdns_info=rdns_info,
            latency_info=latency_info,
            reputation_info={},  # Would add API-key-based checks here
            risk_score=score,
            risk_level=risk_level,
            signals=signals
        )

def main():
    """Example usage."""
    detector = ProxyDetector()

    # Test IPs (use IPs you have permission to test)
    test_ips = [
        "8.8.8.8",      # Google DNS (datacenter)
        "1.1.1.1",      # Cloudflare DNS (datacenter)
    ]

    print("=" * 60)
    print("Lightweight Proxy Detector")
    print("=" * 60)

    for ip in test_ips:
        print(f"\nAnalyzing: {ip}")
        print("-" * 40)

        result = detector.detect(ip)

        print(f"ASN: {result.asn_info.get('org', 'Unknown')}")
        print(f"rDNS: {result.rdns_info.get('hostname', 'No PTR')}")
        print(f"Latency: {result.latency_info.get('avg_rtt_ms', 'N/A')} ms")
        print(f"Risk Score: {result.risk_score}/100")
        print(f"Risk Level: {result.risk_level}")
        print(f"Signals: {', '.join(result.signals) or 'None'}")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Running the detector:

# Ensure requests is installed
pip install requests

# Run the detector
python proxy_detector.py
Enter fullscreen mode Exit fullscreen mode

Limitations and False Positives

No proxy detection system is perfect. Understanding the edge cases is crucial for avoiding false positives that frustrate legitimate users.

CGNAT (Carrier-Grade NAT)

Mobile carriers and some ISPs use CGNAT, where thousands of users share a single public IP. This means:

  • High request volume from one IP may be legitimate
  • IP reputation can be unfairly penalized by one bad actor
  • Cannot distinguish individual users by IP alone

Mitigation: Use additional signals (device fingerprinting, account history) for users behind CGNAT ranges.

VPN vs. Proxy Ambiguity

Corporate VPNs and privacy-focused consumers use VPNs legitimately. Flagging all VPN traffic as suspicious creates friction for:

  • Remote workers accessing corporate resources
  • Privacy-conscious users
  • Users in restrictive regions

Mitigation: Consider the context. An API designed for enterprise use should expect VPN traffic. Consumer e-commerce might warrant extra scrutiny.

Enterprise NAT and Shared IPs

Large organizations route all traffic through limited egress IPs. A Fortune 500 company might have 50,000 employees behind a handful of IPs.

Mitigation: Maintain allowlists for known enterprise ranges. Consider request patterns at the IP level rather than applying per-user rate limits.

Cloud-Based Mobile Devices

Cloud phone services (used for testing, accessibility, and legitimate business purposes) run mobile device emulators in datacenters. They appear as datacenter traffic but represent legitimate mobile use cases.

Mitigation: Be cautious about blanket datacenter blocking for mobile-optimized services.

IPv6 Considerations

IPv6 adoption introduces new challenges:

  • Larger address space makes IP reputation databases less comprehensive
  • Privacy extensions rotate IPv6 addresses frequently
  • Some proxy detection techniques require adaptation

Mitigation: Focus on /64 prefix patterns rather than individual addresses for IPv6 traffic analysis.

Conclusion

Detecting proxy traffic is an ongoing arms race. As detection methods improve, proxy providers develop countermeasures. The most effective approach combines multiple signals, acknowledges limitations, and adapts to evolving patterns.

Key takeaways:

  1. Layer your detection: No single signal is definitive. Combine ASN, rDNS, behavior, and reputation data.

  2. Context matters: What's suspicious for a payment endpoint might be normal for a content API.

  3. Accept uncertainty: Design systems that handle ambiguity gracefully—graduated responses rather than binary blocks.

  4. Respect privacy: Proxy detection exists to prevent abuse, not to eliminate user privacy. Balance security needs with user rights.

  5. Stay current: The proxy landscape evolves. P2P residential networks, decentralized VPNs, and encrypted transport are expanding. Continuously update your detection strategies.

As privacy technologies advance and more traffic routes through intermediaries, the challenge will increasingly shift from "detecting proxies" to "assessing request legitimacy" through holistic behavioral analysis rather than IP-centric detection alone.


References

Top comments (0)