Sharon

Posted on Sep 2

Stopping Malicious Web Crawlers from Wasting Your Bandwidth with SafeLine WAF

#safeline #waf #cybersecurity #beginners

1. Background

Automated bots and malicious crawlers can silently eat up your server’s bandwidth by hammering your site with repeated requests. When you check your cloud provider’s dashboard, you may notice that most of your traffic comes from just a handful of IP addresses.

A quick fix is to limit how often those IPs can send requests. But here’s the catch:

Managing an IP frequency table isn’t part of your business logic.
Developers usually don’t want the overhead of maintaining this manually.
Handling it in distributed or concurrent environments can become both complex and costly.

That’s where SafeLine WAF by Chaitin comes in. In addition to protecting against web attacks, SafeLine offers rate limiting, port forwarding, and manual IP allow/deny rules, giving you fine-grained control over traffic without extra development effort.

2. Installing SafeLine

bash -c "$(curl -fsSLk https://waf.chaitin.com/release/latest/manager.sh)" -- --en

Full installation guide: SafeLine Docs – Install

3. Logging into SafeLine

Open your browser and visit:

https://<safeline-ip>:9443/

If you don’t know the admin account yet, reset it with:

docker exec safeline-mgt resetadmin

Output example:

[SafeLine] Initial username：admin  
[SafeLine] Initial password：**********  
[SafeLine] Done

Now log in with those credentials and you’re in ✅

4. Configuring Your Site and Rate Limiting

4.1 Site Configuration

SafeLine lets you configure your site easily:

Upload TLS certificates and keys automatically
Set multiple forwarding ports
Skip manual Nginx tweaks

4.2 Rate Limiting

You can customize rate-limiting rules based on your needs. A common strategy is:

Limit: 100 requests per 10 seconds
Block Duration: 10 minutes

💡 If you’re testing or experience false positives, you can lift the block manually.

5. Testing and Advanced Considerations

5.1 Testing with a Simple Crawler

Here’s a quick Python script you can use to simulate crawler traffic:

def send_request(url, request_method="GET", header=None, data=None):  
    try:  
        if header is None:  
            header = {"User-Agent": "Mozilla/5.0"}  
        response = requests.request(request_method, url, headers=header)  
        return response  
    except Exception as err:  
        print(err)  
    return None  

if __name__ == '__main__':  
    for i in range(100):  
        char = random.choice('abcdefghijklmnopqrstuvwxyz')  
        resp = send_request("http://a.com/hello?a=" + char)  
        print(resp.content)

After enough requests, SafeLine will automatically block further access.

5.2 What If Crawlers Fake the X-Forwarded-For Header?

Some bots spoof the X-Forwarded-For header to disguise their IP. SafeLine makes this easy to handle:

Go to Applications → Advanced → Get Attack IP From
Select Socket Connection

This way, the IP is taken directly from the TCP connection.

And if a crawler goes one step further and fakes the TCP Source IP?
→ The TCP handshake itself will fail. No handshake, no request. The crawler is dead in the water.

Conclusion

By enabling SafeLine’s rate limiting, you can stop malicious crawlers from draining your bandwidth and keep your resources focused on real users. With just a few simple steps, SafeLine protects your site automatically, without forcing you to reinvent the wheel in your own code.

🔗 Explore more: SafeLine Docs
💬 Join the community: Discord

DEV Community