1. Background
For some automated bots or malicious crawlers, their website access frequency is high and they stay for a long time. When you open the management console of your cloud server, you often find that most of the network traffic is concentrated on one or a few IPs. This situation can be easily managed by rate limiting the access IPs on the server.
However, rate limiting access IPs is usually unrelated to business logic, and developers typically do not want to maintain an IP access frequency table themselves. Additionally, manually maintaining all visitor information under conditions of distribution and concurrency is costly in terms of development.
Chaitin’s SafeLine WAF perfectly addresses these issues. SafeLine offers rate limiting, port forwarding, manual IP black/white lists, and its core function—defending against web attacks.
2. Installing SafeLine
The official website provides three installation methods: online installation, offline installation. For details, refer to:
SafeLine WAF Installation Guide
3. Log in and Entering SafeLine Management Interface
4. Configuring Sites and Rate Limiting
4.1 SafeLine Site Configuration
SafeLine’s site configuration is comprehensive, including automatically uploading TLS certificate and private key, specifying multiple forwarding ports, and more, no need for developers to configure nginx forwarding.
4.2 Configuring Rate Limiting
You can customize the blocking strategy. It is recommended to set it to 100 requests per 10 seconds, blocking for 10 minutes.
ps: If you test for personal use or find out false positives, you can disable the blocking capability manually.
5. Testing and Others
5.1 Testing
A simple server is prepared in the backend, providing a "hello" interface with an "a" parameter. Here is a simple crawler test code:
import requests
import random
def send_request(url, request_method="GET", header=None, data=None):
try:
if header is None:
header = {"User-Agent": "Mozilla/5.0"}
response = requests.request(request_method, url, headers=header)
return response
except Exception as err:
print(err)
pass
return None
if __name__ == '__main__':
for i in range(0, 100):
char = random.choice('abcdefghijklmnopqrstuvwxyz')
resp = send_request("http://a.com/hello?a=" + char)
print(resp.content)
Output examples:
b'{"a":"u"}'
b'{"a":"m"}'
b'{"a":"y"}'
b'{"a":"o"}'
b'<!DOCTYPE html>\n\n<html lang="zh">\n <head>\n .... (followed by a long HTML text)
At this point, when you revisit the page, you will find that it has been blocked.
5.2 What if Crawlers Spoofing X-Forwarded-For Header
SafeLine can directly set the Source IP acquisition method under ‘General Settings’.
If the crawler spoofs the TCP Source IP field, the HTTP connection will fail during the TCP handshake, and the crawler will lose the ability to scrape information. The request will be discarded when it passes through nginx.
This guide should help you effectively manage bandwidth usage caused by crawlers with SafeLine WAF.
Top comments (0)