Resolving Fail2ban False Positives: Nginx 403 & WebSocket Paths

#fail2ban #nginx #serversecurity #websocket

Resolving Fail2ban False Positives: Nginx 403 & WebSocket Path Exception Handling

One day, out of the blue, a user of our service sent a message saying, "I can't access the site." At first, I thought it was a simple network issue or a temporary server load. However, I was surprised to find that the website wouldn't open at all from a specific network, and even SSH access was blocked – a serious situation. Checking the server status, all metrics like CPU, memory, and disk were normal, and I could connect without issues from other IPs, suggesting that only specific IPs were being completely blocked. As I delved into the cause of this problem, I realized that fail2ban, the security system I had set up myself, was unexpectedly mistaking legitimate users for attackers. In this post, I'd like to share in detail how I resolved this complex and perplexing situation.

What this post covers

Understand cases where fail2ban falsely detects legitimate 403 responses.Identify the operating principles and problems of the nginx-forbidden filter.Learn how to protect administrator IPs using fail2ban's ignoreip setting.Learn how to use fail2ban's ignoreregex setting to exclude 403 responses for specific paths.Discover commands to immediately unban IPs that were incorrectly blocked by fail2ban.Target Audience: Operators who have experienced issues where fail2ban and Nginx together blocked legitimate connections.Difficulty: Intermediate

Complete Blocking of Website and SSH Access for Specific IPs

While operating the service, I received an inquiry: "SSH works, but I can't access the site." Initially, I thought it was a simple network error or a client-side issue, but I confirmed that multiple users were experiencing the same phenomenon and that the website was completely inaccessible from specific IP ranges. In more severe cases, not only the website but also SSH access was blocked, making it difficult to access the server. When monitoring the overall server status, all metrics such as CPU usage, memory, and disk I/O were stable, and the service was being provided without issues from other legitimate IPs. This strongly suggested that the problem was not with the server itself, but rather a system-level block was being applied to specific IPs. It felt as if an invisible barrier had been placed only on certain IPs. This situation could critically impact user experience, so I immediately began investigating the cause.

Fail2ban nginx-forbidden Filter False Positive on WebSocket Path

In the process of tracing the root cause, I discovered that fail2ban, the server's security system, was the culprit. Specifically, the nginx-forbidden filter was causing false positives. We were operating a WebSocket endpoint, the /ws path, on a test vhost, and this path was intentionally configured to return an HTTP 403 Forbidden response for IPs not registered in the whitelist. When a legitimate user opens a web page, the browser attempts to establish a WebSocket connection to this /ws path. If the user's IP is not in the whitelist, Nginx returns a 403 response as expected. The problem was that fail2ban's nginx-forbidden filter was configured to monitor Nginx's access denied (403) logs and block the IP after only maxretry=2 (two attempts). Just by a user opening the page once, the browser would attempt to connect to /ws multiple times, quickly accumulating 403 responses. This led to a vicious cycle where even legitimate customer IPs were mistaken for attackers and banned. Furthermore, this ban was configured to block not only web ports (80/443) but also the SSH port (22), leaving users completely blocked from accessing the website and even the server administrator's SSH access.

Administrator IP Protection and WebSocket Path Exception Handling Applied

To resolve this issue, I took two main actions. First, I added my static IP to fail2ban's ignoreip setting to prevent the administrator's IP from being accidentally blocked. This is a minimum safety measure to ensure that administrators can access the server under any circumstances. I added the following to the jail.local file: # jail.local — Prevent administrator lockout [DEFAULT] ignoreip = 127.0.0.1/8 myStaticIP This setting ensures that localhost and my specified static IP are always excluded from fail2ban's blocking rules. Second, as a more fundamental solution, I modified the nginx-forbidden filter's ignoreregex setting to exclude 403 responses occurring on the /ws path from the ban count. This is to prevent fail2ban from mistaking intended 403 responses for abnormal attacks. I added or modified the following content in the /etc/fail2ban/filter.d/nginx-forbidden.local file: # Exclude /ws from nginx-forbidden filter # /etc/fail2ban/filter.d/nginx-forbidden.local ignoreregex = ^ .* "GET /ws This regular expression ensures that logs for the GET /ws path do not affect the blocking rules. After making changes, you must restart the fail2ban service to apply the settings.

Configuration Verification and Immediate Unbanning of Blocked IPs

After changing the settings, I accessed the website from the same network environment where the problem occurred to confirm that the service was working normally. As expected, site access was smooth, and I meticulously confirmed through fail2ban logs that 403 responses from the /ws path no longer led to blocking by fail2ban. This means that fail2ban is now distinguishing between legitimate 403 responses and malicious attempts, as we intended. If there are still blocked IPs, you can unban them immediately using the following command: sudo fail2ban-client set nginx-forbidden unbanip blockedIP I used this command to manually unban blocked IPs, allowing users to access the service immediately. Through this process, I successfully resolved the fail2ban filter's false positive issue and ensured user accessibility.

Conclusion

I personally experienced that while fail2ban is a powerful security tool, problems can arise when the filter cannot distinguish between 'abnormal traffic' and 'intended legitimate 403 responses'. If there are paths that return 403 or 404 even though they are normal operations, such as health check paths or WebSocket handshakes, you must handle them as exceptions using the ignoreregex setting. Otherwise, the security device might end up blocking legitimate customers. I also realized once again the importance of always adding administrator IPs to ignoreip to establish a minimum safety measure for accessing the server under any circumstances.