The Bot That Shouldn’t Have Taken Down My Umbraco Site, and the WAF Rule That Fixed It

#umbraco #waf #security #webperf

This week, one of our sites was brought down by a Denial of Service attack. It wasn’t really a DoS attack, i.e. I’m sure that wasn’t the attackers’ aim, but the attack did cause a Denial of Service nonetheless.

It was clear a bot was spamming the site with several thousands of requests per minute, but that in itself was well within the threshold that we’d expect the site to be able to handle - this is a modern Umbraco site with load-balancing/auto-scaling and a well optimized codebase after all.

So, what was going on?

The problem was the nature of the traffic – form submissions. Shedloads of form submissions, being made to both Umbraco forms and our custom surface controllers, that were causing errors.

It was those errors that contributed to a significant degradation in performance, leaving us with some challenges that we had to find a way to overcome:

1. Umbraco Forms

Umbraco Forms has a honeypot field built-in – this is a hidden text field that looks tempting to bots, so they fill it in. If a form is submitted with a value in that field, Umbraco Forms will reject the submission. The bot hitting our site was totally falling for the honeypot - great! So what’s the problem? When Umbraco Forms rejects that submission it also logs that rejection.

Logging is a non-trivial concern in performance-critical applications. In Umbraco, every log entry is a disk write operation by default. So, although the site would normally be able to handle this level of traffic easily, these POST requests are resulting in a disk write (which in an Azure Web App is actually a network request under the hood) – gobbling up IO and slowing the site down.

2. Surface controllers

We had a bigger problem with our surface controllers. Surface controllers use a hidden ufprt field to store an encrypted value for routing form submissions. If there’s a problem with this field, i.e. Umbraco can’t decrypt it, an error is thrown.

The bot spamming us was stuffing that field with SQL in the vain hope of carrying out an SQL Injection. Of course Umbraco was unable to decrypt that into a valid ufprt value so threw an error.

So, just like with Umbraco Forms we have an error being logged for each of these requests, with the added overhead of an Exception being thrown.

2.1 reCAPTCHA

There was another problem with our surface controllers: reCAPTCHA.

Like a lot of other .NET projects, we use an attribute on our controllers to handle reCAPTCHA validation. The problem is that this validation was running before Umbraco tried to decrypt the ufprt field – so as well as the exception and extra writes to disk, every request was also making an HTTP request to the reCAPTCHA API, gobbling up precious TCP threads.

We're serving all these requests!

On top of the errors themselves, we’re then returning a 500 error page for most of these requests.

These are dynamic/server-rendered pages on this site so they are using compute – it’s not a big deal, but we don’t want to be wasting those resources on bot requests.

Considerations

Sure, we could look at each of the individual problems above and solve for them but…

I still want to know about errors, so I need logging.
I still want to use reCAPTCHA to validate form submissions, especially from bots.
I still want to serve dynamic error pages (though there’s an argument to be made for better caching).

For legitimate traffic, I still want all of this to happen. For illegitimate traffic I don’t want our app to be responding at all - it’s a waste of resources. In fact, W3C’s Web Sustainability Guidelines recommend filtering suspicious activity for this reason.

We should be blocking this traffic at the edge, and that’s a job for the WAF - In this case Cloudflare.

The Solution

It’s time for a new WAF rule.

Fortunately, there’s a really simple rule that we can deploy in Cloudflare to block this traffic.

I set up a new Rate Limiting Rule with an expression that matches all POST requests, then rate limited it to 25 POST requests per IP address every 10 seconds before issuing a challenge.

No legitimate user is going to be making more than 20 POST requests in 20 seconds. You might even think that’s a little high, but we can only rate-limit by IP address, and multiple users will legitimately be sharing a single IP.

Here’s what that looks like in the portal:

The Result

3 million requests have been blocked since I deployed that rule yesterday and, most importantly, the site has remained online and performant for legitimate users throughout.