DEV Community

Cover image for AWS re:Invent 2025 - A day in the life of an AWS WAF administrator (NET317)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - A day in the life of an AWS WAF administrator (NET317)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - A day in the life of an AWS WAF administrator (NET317)

In this video, Tzoori Tamam from AWS and Dan Avidan from HSBC present a level 300 session on AWS WAF administration. Tzoori walks through protecting a new application using CloudFront with WAF, covering architecture decisions, origin cloaking with VPC origins and OAC, and leveraging the new flat rate pricing plans with pre-configured protection packs. He demonstrates the built-in dashboards, handling false positives using labels, and cost optimization strategies. Dan shares HSBC's journey building a serverless edge protection platform supporting hundreds of CloudFront distributions across multiple markets, protecting both AWS and non-AWS origins. He discusses challenges including tuning Amazon managed rules, scaling logging during DDoS attacks, and bridging the gap between security and application teams. The session emphasizes starting with manageable policies, continuous tuning, enabling Layer 7 DDoS protection, implementing bot control with Web Bot Off for AI agents, and treating WAF configurations as code for automation.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction: A Day in the Life of a WAF Administrator

Hi. Welcome to Reinvent 2025. I hope you have enjoyed your first day and hopefully you'll enjoy the rest of the week. My name is Tzoori Tamam. I'm a Principal Solutions Architect Specialist for Education Services at AWS. With me is Dan Avidan from HSBC. He'll introduce himself a bit later into the presentation. Today we're going to cover a level 300 talk about a typical, or maybe not so typical, day in a WAF administrator's life.

Thumbnail 30

It will be a bit hectic and a bit technical. The fact that it's a level 300 talk means that we assume you already have some foundational knowledge, so we won't dive into what rules or actions are. We'll dive a bit into labels and maybe some of the newer features, but we do assume some technical level from the start. I apologize for the pace. I talk fast naturally, and we have a lot to pack into 60 minutes, so I'll be even faster. You can watch this later on YouTube at 0.75 speed and it will make more sense, but for now, it's going to be rushed.

Thumbnail 100

We'll build this talk as a story. I'll pretend to be a WAF administrator and take you through one of my days. Later, we'll have an actual story from Dan at HSBC, who will discuss how they journeyed through adopting AWS WAF. So with that, we'll start waking up. It's morning, a new day, a new web application. I just got a notification this morning on the way to work that there's a new application out there.

Thumbnail 110

Thumbnail 130

Morning Challenge: Protecting a New Bot Robot Selling Application

My company probably misunderstood the hype around bots and AI, and they built an actual bot robot selling website and mobile app. There's this new application that now I, as the WAF administrator, need to go out and protect. Initially, this is the architecture. It's built to serve web users and mobile applications sending typical API calls, as well as unknown automated clients such as good bots and bad bots. It's just out there and everybody can use it. The traffic will initially hit CloudFront, our CDN service, as the ingress point, and from there, CloudFront will route the traffic into multiple backend and frontend components.

Thumbnail 190

We'll have an Application Load Balancer act as the load balancer for the web components. We'll have an API Gateway that will route API calls to different Lambda workers and Lambda integrations. We'll have static content from a single page application, so the HTML, CSS, and JavaScript files are all hosted on an S3 bucket, and CloudFront will take care of all that. This is my headache. I got my morning coffee, and now I need to decide how to protect this newly launched application. There are multiple questions we need to ask ourselves when we decide how to protect a specific application.

Thumbnail 200

We need to ask ourselves where. Where do we protect it? We saw multiple components. Where do we put WAF? Do we put it at the very edge on CloudFront? Do we have separate WAFs on the API Gateway and on the ALB? Where should we put it? It really depends. It depends on who's going to manage it and how much time they have. If we want to spend time and really tune each and every policy for the different components of the application, it probably makes sense to have different Web Application Firewall policies on top of the ALB and the API Gateway.

But we don't have that much time to manage all of these applications separately. More importantly, we would want to block attacks as they come in at the very edge of the network. So we will probably enable it on CloudFront. Then we need to ask when. When do we need this application protected? If we have time, we can plan ahead and start things in staging, test it out, and tune it. Then when we're ready to launch into production, we can start turning on the different rules. But I got this call this morning, and the when is now. The application is already out there. I have to protect it.

So in this case, typically we have to make some compromises. We never have all the time in the world, so typically there is some urgency when we deploy WAF. We'll need to make some compromises, at least initially. We'll need to decide that some things will go on immediately in blocking mode, while other things will go on in count mode so we can tune them as traffic starts to flow through the application. But again, this remains to be seen. We need to turn it on and see what happens.

Another question is who will manage this web application firewall policy? Is it me, the security person? Is it the application team? Each application component has its own WAF policy, and that means each of these people needs to manage their own WAF. This really changes the complexity that I can basically put on them. If they need to manage something they don't know really well, they would typically do simple things. On the other hand, if the application people are managing the web application firewall policy, they really know when they introduce new changes to the application, and they really know what those changes are, so they can tune the web application firewall policy according to those application changes.

So again, we need to balance things out. We need to understand and decide what goes where and who manages what. And finally, the what? A WAF policy is a complex thing. It can do many different things. Obviously, we can go and turn on all of the different toggles. Probably it will be a mess. Probably some things will get blocked that weren't meant to be blocked, and we'll have to roll back and do something else. So the what is, what are we putting in place day one and what are we planning to put in place day 100, and how do we plan to roll that out? So what will my policy contain? We don't have time for everything today, so we'll focus on these two questions on the where and on the what for this talk.

Thumbnail 400

Thumbnail 410

Deciding Where to Deploy WAF and Securing Origins with CloudFront

As for the where, one thing we can do, and that's typically the best option, is to put a policy on top of CloudFront. Why? Well, initially we want to block traffic at the very edge, especially if we're talking about volumetric attacks. We do not want them to reach all the way into our origins, into our VPC. There are multiple reasons for why we don't want to do that, but remember that with CloudFront we have 750 plus points of presence, all of them with hundreds of gigabits per second capacity. It's a very well-built mechanism designed to withstand these peaks in traffic, so blocking bad traffic at that point makes a lot of sense.

On the other hand, we might also want to consider building web application firewall policies, smaller ones, more granular ones, and put them in place on top of the different application components and maybe let the application people handle them. Some designs we see have coarse-grain control at the very front on top of CloudFront, with finer-grain controls with smaller WAF policies running on top of the underlying services: ALB, API Gateway, Lambda, or whatever. At the very least, we do want to see WAF running on CloudFront. That would be the best place to start blocking incoming attacks.

Thumbnail 500

Once we have put CloudFront on top of the application, there is one consideration we have to think of, and that is how do we make sure that no one bypasses our WAF. For that, we have to utilize mechanisms known as origin cloaking or origin masquerading. Basically, we want to make sure that only traffic that originated from WAF can hit our origins, be it an S3 bucket or an API Gateway or an ALB.

When it comes to protecting the most common use case where we have a VPC-based application and we need to make sure that only traffic coming from CloudFront with WAF can hit it, the best course of action would be to enable VPC origins. It's a kind of a new feature we have had for a few months, so I don't know how many of you already use it and know it. But it's the best way to protect your origins. Basically, by enabling VPC origins, we turn the load balancer to become an internal resource. They no longer have public IP addresses. They're no longer accessible from the internet. The only thing that can contact these load balancers is the specific CloudFront distribution that we assigned to it. It's not even all of CloudFront; it's a very specific CloudFront distribution, and that's called VPC origin. That's the best case scenario.

In case we have an S3 bucket, the best way to protect that bucket without making it public is to enable Origin Access Control, or OAC, on it. Basically, that means that whenever CloudFront communicates to that bucket, it uses IAM policies. It assumes a role, really, to get access to that bucket just like any other policy that you would put on a bucket. The S3 bucket will now have the CloudFront role assigned and allowed to fetch the objects from the bucket.

This solution will allow you to cloak and mask and protect your resources, basically forcing traffic to go through CloudFront with no other way in. In case you don't have one of these components in the application, maybe you're running an on-premises application or your application is running on another cloud, which we'll discuss later. There are still other ways you can utilize to make sure that no one can contact your origin without going through CloudFront.

Basically, you would want to limit access at the IP level with managed and unmanaged prefix lists. There's a JSON file available on the web as well as a managed prefix list on every VPC that you can use in your security groups to make sure that only CloudFront incoming IP addresses are allowed in. However, that's not good enough because any CloudFront distribution can access the application.

For that reason, we typically want to add a secret header to your CloudFront distribution configuration. Preferably something that rotates every once in a while. That secret header is added into every single request going out of CloudFront towards the origin, and it needs to be enforced at the origin, maybe with an ALB rule or maybe with an NGINX rule. Someone needs to look at the incoming headers and make sure that only requests coming with that secret header are being allowed through.

Thumbnail 700

Building Your First WAF Policy: From CloudFront Flat Rate Plans to Essential Protection Packs

That's just a side discussion to keep a stable and best practice application architecture. Let's move on to the what. How to start building an AWS WAF policy used to be one of my biggest conversation topics with customers. I've been dealing with customers about WAF for too many years that I would like to admit, but starting recently, it became super easy.

One easy way to enable a WAF policy on top of your CloudFront is to go into the CloudFront console. I don't know how many of you saw that this is kind of new. We've had it out for about two weeks. Just pick a flat rate pricing plan that includes WAF in it, and it will already include components of WAF for you.

Thumbnail 750

If we zoom in for a second, we'll see the different plans coming all the way from zero for a very simple configuration still including CloudFront, including DNS, including a simple WAF configuration, all the way to the different pricing plans. By choosing one of these plans, you basically get a pre-made policy which is typically good enough. Now these are big words, typically good enough. It's way better than nothing, and it's a very good start to allow you to start massaging your rules into the policy.

Thumbnail 790

It will allow you to start where it's safe and manageable and work your way up. We'll talk about manageability and security balance a bit later. Zooming into the business offering in the middle there just to see what everything that it includes, I'm just hijacking the discussion for a second for marketing. It does include quite a bit. So it includes DDoS protection, it includes storage in S3, it includes DNS requests. It's a really nice package, and you can have multiples of these in your different distributions.

Let's say you have fifty distributions. Each of them can have one of these packages attached. So in case you're dealing with an application that you know nothing about or you yourself don't know enough to start out with a solid web configuration, that would be a very good way to start from the CloudFront console.

Thumbnail 830

Once you start configuring or enabling WAF on CloudFront, there are very simple questions you can answer by ticking a box. Do you want SQL protection? Do you want rate limiting rules? Do you want to run in count mode or monitor mode only? Very simple toggles. You don't even have to know WAF, or the application owners that you delegate to don't have to know really anything about WAF. It's good enough to start with.

Thumbnail 880

You can definitely, and probably it will be best if you build on top of this and add more controls as you go forward. So this is all the console of AWS CloudFront, really easy to begin building WAF on top of that. When you start from the WAF console, the experience is a bit different. This is a change we've committed to not too many months ago, so I don't know how many of you are familiar with it.

Basically, when you create a new web ACL or protection pack as we call them now, you're being asked simple questions about the application. What kind of an application is it? Is it content delivery? Is it an API? Is it a mix of them? Is it retail? You just check a box and you start building your application configuration from there.

Thumbnail 920

You're basically answering questions about your application, which you or the application owner should know. From there, you can pick one of these pre-made protection packs that include a good enough security policy. It contains more than the CloudFront deployed policy does. Assuming that if you are already on the WAF console, you probably know a bit more and you'll be allowed into more advanced features. So again, there are three options: a basic one with rate limiting and IP reputation services enabled, and bot control in count mode. You can move on to the essential pack, which offers more features. However, more is not necessarily always better. We obviously want to strive for a very robust, feature-rich policy that protects you against different kinds of threats happening on the Internet. But it may incur some costs on you, operational costs.

Typically, when we start too strong with WAF, and I've seen it happen multiple times with all WAFs, when you come on too strong at the first sign of trouble, like you're getting false positives and blocked requests, there's too much to handle and you just switch it off. The hurdle you have to jump through in order to re-enable it after getting burned is super high. So I would suggest starting low and building on top of what you already have. Once you establish trust with how the mechanism works, know where the logs are, and know how to tune things, you can start building from that foundation.

Thumbnail 1020

Once you configure the application policy through WAF, there are a few other things you can configure, such as rate limit thresholds. It could be anything that fits your application. We start here with 1000 requests per IP per 5 minutes, but you can tune that to something as low as 10 requests if that's what your application needs, or as high as 2 billion requests if that's what your application needs. So we start out with a good normal number, but you can tune that. And this part here is super simple but super important to enable: logging. You just have to say you want logging by ticking a box. The same goes on the CloudFront console—it's super easy to enable web logging, and it sends logging towards CloudWatch logs.

Going back to the fixed pricing plan that we announced a few weeks ago, logs are included in that pricing plan, so it's no longer a concern. CloudWatch logs, which are the easiest way to manage WAF logs, are included in the pricing plan, so you can just go ahead, check the box, and enable that. Even without the pricing plan on the good old pay-as-you-go model, we've decreased the price of CloudWatch logs with WAF. We now provide 500 megabytes of free logs per every million WAF requests that you send through, so it really helps reduce the cost in case you don't go with the pricing plan.

Thumbnail 1120

So summarizing, we know what we're going for with the essential pack, and we know what we're getting. We're getting some IP-based blocks, we're getting geolocation-based blocks, we're getting some very good core HTTP compliance protection rules. We're getting what we call known bad inputs or known vulnerabilities like Log4J, all with a simple click. We're even getting the new layer 7 DDoS rule in count mode, so it's already enabled. It's in the right place in the policy at the very top. And all that remains is just to monitor what it does and enable it when we're ready.

Thumbnail 1150

Thumbnail 1190

Coffee Break: Monitoring Traffic with Built-in Dashboards and CloudWatch Logs

Time for a well-earned coffee. I got there in the morning, got the news of the new application, and built with a few clicks the WAF policy that fits my needs. Now it's time to look at the logs. This is what makes me feel good—looking at dashboards all day. I really like that. Imagine that is me. That's the best I could do with AI. I'm a WAF guy, not an AI guy. So out of the box with AWS WAF, you have these built-in dashboards. Again, it used to be a pain to spin up dashboards in WAF, or I would say a relative pain, where you have to know what you're doing and what kind of queries you need to run to build the CloudWatch dashboard widget.

Starting last June, we launched this new UI with super useful dashboard widgets that really allow you to see where traffic is coming from, which rules are being hit, what kind of bot traffic you're seeing, what kind of DDoS traffic you're seeing, and dive into different insights like top IP addresses, top URIs, top user agents, and top fingerprints. All of this useful stuff is enabled by default. The most important thing is that you can start navigating the logs directly from the console, so there's no need to run tiring CloudWatch log queries. It's all built in.

Thumbnail 1240

You can click an IP address on the top IP list and filter by that IP to start seeing what these IPs are doing in your application. Each of these lines is basically clickable. You can see the full request details, the IP address, the reputation, the query string parameters, and the labels that we accumulated for that specific request. Everything is actionable, so you can take anything out of it, put it back in the filter, and ask for more requests with this URL or more requests with this fingerprint. You can still use the good old CloudWatch Log Insights queries if you need to build something more comprehensive or more advanced querying.

Obviously, this is all true if you're using CloudWatch logs, but it makes sense because it's cheaper and included in the pricing plan. If you're using other destinations for logging, you can use the same kind of schemes to build your own dashboards and run Athena queries on top of S3, just like you used to. We didn't change that.

Thumbnail 1330

Lunch Interrupted: Handling False Positives and Tuning with Labels

Looking at the logs, I feel relaxed. I had my coffee. It's time for the lunch break, so I'm just sitting down to have my lunch when I start to hear yells from upstairs in marketing. They say something about 403 customers complaining, probably something they misheard. So I start looking at the dashboard and I see these 403s. I see multiple blocks coming from different IP addresses, all with the cross-site scripting in body rule being hit. I have a sneaking suspicion it's going to be a false positive that I have to mitigate.

So I start diving into the logs more. I see different IP addresses, and I look at these IP addresses and see that they're basically benign. They're doing good things. They're logging in, they're purchasing bots, they're posting on the forum. But they're also triggering this one rule on a very specific web page where we are detecting a cross-site scripting attempt in the body. I'm assuming this is a false positive that we have to deal with.

Thumbnail 1390

So the first rule in dealing with false positives is to stay calm. Don't turn off logging, don't roll back the change. Try and understand exactly what pattern triggers the false positive. Once you do that, it could be a specific URI, it could be a specific parameter, it could be anything really in the request. By looking and navigating through the dashboard, you can really pinpoint it and start building a rule around that.

Thumbnail 1420

So now I'm calm again. First, we go and switch off the rule that triggered the false positive. In our case, it was a cross-site scripting attempt in the request body, so we go and turn that off or switch it to count. Right, so now no more users are getting blocked, but then again we're not enforcing cross-site scripting in body anymore, which is not a good thing.

Thumbnail 1440

So we are building a different rule right after the rule that triggered the false positive. That rule is built in a way that says still block cross-site scripting attacks based on this label, and we'll talk about the label in a second. Still mitigate cross-site scripting attacks, but don't do it if the URI is slash missing robots form. Because that was the one that I saw in my logs. So by that, I basically turned cross-site scripting in body back on into being enforced, but I exempted the specific URI from that rule. All other attacks on that URI are still valid, so this is a very good fine-grained way to mitigate false positives.

Thumbnail 1490

So once we do that, we start tuning. Tuning helps us avoid chaos. So tune your policy often, tune it early, don't wait for something to happen. So review the logs, make sure that you're doing the right things, enough traffic is being blocked.

Thumbnail 1510

Thumbnail 1530

And enough traffic is being let through. When you do make changes, make changes gradually. Don't add five different rules on the same day. Too much is not necessarily better with WAF. Add rules gradually and monitor their effect. Manage your traffic with labels. There's a little starter with labels. I don't know how many of you know labels, but labels are my and your best friend when it comes to WAF.

Thumbnail 1540

Labels are metadata that we add into every single request that goes through WAF. We can add them with managed rules. By default, all managed rules add different labels. For example, these are the labels being emitted by the core rule set—a part of them, there are more. We can add labels on our custom rules as part of the action. We can use them as a match or scope down condition on rules. Let's say I want to not inspect certain traffic from a certain IP address. I can create a custom rule that adds a label to all traffic from that IP address and then use that label to scope down another rule to not inspect that specific traffic.

Labels will help you investigate because they are visible in your logs. They are visible in CloudWatch metrics, and they're also usable if you want to filter out certain rules or certain traffic patterns from your logs because they're too noisy or cost too much or whatever. Labels are a super useful tool. They're like the multi-tool of WAF and can really be used to do anything. Add them always, then think about what to do with them. Labels are your best friends.

Thumbnail 1620

Thumbnail 1640

Afternoon Coffee: Optimizing WAF Costs and Managing Log Retention

Lastly, I know I'm not running late, don't worry. Lastly, typically I'm from Israel, so typically at 2 p.m. we're having a coffee. I don't know how it is in the US, but at 2 p.m. we're breaking for coffee. Typically it's a strong black coffee. At that time, I would use to optimize my cost. So I tuned my policy. I know that I'm blocking the bad stuff. I know that I'm letting through the good stuff. It's already deployed, but now I'm making sure that I'm not making any mistakes when it comes to cost.

In order to control our cost, we initially need to make sure that the right things are getting blocked at the right places with WAF. Make sure that the Layer 7 DDoS rule is at the very top because whenever it blocks traffic, it is not incurring any costs. Blocked requests are free in Layer 7 DDoS even without a pricing plan. Make sure that this rule is going first and that other page rules such as bot control are scoped down and only getting traffic which we think they should inspect. Also look at logs and make sure that logs are not being carried over for too long. Set up a specific log retention period, whatever your business needs, be it 14 days or 7 years, but make sure that you stick to that and don't carry logs forever.

Thumbnail 1720

Use labels again to filter out noisy logs and switch to flat rate plans. Flat rate plans are really predictable with no overages. You know what you're getting every month. When it comes to logs, I will rush through with it because CloudWatch logs are the easiest to configure and typically the cheapest, especially if you have low rates of traffic. When it comes to higher rates, you might want to consider other ways to send out WAF logs. You might want to consider Kinesis Data Firehose, which is outside of the pricing plan and probably the cheapest way for very high volumes of traffic. It can also be used to send traffic to different locations like S3 or an external third party vendor like a SaaS offering.

Thumbnail 1780

Lastly, you can use S3 buckets directly to send out WAF logs directly to an S3 bucket. Behind the scenes it does use CloudWatch logs, so at high volumes you might incur some vending costs that you don't expect. But it's the easiest to configure. Preferably, use CloudWatch logs. If not that, use Kinesis Firehose. If you need S3, use the Kinesis Firehose to send traffic or logs towards S3. Finally, if you're using pricing plans, it's really easy to see if you're about to breach the contract that you signed. You can see based on what you already had in the month how soon you will breach the quota that you got. For example, out of 500 million requests, how many you already sent. Same goal for data transfer. But it's fine if you're about to breach as long as it's not a consistent breach like every month you're breaching.

If you breach your quota in any given month, there will be no overages. If this is happening more often than not, and you're breaching your quotas, eventually you will experience some throttling. However, we have a very clear dashboard in the UI allowing you to keep track of where you're going and whether that plan is the right plan for you. This is the end of my coffee break, and I'll hand it over to Dan.

Thumbnail 1840

Thumbnail 1850

HSBC's Journey: Digital Enablers in a Global Banking Environment

Hello everyone. Hopefully you're having a great reinvent so far. I'm Dan Avidan from HSBC Retail Bank, IWPB International Wealth and Premier Banking, and I'm going to walk you through our journey. First, in terms of who we are and where we sit within the organization, we're digital enablers. We provide things like common frameworks and platforms pertaining to mobile authentication and authorization, business process orchestration, and of course what we're going to talk about today, which is our edge routing and protection platform. The services that we provide specifically allow the various value streams to concentrate on their specific business verticals without having to worry about things such as DDoS protection.

Thumbnail 1880

HSBC is one of the biggest banks in the world, and we operate in a large number of markets. The vast majority of them can adopt AWS, but there are others where we cannot. For example, highly regulated markets that have data residency requirements like China, where we cannot use AWS, or markets like Egypt where we cannot use any cloud providers. We need to make sure that we operate across multiple cloud providers. We operate in a highly regulated environment where every downtime has to be reported to regulators, so protecting the brand and protecting against DDoS attacks is of the utmost importance.

There are some interesting anecdotes. For example, in markets where we own white label brands such as Marks and Spencer Money in the UK, or in Asia Pacific where we own competing brands, we need to keep separation between them to meet anti-competition laws. For instance, Hang Seng Bank versus Hong Kong retail bank. We own not only banking but insurance products as well. All in all, this ultimately creates a fairly diverse traffic pattern to our sites, which we obviously need to take account for in our various systems.

Thumbnail 2010

We have a regulatory obligation to allow our customers to transact. If people can't access their money, it obviously erodes trust in the financial industry, which regulators feel very strongly about. Peak times tend to be challenging times during geopolitical tensions and natural disasters, which makes it challenging as well. In terms of our team specifically, we operate in a large number of markets with multiple non-production and production AWS accounts. We support many value streams, and each one of those value streams has multiple applications. If you do the math, it ultimately ends up being a huge attack surface, which means the protections really have to be top notch.

Thumbnail 2070

We get hundreds of terabytes worth of traffic each month, billions upon billions of requests. We have hundreds of CloudFront distributions, multiple WebACLs, and again we're supporting multiple cloud providers. Unfortunately, the reality is that we're literally under daily attacks. It's not something that happens once in a blue moon. It's literally around the clock that there's some sort of attack going on. I'm not going to dwell on this slide too much because these banking challenges are not specific to us, but it is interesting to know that we're seeing regulators more and more interested in this. Not so much in the US, but more certainly in Europe from the European Central Bank and in Asia Pacific. This is now a mandatory regulatory requirement, so it's quite interesting to see regulators getting involved in mandating the use of these protections.

Thumbnail 2100

In terms of when the journey started, it was interesting enough, also at reinvent back in 2018, which coincided with the need to create a new cloud platform.

I was personally really inspired by all things serverless, so it was a personal crusade to try and do it completely serverlessly, which we did. Of course, the migration to cloud has taken longer than we anticipated. We're still operating in a hybrid cloud environment, which requires special considerations. The intention was to specifically shift security left to try and abstract third-party service providers and our origins from sensitive security tokens, making sure that they don't have to worry about edge protection. This is something that we're providing, and we're doing it all through compute at the edge completely serverlessly.

Thumbnail 2150

HSBC's Serverless Architecture and the Evolving Threat Landscape

So this is the architecture, and nothing too exotic. Shield provides the layer 4 mitigation, which passes requests to CloudFront. We inspect the request, and if not blocked, it passes over to Lambda. You can see there's no EC2 reverse proxy. It's all done serverlessly. This predates CloudFront Functions. Ideally, now some of these we would migrate to CloudFront Functions. The other thing to take away is that we're not protecting just AWS origins. We created a generic abstracted platform that provides protection to a wide variety of origins, specifically on-premises and even other cloud providers, for example, some GCP-based origins.

Thumbnail 2200

So some challenges. One is that it's an ever-evolving threat landscape. Back when we started, DDoS was really not so much of a problem. If it did occur, then AWS Shield would mitigate the layer 4 type attacks, and you get to claim credit without really having to do much for it. AWS Shield would kick in, mitigate the attack, and you're done. Then there are layer 7 attacks, malicious attempts to your domain name slash GTC password, hoping to get lucky. Then there are high-volume attacks, which we in the industry responded to with rate-based tools. Then of course there are low and slow type attacks. And of course, the concern these days is agentic. For example, if I have a genuine agent that I really want to transfer money on my behalf, how do I distinguish my legitimate agent from a malicious agent?

The reality is that unfortunately, executives don't really understand this type of thing. Even if you're not a tier one bank, the type of thing that your management have probably told you is things like, well, make sure that you can confirm that you're getting paged every time we're under attack. No, the team is never going to get any sleep. We're constantly under attack. You want to get paged, for example, when the origins are suffering, when the origins are under duress.

There's going to be a misconception, for example, that you invest in WAF once and it's done. Whatever money I invested this year, I don't have to spend it again next year. No, the reality is that you have to continuously invest. You continuously have to make sure that you enhance WAF to protect against new threats. Specifically, given the serverless nature of our platform, when there were attacks that did breach our perimeters, we got hit, for example, for doing it serverlessly, saying, well, if you were to use EC2, you wouldn't have those issues. Why are you using Lambda?

The key takeaway is that no amount of computing in the world is going to mitigate a really high-volume DDoS attack. It's not about the compute type you use. The key is to have really strong protection in place to thwart the attacks. It's not about the compute type that you're using. The other reality is that engineers or application teams don't really understand this. It's not something that unfortunately they deal with on a daily basis. They probably understand API Gateway really well, Kubernetes, EC2, not really. Plus, they're obviously under pressure to provide new business capabilities. WAF is probably not one of them.

Bridging the Gap: Challenges in WAF Implementation and Tuning at Scale

Cyber, on the other hand, understands security really well but doesn't really understand the applications and how to protect them. This is where our team comes in to try and bridge this gap between cyber and the application team. We're in between trying to bridge that gap. From an application perspective, if you're using AWS Firewall Manager organizational policies like you should say OWASP 10 type rules, then you have to think about how do I integrate application-specific or AWS account-specific rules into those policies. Do I do it at build time? Do I do it at run time? There's some consideration there.

AWS WAF unfortunately, let's say like API Gateway doesn't have deployment stages, so you really have to think about how am I going to roll out my new rules. Maybe I've done a really good job consolidating into a single cloud fund distribution onto a single WAF backend.

But if now you have one WAF that does something that it shouldn't, all of a sudden the entire organization goes down. So you have to think, am I going to do it through blue-green web buckets that I'm toggling between, or maybe I just create artificially more web appliances than I need to just minimize the blast radius for my organization.

It's counterintuitive, but detecting actual attacks can actually be complicated. Maybe the business had a really successful campaign that they didn't notify you about, and you're getting a lot of requests that seem really distributed. Well, is it genuine user traffic, or is it an attack? So you have to think, am I really going to know when I'm being attacked or is it genuine traffic.

If you're using the AWS managed WAFs like you should, you have to think about how to tune them. For example, if you have an AWS WAF rule to restrict a certain size, then what happens if all of a sudden one of your applications needs to be able to support a threshold higher than that, say for document uploads. So you really want to be able to tune for that specific journey, for that specific application to minimize the impact. And this tuning can take time.

Thumbnail 2500

Here is a specific example for tuning an Amazon managed rule. It took us quite a bit of time, including support from AWS. I'm not going to go through it line by line, but you can see the type of changes that we needed to make, ranging from application changes to the way in which we're acquiring the AWS WAF tokens, and some changes even on the AWS side by the service team. Shout out to the service team. Thank you. Quite amazing agility where AWS rolled out some enhancements across the entire state within a sprint. So yes, an agility that we can definitely aspire to. And eventually it was time well spent because we were able to achieve the business accepted level of false positives.

Don't be scared away from it. It was ultimately very valuable because it allowed us to create a cookie cutter report that, post these two initial tuning periods in those two markets, we were able to then roll it out successfully to the other 35 markets.

Thumbnail 2560

Lessons Learned from HSBC: Automation, Continuous Investment, and Proactive Protection

Some challenges on the WAF side. So you really have to think, like Suri mentioned, how do you protect your own premise origin. Previously with VPC origin for the AWS origin, there used to be this restriction that now is no longer there, so we're excited to now roll it out wider. VPC origins no longer need to be within the same AWS account as CloudFront.

If you're adding CloudFront distributions on a regular basis, you need to remember that they don't automatically enroll in the Shield Advanced auto mitigation. So if you're adding a CloudFront distribution, you need to make sure that you're enrolling them so that they're protected. And actually, I only recently realized that while WAF can add custom labels, and while there's no triggers for CloudFront function on Lambda Edge, WAF could in fact inject headers as well, which then you can do quite interesting things with within CloudFront functions or Lambda Edge, so quite useful.

Thumbnail 2630

We learned the hard way that at the time our logging solution was not scaling to DDoS attacks, so something that you want to be thinking about. I'm sure you have a working logging solution, but you want to make sure that it can scale up during a DDoS attack. It's quite important, or else how are you going to analyze the attack and be able to add, for example, IP-based rules to mitigate it. Yes, CloudWatch Logs Insight is there, but for our traffic, we noticed that you can have quite a bit of long running queries. It's not the cheapest service, so we, for example, are exploring using AWS OpenSearch instead.

Thumbnail 2670

Some final lessons learned. I appreciate it's like saying sell stocks high, buy them low, but what I mean by protect against tomorrow's attack, not yesterday. Have you added any new infrastructure? Have you added any new applications? Do applications have new capabilities that you really want to add new WAFs for them? WAF does come with an API. If you're not aware of the WAF automation framework, you can mine the logs for interesting events. If there's any IPs, for example, that are introducing any 404s, any extra or large number of 500 errors, why not automatically block them, say, for a 24 hour period.

Know who your terms and solution architects are. We derive tremendous value from them. They can enroll you in AWS WAF betas and various previews. Continue to invest in your Security and Compliance (SAC) and your WAF team. It's not something where you set WAF and forget it. It requires continuous investment. Of course, the AWS SRT team is always there, but you really want to be able to independently analyze your logs and mine your logs yourself. You want to add, for example, JSON for fingerprint blocking rules yourself. Automation is absolutely key. You don't want to be there at 2 a.m. in the morning figuring out how to revert back a rule, so automation is absolutely key.

Thumbnail 2750

In terms of what's coming up for us next is to ultimately capitalize on the investment that we've made in WAF so far, making sure that we consolidate our various endpoints and thousands upon thousands of domains onto our top-level domains so that they can ultimately benefit from this very protection that we spoke about. Thanks, Dan, and I know HSBC is a big bank. A lot of the challenges they encountered are not your challenges, but learning from them and seeing that it's possible, even at their scale, to basically use WAF as a layer to protect all of the applications they have—thousands and thousands of them—is a really good practice.

Thumbnail 2790

Thumbnail 2820

Looking Ahead: Bot Control, Layer 7 DDoS Protection, and Best Practices for WAF Evolution

Now, as the day grows long, I tend to be more philosophical, and I want to look into the future and see what else is possible. As Dan said, we have to look ahead when it comes to AWS WAF. We can't be reactive. We have to be proactive and be faster, or at least as fast as the attackers that look to harm us. Practicing new rules, adding new rules, configuring, tuning, and tightening the security—I will talk about three different things for the last few minutes. Initially, we'll talk about bot control, we'll talk about Layer 7 protection, and we'll talk about the general use for new rules, new custom rules, and new managed rules.

Thumbnail 2840

When it comes to bot control, my company completely misunderstood what bots really mean. For bot control, you can use the managed tool that we have to first identify common bots for good and bad. Specifically, allowing in search engines or SEO engines is an important thing that can be achieved with common bot control, or maybe blocking noisy health checkers that come from the web just because they're self-identifying—that's an easy thing you can do with common bot control. Detecting and finding the more sophisticated bots, what we call targeted bots—the bots that were built in order to specifically harm your application—this is where we typically enable targeted bot control, the next level of bot control, where we can track abnormal behaviors. Not only a specific request, but looking throughout the session lifetime and seeing which sessions are more volumetric or differ dramatically from normal user sessions, so we can start challenging them with a CAPTCHA, with a JavaScript challenge, or just plainly blocking them if we see that they're acting too suspiciously.

Now, with the age of AI and AI bots, a lot of your incoming traffic—and when I say a lot, sometimes more than 50 percent of your incoming traffic—might be generated by AI agents. Some of them benign, some of them not so much, all of them very noisy, and you have to make sure that you are only allowing the good ones in. By utilizing bot control, make sure that you upgrade to the latest version of bot control, version 4. You can now use Web Bot Off to make sure that only the allowed agents, the verified agents, are allowed into your application and all other agents probably should somehow register and verify with this mechanism. It's not yet a standard, but this is the mechanism most web providers and agent registries use to make sure that only good bots are being allowed through.

Thumbnail 2970

When it comes to Layer 7 DDoS, this is a new rule, so if you're not aware or are not familiar with it, be familiar with it. There will be a QR code at the end that you can scan and learn more about it. By enabling the essentials pack, we got this included in the business pricing plan, and we can start using it. It was created with the override check enabled, so once we saw in the dashboard—there is a specific dashboard for DDoS—once we saw it causes no false positives, and it doesn't, it's a very safe rule to use.

You can check off this box and see that it only acts during DDoS attacks. It is super silent, doing nothing during peacetime, but only when there is an actual event does it start targeting the offensive actors by specific IP addresses, countries, AS numbers, user agents, or fingerprints. Only those will get this challenge or eventually get blocked if they do not behave. Make sure that you use that super important, very powerful tool when it comes to evolving your WAF.

Thumbnail 3040

When adding new rules, make sure that you track your traffic and traffic changes. You might go viral and see more traffic, or you might go viral in a bad way where attackers look to harm you. Make sure that you keep track of all of the traffic, not only the traffic that you block, but also the traffic that you allow in. Are different rules being hit? Are different rules not being hit all of a sudden? Maybe you are missing something. Maybe some attacker evolved and now you are not finding them anymore. Make sure that you track traffic all the time.

On the flip side, strike the right balance between a secure policy and a manageable policy. The most secure computer is locked in a safe, disconnected from the world, and that is not usable. Strike the right balance for you. I do not know how much time you have to invest in WAF, but it is an investment in time. Add the rules that you get the most value out of, even if it is not the perfect policy. That is good enough. The best policy is the one that you can manage. Otherwise, you will just turn it off and do nothing with it, which is way worse.

Thumbnail 3140

Strike the right balance and tighten the right rules. Make sure that the rate-based rules that you have encountered or introduced into the policy not only track IP addresses but also track fingerprints, methods, and labels. Make full use of what WAF has to offer. This is an ongoing effort. You make it tighter and tighter and tighter as the attacker becomes more and more sophisticated.

Another good practice is to maintain a staging environment. Dan mentioned it a bit. Keep track of your changes and roll them out gradually. Make sure that you expect what they will be doing if you are launching them into production. Build your label logic in a way that allows you to test it out in your staging environments. If you have the luxury of it, great. In some scenarios, you can even add the same rules to the same policy in count mode, even managed rules. You can add multiples of them and see how your production traffic would have behaved were they in blocking mode and not in count mode.

Thumbnail 3200

If you do not have the capacity to run a full staging environment, at least do that. Add multiple versions of the same rule into the policy with different actions on them. Order them correctly and see what would happen if you switch a rule into block mode. Again, labels are your friends.

Thumbnail 3210

Thumbnail 3220

Thumbnail 3250

Lastly, remember that AWS WAF configurations are code. If you look at the WAF console, there is this JSON button up there. The entire WAF configuration is just a JSON document. Use it. Use this power. Use the API that WAF provides to automate and use infrastructure as code tooling. It allows for better versioning and better change management. It allows for automation when it comes to new deployments. If you have the resource to do it, and with AI writing most of your automations now, it is super simple and it will save you a lot of pain. Day one, everything is quick and dirty. Day one hundred, probably you should be doing better.

Thumbnail 3260

The day is done. The application is good for now. Tomorrow is a new day, obviously, and on the drive home, I am reflecting on the lessons learned from this busy day. We talked about architecture and cloaking your origins. We talked about a good enough initial policy. Build it however you want, manually with the pricing plan packages, with the WAF configuration protection packs, or whatever, but do something on day one. Even a policy with a single rule in count mode and logging is way better than no policy. At least you get visibility.

Do the one thing you know you can manage and then grow from there. But make sure you do follow up, and do follow up quickly and repeatedly. The web is a bad place. I am not here to scare you. I am just here to show that WAF is an ongoing effort.

I've been doing WAF for too many years now. Utilizing the new pricing plans, they're super predictable and super easy. They contain a bunch of value in them for a fixed price. Make sure that your WAF configurations are evolving as attackers will evolve, so make sure that you introduce new rules, upgrade your rule versions, and if they're managed rules, keep on testing, keep on viewing the logs, and keep on looking for what changed in traffic and rule hits.

Thumbnail 3360

What rules are being hit and what rules are being missed are equally important. Automate and test your changes if you can. It's a repeated, repeatable, predictable, and easily rolled back process. WAF configurations are code. Treat them like code.

Now I promised three QR codes. Don't scan them; take a picture of them instead. One will be a blog about the new flat rate pricing plans. It's new, so we'll learn about them. We didn't cover a lot in depth about them, but they're super useful. These two other QR codes deal with Layer 7 DDoS mitigation with our new managed rule. The one blog is a launch blog in the middle, and the right-hand side is more of a deep dive into how to configure it and how to tune it if you need to.

With that, I really appreciate your time. I know it's a busy hour and a long day. Some of you are still jet lagged or worse. Thank you for your time. We would really appreciate the feedback if you can leave it in the app, and have a great rest of the week. Thank you all.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)