DEV Community

Cover image for AWS re:Invent 2025 - Managing Bots vs Humans with CloudFront and AWS WAF (NET324)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Managing Bots vs Humans with CloudFront and AWS WAF (NET324)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Managing Bots vs Humans with CloudFront and AWS WAF (NET324)

In this video, Eitav Arditti and Nick McCord demonstrate managing bot versus human traffic using Amazon CloudFront and AWS WAF through a pet adoption platform demo. They show how to implement traffic differentiation strategies: starting with basic WAF rate limiting and bot control, then evolving to non-terminating actions that add custom headers (X-is-bot) to route bots to cached content instead of blocking them outright. Using CloudFront Functions with origin modification, they implement intelligent responses including a cache-busting mechanism for 1% of bot traffic to gather insights via a honeypot Lambda function. They capture JA3 fingerprints, user agents, and IP data through async console logging to CloudWatch, enabling correlation between API requests and image rendering. The architecture leverages WAF for labeling, CloudFront Functions for traffic routing, and caching policies to serve different content to bots while maintaining legitimate user experience and reducing costs.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction: Managing Bots vs Humans with Amazon CloudFront and AWS WAF

Hello, everybody. I'm so glad to see you today. Who is having a good time so far? Yes, exactly. This is the spirit. Today we're going to talk about how to manage bots versus humans using Amazon CloudFront and AWS WAF. My name is Eitav Arditti. I'm an Edge specialist at AWS, meaning I work on a daily basis with different Edge customers and see the different patterns out there. I was fortunate enough to be an Edge customer for the last 10 years, so I'm very happy to share my experience today with you and later discuss your own challenges.

Together with me on the stage is Nick. Hi everybody, I'm Nick McCord, Startup Solutions Architect. My day job is working with startups, and I chose to focus on working with services that allow you to go global in minutes. I thought the concept of that was interesting. I've helped dozens of customers of all sizes with their CDN, WAF, and DDoS problems, as well as published a few blogs and AWS solutions.

Thumbnail 70

Thumbnail 80

A bit about the operations here. This is considered a Code Talk level 300 session. What does that mean? We're going to deep dive. Yes, we're going to have a quick short ramp up, but during the entire session, this is going to be quite complex concepts. We'll try to phase it during the session. If you have any questions during it, we will have some time after this session.

Thumbnail 100

Thumbnail 130

We're going to present a lot of code and architecture diagrams. During the entire session, both Nick and myself will have a live demo running to showcase how the different changes within CloudFront and WAF are actually implemented and actually impact our platform that we built for you. And last, Q&A. This session is recorded, so we are not going to do a Q&A during the session, but both Nick and myself will be here for the next 15 minutes after the session, so please reach out later on and we'll be happy to discuss.

Thumbnail 140

Understanding the Bot Landscape: Humans, Good Bots, Bad Bots, and AI Agents

After the operations, we can start to discuss our topic for today. So, humans. Who of those are humans? I surely hope that all of us here are humans, but nowadays it's quite challenging to understand. When all of us are trying to build a product, we aim to build it for humans, the actual legitimate users that will use our product. But since the internet began, we also saw a tremendous trend of using good bots.

Thumbnail 170

Thumbnail 190

Good bots are usually used to operate the internet. They crawl, index, and populate our websites and are eventually good for our business. But wherever there is good, there is always bad. And this time we have bad bots. Bad bots can also try to harm your infrastructure. DDoS is the most common attack that we see out there. Bad bots can also try to replicate your IP, try to steal your content, and maybe publish it somewhere else. We'll discuss those two in the session.

Thumbnail 210

And last, we must say AI in every single session, right? So, AI bots. We see an increased trend in the last two years of AI bots, and this is tricky because it's combining all the human, bad, and good bots. Some of those AI bots, those agents, are working for humans, but some of them are just trying to showcase that they are humans. So in today's session, we're going to discuss all those four topics.

Thumbnail 240

And as promised, a quick edge ramp up. Amazon CloudFront is the Amazon CDN service that's mostly discussed when you're talking about Edge. Amazon CloudFront is built from more than 750 points of presence that we like to call POPs around the globe. No matter if you want to serve cached content, which is usually used with CDN, or if you want to serve your APIs closer to the actual customers, we see a lot of customers using Amazon CloudFront. In this specific session, we are going to talk about Amazon CloudFront, CloudFront Functions, and AWS Lambda@Edge, which is our computed Edge offering at AWS.

Thumbnail 300

We're going to discuss AWS WAF as the protection layer on top of CloudFront. And last, we're going to talk about Amazon CloudWatch as a unified observability service to showcase the entire changes during the session.

Thumbnail 330

Thumbnail 340

AWS Edge Architecture: Request Journey Through CloudFront and WAF

So how do we make all of these services work together? I think a good place to start is going to be the journey, breaking down the journey of the request, which falls into four different stages. The first is the viewer request, which before the request even hits the CloudFront cache is going to be evaluated against your AWS WAF rules, which is the protection logic you have in place broadly across your site but more specifically for your edge cache.

Thumbnail 370

Next, we have our origin request. This is often commonly referred to as a cache miss, and this will happen if the content that the user has requested is not in the cache or if the TTL, time to live, has expired and needs to be refreshed. So CloudFront makes the request on your behalf to your designated origin, which then responds with an origin response. This populates the CloudFront cache, which can then be used for the viewer request and for subsequent requests from viewers that come in.

Thumbnail 390

This has the advantages of lower latency as well as a better cost profile. In addition to this, we have access logs using CloudWatch, Kinesis, Firehose, and S3. This is going to be capturing different aspects about the request once it's been sent back to the user. This could include the total processing time of this entire flow, different headers, query strings, and the various fields and aspects that are being captured by CloudFront for each request.

Thumbnail 410

Thumbnail 430

We also have real-time logs, which is going to be making use of the Kinesis data stream. This is actually going to give you those logs in seconds as opposed to the minutes that you would see with standard logs. This can be further filtered down so that you're using sampling or doing it on specific behaviors or caching policies so that you don't have the additional cost of all of your logs being ready immediately if you don't need it.

Thumbnail 440

Thumbnail 450

Thumbnail 470

Moving more deeply into WAF, here we have a protection pack, formerly known as Web ACL, and this is going to be the scaffolding for our rules, our rule groups, and everything that we want to do with layer seven protection using WAF. We have various protected resources, all of these are accessible through HTTP. This could be your API endpoints, your Cognito user groups, but compatible with various services. The smallest unit within AWS WAF is going to be a rule, and this is going to house business logic and an action.

Thumbnail 490

The business logic could be geo-based, IP rules, you could be looking at query strings. It's really going to be dependent on your workload, what you want to evaluate and what you want to protect about your workload. These rules can then be bundled up into a rule group, and what this does is allow you to have portability across different web ACLs or protection packs in order to make it a little bit easier in terms of managing the overhead of rules across multiple different WAF protected resources.

It's important to note that these are in priority. So if you hit a terminating rule on rule one and you either block or allow that request, then rule two in the rule group here is not going to be evaluated. So it's important to think about what you want that flow to be. What are the IPs or the regions that you want to block outright and you don't want to have any more advanced evaluation of, versus the ones that need to go through that entire workflow.

Thumbnail 540

So how this ends up looking is for your users, they should be passing seamlessly through. It works every time, and they're never going to have an interruption or production impact. Threat actors are going to be constantly trying to navigate around this, which is why WAF is considered something that's not set and forget. You have to continually evaluate the rules that you're adding to your WAF. You have to look at the behaviors that are coming in and then make updates to these.

Thumbnail 600

The Pet Adoption Platform: A Real-World Use Case Under Bot Attack

Amazon does help with that. We have our managed rule groups for common use cases such as IP reputation lists or the OWASP top ten, which is in our core rule set. But this is a constant journey where you're going to be evaluating and understanding what is actually hitting your workload. So a bit of background, how many animal people do we have in the audience? A few, okay, fellow animal lovers, this will play out, I promise. We built a platform out for animals, and this is largely driven by the 2024 ASPCA report where five point eight million dogs and cats in rescues and shelters were reported for the year.

Thumbnail 630

I know we say a lot of big numbers at re:Invent, so I'll put that in a little bit of context. If you were to adopt a dog every hour for 662 years, then you'd get to that number, or fill about 100 NFL stadiums. Additionally, the data shows the length of time in shelters and rescues is increasing, and last year alone they euthanized over 600,000 animals.

Thumbnail 640

Thumbnail 660

Now why are we qualified to do this? I've been an extensive participant in fostering. We both are rescue dads, and as I have mentioned, it would not be re:Invent without a little bit of AI. So the dogs are much closer, they probably could have done a better job on us.

Thumbnail 670

What we ended up doing is building out a platform for aggregating this data from the rescues and shelters. Our platform is far better because we have these relationships. It's more up to date. We have a larger pool of animals than you can find directly with the shelters and rescues.

Thumbnail 680

Thumbnail 700

Just to switch over to the UI here, we have basic search functionality, so you can look for a particular pet type. You have maintenance level. For some reason, I continually find the high maintenance ones. You have the max age, and then for demonstration purposes, we have added a bot toggle. This is basically to differentiate between the experience a bot is going to have on our website versus a legitimate human, as well as an admin panel where we can see logs on the adoption requests that are being submitted and logs that are coming in from CloudFront.

Thumbnail 730

Swapping back over here, we got a whole bunch of adoption requests. Our platform is a huge success. Well, wait a second. We have adopters being numerically incremented, and it appears that there's someone that's trying to take advantage of the data that we have, and they are saturating our adoption pipeline with these fake adoption requests.

Thumbnail 770

As much as we might think this is the outcome of that, more than likely there is another competitor in the market that is going to be taking this data and doing it for profit as opposed to the nonprofit that we are operating as. So to summarize, we've got our pet adoption platform, tons of traffic, but only certain pets are being adopted, and we operate off three main KPIs which we'll continue to come back to. The first is going to be website traffic. The second is the application review time. We have a healthy bench of adoption reviewers, and then finally is going to be the time to adopt.

Thumbnail 780

Thumbnail 820

Well, now what do we do? AWS edge to rescue. So QR code, you're welcome to play along with us. You all mostly look human, but for demonstration purposes, we are going to assume you're bots. As we're going through the different phases, you're going to see the experience that is reflective of a bot that is going to be hitting our site. So you'll see how that plays out. On the back end, Eitav and I are going to be helping each other out with AWS CDK, so we'll be doing deployments to automate this, but I'll be walking through the console to show you just how easy it is to do some of these very simple steps and into some of the code that applies to the functions.

Thumbnail 850

We did use a little help. We've set up a pretty sophisticated bot using Tor Cannon, and that is going to be both manufacturing traffic to our website, which makes the graphs a little bit prettier, but it'll also demonstrate some of the complex nature of bots that you'll end up potentially seeing with your business.

Thumbnail 870

So what's our baseline architecture? Very straightforward. S3 object for storing the animal images. This is going to go to our cache. If it's a new poll, then we need to repopulate the cache. For the adoption request, this is going to go to our load balancer backed by serverless AWS Lambda, and then we have a DynamoDB for storing those adoption requests.

Thumbnail 890

So if we look where it stands right now, no WAF. If you take away one thing, it's adopt a dog. I'm kidding. Turn WAF on. So we go over here and we look at where our phase starts. We have a good bit of traffic that is coming in and you'll notice that the number of CloudFront requests and the amount of adoption requests is all very linear. So these are unable to be distinguished between humans and bots at the moment and we aren't doing anything to protect it.

Phase One: Implementing WAF with Rate Limiting and Bot Control

So phase one, let's turn on WAF. What does that look like though? We are going to add rate limiting, which is based on a time frame that you specify, can be down to as low as a minute and then up to several. This can be based on URI, IP headers.

Thumbnail 940

And various other aspects within the request that you can use as the counter for rate limiting. In addition, we're using JA3 and JA4 fingerprinting, which relates to SSL and TLS connections. This is a hash created by a combination of request characteristics, and it's very helpful for catching subtle tricks of request manipulation as opposed to looking just at the geography where it's coming from, just at the IP, or the user agent.

Finally, we have bot control. Amazon offers a managed rule group for this, which comes in two flavors: common and targeted. This functions by monitoring, blocking, and rate limiting scrapers, scanners, crawlers, and SEO bots. How it does this is as requests go through the WAF pipeline of rules that we discussed, it adds labels to them. You can label the good bots, the SEO that you want to crawl, versus the ones that it deems to be bad.

Thumbnail 970

Thumbnail 980

Thumbnail 990

Thumbnail 1020

Thumbnail 1040

So what does this look like in our new architecture? We now have WAF coming in. To walk you through the pseudo logic that exists before we get into the console, a request comes in, WAF checks: are you a bot? If not, humans can progress through. Now let's see what this looks like in the UI. Here we have our CloudFront distribution, and you can see that we have a WAF attached, but right now it's not blocking anything. So we're going to make a very simple change by going over to our manage rules. We have the IP reputation list, rate limiting here, bot control, and add bot label. We've made it easy for ourselves and consolidated all of these to basically add a label to the request, which we then act on at the end. That's the take action here, and we're going to change it from count to block and save the rule.

Thumbnail 1050

Thumbnail 1080

Thumbnail 1120

Now if we go over to our UI, I'm a normal human and I can keep searching for all of the pets that I would like. And then a bot—oh no. Now, any thoughts on how this might reflect in our logs? Well, if it's a simple bot, they'll just give up and find a new target. However, ours are a bit craftier. If we go into what this has done for WAF blocking, we've actually seen an increase in the amount of requests. What we've seen is that for the people using bots on our website, they have detected that they are being blocked with 400s, and they're trying alternative methods. They're using different request types, different paths, and saturating across different numbers of IPs. We see the CloudFront requests go up, so they see it's blocked and they're trying harder. They're not going to give up easily. What this is actually doing is we have more traffic flowing, which is a greater cost to us. We have more requests being submitted, and it's harder for us to track. So we've actually made it a little bit more expensive for us to do this.

Thumbnail 1150

Thumbnail 1170

Thumbnail 1180

Phase Two: Using Non-Terminating Actions and Custom Headers for Smarter Bot Management

How does this impact our KPIs? We have an increase in website traffic, but not in a good way. We have application review time that has been increased because they're still trying to do it. The time to adoption hasn't really changed. So how can we be smarter as the implementers of the WAF logic? Well, instead of giving them a very clear stop sign saying you're not allowed here, we can change from using terminating actions of block to non-terminating actions. There are captures and challenge, which some of you may be familiar with. We're specifically going to be using count, and then we're going to enhance that by adding a custom request header to those requests that are coming in and deemed to be bots.

Thumbnail 1190

So a bot is going to come in, we'll add a header to it, and then make use of a caching policy so that if that request is deemed to be a bot, we are only going to show it a segment of our platform. This way we can narrow down the amount of animals being scraped and we have control over what is being pulled from our site. This makes the data less valuable to the bot.

Thumbnail 1210

Thumbnail 1220

Thumbnail 1240

Thumbnail 1260

Thumbnail 1270

Thumbnail 1290

What does this look like in the architecture? We now have two different paths. The request comes in, and instead of outright blocking it, we are going to add the X-is-bot header set to true, else do nothing. And then as it goes through and hits our caching policy, the bot is going to be redirected to our CloudFront cache. Swapping over here, after that we're going to look at our distribution. Right now we have the different caching behaviors. We're going to our search path pattern and edit. We've preconfigured the bot policy here. So if we take a look at that policy, we can see the different aspects of the query strings. That's a part of it. I get to copy this. Amazon WAF is going to preface the headers that are added as part of these WAF rules with the x-amzn-waf prefix. Then we go back over here. This is going to be saved to our distribution. And then in the WAF rule for our action, instead of blocking here, we'll move over to count and do a custom request.

Thumbnail 1300

Thumbnail 1310

Thumbnail 1320

So now if we go back to what the experience for our bot looks like, humans still have access. Swap over to a bot. They have access, but the same access. I can keep searching, keep searching, and it's going to be the same content that the bot is getting, whereas if I'm a human, I can see hamsters, I can see bunnies, I can see all of the available animals on our platform. So now we've restricted what the bot is able to see only to a much smaller portion.

Thumbnail 1360

What does this mean for our KPIs? Well, overall, the deployed changes go down, the CloudFront requests go down. The WAF blocked traffic is none anymore that's blocked, and our Lambda invocations have decreased. Website traffic is now at an acceptable and appropriate level for what we would expect for our platform. Our application review time improved. We can work with the volunteers to say, hey, this segment of animals is the ones that are in the cache and are going to have known bad adoption forms, and then the time to adoption for a vast majority of our animals has decreased.

Thumbnail 1390

Thumbnail 1400

Thumbnail 1420

Advanced Bot Handling: Origin Modification and CloudFront Functions

So while those are great metrics and improvements in our application, does it actually impact what we try to solve? If I go back to our platform and repeat what Nick just mentioned where humans get new pets every time, but bots get the same pet, this is going to put us in a different challenge because now bots are getting the same cache response and they'll probably try to adopt the same pet over and over again. So we just moved the problem from eating all the different pets to eating only a specific pet, and both Nick and myself love all the pets. So we need to find a way and solution to somehow not impact any of the pets during our research.

Thumbnail 1430

Thumbnail 1460

We're going to present a couple more approaches on how you can customize even further the bots' traffic. So we believe that different traffic deserves different responses. And we're going to use a couple of concepts within CloudFront Functions that will help us provide different responses to bots, but still doesn't signal back to the bots that they are being blocked. First, we're going to use origin modification. Origin modification is quite a new feature within CloudFront Functions that allows you to override the traffic from one origin to a different origin. This is usually being used for A/B testing, where you want to split the traffic between different origins, but this can also be used if you get a signal from upstream, like the WAF upstream, and route the traffic to a different origin.

Thumbnail 1490

Moreover, we're not just going to reroute the bot traffic to a new origin, we're going to route traffic to a new origin that's using an intelligent response that will allow us to respond with a different path.

Thumbnail 1520

This intelligent response will allow us to get more insights about the attackers and maybe understand why they tried to attack us, unless we are going to use a cache busting mechanism. So Nick implemented a caching for bots, meaning bots are going to get the same response over and over. But this has some caveats, because if you are going to respond to bots from caching only, you are going to lose the insights about the bots.

So using CloudFront Functions, we are going to implement a way to override some of the caching policies and control it according to your business logic. In our example, we are going to cache bust for only 1% of the traffic, but you can do it for as much percentage as you need. So many of those features are related to CloudFront Functions. I am going to do a couple of minutes on CloudFront Functions.

Thumbnail 1560

CloudFront Functions are a lightweight compute running on the edge. This is basically a simple JavaScript file that you can write, and it will run between the WAF and your region. Meaning the WAF happens first, and only then the CloudFront Functions run. This allows us to read the adds that the WAF is adding.

Thumbnail 1590

Thumbnail 1600

Thumbnail 1610

Thumbnail 1620

Thumbnail 1630

The functions are based on a simple JavaScript. So you can use variables like strings and numbers. You can define your own functions. You can use JavaScript native methods such as split, slice, lowercase, and many other JavaScript native methods. You can use more advanced methods like buffers and crypto if you want to create your own authentication within the edge. And you can also do asynchronous operations using Promise and async/await. Of course, every JavaScript program should support console.log.

Thumbnail 1650

Every time that you are doing console.log within CloudFront Functions, it will automatically ship to CloudWatch Logs in US-East-1, so you can later check those logs. And you control the return. The controlling the return is quite important. If you return a request object, this tells CloudFront to just proxy the request later on to the origin. But if you return the event.response, this will tell CloudFront to terminate the request and skip the origin altogether. So if you want to do a very quick blocking on the edge after the WAF, you can do it using CloudFront Functions.

Thumbnail 1680

Thumbnail 1710

The event structure is important to understand. We have the context, which is like a general metadata of the distribution. So we have a request ID, which is a CloudFront request ID and is unique per request. We have the distribution ID and the event type. CloudFront Functions can be hooked to the viewer request and the viewer response. So you can write your own code according to the different event type.

Thumbnail 1730

Thumbnail 1740

We have the viewer within the event, and this is basically the device that got connected to CloudFront. In the vast majority of the cases, it will be our mobile devices, our laptops, or server-to-server connections, and this will represent the actual IP of those devices. We have the request, and the request is also modifiable, meaning you can read the query string, the URI, and the cookies, but you can also manipulate those using CloudFront Functions.

Thumbnail 1780

And the same as the request, you will also have the response when we can manipulate the status code, status description, headers, cookies, and overwrite the body altogether. And as I mentioned before, this is quite important to understand if you want to terminate the request or just proxy to the region. In order to use origin modification, we are going to use a new helper that you should import CF from CloudFront in your own code. This helper allows us and exposes three different methods to us. The first one is update request origin. And this allows you to override any kind of origin parameters like the host, the entire URL, the headers, the TLS configuration. You can simply override whatever you need. So this will allow us later to route between bots and humans origin.

Thumbnail 1810

Thumbnail 1820

You can also use the select to request origin by ID, so you can deploy pre-made entire origin configuration. You just state the ID that you want to use, and it will auto import the configuration from there. You can also use the request origin groups, which gives you a native failover mechanism. Within the origin groups, you can define two different origins, and CloudFront will try to fetch the data from the first origin. If the origin fails to respond, for example, giving HTTP 500 or 400, CloudFront will automatically reply to the second origin, and this happens without your clients being aware of it. This is a native failover mechanism over CloudFront.

Thumbnail 1860

Thumbnail 1870

Thumbnail 1890

Thumbnail 1900

Live Implementation: Traffic Routing, Honeypot Strategy, and Enhanced Bot Intelligence

Now, let's look at how it's all going to be connected together and how the architecture is going to change. We are going to introduce a new router that we call a traffic router. As I mentioned, this happens after the WAF but before our origins. This will allow us to create the custom mechanism that our business needs. When a request hits CloudFront, we will first get to WAF. If WAF detects a bot, it will add the same header that was added before. But instead of just going to the cache, this new header is now exposable within CloudFront Functions. We are going to implement a change to route all bots to a different origin, and for 1% of the bots, we're going to eliminate the caching. As I mentioned before, you can do 1% or more than that. After that, we'll redirect the traffic to the cache.

Thumbnail 1930

Thumbnail 1940

Thumbnail 1950

Thumbnail 1960

Let's see it in action. In order to associate the function for CloudFront, you will need to choose your behavior. If you scroll down in the behaviors, you will have four different hooks where you can associate a function or Lambda@Edge. I'm going to associate it for the viewer request. I'm going to choose a CloudFront Function, and I'm going to choose a function that I pre-made. We will see the code in a second. I just deployed it very quickly, and this is all you need to do in order to have compute running on the edge.

Thumbnail 1980

Thumbnail 2000

Let's check the actual code and how we implemented the different aspects that we see for CloudFront Functions. This is basically the handler in about 20 lines. First, I extracted the request from the event. This is the request and these are the headers. I have created my own bot detected function that checks the headers. This function is right here, and it's a simple function. It simply checks if the WAF header that we added before is equal to true, meaning this scope of code will only run for bots. For human users, this altogether will be skipped.

For bots, we are going to randomize a number between 0 and 100, and only for 1% of the traffic we're going to implement a cache buster. Again, we can do 50% and do some kind of A/B test for it. The way I have implemented the cache buster is by randomizing a new value and overriding the header according to it. The cache policy is working such that from a caching perspective, each time we randomize the number, it sees this as a whole new request and keeps the caching separate. On row 24, I've used the method of select request origin by ID, and I've used an ID of my origin that I pre-created for the bots. I'm going to copy paste the ID here and let's see where it's being configured.

Thumbnail 2080

Thumbnail 2090

Thumbnail 2100

I'm going back to the CloudFront distribution. I can see in the origins tab that I now have the ID that I used before, and this is a Lambda function URL. You can create your own origin, have them pre-made and deployed, and simply map them in CloudFront.

Thumbnail 2120

Thumbnail 2130

Thumbnail 2140

Let me show you how this impacted our platform. I am currently a regular human, and I'm searching for pets with everything working as expected. The moment I switch to being a bot, I start to get bot pets. Those bot pets are not going to be impacted if any kind of bot tries to adopt them. You can also check this out using your mobile phone with the same URL from before, and hopefully you will see the same different bots that I see.

Thumbnail 2160

Thumbnail 2170

Thumbnail 2180

Thumbnail 2200

Thumbnail 2210

Checking out the network here, you will see that the response is from CloudFront. This means it's being cached, but only 1% of the traffic is being cached. If I check the admin panel, I have a couple of flags already here. As I mentioned, we are not just going to respond with a different response to the bot. We want to fetch some information about those bots. So we created a simple login UI, and as you can see, we now have information on the route the bot is trying to hit, the bot API, and the different user agents. In this example, some are using PhantomJS, but some are using actual user agents. We have the JA3 fingerprint, and this is all the metadata that I wanted to implement.

Thumbnail 2220

Thumbnail 2230

Thumbnail 2270

Let's see how it impacts our CloudWatch. I'm going to check the honeypot routing. Let me refresh it and look at the last 50 minutes. We can see here that our cache hit rate decreased a bit, and this happened because now 1% of the bot traffic is having cache misses. But we introduced a new function. It might be a bit small, but on the third row, we have the honeypot Lambda invocation, and you can see it being hit about 4 times. So for all the traffic we are sending from the bots, only 4 bots actually got hit by the honeypot region.

Thumbnail 2280

Here we have a more simplified way to look at it. The vast majority of our metrics remain the same. We have the same amount of requests coming to the system and almost the same cache hit rate because only 1% of the traffic is being cache missed, and this is by purpose thanks to our CloudFront function. We introduced only a very small amount of new Lambda that is being served with new content to the bots. So the cost impact for it is going to be very minimal, and this can also be adjusted. If you want less than 1%, you can also do 5% of the traffic being cache busted. This is highly customized for your own business needs.

As always, this is not enough. We wanted more. So we thought about how we can get any new insight for our bots. Instead of just responding with new JSON or a new patch to our bots, we want to respond with images to the bots. As we saw in the platform, the images are being rendered with the browser. So we thought how cool it could be if the browser that's rendering the images will send us new metrics, and then we can correlate between the bots that are making the fetch request of the APIs and the bots that are actually rendering our images.

Thumbnail 2400

So even here in our static routes, which usually is just being used for caching and proxy to S3, we introduced a new CloudFront function. This CloudFront function is quite simple. All it's going to do is when a bot is going to render an image, the CloudFront function is going to do an async console log and ship the metadata to CloudLogs.

Thumbnail 2410

Thumbnail 2420

Thumbnail 2430

Thumbnail 2440

Thumbnail 2450

Thumbnail 2460

Thumbnail 2470

We spawn the same image so the bots don't even know that something happened behind the scenes for them. You can see it in the demo. As before, if I want to associate a new function, I'm going back to my distribution, then to my behaviors. Now I'm going to check a different behavior, which is the static PNG's path. I'm going to add this new function and save it. Let's see the function. The code itself is going to be even simpler than the previous one. So let's see here. This is my handler. On the first three lines, I'm extracting the request metadata. Then it simply logs a JSON with all the information from the request that is important for me as a business to better understand and analyze the traffic that is coming.

Thumbnail 2490

Thumbnail 2510

In this case, we want to make a correlation between the search request and the image request. So I log the same metadata, such as the user agent and the client IP, but also the JA3 fingerprint that is pre-calculated by CloudFront, so you're getting it free of charge within all the requests. I also console log the device type, so I can know if it's a desktop, mobile, iOS, Android, and so on. I also have the geolocation. Again, this can be highly customized by your own business needs. So I'm getting back to the platform. I'm doing requests as a normal human. Everything is working fine. Then I'm switching to bots, and still, those bots are being responded with their own images, but each time that this image is being requested, now we should have a log for it.

Thumbnail 2530

Thumbnail 2550

Thumbnail 2590

Let's see if it's there. Already fetched some logs, yes. So I will zoom in a bit so you can see that instead of just having my API log, now I have the log for the bots' images. So I can see that it's a different client IP that is actually rendering the data versus the kind of IPs that are fetching our APIs. Same goes for the user agents, same goes for the JA3 fingerprints. So this makes us understand that there is a machine probably making the request and fetching the API for us, but this machine is trying to publish our content across the globe, and there are actual custom browsers out there that are rendering the image.

Key Takeaways: Traffic Awareness, Simplification, and Creating Synergy

So now we can even do our rules even more complex than that, because if we see a JA3 fingerprint that is repeated as a rendered image or maybe as an API search, we can now block this specific JA3 fingerprint or maybe create a new rate limit. Basically, we just expose many more metrics out there and give your business the ability to control the traffic accordingly. We thought it would be best to simplify the architecture a bit. I saw many of you taking pictures of the architecture that we just showcased, and this is great. But we want to talk about the high-level concept because every different product is built a bit differently. So I want to discuss how you can implement it with your own business.

Thumbnail 2670

Thumbnail 2700

We used WAF in our case in order to label the traffic, unlike the vast majority of the usage of WAF, which is blocking. If you use WAF for labeling, now you have the upstream that can react to it. So we use CloudFront Functions to react to it, but if you have your own origins behind the scenes, your origin can be exposed to those labels as well. Within the CloudFront Function, we use modification. So now the CloudFront Function knew if it's a bot or not a bot and could react differently to those.

Thumbnail 2710

After the modification, 99% of the traffic still got served by the cache, because that is what we wanted as a business. The rest of the 1% got split into two different regions. This is a more trivial architecture that you mostly see in different patterns, but again, you don't need to replicate it step by step. Just take different concepts that apply to your use case.

Thumbnail 2740

So the takeaway that I want you to take back home when thinking about your architecture is split into three parts. First and foremost, you need to be traffic aware. You should ask yourself: do you really know what goes on within your system? In our case, we used labels for bots versus humans, but this can be for any business-related topic. It can be different tenants, different tenant tiers like premium or standard. This can even be sales-related, like if you want to route traffic from the EU to an EU region, you can do it using CloudFront.

Thumbnail 2790

So think within your engineering team and your product team about what kind of different metadata is related to your business. We showcased bots, but this literally can be anything else. Second, simplify. All the changes that Sneak and I made during this live demo were done only using CloudFront and AWS WAF. We haven't introduced any change to our application. So think about how much of your complexity you can move to the edge without actually impacting your engineering efforts. Consider if there is maybe a workload that you're currently doing in your own regions that you can simplify and decouple to run on the edge.

Thumbnail 2820

Thumbnail 2850

And last, create a synergy. Using AWS services is not just about selecting one component and using it in isolation. In our use case, we used WAF, CloudFront Functions, and CloudFront Cache. Rethink about each one of those components and what value it brings to you. Now you can multiply this value by connecting all those components together. This re:Invent week is full of content delivery sessions. We have packed all of them for you so you can take a screenshot. I know that some of those have already passed, but most of them are being recorded, so you can use them later when you want to pick some of that content.

Thumbnail 2880

Moreover, we curated three articles for you. The first two are different articles that will deep dive into the concepts that we discussed today about how the flow of requests happens within CloudFront and the different approaches to using origin routing. The last QR code is actually about a pricing change that was released two weeks ago. This is called a flat-rate pricing plan, and it ranges from $0 a month up to $1,000 a month as a fixed price. You get CloudFront and WAF out of the box, so this gives you a very quick way to start using those different components that we just discussed with a very low price point and scale from there.

Thumbnail 2940

Thumbnail 2980

My ask for you before you leave this room is to do something good for the pets out there. If you open the AWS Events app and provide feedback for this session, for every piece of feedback we receive, we are going to donate to a pet organization, a non-profit pet organization. We really appreciate your honest feedback for this session. This will highly help us later to shape this session, but it will also allow us to help other pets out there. So please use the AWS Events app to provide feedback for this session. Thank you, folks. I was Nick. Ever happy re:Invent week. Thank you.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)