🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Scaling Multi-Tenant SaaS Delivery with Amazon CloudFront (NET316)
In this video, Sagar Desarda and Bhagirath Gaonkar from AWS, along with Ryan Neal from Netlify, demonstrate how Amazon CloudFront SaaS Manager solves multi-tenant content delivery challenges at scale. They explain tenant isolation strategies (siloed, pooled, and tier-based), introduce key constructs like multi-tenant distributions, tenant parameters, cache policies, and connection groups, and showcase HTTP validation for automated SSL provisioning. The console demo walks through creating multi-tenant distributions and configuring tenant-specific settings, while Netlify's real-world implementation demonstrates automated tenant onboarding in under 10 seconds using APIs, serving 6 million websites with isolated security and routing per tenant.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Introduction: The Challenge of Scaling SaaS at the Edge
Good morning, everyone, and welcome to day 3 at Reinvent. What I'm about to describe right now is something that we keep hearing from SaaS customers over and over again. A SaaS platform lands a big global enterprise customer, and overnight they need to support thousands of domains with strict security and uptime demands. Then as SaaS architects, you've all felt that pressure to deliver a fast, reliable, and scalable experience without tripping over operational complexity. So today we'll explore how Amazon CloudFront and especially CloudFront SaaS Manager offers an answer to that challenge.
My name is Sagar Desarda. I lead teams at AWS that work with our SaaS customers, particularly those building data intensive and AI driven platforms on AWS. I'm joined today by Bhagirath Gaonkar. He's our product manager from our CloudFront team, and he led the launch for CloudFront SaaS Manager. So if you have any tough questions after, he's our guy to ask. We are excited to have one of our customers join us today. We have Ryan Neal from Netlify. Ryan was the first engineer hired at Netlify, so if anyone can tell us what life looked like before and after CloudFront SaaS Manager, it's going to be him.
So we are thrilled to have all of you here. Let's dive in. Here's what we'll cover today. I'll start with why running SaaS at the edge is hard and what makes multi-tenant delivery so challenging as you scale globally. Then we'll look at tenant isolation strategies and how those choices impact performance and security. From there, I'll dive into CloudFront SaaS Manager and walk you through how it simplifies your tenant onboarding experience. Bhagirath will then take over and run a quick demo in the AWS management console.
Next, we'll have Ryan show us how he's automated this tenant onboarding experience for Netlify, which enables him to onboard new tenants in seconds, literally seconds. We'll wrap up with a quick call to action, and yes, that's the part where I get to give you all a little bit of homework. This is a level 300 session, so we'll assume that you have a solid understanding of Amazon CloudFront fundamentals like cache behaviors, origins, and distributions and request flow. We're going to build on that foundation, show how that architecture comes together, and give you some implementation guidance that you can take back.
Why Multi-Tenant Content Delivery is Operationally Complex
Amazon CloudFront is AWS's content delivery network that enables SaaS providers to deliver multi-tenant applications with low latency edge optimized routing, and it supports tenant isolation at scale. By integrating Lambda@Edge and CloudFront Functions, you can execute real-time logic at the edge. You can customize authentication, customize your routing, and serve content based on tenant context without sacrificing performance.
But in multi-tenant platforms, we don't just need speed. We need to ensure that every tenant's data, routing, and experience is isolated and secure. So the question becomes, what does multi-tenancy even mean when we're serving content at the edge? Each of your customers becomes a tenant with their own domain, certificate, cache policy, compliance requirements, and routing needs. You're no longer managing content delivery for one brand; you're now managing delivery as a service.
In a SaaS platform, CloudFront becomes your edge layer, but not just for you, but for every tenant that you're going to support. You can use one or a few CloudFront distributions to serve thousands of tenants. This means that the edge is not just about caching static assets anymore. It's about tenant isolation, domain management, and security at scale. When you're small, this looks simple, maybe a handful of tenants with one shared configuration. But when you're serving thousands of tenants from one distribution, everything multiplies: your certificates, your DNS entries, your cache policies, your custom headers. Suddenly the edge becomes your operational bottleneck. It takes just one misconfigured header or an SSL certificate to cause chaos. That's not because the CDN is not capable, but because you're pushing it to act like an application platform.
Approach one is a single tenant approach where every customer receives a dedicated CloudFront distribution with their own SSL certificate. However, the problem with this approach is that CloudFront distributions are not meant to mutate thousands of times in a day. Each update to the distribution takes minutes to propagate globally, so if you treat each tenant as a new distribution, your deployment pipeline really slows down.
With approach two , if you use a shared distribution, you now face the opposite problem. How do you isolate tenants logically while sharing one CloudFront distribution? How do you achieve both speed and isolation? Things can go wrong, but why is this hard at the edge? Let's dig deeper into this challenge.
SaaS at the edge is hard because your control plane and data plane multiply together . When your control plane pushes a single change, that affects your thousands of customers. When your data plane handles a million requests, these requests belong to your thousands of tenants, and they all have their own unique rules. You need precision, automation, and isolation. Without that, one small misconfiguration could invalidate everyone's cache or break SSL for multiple tenants. To top it all off, you're trying to do this in one CloudFront distribution that is a blessing for simplicity, but then it's hard to scale with that.
The Streaming Profile Problem: Understanding Cross-Tenant Isolation
In CloudFront, as you all would know, each viewer request comes in with a host header, and that's how CloudFront knows how to route which tenant it will serve . CloudFront will inspect that header, apply the right routing logic, and forward it to the correct origin. The problem here is that you're no longer managing one behavior in a multi-tenant architecture. You're managing thousands of microbehaviors with one CloudFront distribution.
It's like when your whole family shares one streaming profile. Everyone has their own taste. My daughter loves animated shows, my wife is into K-dramas, and for me I just want to watch a good action movie in peace . But because it's one shared profile, the streaming platform thinks you're all the same person, and now my topics are just a wild mix of Peppa Pig and serial killers and Korean dramas that I've never heard of. The worst part is that one wrong click and someone watches one random episode of a reality show, and boom, the recommendations are ruined for everyone.
That's exactly what we're trying to avoid in SaaS. Because in a real multi-tenant system, if one tenant's weird recommendations spill into another tenant's experience, we don't just get a messy homepage. We get cross-tenant data exposure, shared cache pollution, and performance contention . You could be getting a very angry customer escalation at two in the morning, and none of us want that.
This is where SaaS architecture really begins . Isolation is the foundation of SaaS reliability. How we isolate the tenants logically, physically, or by resource boundaries determines everything that we do downstream. How do we route the requests? How do we enforce access? How do we scale? How do we detect noisy neighbors? How do we make sure that one tenant does not accidentally impact the experience for your other tenants?
Tenant Isolation Strategies: Siloed, Pooled, and Tier-Based Approaches
With that as context, I'll talk about tenant isolation strategies that form the backbone of modern multi-tenant platforms and architectures on AWS . There is no one-size-fits-all model for tenant isolation. You can choose your pattern based on your scale, your complexity, and your tenant expectations. Some teams prefer siloed isolation with dedicated distributions. They offer stronger security boundaries, but they come at a higher cost and management overhead. The upside is that you get maximum isolation, you don't have noisy neighbors, there's predictable performance, and it comes with the highest security. The downside is that it's expensive, it does not scale well, and what happens if you have hundreds and thousands of tenants? You're essentially running a mini SaaS for each customer.
This is typically reserved for your high-value or premium clients who have strict security, compliance, or performance requirements. The other teams prefer pool isolation. In this model, tenants share resources in a pool, but you use logical isolation through mechanisms like namespaces, tenant IDs, and per-tenant quotas. It's like a co-living space with a rulebook where everyone shares the kitchen, but each tenant has their own locker to keep their stuff. This model is highly efficient and scalable, but the trade-off is that a noisy neighbor can sometimes impact performance for other tenants if the isolation mechanisms are not well tuned.
Modern orchestration frameworks like Kubernetes provide fine-grained control over how tenants share resources. You can isolate workloads based on namespaces, role-based access control, network policies, and per-tenant quotas. You can combine this with AWS App Mesh or IAM boundaries to achieve logical isolation that is both efficient and secure. This is where modern cloud-native design really earns its value, giving you scale without sacrificing safety. Finally, there is tier-based isolation, which combines both approaches. High-risk or high-value premium customers or tenants get siloed resources while the rest are pooled. This is a way to optimize for cost and performance without sacrificing security or SLAs for tenants that demand it.
In practice, which strategy you use is not just about technology. It is a business decision that balances your risk, cost, and customer expectations. The takeaway here is that your isolation strategy is foundational. It affects how you scale, how you handle failures, and ultimately how happy your customers are. If done right, this is what allows you to run multi-tenant SaaS at massive scale without sacrificing control or security. Once you've decided how to isolate the tenants, the next thing you're looking at is how to route them efficiently.
Introducing CloudFront SaaS Manager: A New Mental Model for Multi-Tenancy
At the core of every SaaS platform running on AWS or Cloud Run lies a critical responsibility of tenant resolution, which means mapping an inbound request to a tenant configuration at the edge without hitting your origin unnecessarily. Routing is where the edge really earns its value. You want every request to find the right tenant origin in microseconds with zero latency overhead. So far we've seen how to route and isolate tenants at the edge, but as the tenant count grows, so does complexity. You're now managing domains, certificates, and WAF rules, and this is all happening per tenant at scale.
As our customers have been building and scaling SaaS platforms on AWS, we kept hearing a consistent set of architectural challenges from them. Those patterns became the foundation for how we designed CloudFront SaaS Manager. First, you have hundreds or thousands of customers sharing the same platform, but each one needs some degree of uniqueness. You want to reuse your shared infrastructure and resources efficiently, but you still want to isolate behaviors per tenant. The white box here shows your shared infrastructure configuration, and ideally all of this would live in one CloudFront distribution.
On top of that, you have different tiers of customers: basic tier, premium tier, and enterprise tier. All of these tiers come with their own unique requirements. Some have stricter security requirements, varying SLAs, different subscription tiers, and unique caching needs. All of this needs to be enforced independently. You need a way to manage each tenant's configuration separately without creating massive sprawl of CloudFront distributions. Trust me, no one wants to manage hundreds of CloudFront distributions—that would be a nightmare.
So the mental model here is that we need a single config for your shared infrastructure, and then you want to be able to maintain separate tenant-specific configurations while enforcing isolation. Ideally, you want both your shared infrastructure config and your tenant-specific configurations consolidated in a single SaaS configuration, keeping things simple, centralized, and manageable. Those tenant configs, the boxes that you see at the bottom, can simply inherit whatever they need from your main SaaS config, which is your shared infrastructure config. You don't have to redefine those rules over and over again.
With CloudFront SaaS Manager, we take that mental model and turn it into concrete constructs. You have your multi-tenant distribution, your per-tenant distribution, and then tenant scope parameters and rules. Together, these constructs let you safely isolate tenant behavior, enforce security policies, and help you scale your platform. This is the foundation that makes per-tenant isolation not just possible, but also seamless with CloudFront SaaS Manager.
Multi-Tenant Distributions and Distribution Tenants Explained
What you're seeing here is a multi-tenant distribution, which is essentially a single CloudFront distribution configured to serve multiple tenants of a SaaS application simultaneously. You can now use one distribution that handles tenant-specific behavior through routing and caching logic at the edge. This multi-tenant distribution acts as your baseline config for your entire SaaS config. The tenant config can then apply targeted overrides for behaviors, routing, and security policies that may differ per tenant or for a group of tenants based on how you're set up.
We define distribution tenants as individual customers or logical entities that a single CloudFront distribution serves. Even though multiple tenants share the same distribution, every tenant's request, cache entries, and experience are all logically isolated. You identify the tenants using domains, subdomains, headers, cookies, and URL paths, and you can use that to route the traffic accordingly.
This setup allows you to scale efficiently, supporting hundreds or even thousands of tenants while still enabling tenant-specific logic at the edge. This includes real-world examples like tenant-aware authentication workflows, customized branding or personalization for your tenants, and tier-based performance rules. This lets you deliver fast, isolated experiences for every tenant, all from a single CloudFront distribution.
This approach removes a lot of operational friction. Developers can stay focused on building tenant logic, and SaaS Manager can take care of provisioning and keeping configurations consistent, with no more drift or manual cleanup. Because everything runs through APIs and event-driven workflows, onboarding a new tenant goes from hours of manual setup to just minutes or even seconds. It's fully automated, repeatable, and auditable.
Key Components: Parameters, Cache Policies, Connection Groups, and Certificate Automation
Now that we've looked at how CloudFront SaaS Manager enables multi-tenant distributions, I'm going to step through an actual flow of the request. When a new SaaS customer is onboarded, it kicks off the tenant onboarding service at your end. Your service starts spinning up tenant-specific resources like your compute, storage, database instances, or even your per-tenant configuration objects supporting the tenant isolation and scalability expected of your hardened SaaS platforms.
The next step is to configure CloudFront rules, and this is where we start making the distribution behave differently per tenant. One of the ways we express those per-tenant differences inside a multi-tenant distribution is through parameters. Parameters are tenant-specific key-value pairs that let you customize routing and set your origin behaviors dynamically without touching the underlying code for each tenant. Before parameters, if you had routing requirements based on user attributes, then I'm sure you all would have faced challenges today.
You would typically use either Lambda@Edge or CloudFront Functions, managing all of that for every tenant in a single CloudFront function or Lambda@Edge function and dealing with all the operational overhead, not to mention the extra cost associated with running these functions. Now, you can configure these parameters as variables in the multi-tenant distribution. As you can see here, I've defined a parameter called tenant1 in my multi-tenant distribution, which is my main config. Then I use that parameter as I configure my distribution tenant. In the distribution tenant config, I'm assigning a specific value. In this case, it's "golf carts" to that same tenant1 parameter. This lets me leave my distribution config untouched, and by setting the parameter value in the tenant config, CloudFront SaaS Manager can resolve the updated origin path at runtime for that specific tenant.
Next, let's look at how you control what gets cached and how it gets cached for each tenant. That's where cache policies come in. A cache policy in CloudFront defines what part of the requests, like headers, query strings, or cookies, are included in the cache key. For SaaS platforms, this means you can cache the content separately for each tenant by including tenant identifiers. Take scenario one as an example. With cache policies, I can carve out separate cache entries for each tenant by including the X-Tenant-ID request header as part of my cache key. So tenant 123's analytics data stays completely isolated from tenant 456, and both still get the benefit of caching repeated requests for their own users.
Next, I'm going to talk about another important concept that we introduced in CloudFront SaaS Manager. It's called connection groups. Think of connection groups as logical partitions inside your CloudFront distribution that control how end user connections are managed in a multi-tenant setup. Traditionally, CloudFront reuses the TCP connection across all end user requests to improve performance. But in a SaaS environment, that means requests from different tenants could all be in the same connection pool. With connection groups, you can now set boundaries so that the connection pools can be isolated for a group of tenants or even on a per-tenant basis if you have the right use case for it.
In this example, both tenants use the same CloudFront distribution, but each one is routed through a different connection group. Tenant A is locked to a small set of static anycast IPs, which is perfect for customers who have strict firewall IP allowlists. Tenant B uses a Quadstack connection group so they can reach CloudFront over both IPv4 and IPv6. Even though the distribution is shared, CloudFront SaaS Manager makes sure that each tenant connects through its own network path that's tuned to its requirements. You don't need to do any config cloning or forking, and there's no fear of having any cross-tenant networking side effects.
The next step is step 3, which is where a new tenant is onboarded and CloudFront can request a certificate from AWS Certificate Manager, or ACM. As a SaaS provider, you often face the limitation that you might control a few domains, but the majority of the domains are actually controlled by your end customers. This makes standard certificate provisioning tricky because you can't rely on DNS validation since you don't control their DNS. CloudFront solves this problem now by integrating automatically with ACM. To address the DNS challenge, we introduced HTTP validation, which lets you fully automate certificate provisioning even for domains where you do not control the DNS for your end customers. You can now provision certificates end to end without involving your end customers at all, and that keeps your entire tenant onboarding flow fully automated. In a bit, Ryan is going to dive into the APIs and show us how he's got this set up for Netlify.
The final step involves DNS. So that's step number 4. At this point you can now set up Route 53 records or other DNS mapping pointing the new tenant's custom origin to CloudFront. That covers the key concepts we spoke about: multi-tenant distribution, cache policies, and connection groups. Now to bring it all to life, Bhagirath is going to walk us through a quick demo in the AWS console before we bring Ryan on stage. Thank you, Sara.
Console Demo: Creating Multi-Tenant Distributions and Configuring Tenants
All right. In this demo, what I'm going to walk you through are three things. First, we're going to create that multi-tenant shared settings configuration that Sagar just spoke about. We'll create our first tenant. And then also look at how we're going to associate connection groups with specific settings for routing to that particular tenant. Let's get started. I'm going to go into the AWS console into CloudFront. I'm going to click on create distribution. That gives me options. I'm going to select the multi-tenant architecture.
At this point, you'll notice that there's a dropdown for a wildcard certificate. That becomes really interesting because a lot of our SaaS customers told us that they usually rent subdomains to their customers, whether they're their free tier customers or they're enterprise customers. They all get a free subdomain. So when you add a certificate at the multi-tenant distribution level, that gets inherited to all of your tenants. Typically this is a wildcard certificate so that you can rent subdomains. It could be a SAN certificate. It could be any kind of certificate you want, but the most common use case is a wildcard certificate.
Next up we're going to talk about the origins and the parameterization that we learned about. Parameters are essentially variables. These values are defined at the tenant level, but the definition of where the variable is and how it's going to be exposed is defined at the multi-tenant distribution level. This is particularly useful, especially when you have patterns. In my example here, each customer has a separate path or folder within my S3 bucket for their content. I can define the path by a variable in this multi-tenant distribution, and when I create my tenant, I can tell CloudFront what's the path for each tenant.
I didn't change any recommended origin settings, but if I had some specific settings I wanted to, I could do that at this stage. Now you've got that baseline, you've got the origins, you've got that certificate set up. Let's talk about security. At this point you have the option of choosing an existing WAF rule or Web ACL and associate that with this multi-tenant distribution, or you can create a new one. Either option works. In this example I've created an existing Web ACL. It has the settings I want to be inherited by all my tenant domains, and that's what I'm associating over here. I'm going to review these settings, and we've completed our multi-tenant distribution setup.
Next, let's go into a tenant. So when you want to create a tenant, the first thing I'm going to do is add a name. This is just a tag. It's just for me to remember this easily and search for it. When tomorrow I come in with 100 or 1000 tenants, I need to know what this is about without really having to remember the exact domain. You'll notice that the current multi-tenant distribution that we had chosen is already selected in that dropdown. Later I could choose to move this tenant to a different multi-tenant distribution.
Do you remember that Sagar talked about this isolation strategy where you have tier-based tenants? You have a free tier, a premium tier, and a silver tier. When your customer graduates from the free tier to a premium tier, you can switch them from one distribution to the next. That way you can enable additional settings at each tier. A good example of this would be something like Origin Shield. You probably don't want to give all your free and hobby tier customers access to that, but when they graduate to being paying customers, you're giving them additional value and performance by enabling Origin Shield on that multi-tenant distribution.
The one interesting piece and the good part about how we designed this was we learned when we were talking to Dana and Ryan here at Netlify that they are doing this often. There are people who graduate, then they go back to the free tier.
They do this seasonally. So when all of this is happening, you don't want to impact performance, so the cache is logically isolated at the tenant level. The cache key is defined at the tenant level, so when you move it from one distribution to the next, you have no impact on performance. All the cached objects remain in the CloudFront cache, and they get access regardless of which distribution they're associated with.
Next, I'm going to add a subdomain. This is a vended subdomain that a SaaS provider provides to all of their tenants, and what you're going to notice is that the wildcard certificate I had associated at the multi-tenant distribution level covers that subdomain.
But most of your tenants, especially the paying customers, are going to want their own custom domain. In this example for simplicity, I'm choosing an existing certificate that covers this custom domain. But you will also be able to automate this, and we've added HTTP-based domain control validation, which allows you to make this process of onboarding custom domains for a customer much easier. The existing mechanisms of email-based or DNS-based domain validation still exist, but HTTP-based is the more operationally efficient way for SaaS providers because they don't control the domain. I use DNS-based in this example because I control the domain. It was easy for me to set up ahead of time for the demo, but in the next session, a little later, Ryan is going to come up on stage and walk you through an automated example of how they use HTTP-based domain control validation in their onboarding workflow.
Next, I'm going to define the values for that parameter we had associated or defined at the multi-tenant distribution level. This allows me to select which path, which S3 bucket and folder that my content needs to come from for this particular tenant and the domains associated with this particular tenant. You can have more than one parameter. The absolute maximum you can have is about five parameters in total, no more than two parameters for each field that you're using the parameters in.
When you have tenants that require customizations, especially your enterprise customers, they're going to come in with their own security rules that they want added. They're going to have their CISO, their security team, tell them that their application has to have a certain set of rules, and in these cases you can choose to override that existing Web ACL that was inherited with a custom Web ACL so they can have their own WAF rules, they can have partner-enabled rules, AWS managed rules, all of that just for their tenant.
I can also add geo restrictions. In my example here today, what I'm going to do is select United States and United States Minor Outlying Islands as the only end users that can access these domains for this tenant. This is a common example where the customer comes and tells me, "Hey, I only do business in the United States," or for regulatory reasons I don't want to do business with anyone else. I don't want to handle the legalities, the taxation for other countries, so limit access to my domains just to the United States. That's what we're doing here.
Now, at this stage, we've created a multi-tenant distribution, we've created our first tenant, and all that's remaining is our DNS setup. Before we do the DNS setup, we'll go into connection groups. This is where you can control how your applications or your tenants' applications actually connect into CloudFront. If you notice, there's no connection groups on the left pane here.
So first I'm going to go into settings and enable this custom feature. Once I enable this, there's a shortcut that's added to the SaaS section for connection groups. I'll create a new connection group and set it up with the settings that I desire. One of the most common use cases in the SaaS world is that your customers tell you that they want to use their apex domain. They want to use example.com. They don't want to do www.example.com, and that leaves you with a challenge. For example.com in DNS you have to have IP addresses. You cannot have CNAMEs. And so they come to you and say, "I need a single IP or I need a small list of IPs that I can put an A record in my DNS." And for this we're going to create a static IP list, and for that in the connection group example we don't have one right now so I'm going to go ahead and create this.
This opens up in a new window so you can create your Anycast static IP list. There are a couple of different use cases. Sagar spoke about the allowlisting example. I'm talking about the Apex domain example, so I select Apex domain. Then I have the option of choosing IPv4 or Quadstack. I'll select Quadstack so I get both IPv4 and IPv6 addresses.
For these customers, this basically gave me 3 IPv4 addresses and 3 IPv6 addresses. These addresses are static. They do not change, and they only serve requests for your workload. No other customers use them, so it's isolated to just your workload. Next, we'll go back to that connection group that we created and associate the Anycast static IP list that we created to the connection group.
What this does is when you look up the domain, the CloudFront domain that we've given or assigned to the connection group, when you do a dig or an NS lookup, you're always going to get one of those 6 IP addresses that we saw in the previous screen. Now we're ready to associate this connection group and this DNS to the tenants. Let's search for the tenant that we just created. You see another dropdown there for connection groups.
In the connection groups, I'm going to select the connection group that I just created with the static IP list. And there you have it—you're ready to go and enter your DNS entries and point those domains to CloudFront. At this stage, you might wonder if you have to do this for every single tenant. You can, but you don't have to. You can use the same connection group across multiple tenants.
This is because we understand that you're going to have some tenants with specific needs, but these needs come in cohorts. There are customers who say they only have IPv4. There are customers who say they want Quadstack. There are customers who say they want static IPs, but beyond that, the rest of the settings remain common. So you can actually reuse those settings. If you're updating your documentation and telling customers how to onboard an Apex domain to your platform, you can provide these static IPs, and the same IPs are provided to every customer. That way, all of them use the same IPs and your operations are much easier.
Netlify's Automation Journey: Onboarding Thousands of Domains in Seconds
We can repeat this process in the console as many times as we want. But that's not what you're doing in production, are you? So we're going to have Ryan come up and show you how to automate this, which is really the playbook of how SaaS operations are done. Ryan, take it away. Thank you. Let's get right to this. First, we have to figure out what Netlify is.
We're a platform that helps developers take their ideas from their laptop into production. We serve over 9 million developers right now, providing key infrastructure for them to lower the bar to getting online. So much goes into developing a site, from what tools you're using locally to what you're using in production, how you monitor and observe your traffic, how you iterate safely and consistently, and so much more. We're trusted by sites from all around the world, from small personal ones to massive enterprise ones. It's been really fun to build out, but that ease brings a really unique cardinality problem to us.
Yesterday alone, we served 6 million websites with different domains all over the place. This means that on top of having high RPS and staggering bandwidth needs, we have to deal with a huge breadth of sites. The SaaS problem is very much our problem. When we initially set up the network, we found that all of the existing providers didn't really have the features we needed. They're built around singular domains or a small set of them, and the provisioning didn't work well. They could handle all of the throughput—plenty of capacity—but the operations was a problem, so we had to build it ourselves.
Here's a much simplified version of what we built. It's a global network that directly serves the sites. Integrated into that network is a huge host of features, from automatic AI support, multiple levels of edge compute, advanced caching directives, and complex routing. The system was designed to handle a small scale set of people up to really massive global presences. As a business, we needed to be mindful of scale, performance, and availability. These are really table stakes for any edge offering. When we started talking with SaaS Manager after we had built a lot of this, that conversation started to change. We started to see a way forward where we could leverage their strengths to improve our product.
Really, it came down to a few motivations for this integration. Cost was a huge driver.
CloudFront bandwidth is dramatically cheaper than direct EC2. It's really nuanced around what region you are, what your contract is, and where you're serving, but it is definitely a benefit. When you look at network operations, DDoS is, in my opinion, a resource war. The goal is to spend fewer resources on bad traffic because we're really just trying to compete on who has more CPUs. At Netlify, we have many different ways that we can carve out traffic from any type of attack, but at the end of the day, we push down to our eBPF layer and reject this at the kernel level. I can't go beyond that practically. However, once we started putting CloudFront in front of us, I then have more operational capabilities that I can work with.
When I look forward and whenever I've done a large integration, the question comes up is how can I leverage that new integrated thing to improve and use their strengths to enhance my product. Let's take a quick look at what was wrong with the limits of the standard distribution. Any CDN needs to have TLS termination and handle caching for you, which is straightforward. That doesn't really work for us in the standard distribution. In particular, it comes down to the fact that the standard distribution was fundamentally designed around a small set of tenants and a small set of domains that it's handling. You can see it in many different parts of its design, from the single TLS certificate per distribution, meaning you're fundamentally limiting how many places that can serve. If you want to handle millions of unique domains, you would need millions of distributions, and we can see how that becomes a problem really quickly. You're only allowed to run a tiny amount of code pre-cache, and we run a lot of stuff pre-cache. We want to be able to do traffic shaping and anything like that.
So we started looking at CloudFront SaaS Manager. Once we started this integration and started talking with the team, we really started seeing that it checks a lot of the boxes for us that we could integrate with. It handles all the cardinality issues, it handles improved resilience on our network, we get a much bigger footprint, and overall, all of the gears started working together. I'm going to run through how we made that automation. This is going to get into a lot of the code and API calls that we actually ended up doing. This is our starting point: a viewer hits our edge and we serve the content back, pretty standard serving. The first step was we needed to build a service that translates our customer context into the AWS language that we needed. We needed to set up the distribution manually with a single parameter to help us with routing, and one of the big callouts is we actually disabled all the caching in CloudFront.
Fundamentally, the caching is incompatible because CloudFront is around path-based caching that you know beforehand. I don't control the URL spaces of people, I don't know what it looks like, so I can't preemptively tell them. Maybe a little bit in the future I'll be able to leverage that. There are some future ideas that we have for it, but fundamentally we just use it as a pass-through network. The customer gives us a little blurb of data: what domain, who they are, and what domains do they want to handle. We hit our service with it, and we make this call. It's a lot of Go code on the screen, but let's run through it slowly. First thing is some table stakes parts: what is the distribution and what are we going to name it? That's straightforward. We translated that customer information about their parameterization from what they told us into the calls that need to be made. And then finally, we end up with the domain itself of what domain are we actually going to use. We send that information through and make this API call, everything goes smoothly, and we get a distribution and we get a tenant in the distribution that we made.
But I glossed over this part, which is really going to be about that TLS certificate that we have to do. Here what we're saying is that we host this domain, it's not a CloudFront domain. Please give me a TLS certificate for it, meaning that I don't have an existing ACM for it. So this is what it looks like in the console: domains are registered but can't be used. One of the great improvements though is the HTTP validation that we got with SaaS Manager. This means that we can rely on SaaS Manager and ACM to work together to give me back what I need. Here you can see what that means is that it tells you certain paths that you need to respond with certain content on, which is just proving that you can do this domain. So the way we do that in the API is this get managed certificate call. What we're going to get back from it is this block of JSON essentially.
What you can see is that this is the same information that we saw on the console. In particular, all these redirects show us what we need to put into production. We get the URL back that we have to redirect to, so they're going to make a request and we have to 301 that back. One of the things that is really helpful in Netlify is that this is bread and butter table stakes for us. You can set up static redirects in a bunch of ways, but for this, we just set up a static redirect on the site. This means that when the request comes in for ACM asking us to prove ownership, we'll serve it back a 301 right away from our edge.
So that lets us work with ACM to say we do actually own this domain, please trust us. And we get a certificate. With that, we will start pulling that same get managed details call, and it'll take the pending validation into issued. At this point, I have all of the components that I need to build up and start serving the site. I've got the distribution itself, I've got a valid tenant, and I've got a valid certificate. The only thing missing is I have to link them all together. That's just one more call into their API that says, please update that tenant with this certificate. The biggest part here is all about this customization and saying, please, for this tenant, use this certificate. This lets us override like we were seeing in the demos earlier. And now we're up, we're live.
We can take traffic here. That's awesome. We're a SaaS provider, we have to prove that we can do this. We can't just tell customers for sure it'll work, we have to show it works. So I reach for my good old friend, curl, and we make a request. Their stuff is running to our thing, we give back a 200, we give back a Let's Encrypt certificate. The site is live and working. What I'm going to do though is actually force a request through to SaaS Manager to also verify. Here, the biggest part is this resolve.
It overrides DNS for this one request, and what we can see is that with this, I did hit SaaS Manager. I get back the ACM certificate that we talked about and some headers that are CloudFront specific that I biograph this problem to one day I will be able to suppress. So all of that's up and working. We've got the distribution, we can serve traffic. All of this stuff is valid. Now I can work with the customer to figure out how to move their domain. This can be a hand in glove situation. This can be documentation, this can work with the connection groups. There's a bunch of options, depending on how they need to set up DNS.
If I control their DNS, which we can do, I'll just move them. If I'm not, I can just give them recommendations for how to do it. And at the end of this, what we really end up with is this automation lets us bring up new domains in under 10 seconds. Usually much, much faster than that actually, but that was on my slow side. It scales up to thousands of tenants by default. There is a soft limit around 10,000 tenants, but you can go over that if you just work with your AWS team to kind of get your quotas raised up.
Advanced Capabilities: CloudFront Functions and Tenant-Specific Security
Turning that page, I want to look at what's coming up next. What are the features that we could make and what are the capabilities that we are able to do? In particular, let's set the stage. This is where we're at. Viewer hits CloudFront, passes through their network, hits my edge, and then we handle that request. With this new system, we actually unlocked a few capabilities. In particular, we can now start working with CloudFront functions. And what that lets us do is a lot of power before we hit cache.
A little aside on CloudFront functions: they run in the CloudFront POP, essentially as close as you can get to the viewer. Start times are in sub-milliseconds usually, allowing you to scale to tremendously high volume. This has a couple of trade-offs. In particular, it runs a limited JavaScript runtime, meaning library support can get really challenging, but ultimately is surmountable. It also lets you touch all of the parts of the request, almost all the parts of the request actually. You can mess with your cookies, the headers, the URL paths, anything like that. You just can't touch any of the bodies, you can't rewrite them, or you can't modify the body themselves. And they're configured per distribution, meaning that it can be really challenging if you want to have multiple ones or if you want to customize the behavior for a tenant.
My recommendation is to really write your code there as generically as possible and use tenant configuration to customize the behavior. So we're going to give an example. Your viewer comes through, we hit a CloudFront function, and then pass that back through. An example of what I'd like to do is have authenticated URLs in each site. So for me, it might be slash private and I put a bunch of information under there, for you it might be slash me. That's a configuration that the customer would tell us, we pass through the tenant, and we're able to use. And the secrets have to be per tenant.
This is to control blast radius and leaks. If there's an issue and some secret gets out, we don't want it to go to everyone. We want to put authorization as close as we can to the user itself. In my experience, the moment that you put something behind some authentication layer, it becomes a ripe target for attack. Somebody is going to try to get past that authentication. In the resource war, we want to do less of it, so earlier in the network, the better we can do that.
The way we can manage it is actually a pretty straightforward little JavaScript function. What we're doing here is we're going to take the information out of the request, we'll get its auth header and things like that, take information out of the tenant, and then use those to make the decision. If you're valid, we just let the request continue on, no problem. If not, we can bounce you at the edge.
The unique part is here with SaaS Manager. It allows you to query that tenant parameter directly. During that setup, when we're talking about parameters, you can put things in there and then query that data back. That lets you in your CloudFront function leverage the tenant's configuration for very generic code and then manipulate through it while having a singular or small set of functions.
Another use case that we tend to look through is leveraging that network in front of us. In my experience, we really end up with two big flavors of attack. There's a ton that happen, but you end up with somebody hitting a single domain, often a single path in that domain, trying to just overwhelm it, really hammering that endpoint. Then we end up in a lot of fuzzing ones where people are going to be walking across different domains or different paths and really just trying to explore what's available and maybe overwhelm you because they're trying to see what your resource utilization is.
In both those cases, we're actually able to leverage AWS WAF capabilities. I can associate with a tenant specific web ACLs. I can then bounce people from that particular person. This lets me contain the blast radius of any IP bands. Often IPs are shared for a variety of networking reasons, and we don't want to block them for the entire network because we're going to impact a lot of sites at that point. Instead, I want to block it for that one site at that one time.
But then we have this problem of fuzzing attacks and people trying to walk across the network and figure things out. When we can identify those, because we have a whole bunch of data about your path and behaviors, we can then say we're going to update a tenant or a distribution level web ACL and just remove them from the network completely. Both of these let us have a lot of capabilities to respond to the different threats in very different ways. Because we can associate different web ACLs and different firewall capabilities with different tenants, we're able to then offer enterprise grade things. We can integrate more and more of the features of the firewall capabilities per tenant, and then work with the customer for what their requirements and needs are.
Call to Action: Building Your Multi-Tenant Edge Architecture
With that, thank you very much. Let's recap what we've covered today. We spoke about why multi-tenancy at the edge is fundamentally harder because certificate handling, your routing isolation, your security boundaries, and your per tenant policy enforcement were traditionally not primitive CloudFront enforced. Each tenant effectively required its own distribution, making scaling operationally heavy. That's where CloudFront SaaS Manager changes the model.
By combining your shared distribution with modular tenant configuration, you can scale even up to thousands of tenants on a single edge control plane while providing isolation for each tenant. This brings us to our call to action. If you're managing multiple domains, you have that custom tenant logic or tiered user experiences, this is your signal to evaluate your multi-tenant patterns at the edge. There's really no reason to stick with standard CloudFront distributions anymore. Everything that you get and you can do with a standard CloudFront distribution, you can scale and do much better with multi-tenant CloudFront distributions.
As you design your multi-tenant blueprint, think in terms of the core constructs that CloudFront SaaS Manager gives you: your shared distribution, your tenant parameters, your origin routing constructs, your configuration overlays. These building blocks let you separate what must be isolated from what can be shared. So you can support thousands of domains, customer tiers, unique behaviors, all from a unified edge architecture.
Use HTTP validation in CloudFront SaaS Manager to fully automate your SSL onboarding or SSL provisioning for your tenants. Ryan showed us how Netlify does this today. Start small. Pick a small subset of tenants with the most complex requirements and then measure the delta in terms of the operational burden, your onboarding time, your config sprawl rate, your error rate. If you have multiple tenants today doing similar things, you can actually simplify your operational pain right out of the gate. This is how you build your internal business case for adopting CloudFront SaaS Manager in your organization.
This brings us to the end of our talk. We have four more CloudFront sessions lined up for you. There's a workshop on CloudFront SaaS Manager in a few hours today in the same hotel. Thank you very much for joining us today. We will be by the stage for any Q&A. If you found the session to be valuable, please take the survey. Thank you once again.
; This article is entirely auto-generated using Amazon Bedrock.












































































































Top comments (0)