Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Streaming at Scale: Advanced Media Delivery with Amazon CloudFront (NET307)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Streaming at Scale: Advanced Media Delivery with Amazon CloudFront (NET307)

In this video, AWS specialists Jamie Mullan and John Councilman, along with NESN's Jess Palmer, discuss streaming at scale using Amazon CloudFront. They cover critical topics including monitoring with CloudWatch dashboards and real-time logs via Kinesis, implementing CMCD (Common Media Client Data) for unified observability, security strategies using AWS WAF, CloudFront Functions, and tokenization with JWT and Common Access Tokens. The session explores embedded POPs for last-mile capacity, Origin Shield for reducing origin load, and seamless cross-region failover using AWS Elemental Media Services with MQAR (Media Quality Aware Resiliency). They address contribution reliability through AWS Direct Connect and MediaConnect with protocols like SRT, and monetization using MediaTailor for dynamic ad insertion. NESN shares their real-world implementation managing Red Sox and Bruins broadcasts from 4K capture through CloudFront delivery, demonstrating how a small team successfully operates direct-to-consumer streaming at scale.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Streaming at Scale with Amazon CloudFront

Good morning everyone. Thanks for joining us today. For this session, we're going to be predominantly talking about streaming at scale and focusing really on Amazon CloudFront to do that delivery. My name's Jamie Mullan. I'm a specialist solutions architect here at AWS and I'm joined by my colleague John Councilman, who is also another specialist solutions architect. We're also joined by Jess Palmer, who is from NESN to talk about what scale really means to them. We're going to be thinking about those large scale events, those critical events that just must be delivered without any issues, and also those events with many viewers globally.

But it's easy, right? You have a contribution feed, you do some processing on it, and it just goes to a viewer. That's it. What else do you really need? Well, it's a little bit more complicated than that. We're going to look at some of those design decisions that help you influence that delivery. But scale can mean different things to different customers. For example, it could be millions of concurrent viewers, or it could be billions of minutes consumed. But no matter the scale, we frequently get asked by customers how do I monitor my distribution? How do I protect my content at the edge? How do I scale to that global audience with high uptime? And finally, can I monetize that live content?

You might be wondering what architectural decisions do I need to make to ensure an event is successful. I'm glad to say you're in the right session this morning. But what happens if we get our job wrong as video architects or solutions architects and video engineers? Well, viewers might get pixelation, degraded viewer experience, and arguably the most frustrating experience is that spinning buffer wheel. We're going to be focused predominantly on delivering content through Amazon CloudFront today. If you don't know, CloudFront is our content delivery network, or CDN. CloudFront is a large global network with over 750 edge locations. All the edge locations are connected over a redundant and private backbone back to AWS regions.

CloudFront can be used to accelerate applications of all types, whether that's static assets like video files, image files, CSS, JavaScript, and they can be cached in edge locations and delivered really closer to those end users. CloudFront can also be used for dynamic traffic requests. The talk is going to revolve around this steel thread architecture today, and we're going to look at different design decisions throughout the workflow. Advance warning to all the video folk in the audience, we're going to start downstream this time. So first off, we're going to look at delivery. You know, some OTT providers deliver these streams for a cost, whether that's subscription or pay per view. So you naturally care about security and protecting your content.

Next we're going to look at encoder origination. For big events or big sporting events in particular, this is really critical and important. You want to design your workflow to ensure the event is delivered successfully. And then finally we're going to look at ingest. Simply put, if you have no feed, you have no event, right? We'll explore different ways to make ingest more reliable. But the first thing that we're going to kick off with is monitoring and ask that question: how do we know we're actually delivering that event successfully?

Monitoring Live Events: Leveraging CloudWatch, Real-Time Logs, and CMCD

I'm a solutions architect with AWS. We want to look at how we can gauge whether our live event was successful. As you can see here, there's a lot of data available to us from our platform, things like exit errors, video start failures, rebuffering. A lot of this stuff either comes from the client itself, the video player, or it's coming from CloudFront. This might be presented as logs or beacons. It's coming from a lot of different directions. We have a lot of data, which is good, but it's also bad. We have a lot of data. How do we deal with it all? We're going to look into some features on CloudFront and some other AWS services to see how we can actually make sense of everything here.

So first off, every event needs good data. We need a dashboard. We need real-time access to logs, and we need to be able to analyze the event after the fact. So first off, with metrics, CloudFront actually uses CloudWatch to deliver metrics. I like to use CloudWatch dashboards.

I compare it to an actual dashboard like on my car. When we go camping, we actually pull a trailer and I'm constantly looking at two gauges on my dashboard. I'm looking at the fuel level because I'm getting like 5 MPG on my car, and I'm also looking at the transmission temperature. So if there's an issue because I don't normally pull a camper every day, I know exactly what to look at. I know if it gets too hot, I don't need to put gas in the car. It's actually the transmission I need to pull over.

The same is true for a streaming event. Most of these only last a few hours. In the case of a sports event, we need to know right away whether we have a problem with an origin, whether it's CloudFront, or if you're using something like DRM or advertising. Whether there's a problem with that affecting our stream, dashboards are the first line of defense so we know where to look and where to dive deeper.

Speaking of diving deeper, CloudFront supports real-time access logs, and as you can see here, it actually uses Amazon Kinesis to deliver this. Delivering real-time logs for millions of viewers is extremely difficult. Kinesis is designed exactly for this. So when you configure real-time logs in CloudFront, it's going to ask for a Kinesis endpoint. The nice thing about Kinesis is you can have multiple consumers pulling that data in parallel, so we can have real-time dashboards. We can have observability platforms, analytics platforms. We can also share this data in real time with partners whether they're looking at viewership in real time or if you're looking at advertising data and they need that information in real time.

With this real-time data, we can see issues before they become a big problem and react to it. We can adjust our platform in real time as needed to handle these largest events. Then lastly, access logs are stored in batches on S3 as you can see here. The delay is usually in the minutes. The nice thing about S3 is we can tie Athena into it so we can query all these logs using standard SQL. We can set life cycles on the S3 bucket so we can have events from weeks or even years ago stored in archival. So we can compare this previous event to one that we did maybe a week or a month ago and see how we've improved and find out lessons learned.

We can do some troubleshooting after the fact and we can pull reporting easily using Athena. Then lastly, console reports, this is in the CloudFront console and it's useful for figuring out trends, viewership, cache hit ratio between events and monthly reports, so that's always available as well.

I spoke a lot about CloudFront and its logs, but typically CloudFront only has access to things that are in the HTTP header or geo information, such as user agents, the client's IP address, maybe the geo region, the ASN, cache status. This has always been logged, whether it's in real-time logs or in the S3 logs. But I mentioned earlier about the client. The client has a lot of really valuable information, such as the throughput. The client being the player, like DASH.JS or HLS.JS, is making decisions about which video bitrate to pull based on what it knows your home internet bandwidth is. Having that kind of information along with session ID and content ID is extremely valuable for figuring out what's going on.

But in the past, this information was only available using SDKs from third parties and it was sending it using beacons to some other platform. So you had CloudFront or CDN logs in one bucket, then you had all the player information in another bucket and really no way to tie the two together. Trust me, it's extremely difficult to do that. It can be done but it's not easy. But what if there were a way to combine all this client information, such as the bitrate being used, the session ID, which rendition, whether it's HD or standard definition content, and combine that with all the information that CloudFront knows about, such as HTTP headers, user agent, IP address?

That's exactly what CMCD, Common Media Client Data, attempts to solve. It's actually a standard now that was created by CTA Wave, which has created a lot of other standards that Jamie is going to talk about later. Pretty much all CDMs adhere to it and the players know about it. So using CMCD, all this information is kept in the same log line. All this content ID, session IDs are transmitted as query strings or headers to the CDN. Since it's standardized, we know what to do with it already. A lot of partners that you might be using for a monitoring platform already know how to deal with CMCD.

CloudFront has supported CMCD for quite a while. If you look here on the left, this is a real-time log configuration. As you can see here, we have additional fields that are standard CMCD fields. They're all prefixed with CMCD.

You can choose one, some of them, or all of them depending on what your client is sending. All this information will be added to the real-time logs that we're sending through to Kinesis. Many partners already support this on their logging platforms, and we have a reference sample that knows how to deal with this additional information in the log lines. It has a real-time dashboard that refreshes in real time based on all this data to show exactly what's happening with the client and with the CDN.

You can scan the QR code here or search for video observability with CMCD and CloudFront to try it out on your own distribution. Being a reference sample, it's free to use and it's open source, so you can edit it. CMCD is easy. You saw it on the real-time logs in CloudFront. It's just adding those additional fields and having a consumer dashboard that knows how to access it.

Here are some examples of players you might be using today. These are two popular open source players: HLS.JS and DASH.JS. Configuring CMCD is just a matter of adding a few more key-value pairs when you instantiate your player. In the case of HLS.JS, which I'm very familiar with, it's just a CMCD structure. Notice specifically on the left that we have a session ID now, which is a standard CMCD field and is extremely useful. If you're using CMCD and not using session ID today, I highly suggest you do it.

The session ID is usually pulled from your content management system for that particular user when they start a session. When you include this session ID in a field, you can find all the logs tied to a specific user so you can see individual user journeys from starting to play back your stream to the end. You can see how many times they switched bit rates, rebuffered, and stalled. This allows you to figure out individual user journeys and then use the same information to find other users' journeys based on session ID. This wasn't possible before since the client information and the CDN information were stored in separate locations, so it makes it super easy.

Security Requirements: Protecting Content with Geo Restrictions, AWS WAF, and CloudFront Functions

Now let's talk about security requirements. We've added visibility to our CloudFront distribution. We want to make sure all of our users, the valid users, are receiving the best possible experience. But we don't want invalid users who are trying to steal our streams, take down our content, or harm our service to receive our content and damage the experience that other users are experiencing.

What do these users look like? Ideally, in a perfect world, you'd have your origin server behind CloudFront. CloudFront would deliver a standard URL, in this case HLS index.m3u8, and all of your paid or subscribed users would pull the content. That's not the case as we all know. We see users trying to re-stream the content. They're a paid valid user coming from maybe a particular IP address pulling our content and then re-streaming it to their users who are probably paying them. Obviously, we don't want that.

Then there are invalid regions. Not only have we not really designed our infrastructure to operate in a different country for scalability purposes, but when we have viewers from invalid regions, they're messing up our stats. They might be having a bad experience and it's going to mess up our analysis of whether our event was successful or not. If we see viewers coming from outside of the country with high latency numbers, that's going to affect all of our reporting.

There might even be legal consequences of delivering content outside of your country where your contract states that you can only deliver in a certain region. You only have rights to deliver in maybe a specific state in the case of sporting events or maybe a specific country. I know in the past, dealing with some customers when the Olympics are involved, the contract is extremely specific when it comes to regions and the security required to deliver that content. So rights holders are very important as well.

Then lastly, there are unpaid users who have found their way around any CloudFront restrictions and are able to go directly to the origin. Obviously, we don't want that. But also, if you're able to go to the origin, it's going to affect the experience of everybody because the origin should be protected behind CloudFront where CloudFront serves 99 percent or more of the traffic from its cache. The origin server should never be delivering content directly.

So what tools do we have available to us? First of all, my son, he's seven, he's actually been home from school recently.

He tends to find ways to stream YouTube when he's not supposed to, so I tried using firewall rules in my home internet. Eventually, I ended up locking the remotes in the office and securing the networks. Those are the tools I have available, but obviously with CloudFront, we don't have physical locking keys. However, we do have ways to restrict traffic.

First, for anyone who's been in the CloudFront console, you've probably noticed the geo restrictions built in. This is built into the distribution in the console or through the API. This is a way to quickly allow or disallow specific regions. If I'm delivering content that's only allowed to be delivered in the United States, for instance, I can go into CloudFront and quickly add geo restrictions to only allow the US, or I could block specific countries within there.

The next layer we can add on top of that is the next tool in our toolkit: AWS Web Application Firewall, or WAF. With WAF, we can inspect anything coming in, whether it's headers or IP addresses, against a list. We also have lists that you can subscribe to, so we have managed lists. We are able to allow specific regions with geo restrictions, but what about users who are using VPNs or proxies to access our stream? Using WAF, we can actually subscribe to lists of known VPN endpoints and known proxy endpoints. In the case of the re-streamer, we found that their IP address, for example, was 1.2.3.4. During the event, we can actually add that IP address to our own managed list to block that user who is re-streaming. WAF is really useful for before, during, and after events. If you notice something specific, you can add rules after the fact for your next event, or in the case of the re-streamer, you can actually modify it during the event.

And then lastly, for anything above and beyond geo restrictions and WAF, we have CloudFront Functions. CloudFront Functions run as JavaScript actually on the edge server. They're extremely low latency and high performance. You're able to analyze the requests and responses coming from CloudFront or to and from the origin. We've added key value stores as well, so you're able to add data to the key value store using the API and query that on the edge function. We're going to go into detail about what this can be used for when it comes to security in a moment.

So how do we secure our origin? As we all know, CloudFront is origin agnostic, so the origin could be anywhere really. First, if our origin is on-premises, it's most likely on a public subnet. We obviously don't want users to access that directly because it affects the experience for everybody. The first thing we can do is leverage HTTP headers at the origin level when you configure your origin in CloudFront. You can send a secret header to the origin, which the origin can validate to confirm that this is actually coming from CloudFront. We also publish a list of known IP addresses of CloudFront edge servers that you can add to your firewall on-premises or on the origin itself to allow only CloudFront to access it.

Second, if you're using something like an Application Load Balancer or Network Load Balancer, maybe with EC2 instances behind those for your origin, CloudFront now can leverage VPC origins. I don't know if everybody's aware of that. Recently, we've even introduced cross-account VPC origins. If you're using multiple accounts, CloudFront can access these origins on a VPC from other accounts, whether they're your accounts or even partner accounts, which is huge because if your origin is not even on a public subnet, it can't be accessed in the first place.

And then lastly, for some of our managed services delivering media, a lot of you use S3 for live and on-demand media delivery as well as AWS Elemental MediaPackage. These can use Origin Access Control, which are rule sets that specifically allow only CloudFront to access their content using signed URLs.

I mentioned CloudFront Functions to expand upon security above and beyond WAF and geo restrictions. We see a lot of customers use tokenization within functions to further secure content. A token typically uses JWT, but there are quite a few formats out there. Using functions, we're flexible. You can basically implement any sort of mechanism that you'd like, whether it's JWT or some custom token that you use across other CDNs. Inside the token, it basically describes who you are and what you're able to watch, like a content ID. It would have a time limit. In the case of JWT, it's very easy to set a lifetime. This token is only valid for one hour, two hours, a full day. It also provides instructions on where and what you can watch.

Ideally, a user with a valid token sends all their header information and IP address information to CloudFront. We're able to look up their geo information using geo headers that are configured in CloudFront, and the token contains all the information about what they're able to watch from a specific location, perhaps a US state. They're tied to a particular IP address or user agent, and the function validates all of this against the token and what CloudFront knows about the user. If everything matches up, the token wasn't modified in transit from the user, and the token hasn't expired, then we return a 200 OK from the function.

Then we have users with invalid tokens. Perhaps they tried to modify the token and create their own token, and it's invalid because it wasn't signed correctly. Or they're coming from an invalid location. Maybe they traveled with a valid token that we signed for their home use, but they're now on their cellular network and didn't receive a new token. We return or manufacture a 401 unauthorized response from the function itself and immediately deny that user before they even hit CloudFront or the origin server.

Lastly, we have a user with a shared token. All these examples are using JWT, which is really common and easily implemented in functions. This user received a valid token from the first user and is trying to reuse it. They might be living in the same neighborhood and using the same device, so it happens to be valid for some reason. We notice that this token is being used multiple times and is not valid. We can actually add this token value into the key value store.

This is a good use case for key value stores and functions: token revocation. You can insert this token into a key value store when you notice something wrong with it that's being used too many times, and then you can have the function return a 403 forbidden if that token's in that key value store. This is a common use case that we see for tokenization with JWT tokens. However, we recently released CBOR token support right before re:Invent, which is needed by Common Access Tokens, which was created by CTA Wave. Instead of having to figure out how to write a full token authentication workflow in functions, you can tie directly into CBOR, which is supported as a module in functions. The code is extremely minimal—just a few lines—so that's natively supported now. Common Access Tokens seem to be where the industry is headed because it has been standardized now.

We do have guidance for secure media delivery at the edge on AWS. This takes all the work we've done on functions and does it for you. You can simply deploy this on your CloudFront distribution, and we deploy all the functions supporting JWT token support. We have a step function available that's also deployed that can automatically revoke tokens that are being used incorrectly, either multiple times or from invalid users. We've even added a full playback application sample with a video player so you can try it out. You can just deploy this using CloudFormation on your own distribution, and being open source, you can modify it at will. We even included SDKs for your content management system to create tokens and sign tokens when you create playback URLs. We're actually enabling support for Common Access Tokens in this really soon, so keep an eye on it for the updates.

Last Mile Capacity: Overcoming Network Constraints with Embedded POPs

I'm going to hand this back to Jamie for a talk on last mile capacity. Large scale events can generate significant internet traffic spikes for a really short period of time, straining the network links that connect viewers to content. However, there might actually only be a certain amount of capacity between an internet exchange itself and an ISP or a smaller city. ISPs continuously optimize their networks, but scaling capacity for those last minute big events is understandably challenging, especially those events where you only need that capacity for that short period of time.

When a network becomes saturated or congested, both packet loss and increased latencies can occur, and the result on the viewer experience is things like quality drops, rebuffering, or even playback failures. You can lean on that monitoring approach that was spoken about earlier to try and help identify some of these issues. But unless you own the network or the underlying network as an ISP provider, this is something that is really largely out of your control. Last year, however, we introduced embedded POPs. They overcome this challenge by getting you closer to those end viewers, caching content directly inside an ISP network and removing some of the dependency on the network links between the viewers and the internet exchange.

We've embedded POPs across over 1,140 locations spanning more than 100 ISPs in over 20 countries. By reducing the distance to end viewers, customers who have already adopted embedded POPs have seen performance improvements. Because we're able to remove some of the network dependency, they're really great for highly cacheable content. We're very focused on the media and entertainment space today, but you can also use them for content like game delivery and software downloads.

As part of the onboarding of embedded POPs, we validate the workflow that you want to use them for. If it is a good fit, the distribution is enabled by us in the backend. However, there is some additional work required on your side to make sure your client application can route to embedded POPs. The way that we recommend doing this is integrating the client routing SDK into your URL vending service.

So how does this work? First, you need to integrate that client routing SDK into your URL vending service. The QR code on the bottom right is the link to that SDK in GitHub if you're interested in looking at it. When the client application requests a URL from that vending service, the client IP is encoded into the URL and it's annotated as a CR label in this diagram.

The client then performs a DNS lookup, and Route 53 and CloudFront are able to decode that client IP from that CR label. CloudFront then returns the best POP based on that client IP, whether that's an embedded POP if appropriate, or a traditional POP. The client can then start fetching content from CloudFront. In summary, embedded POPs deliver cached content directly from ISP networks, improving performance and overcoming those potential last-mile capacity constraints that you might have.

Origin Resilience and Failover: Designing for High Availability with AWS Elemental Media Services

If you want to make use of embedded POPs, I really urge you to contact your account team to help you validate and evaluate your workload to see if it's suitable. So typically, the next bottleneck that we see is the origin. Our aim is to offload as much traffic as possible from the origin so that it has enough capacity to serve requests at scale. This is to prevent our origin from becoming overwhelmed and failing entirely.

Caching helps with this, of course, and CloudFront uses tiered caching. We have edge locations and regional edge caches, but you can also use something called Origin Shield to further reduce that origin load. Instead of regional edge caches hitting the origin directly, they go through Origin Shield, and this provides a better cache hit ratio, reducing demand on your origin. This is especially helpful for those events where we've got a lot of scale.

We've covered capacity and reducing load to ensure viewers can get a great experience. Now it's time to think about how you design the rest of the platform to provide just that. To quote Werner Vogels, everything fails all of the time. Lots of customers approach us by thinking about how many nines they want to design for. But when you're thinking about a media world, this can be very different if you're designing a 24 by 7 channel, for example, versus event-based channels.

The other important part is balancing the cost and how much resilience is really enough for you. If your platform fails entirely, users will be unable to consume your content. This might impact you in other ways, such as increased customer contact or even financial impact. For live events especially, there's no real second chance or try again later. That moment is simply gone forever.

Like we've discussed, things can go wrong. Redundancy is really critical to keep your stream up and running so that you can deliver to your end users. In Nick's example, we have two EC2 instances in availability zones to help combat this. We do see customers deploy redundant encoders and origins in AWS, but we do offer managed services for media workflows. They offload the challenges of managing your own customized infrastructure and give you capabilities and features that we'll talk about in a bit. These features help you provide the best viewer experience possible to viewers.

If you're not familiar with AWS Elemental Media Services, this is a typical workflow for an AT&T stream. We have AWS Elemental MediaConnect, which is our secure and reliable transport service for live video. That's then fed into AWS Elemental MediaLive, which is our cloud-based broadcast-grade video processing service. Then we have AWS Elemental MediaPackage, and that is our just-in-time packaging and origination service where you can implement DVR-like functionality and things like DRM. And finally, we've got Amazon CloudFront, which is our CDN which we're predominantly focusing on today. Viewers, depending on what they want to stream and what manifests they need, can pull different origin endpoints from MediaPackage.

With this, we've moved from EC2 and gone to AWS Elemental Media Services, so we no longer have to worry about those underlying instances and get all the benefits of AWS Elemental Media Services. For example, MediaLive supports cross-AZ synchronization and redundancy for failover scenarios. Customers running large events tend to implement more redundancy than just one region, though. We're going to look at what that multi-regional architecture might look like. To help speak through this, I'm going to look at a typical failover scenario that you might see, but looking at where manual intervention might be required.

We have the same architecture as what we had, but obviously we've got Region 1 and Region 2 with MediaConnect, MediaLive, and MediaPackage. We also have the addition of AWS Elemental Live, which is our on-premises encoding appliance, which in our case is doing that venue contribution encoding. Let's say something goes wrong with one of the contribution feeds going to Region 1, where the viewers are currently streaming from. Viewers connected to that might start seeing black frames instead of the actual video content, but hopefully you've got an eyes-on-glass operations team, and they spot this issue early on and decide that they want to do that failover to Region 2 manually. Remember there's a time delay between the issue occurring, somebody spotting it, and then proactively rectifying that situation.

The failover occurs, our viewers go to the URL vending service and say, I need a new URL, and it returns one for Region 2. However, there are a few challenges here to think about. Number one, predominantly, is that it requires somebody manually to do this, and there's also a time delay in detecting it and then remediating it. But not only that, when a viewer rejoins, are they going to have to go find where they were viewing in a stream, for example, and then the timing might be misaligned between the two regions anyway. So how do you keep things in sync? How do you do these things automatically?

We're going to explore this further, but we're going to start with timecode. Timecode is a time reference that's embedded in each video frame. Timecode can be really important in the media distribution world. We want to use it for synchronization in downstream components, such as the encoder and packager, to provide that frame-accurate alignment. Timecode is part of the answer, but how is it actually applied to a failover scenario? We have a feature that can help manage this, and it's called seamless cross-region failover.

We're going to start with the AWS Elemental Live encoder on-premises, and that's going to use the same timecode source, so both regions receive an aligned contribution feed. MediaLive is then configured with epoch locking, and uses the embedded timecode, that timing source being that input clock. And then CMAF ingest on the output of MediaLive into MediaPackage, and when you combine CMAF ingest and epoch locking, it results in this regular segmentation cadence based on the epoch time. MediaPackage can then use stateless synchronization to help predictably package the output content.

So what happens if something does go wrong, such as slate input, incomplete manifests, and a missing DRM key? Well, MediaPackage actually has the ability to 404 its endpoint. And CloudFront is configured with something called origin groups, where you specify a primary and a secondary, and also a failover criteria based on an HTTP status code. When it picks up that MediaPackage has indeed 404'd one of its origin endpoints, it'll automatically send that traffic to the secondary. But what if it's not such a fundamental problem, for example, what if it's black frames or frozen frames in the source, and it's only impacting one of those contributions going into one of the regions? Well, building on the previous architecture, we have a capability called Media Quality Aware Resiliency to solve this, or MQAR. So MediaLive actually has the ability to also produce something called an MQCS score.

MQCS stands for Media Quality Confidence Score. This score is based on input parameters including black frame detection, frozen frame detection, and input source loss, and it ranges from 0 to 100. Each score is independently based on its own input. Media Package has the ability to use this in two locations. It can use it on the input to provide in-region failover. It also uses it on its output via CMSD, or Common Media Server Data. CMSD is also part of the CTA specification. Finally, you continue to use origin groups on CloudFront, but instead of using the default routing, we use the media quality score as the origin selection criteria. The way it works is that a GET request is sent to both regions, and it picks the segment that will ultimately provide the better viewing experience to viewers or clients. In summary, you can leverage AWS media services and Amazon CloudFront with MQAR, or Media Quality Aware Resiliency, to automate your monitoring and minimize the duration of disruption events.

Reliable Contribution Streams: Securing Ingest with AWS Direct Connect and MediaConnect

All these measures we have taken to analyze our stream using logging and to ensure our origin and encoder are available are useless if our contribution stream into the cloud is not stable. We need to make sure our contribution stream is reliable just as much as the rest of the chain on the encoder, the origin, and CloudFront. So how do we make sure our source is just as reliable as our delivery? Some of the things that we require in our source are consistency, a reliable source, coverage of that source wherever our event happens to be, and security because usually the source coming into the cloud is our highest quality mezzanine feed. We do not want anybody to be able to access that because it is the highest quality content and we do not want that to go into the wrong hands.

We do have a solution for that which many customers use, and that is AWS Direct Connect. Direct Connect is used heavily for stream contribution for consistency. It is a dedicated connection into a region offering up to 100 gigabits of connectivity. Many of you are probably contributing content now through higher quality lossless formats like JPEG XS and using ST 2022-7. If you are doing things like that, Direct Connect is extremely helpful. It is also reliable. You can have multiple paths into the region using Direct Connect. Coverage is in over 100 locations worldwide in 30 or more countries, so it is probably nearby or at the facilities where you are broadcasting from, such as stadiums. Again, it is private. It is a private network and we are able to add things like media encryption on top of it, which we will discuss in the next slide.

This is where Media Connect comes in. Media Connect is one of the AWS Elemental services for the entry point of live media into the cloud, whether you are using Direct Connect or Internet contribution. You can use Media Connect for both use cases. If you are using Direct Connect, you are probably ingesting over a VPC. If you are using Internet connectivity, you could certainly use a VPN endpoint and also ingest through Media Connect over a VPC as well. By utilizing VPC origins and VPC ingest, the whole media workflow can be private. CloudFront supports pulling from a VPC and we can ingest to a VPC as well. We mentioned encryption earlier. Many of the protocols that we support on Media Connect offer encryption such as SRT and Zixi and RIST. SRT is used quite a bit and enabling encryption on it is trivial. Pretty much every encoder out there that publishes SRT supports encryption. So whether you are using the Internet or Direct Connect, encryption is an option.

Especially over Internet contribution, packet loss is a concern. Dropped packets are a thing. Most of these protocols like SRT and Zixi and RIST and RTP FEC support error correction and ARQ, or Automated Repeat Request. This is all adjustable on Media Connect and on the encoder. If you are doing contribution over the Internet, adding a little bit of latency in exchange for reliability is available on Media Connect. You can add a little bit more error correction for known spotty connections. Both of these protocols, Zixi and SRT that I mentioned, allow for failover of the stream. So you can have multiple ISPs contributing over Media Connect, and if one of them has an issue during the live event, it can automatically switch over to the other one.

I mentioned earlier about metrics, and they're super important. The dashboard that we create before the event has CloudFront on there so we know if there's an issue with CloudFront. Most likely it contains metrics from the origin server to know how the origin is performing, and if we're using DRM or ad insertion, those metrics are also available. We can also add ingest metrics to the dashboard as well for ARQ recovery and dropped packets so we know we have a quick view on whether our ingest is having issues, whether it's CloudFront or the origin.

NESN Case Study: Managing Regional Sports Network Streaming with AWS

For SRT, which is one of the most common protocols for ingest over the internet, these are the recommended metrics that we see being used for dashboards for Media Connect. These are the ones from the largest live events that customers are actually monitoring really closely for ingest quality. Next, Jess Palmer is going to come over. She's from New England Sports Network, home of the Red Sox, and she's going to speak a little bit about her platform.

NESN is not the largest RSN, but we are actually the first RSN to offer a direct-to-consumer live streaming app in 2022. As a regional sports network, NESN has a small digital team, mostly me and my boss serving a growing user base. Armed with basic streaming knowledge, we were able to easily learn and manage the stream flow through AWS cloud services. RSNs like NESN have unique streaming challenges. We must deliver high quality, low latency live and on-demand sports content, enforce regional broadcasting rights, and handle unpredictable traffic spikes, all while maintaining strong security and compliance.

Additionally, unlike many national distributors who receive SRT feeds and then retransmit those, at NESN we control every step of the broadcast. First, the game is captured by cameras which are connected to our mobile production unit, otherwise known as the truck. It's an actual truck with the production equipment inside it. We have 4K cameras and HD cameras at Fenway Park for the Red Sox and at TD Garden for the Bruins. For away games, we have our own HD cameras that travel with the teams. The feed is then relayed over the internet from the truck or from the opponents' stadiums to master control at our Boston office.

These feeds are shared directly with our linear distributors or fans still watching sports the old-fashioned way through their cable subscriptions. Additionally, the SRT feed is routed directly to AWS through AWS hardware encoders. About 50 percent of the Red Sox and Bruins season's games are home games which we offer in HD and 4K. AWS makes it easy for us to manage the availability of our 4K stream using an AWS Elemental encoder via Media Live for transcoding and streaming management. We installed Media Live Anywhere on our AWS Elemental encoder with John's help so that we could administer the feed through Media Live the same way we manage our HD and UHD links in the Media Live dashboard.

From the 4K source we configured a Media Live output group that contains 4K, 2K, 1080p, and 720p video outputs as well as stereo and surround audio outputs and closed caption outputs. We kept the audio and video tracks separate because we found that muxing resulted in issues with some of the OTT players. We used the Media Package V2 channel group to create DASH and HLS origin endpoints again for compatibility on different systems. Those origins are then used for CloudFront distributions. A major advantage to using CloudFront is that it automatically scales to accommodate traffic variations. So for example, if the Sox are playing the Yankees, we don't have to worry about edge capacity. CloudFront takes care of that for us.

We also occasionally get third-party feeds or one-off direct feeds like batting practice directly from Fenway Park or morning skate from Bruins practice arena that we offer as digital exclusives. We have an HD link that we use for those one-off events. We use the same basic flow for those events. The live or third-party feed is routed through the HD link, and the HD feed is transcoded.

into lower output tiers to accommodate CTV and mobile devices. Then it's the same media live to media package to CloudFront flow. Additionally, CloudFront logs across all services provide metrics and alarms that help detect performance issues early, maintain stream quality, and ensure smooth delivery. Having these logs easily accessible and queryable using log insights is essential to a lean digital team like ours. In summary, CloudFront empowers NESN to deliver reliable, high quality streams for every fan, scale seamlessly during live sports peaks, protect media rights, and maintain regulatory compliance, and gain real-time insights into audience behavior and performance.

Monetization Challenges: Scaling Dynamic Ad Insertion with AWS Elemental MediaTailor

Thanks, Jess. It's really cool. I always love everything you're doing over there. Jess spoke a little bit about AWS Elemental MediaTailor. It was on her diagram earlier when she gave a demonstration. AWS Elemental MediaTailor is a managed service under the Elemental umbrella that does ad replacement and ad insertion for live streams or on-demand streams. This workflow here again uses MediaTailor, but a lot of the other ad insertion platforms work in a similar manner. When you're trying to monetize your streams, it's a whole new set of challenges.

This is called dynamic ad insertion. By the name, you add a dynamic element into it. Every single manifest for end users is personalized because they have ads destined for the end user, whether you're using linear ads or whether you're using some of the newer formats like squeeze back or pause ads. So before we've always talked about scaling and CloudFront scaling the origin. When we talk about ad insertion, the origin becomes that much more important because every single request from the end user, whether you have 1000 or a million users, you're going directly to the origin and the origin is modifying the manifest in real time.

So it's really important that your origin can scale with your viewers appropriately. When you configure CloudFront for advertising, you're going to have a couple more behaviors being configured for your manifests. You're no longer caching them; you're sending them directly to the origin server, which is then modifying every single manifest going to the end user. Then MediaTailor in this case is going to make a call to an advertiser upstream, whether it's Springserve, FreeWheel, or any of those ad platforms, and those must scale as well because every single request is going to be a unique request to an ad server.

So it's really important that when we add dynamic ad insertion into a platform, we look into that as an origin as well as DRM platforms as well. You know, they're typically not behind CloudFront. AWS Elemental MediaTailor actually recently added some features to help with the ad platform because if everybody's going to an ad break at the same time and we're hitting a million viewers to an ad platform all at once, the ad platform, whoever it might be, can't handle that. So MediaTailor actually has the ability to prefetch ads before an ad break and stagger requests to an ad server.

These are some of the really important considerations when you're adding monetization to your live stream, again whether it's linear ad replacement, overlays, squeeze backs, or any of the newer formats. There's a lot more considerations that you have to look over, specifically with the origin server, the origin server being the ad insertion provider. I want to hand it back over to Jamie for the summary.

Key Takeaways and Resources for Large Scale Event Delivery

To wrap things up, we wanted to aim this talk about thinking of ways that you can help you deliver those events successfully and at scale. We want you to remember these key things when you start planning for your next large scale event. So firstly, observability is critical, right? Whether that's dashboards, data collection through real-time logs and Kinesis, think about how you could use other data points such as CMCD and CMSD to also help you with your observability.

Architecture design can actually go beyond just the incode and the origination, right? We covered AWS Elemental and media services, but also remember how to leverage other infrastructure to help you absorb some of that traffic, such as Embedded POPs and Origin Shield. Security is also a really layered approach. There's not really one mechanism that fits all. So think about your end-to-end security strategy, covering tokenization, IP blocking, GA blocking, and using mechanisms like CloudFront Functions and AWS WAF.

And then finally, monetization can be more than just a subscription payment, right? You can use server-side ad insertion to help you monetize your content using services such as MediaTailor. So if you want to learn about or continue reading on some of these topics that we've discussed today, or want to deploy the samples, here are the links to them. And finally, if you're interested in leveling up your skills on network and content delivery, I thoroughly recommend you check out AWS Skill Builder.

So that concludes our session today. Thanks for joining us. Hope you found it useful to think about the next things you need to look out for those large scale events. Just a quick reminder, please go to the re:Invent app to submit your feedback on the session today so that we can help improve it. We'll be hovering around if you've got any questions for what we spoke about today, but enjoy the rest of your time here at re:Invent.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community