Kazuya

Posted on Dec 6, 2025 • Edited on Dec 7, 2025

AWS re:Invent 2025 - Powering your Agentic AI experience with AWS Streaming and Messaging (ANT310)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Powering your Agentic AI experience with AWS Streaming and Messaging (ANT310)

In this video, Mindy Ferguson and Francisco Morillo from AWS, along with Chris Jackson from Olympics.com, explore how streaming and messaging services power agentic AI applications. They explain why real-time data is essential for AI agents, demonstrating design patterns using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Managed Service for Apache Flink for anomaly detection. Chris Jackson presents a compelling case study showing how the Olympics processes 3.2 petabytes of user actions during events, using streaming architecture to detect interesting moments in real-time. The session highlights their evolution from Lambda-based implementations to Amazon Bedrock AgentCore, which provides better observability, memory management, and developer experience. The demo showcases their Milano Cortina 2026 Olympics implementation, processing anomalies and generating AI-driven content descriptions with sub-second latency using multiple specialized agents communicating via Amazon SQS.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to ANT310: Streaming, Messaging, and Agentic AI at AWS

Good morning everyone, and welcome to ANT310. This is Powering Your Agentic AI Experience with AWS Streaming and Messaging. My name is Mindy Ferguson, and I'm a Vice President of Technology at AWS. I lead our data streaming, messaging, and collaboration teams, and if you want to know what that means, it means I get to lead great services like Amazon SQS, Amazon SNS, Amazon Kinesis Data Streams, Amazon MSK, AWS Clean Rooms, and so many more.

I'm excited today to be joined by Francisco Morillo, a Senior Streaming Solutions Architect at AWS, and we have an incredible customer speaker with us today. This is Chris Jackson, the Head of Digital Data and Analytics for Olympics.com. Now someone told me earlier this week, what do you mean the Olympics, like the Olympics Olympics? And that's right, the Olympics Olympics. We're eight weeks away from Milano Cortina, and so we're really excited to have Chris with us today. We're going to hear from both of them as we go through the day.

All right, so here's how today is going to go down. I'm going to start with some of the fundamental concepts of streaming, messaging, and agentic AI. And then, this is a 300-level session, so we're going to move very quickly to where Francisco Morillo is going to dive deep into some of our key design patterns. And as I mentioned, we're going to see a real-life use case that Chris Jackson will bring to life using data from the Olympics. And make sure this is not the session you decide to leave early today because there is a demonstration at the end of this presentation. And I can guarantee you, you will never look at the Olympics the same again. It's quite fascinating.

The Data Foundation Challenge: Four Key Obstacles and the Streaming Solution

These are really exciting times in the world of data. It should be no surprise that the data landscape is experiencing a fundamental shift as AI takes center stage. According to the Harvard Business Review, 89% of chief data officers say that their organizations are moving forward with AI initiatives. But more than half of those same chief data officers say that their organization's data foundation isn't quite ready for AI. Now at AWS, we work with a lot of customers, and we see that customers who are the most successful in their AI initiatives are ones who have gone back to the basics and have really worked to strengthen their data foundation. And as they do that, they normally find four key challenges.

The first is that team collaboration is often disconnected. You can have data scientists, analysts, and engineers working with different tools in different locations, and often they have to switch context, and that can hamper collaboration. The second is that data fragmentation is still a very real thing. Just like people, we have data in very different locations in different formats, and even within an organization, we can have data that's siloed, making it really challenging to create a cohesive data foundation that supports both AI and analytics.

The third challenge is that end-to-end data governance supporting AI and analytics requires using new tools and new approaches. And the fourth is that it's really challenging still to build scalable and performant infrastructure that can flex with the demanding AI workloads while still maintaining efficiency. We see customers choose streaming data technologies not only as a way to address those four key challenges, but they're also realizing significant additional benefits from using streaming.

The first is they can now capture and analyze data the moment that it occurs, and that means they can deliver instant insights and make reliable, predictable decisions. The second is they're able to use real-time data to help filter out unneeded or unnecessary data, and that's helping to remove costs. So now I can filter out unneeded data before storage and processing and lower my infrastructure costs. The third is that they're using real-time data to power AI with fresh, high-quality data, and that's driving automation.

Evolution from Rules-Based Systems to Agentic AI: Why Now Is the Right Time

And the fourth is that real-time data is allowing them to keep their systems completely in sync, and this is powering the most complex workflows. See, real-time data is the foundation for modern AI, and I want to walk you through this evolution. How many of us here have actually built a rules-based decision system? Yeah.

I think I've done this a few times in my career actually, and not that long ago, and they were great for predictable scenarios. But they are also rigid, and they require manual rule creation. So then we advanced over to machine learning where we were now able to not just classify or read rigid rules, but we could learn patterns from our data. And this changed how we were able to move forward.

The next leap came with generative AI, and with generative AI, now we're not classifying and predicting data, but we're actually able to create new content like text or images, and now even creating code, and this is fueling automation. But now we're in this world of agentic AI, and it's the most powerful evolution yet. See, agentic AI really brings it all together. Now we're able to drive real-time decisions. We're able to power machine learning for pattern recognition and create dynamic content with generative AI, but we're also able to power some of the most complex workflows with event-driven architectures.

At AWS, we define AI agents as autonomous software systems that can think, reason, and take action on behalf of humans or other software systems, and what makes an AI agent so incredibly powerful is its ability to think iteratively. It can evaluate results, adjust the plan, but it's always going to keep tracking towards the goal. And if you're thinking about an AI agent as something that just answers questions like a chatbot, that's just not where agentic AI is today. It actually solves problems through the process of exploration and refinement.

But why is now the right time? We've been talking about generative AI for a while. Why is now the right time for agentic AI? Well, there's really three reasons. The first is that almost every single week, and if you've been here at re:Invent, it's already happened this week, we've had so many new foundation models that come to fruition. We see it almost happening on a weekly basis, and it provides advanced reasoning capabilities, but it's also changing the economics of how we're able to use foundation models.

The second is that so many of you have been working over the past few years to get your data connected into AI systems and to make that data valuable, and this is a really key point because AI doesn't work without your data, and that's super key to keep in mind. The third is that we're finally seeing so many tools coming into the market that we're now able to democratize access to AI. And we're going to show you that a little bit later in this presentation with Amazon Bedrock AgentCore.

How Streaming and Messaging Power Agentic AI: Real-Time Intelligence and Asynchronous Communication

So far we've discussed the importance of data and how it empowers AI. I'm going to shift gears a little bit and talk about how streaming and messaging differentiates agentic AI, and I'm going to start on the streaming side because streaming has something that's really cool. We know that streaming is providing real-time intelligence, but the important part about intelligence is that you need to be able to get your insights the faster the better, and time really does matter.

But with agents we can form a closed loop process and this is really fueling the power behind real-time data. See, we can gather data continuously, we can process continuously, we can deliver insights, and we can act autonomously, and that closed loop process means that we're always able to learn from our past results, and that's what makes real-time data with streaming and agentic AI a powerful combination.

Now in the title of this talk today we talk about messaging services and so many customers say to me I didn't really understand how messaging services can work. How many of you use Amazon SQS today in your architectures? It's been around for 19 years and it's one of what we would call the OG services. Amazon SQS is an amazing service and it powers your ability to decouple your microservice architectures.

But here's the cool thing. That exact same pattern allows you to communicate asynchronously between agents. And we see it in almost every single agentic AI architecture because here's the great thing that all of you know, Amazon SQS scales without you having to think about it.

And so it can scale elastically to handle all of your demanding AI workloads, and it's a really incredible tool as you start to think about agentic AI. All right, we're getting really close to Francisco coming out here, but I want to talk a little bit about some use cases that we see. Most of us already know that agentic AI is being used to power customer experiences. In fact, I would hazard a guess that 90% of us today have had an AI-powered customer experience already, and if you haven't, I know you won't make it through the day without one.

But agentic AI is also being used in manufacturing and operations for anomaly detection, predictive analytics, preventive maintenance, and alerting. In fact, Chris is going to tell you a little bit later about how the Olympics is using anomaly detection, which is quite interesting. We also know that AI has been providing us some human benefit for quite some time, being able to speed up some of our productivity. But we see customers also using agentic AI to reduce human bias.

We have a customer who is in the insurance claims business, and they're using agentic AI to receive text, images, and videos. They're using agents to help process claims with a human still in the loop, but the claims agent is able to help them understand and provide an auditable tracking back to why they made the decisions that they made. All right, we're ready to dive deeper, so here we go with Francisco Murillo, and we're going to dive deep into some key design patterns.

Design Patterns for Real-Time Anomaly Detection with AWS Services

Thanks, Mindy. So now we're going to see some design patterns for building real-time data architectures with AI agents, being able then to empower your AI agents in order to respond and react to when events are actually happening. As Mindy mentioned, my name is Francisco Murillo. I'm a senior streaming solutions architect at AWS, and what that means is that I help customers build, design, and scale their real-time architectures using AWS services.

So for today, we're going to be evaluating a real use case, which would be anomaly detection. This use case is not really just bounded to any specific industry as it can relate to financial companies, gaming, manufacturing, and even e-commerce. So when we think, and as Mindy mentioned, organizations are processing large amounts of data. However, that leads to specific challenges that they may have during that process.

First is data visibility. Sometimes we're not even able to detect whenever there's an anomaly, if there's a spike or surge or drop in our transactions, user events, or specific information coming from our data sources. By the time that an anomaly has happened and we are able to realize it, it has already impaired our systems. Second, this can also then lead to operational inefficiencies. As we're not able to detect and react, we have then to prepare a response plan after everything has already happened and has affected our company or our applications.

And then lastly, what this can also lead to is limited context awareness. Sometimes even if we're measuring the right stuff, we're missing key information about what systems were impaired or how this actually relates to our business. Without context, being able then to react to an anomaly just becomes guesswork. So at the end, what we see is that this leads to increased operational costs, reduced customer satisfaction, and lastly, potential revenue loss, either through downtime or even just customer satisfaction in terms of how we're able then to respond and provide a service at scale.

So let's take a look at a high-level architecture. How does this actually then look behind the scenes? First, we'll have our data sources, which can be databases, it can be IoT devices, it can be transaction logs coming from a website, and we will ingest them in real time to a streaming storage. The reason for this is that we need to be able to process and store this data at scale as we then have many consumers being able to process, react, and in this case, detect our anomalies.

Once we have detected an anomaly, what we will do is use this with an agentic AI application in order then to provide the necessary context in order to make the right decision. The simplest use case could be once an anomaly has been detected, do some enrichment, do some research, and send an alert either to the specific teams that need to make a decision, but this could be blocking a specific user or even then making higher level or more important decisions.

But how can we then build this with AWS? Here is where AWS streaming messaging services come into play. With those, we provide simplicity, durability, and scalability in order for you to be able to react to your data. With AWS streaming and messaging services, customers can move at the speed of their data.

AWS Streaming Infrastructure: From Storage to Processing with Kinesis, MSK, and Apache Flink

Our services are organized around infrastructure, movement, and processing. First, we have for streaming storage our native cloud service Amazon Kinesis Data Streams and our open source managed service for Apache Kafka. With those, we're able then to process and store all of our information as it's constantly being fed in real time. With Amazon Data Firehose and Amazon MSK Connect, we can easily deploy and write data into our data lakes and into our data warehouses. And then lastly, for stream processing, we have our managed service for Apache Flink that provides the easiest way to react and process data in real time.

But now let's just dive a little bit deeper into each of those on how they can relate to our generative AI applications. So for streaming storage, as we mentioned, we have Amazon Kinesis Data Streams and Amazon MSK. Well, the reason for these services is that they provide durable, scalable storage for our streaming data, allowing us to write in parallel from many producers as well as reacting and consuming with consumers. By doing so, we're able then to provide continuous information to our agents to be able to have all the necessary data that they need when they need to make a decision.

We can use it to store our raw data before we do some processing, such as doing some embeddings or enrichment from multiple data sources, which later on then provides our agents deterministic reasoning. So if we go back to our anomaly detection use case, we have our data source, and once we have received all of those events as they're happening, we can then store them into Amazon Kinesis Data Streams or Amazon MSK.

So once we already have that data available, what do we do with it? Well, in this case, we process it, and now we need to be able then to detect and react to anomaly detection, for which we can then use Amazon Managed Service for Apache Flink. In order to provide real-time decisions and real-time enrichment of all of our data sources, one of the key benefits of using Apache Flink for stream processing is its broad set of connectors. So you're able then to not only process data from our streaming storage, but you're able to read from your data lakes, databases, and so many more. By doing so and enabling its built-in state, we're able to build our own context and our own prompts before having to activate or invoke our agents.

So going back to our architecture, we can then leverage Amazon Managed Service for Apache Flink to detect and react to our anomalies. We can do this either, as Mindy mentioned, as a rule-based system, so whenever we set a specific threshold, we do some aggregations such as number of views, number of users, and provide an anomaly event. Another way to do this, and as we see companies do so, is using machine learning. So using models such as Random Cut Forests or even then having some models be able to be invoked synchronously, we're able then to detect those anomalies in real time with the lowest latency possible.

Building Production-Ready Agents at Scale with Amazon Bedrock AgentCore and Multi-Agent Architecture

So now once we have detected our anomaly, what's next? Well, in this case, now it's time for us to build and deploy our AI agents. How can we do so? I'm going to ask a question just to see a raise of hands. Who here has heard of Amazon Bedrock? I'm not surprised. So Amazon Bedrock is the easiest way to build custom agents. We provide open source as well as our own internal models that you're able then to invoke through a single API. And as Mindy mentioned, as we're going through re:Invent, even more foundational models are being added now. Bedrock includes a new service or a new set of features called Amazon Bedrock AgentCore, which is the easiest and simplest way to deploy agents at scale.

So you might be wondering what's the key difference or why do I actually need to care about deploying agents at scale? Well, because moving from POCs and demos into actual production-ready agents is quite a hassle. We need to maintain repositories, dependencies, provide endpoints, as well as worry about how our agent is doing, and it can become quite challenging to understand everything happening behind the scenes.

All of these gaps are addressed with Amazon Bedrock AgentCore. It provides its own dedicated runtime for session isolation. It provides short-term and long-term memory for context awareness, so we do not need to worry about our agent forgetting what we were just discussing. This is quite interesting because we can provide extraction prompts so that we can tell our agent what key information from our session or from our invocations is actually important, and we can do so without having to manage any kind of databases. I would say that the most important feature that we see from customer stories with Amazon Bedrock AgentCore is its observability. With Bedrock AgentCore, we're able to actually monitor and track all of our agents, our sessions, our traces, and actually see how our agent is behaving behind the scenes and how long it takes to reason and iterate as it's making a decision.

So with that, we can then have our end-to-end architecture be able to detect our events and do some agent invocation that can be doing some research, actually verifying that this is an anomaly and sending an alert. But what we see that happens is that agents start focused and small. We have all done that POC that looks amazing. It's able to respond precisely to what we want, and in this case, we have just a set of tools, so it's quite specialized. We're able then to analyze anomalies, validate, and take some specific actions. That's amazing, but we then see some success and we start adding additional tooling. We may add some knowledge bases in order to provide more specific responses to specific anomalies. And then as well, we go from version 2 to version 3 with more knowledge bases, more tooling, and if we take a single agent too far, well, we won't be surprised that this will lead to challenges. Actually, maintaining that single agent can become a nightmare. We have to write specific prompts in order to avoid hallucinations, in order to detect or avoid having tools being misused. And in doing so, our agents will now be taking more time to respond. They will become more expensive, or we need to use frontier models in order to provide the actual response that we're hoping for.

So how can we fix this? Well, by using multiple agents. We can then have specialized agents being able to have a set of tools or a set of boundaries in order to just focus on their task at hand. But as we then move forward, we might be wondering how I can have all of these agents working together. There are specific protocols that allow you to do this, such as A2A. But what we would wonder is what happens if an agent fails? What happens if we are talking about scale and we need to have multiple agents working in parallel?

So how can we fix this? Well, Mindy actually gave us the answer shortly, which would be with AWS messaging services for multi-agent communication. And for this, we can leverage Amazon SQS or Amazon MQ, which would then facilitate agent-to-agent communication asynchronously and allow us to decouple our agents as they can then work in parallel. So effectively, what we're looking for, and we're all familiar with, is going back to moving from a monolith agent or an application to seeing agents working as microservices, being able to react asynchronously when an event or when the agent has finalized making a decision or sending a response.

Let's say we modernize our application. We can have our stream processing application write into a queue where we can then have Lambda functions invoke our endpoints or our specific agents, enabling them to make specific decisions and work collaboratively. But now, enough about AWS services. Now it's time for us to see how this actually looks for a real use case. So I want to introduce to the stage Chris Jackson.

Olympics.com at Scale: Managing Global Digital Operations and the Long Tail Challenge

There you go, Chris. Thank you. Thanks. So I'm going to talk to you a little bit about how we use this in the Olympics. To introduce myself briefly, I work on the digital services for the Olympics. Within that, I lead teams responsible for AI, artificial intelligence, analytics, and a bunch of data management topics. I've got a mixed background in tech and business topics, particularly around sport. And accidentally, I was the first person to talk about big data on Twitter, if anyone remembers what that used to be.

Paris in some numbers, some big numbers. Perhaps the one on the top left is a little less impressive to attendees of re:Invent. I think the conference this year is almost four times that size, but still, 24,000 media personnel to bring an event to the public is a pretty large grouping of specialists from one industry. They're using over 1,200 cameras. If we talk about venues, there are, I think, six venues in re:Invent. We had 39. And the 32 sports that are across those is definitely bigger than any other sporting event. Two is unusual, one is the normal. So we've got a huge complexity in terms of the scale of our operations.

The other thing that is really striking about the Olympics is its truly global nature. A lot of at-scale businesses can talk about a couple hundred countries and territories where there is serious activity. For me, the exciting stat is 91 countries that won a medal, so really engaged to the level where they can be in the top three in the world. Digitally, we're very proud of our numbers. In Paris, we hit 325 million users on Olympics.com and over 16 billion engagements on social media.

Behind the scenes, it's complicated. We're not one organization, and this definitely has an architectural impact in how we create things. Every host city is its own organization, every sport is its own organization, every national team is its own organization, and of course we have a range of other partners, our sponsors and our media rights holders like NBC who bring it to you at home. A lot of complexity there that we have to think very carefully about how to manage.

Let's look at numbers around data in particular. We have, in terms of the number of users at home who are coming to us, around 40 times what we would see in a typical month during the Olympics. And as you would imagine, they're a lot more engaged, almost four times as many actions. So that gives us a times 150 peak in terms of writes into our system, which is interesting to manage. From a read perspective, this is also significant. Although it's only a roughly one order of magnitude increase, doing that alongside the writes has an interaction which we need to be really careful with.

And that's because we have around three times as many data users coming and querying stuff, and as you can imagine, they're an awful lot more engaged as well. And every time they query, they're querying for a lot more data because a lot more has happened in the last hour, day, week, month, whatever it is that they care about. So we have to work extremely hard to be able to manage these peaks.

We have a saying in our team: batches cause calls. And the reason for this is, you know, traditionally you would architect this stuff with, I don't know, a batch that runs perhaps every day, maybe every hour, maybe you get it down to 15 minutes or five minutes or whatever it is that your system can run. Because of the size of the load and because of the variability of it, even if you have a super fast batch, that batch can be a super variable size and it ripple effects, creates provisioning issues all the way down through the various parts of the system.

So honestly, the only way we know how to manage this, and the way we've managed it ever since we had a data team working at scale, is using streaming systems, making sure that as the at-scale data sources come in, they're going through a series of streaming systems. And we're then managing in terms of, is there a bottleneck, is there a latency issue that we can handle, rather than starting to worry about availability or fundamental issues that get you as an engineer onto a call rather than typing in Slack and dealing with stuff as usual. So we feel like the streaming approach is essential.

So we feel like the asynchronous way that we handle data allows us as data people, particularly engineers, to also work in an asynchronous manner, and we're religious about this. We'll only have a batch for small data, and if we really can avoid it. And what that means essentially is that we scale up, of course, but we don't overprovision because we know we'll be in real trouble if we don't get there. We scale to a reasonable level, and if there's a problem, we scale further in order to make sure the data keeps flowing through, and that keeps our costs quite reasonable as well.

So I want to talk to you about a project that we ran in the Paris Olympics, and we're rolling forward from here to the future Olympics called Interesting Moments. The idea of this was to solve a very specific problem with AI. The problem in a few words, is that we have an extremely long tail in the Olympics. This is counterintuitive if you watch the Olympics in any one country, because you will see a relatively small number of stories, really well told for the audience in your country. But in reality, only about 20% of engagements is with the top 5 athletes, and that's on a sports level rather than as well as overall. So you've got almost a flip of that typical 80/20 Pareto rule, where normally the 80% matters most.

Interesting Moments in Paris 2024: AI-Driven Content Creation Using Streaming Data

So this means that we handle a lot of topics with a large team that creates content, that creates marketing messages, that iterates through, but we may well be able to address 4 times as many countries, and within that increased number of countries in each country, around 5 times as many sports. If we used AI to understand what those messages were and to go a long way towards helping, if you like, content workers with things like clip selection, social posts, writing articles, notifications, live blogs, and crucially with how we assign people to go and cover all of these different sports. It's a tantalizing opportunity, and we made some good progress in Paris. Let me show you a little bit about that.

So, just to visualize this, we made in Paris an AI video-driven map where you could punch into a venue from the overall map and see what was going on. This video has jumped quite a long way forward. Excuse me a minute, and I'm just gonna manually jump back on this video. Hopefully that will play from the start. Okay, hands off. So, as I was saying, map of Paris, we have some nice AI videos where we can punch into a venue and see what's going on. This was a nice way for us to visualize it, and it clearly is insisting on jumping. So I'll tell you what the video says, essentially, which is that you see as you go into individual areas, a really clear indication that any one sport is being watched by a large number of countries. And any one country is watching a large number of sports, and we're able to start generating already fascinating AI messages around all the things that were going on in that around Paris.

And the last bit you did see there in the video was about contextualizing it. So it's one thing to be able to take a moment, put that together and say this is really interesting. Lots of people are looking at it and yeah, there was a goal or a record or whatever happened in that sport. It's another thing to then be able to say this was their childhood dream, or this was as a result of a really challenging lead up to those games. So essentially we're using AI to understand the key moments, understand why there was a goal, and then understand why that was important, what the human story was behind it.

So Mindy mentioned that bringing your own data is a differentiator here. And that's something that we've done at scale in the Olympics, and I would 100% agree with that message. We have on the small side, over 130,000 stories, which we've got in an Amazon OpenSearch database. Play by play, it's a message every time something significant happens in any one of those 39 sports going on across a couple of weeks, we have 4.6 million of those events. This is interesting. It's in a legacy application, not super easy to deal with, but we were able to dump each of those messages into S3, have a trigger that fired off a Lambda and sent it through the rest of the process. So even with an application that is a little bit old, we were able to transform pretty easily into the streaming world and then handle it in the way that we handle pretty much anything else that comes from an application that perhaps we've been able to build relatively recently.

Last but by no means least, 3.2 petabytes of user actions across those 325 billion users during Paris. That goes straight into Kinesis and feeds into our streaming infrastructure.

So let's get a little bit more into the solution and essentially what we're doing with that huge amount of data in Kinesis. We're feeding it into a process to detect anomalies. When it gets over a threshold, we're then starting to look at whether there's an interesting peak, and every single one of those peaks is an anomaly. It's a message which feeds on down the pipeline so that we can turn it into a moment that we can use further. So here's a real demonstration of that. Here's our actual traffic, moment by moment, and this is in one country. We're able to jump onto one of those points and immediately get an explanation. The AI has written that text. The AI has worked out where the in and out points were.

This particular one was about Simone Biles helping the team to get to a point, and you see that moment there. Semi-finals do well for us, and here's the USA men's basketball team in a key moment where they were struggling to come from behind, and each of those points we're able to explain entirely with AI. That's great for our analysts to understand what the hell happened there that caused that peak. It's even more interesting for content teams to understand how we work with these things.

Okay, that was a lot of talking in a level 300 session without an architecture diagram, so I'll show you that now. What you see here is essentially what I've just described, put into components. So that 3.2 petabytes of user actions across the top, feeding through into the central branch. Similarly, the play-by-play into S3, into SQS, into a Lambda, and again feeding across. So we're triggering a pipeline either because something interesting happened or because there's a peak that we perhaps don't understand yet that suggests that our audience thinks something happened, probably.

That feeds through into a Lambda, and these are architecture diagrams that I'm seeing around a lot these days. Hopefully a lot of streaming services and a lot of those orange Lambda dots where people are implementing AI functionality. And that's where we were at for Paris. And that Lambda was doing a lot of heavy lifting in the middle of the diagram. So let me give you a really honest evaluation of what that was like.

First, we were super proud to have something which we could call agentic just about in July 2024. As far as I can see, the term was only coined in May 2024, so we did well in terms of having something which was really doing more than a chatbot and creating real intelligence over a couple of months. It worked at near real time, it worked at scale. The evaluations, both quantitative and qualitative, were extremely positive.

The challenges Francisco hinted to, well, he talked about it quite a lot, this idea that we've really got to create a good developer experience. And put simply, my developers don't like me very much already because this was very challenging to build and maintain. And the reality is we need to do more. We need to add complexity in terms of extra data and in terms of supporting a lot more applications to really get the business value out of this. So it's very clear that keeping on building this through Lambdas, maybe we split it up, is going to get us to some of these microservices problems that I'm sure a lot of you have experienced before. And we really, really need to avoid going to that place.

Milano Cortina 2026 Architecture: Upgrading to Agent Core with Multi-Agent Workflows and Live Demo

So we've got an Olympics coming up in Milano Cortina. It's a couple of days plus a couple of months away. The team back in Europe is working extremely hard on a rehearsal this week to get everything configured, to get everything working, to iron out the final problems. And what I want to do is talk to you through a bit what our approach is for Milano Cortina. This will be an approach we will then carry forward towards LA in 2028, not far from here in a big event in the summer. So this is a really key jumping off point for us, and we've worked very hard with Francisco's help and others' help to get to the point where we've got an updated architecture for this event.

So backing off a little bit from the architecture and talking about the conceptual, these are some of the main applications that we're going to have driven by AI in Milano Cortina. Let me zoom into two of those. We have a bunch of work to support people creating content with agentic tools that help them to do their job faster and better. And we also have this interesting moments application, which needs to understand the Olympics, essentially. There's a bunch of different functions, and you'll notice that a number of them are overlapping.

So the immediate advantage is that now we have a single agent pool that across applications is able to create a bunch of different agents that can be knitted together in different ways to do very different jobs across the enterprise that we run. That in and of itself is a massive advantage.

So here's our architecture. It's pretty much the same in the first bits. One important change is actually I showed that numbers chart to Mindy a year ago, and my comment at that point was we've got a big peak to manage. Streaming certainly helps us, but fundamentally we're having to provision all of these different services largely upfront through traditional provisioning, and that's particularly the case in Kinesis. We don't provision quite to the maximum peak we imagine we can handle, but we do need to provision upfront. We didn't feel that the on-demand solution was really designed, especially in terms of pricing, for the scale that we've got. So one change we'll make is to use on-demand at that first bit, which gives us the ability to scale despite the massive amount of volume that we're putting through.

Other than that, most of this is looking more or less the same until you get to that Lambda. It's still there, but it's now delegating most of its work up the top to the purple bit, to Agent Core. So let's zoom in and now we're just looking at what's going on within that purple bit. We've got here those interesting events going in at the left-hand side. We've got SQS knitting together a range of different agents and finally going out to the endpoint.

So there's an interest scorer agent which is looking very carefully to understand whether this anomaly is truly something that we should take seriously. Sometimes it might just be a coincidence that this has happened and we can't find a strong enough explanation for that level of anomaly to be able to carry it on through the process. Enrichment and summary then takes a bunch of different sources and makes sense of that moment, adding in context and describing it in a way that makes sense to humans and adding a few other parameters that help the workflows further down.

The place for us where it gets really interesting is around our challenge of getting out to many more different parts of our organization so that they're all getting benefits from this. What's quite beautiful is that we're able to have what we call channel decisives, which for the folks working in that channel implement the logic they care about. For some of our channels, we'll say if it's not a market of this scale, we unfortunately can't deal with it, so take out all of the markets that are smaller than that. Others will want to reprioritize in particular orders. Others may have particular times of the day when they can handle these things and we're able to organize it like that.

Others may want the information presented in a particular way so that they can copy-paste that into an application or even feed it directly in. So that used to just be Slack, honestly, a lot of automatic Slack messages, which is fine, but tends to involve further work downstream. Increasingly then we're feeding in directly to visualizations, directly to other systems so that we can create all of those things increasingly automatically further down.

So without any further ado, let's look at a demo. One word about this demo: every job I've worked in previously, if you're testing something out, you're probably testing it on something approximating production traffic before you put it into production so that you can see how that works. Our challenge is that our production traffic this month is nothing like the production traffic in February, both in terms of scale but also in terms of shape. So what we do with some modifications is replay the previous equivalent games, which in this case was Beijing just over four years ago. So you're going to see some information about Beijing, but essentially with the Milano Cortina implementation that I've just described.

So here we go, and I thought it would be interesting to look at a USA example. This is in the women's hockey versus Finland, and you see the traffic down at the bottom there. Let's dive in and look at the Flink application that is doing that anomaly detection first and foremost. If you use Flink, you'll be familiar with this interface already. Essentially what's happening is we're generating moments up at the top of that flow. In the bottom, we're doing a bunch of deduplication to make sure that even if it's a local peak, it's genuinely interesting. Flink gives you out of the box a bunch of instrumentation which helps you to understand how those various bits are working and make sure that you debug that pipeline.

So look, there's been a penalty. Fancy that. And if you look at the traffic down at the bottom, I think we probably will see a peak up in it because certainly something interesting's happened. And as if by magic, yeah, it's going up there. So here's the anomaly. We're about 40% up in terms of traffic on that particular moment versus what our expectation was. And you see the traffic going back down there again, but

a pretty clear signal that the Flink application certainly agreed with that something interesting is probably going on there. You do see another anomaly that's been created meanwhile, but we've created one moment. So this is the Agent Core flow now, taking that, deciding it's important, unlike the other anomaly, and summarizing it with a bunch of different fields that we can use.

Now there's been a goal, as you'd expect, and that's also an anomaly. So the system will process that, and while that's happening, let's look into the Agent Core part of it. So you see the various agents there that we discussed in the architecture diagram, and what you get is a bunch of really interesting information so that you can understand how well that's all running. Pleasing to see a 0% error rate at this point. That's not always the case, and some pretty good other statistics. As and when it tips up to see something different, we will notice that.

Memory's key. The way I look at this is it gives you essentially in-agent, out-of-the-box RAG. And it's particularly good when the agent is collecting a bunch of information with different tools, but then that comes into a local store that we can use immediately in our prompts so that it's contextually aware without us having to go implement a database connection and a bunch of RAG there. So very important in terms of how that works. For us, just to say that it's doing it between 100 and 200 milliseconds, so it's working pretty fast, and latency is super important for us. We want to get those messages out. You're pretty frustrated to see the anomaly without the description, and that's the same thing that our editors feel. Let's get that done as soon as possible.

So let's dive in a little bit deeper, and we see in some components that there is an error rate, something that we may well want to investigate. If we look at how fast this is running, we're going to probably start to see some issues in that as well. So let's go into one particular one, and yeah, that's running at just under 10 seconds. We'd like that to be a lot faster, and you're able to see this across each different session that's been run. So you've got the various different hockey games, so we can select the one that we want to look at. Maybe we had some feedback that one was running a little bit slower, and now we're diving into one particular trace.

So this is something you definitely don't get without a lot of coding in that custom-made Lambda. You can go in, you can look at the kind of graph of how it's going, and I love this graph because I can go in even as a manager and understand why something might be running a little bit slowly. Over on the right, you see an Anthropic call that's taking almost 6 seconds, so that may well be a particular part of the reason here. I suspect we're ending up using a frontier model, as Francesco mentioned, when we don't really need something that fancy. So another anomaly came through there. A goal. In fact, there was a goal just before we went away, a second one has come, and as you would imagine, we've got a really interesting description there too, of exactly how it was generated.

So, in short, this thing that we did in Paris, we've been able to upgrade significantly because we've got Agent Core in place as a framework that allows us to trace it, understand it, and make sure that we're creating an experience for our developers which encourages code reuse, gives a bunch of tools out of the box that just speeds everything up, no end. So that's Milan, but let's cast our mind forward to LA. I'm sure it will be of interest to a lot of people here, and while Milan and Cortina is super important to us as the next event, the real challenge is going to be a few years after that.

I think our needs are first and foremost streaming that gives us great peace of mind. We're able to manage these applications very smoothly. We do need to onboard and scale across a really complex range of organizations, and the regulatory part, especially with AI but also across rights, is a real complexity for us that we also need to find ways of managing. Alongside those three, if you like, core challenges that we think about, I think Agentic is a really interesting foil. It's giving us an approach that scales AI across many more applications without taking us to a place where it's slow and continuing to work in real time. Thank you very much.

Key Takeaways: Streaming as a Must-Have for Agentic AI and Resources for Getting Started

Thank you so much to Chris, not only for your partnership, but Chris was great in telling us last year here at re:Invent what his needs were, and so many times we talk with all of you as customers and we really do want to hear your needs. Kinesis On Demand Advantage is available to all of you, by the way, but it really does help with your scaling. And I'm super excited to see it powering the next Olympics in just a few short weeks. Here's some key takeaways that we want you to think about today

as you walk away from here. The first is that streaming data is not a nice to have in the world of agentic AI. It's actually a must. The second is that your data is what really makes the difference when it comes to any form of AI. The third is that the same messaging services that have powered your architecture for years allow you to asynchronously communicate between agents, and it's a wonderfully extensible pattern in your architecture. The fourth is that you can build and deploy agents moving quickly from proofs of concept into production with Amazon Bedrock AgentCore.

Now, if you want to dive a little bit deeper into agentic AI, here's a QR code to help you get started. We offer more than 1,000 free digital courses, labs, and immersive training experiences. I'm going to pause here. I still see some cameras up. If you'd like to learn more about agentic AI, streaming, and messaging, we still have some sessions here at re:Invent.

Now, before you leave today, I have a question and I have an ask for you. As part of your re:Invent app, you will get surveyed. I will tell you your survey feedback really does matter to us because we want to create content for next year's re:Invent that applies. So like your favorite rideshare driver, if we've knocked it out of the park today, please give us five stars, but give us some feedback. We really do appreciate it.

Chris, Francisco, and I will be outside of this auditorium. We're happy to take any questions. I really do appreciate all of you for coming today and for your tremendous engagement. Enjoy the rest of your re:Invent and thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community