Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Bridging from POC to production: An intro to Amazon Bedrock AgentCore (AIM2204)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Bridging from POC to production: An intro to Amazon Bedrock AgentCore (AIM2204)

In this video, AWS introduces Amazon Bedrock AgentCore, a platform addressing the challenge that while many organizations prototype AI agents, few reach production. The session covers four key production challenges: performance (tool integration, memory, evaluation), scalability (concurrent sessions, state management), security (identity and access controls), and governance (auditing, policy enforcement). AgentCore provides modular, serverless services including Runtime for hosting agents with any framework or model, Memory for short and long-term context, Gateway for unified MCP tool access with semantic search, Policy for real-time deterministic controls, and Observability with Evaluations for monitoring and quality assessment. Mark Roy demonstrates multi-framework agent orchestration using A2A protocol and automated deployment via Kiro. Ericsson's Sarbashish Das presents a production case study where specialized agents reduced network engineers' research time from days to seven minutes by unifying fragmented knowledge across standards, code, and documentation, achieving 99% time reduction while maintaining system-level comprehension.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Prototype-to-Production Gap: Why Most AI Agents Never Make It Live

Good afternoon, everyone. Thank you for making the trek to Mandalay Bay and welcome to our session, Bridging the Gap from Prototype to Production. Over the last year, we have started seeing AI agents transition into autonomous systems that can reason, plan, and adapt in pursuit of user-defined goals, completing tasks on behalf of humans or other systems. Things like compiling research, remediating infrastructure issues, or even generating full stack applications from a single prompt.

The advent of reasoning models, agent frameworks, and open source protocols like the Model Context Protocol have made it easier than ever to prototype agents. As a result, we are seeing an explosion of agent prototypes across both startups and enterprises. But when we look at which ones actually made it into production, the number is in low single digits. Quick show of hands, how many of you have built or experimented with an agent in the last six months? That's a lot. Now keep your hands up if that agent made it into production and is running reliably today. See that drop off? That's the story we're here to talk about.

We're still in the early innings of the agent era. There's enormous excitement, enormous potential, but the value only comes when those agents operate safely at scale. I'm Vivek Singh. I lead product management for Amazon Bedrock AgentCore. I'm joined today by Mark Roy, tech lead for Agentic AI at AWS, and Sarbashish Das, who's the GenAI tech lead at Ericsson. Today, we'll cover the key challenges that developers face in moving their agent prototypes to production, and how Bedrock AgentCore provides developers the building blocks to cross the prototype to production chasm.

From Sales Assist Demo to Enterprise Reality: Understanding Agent Complexity

We also have a few demos for you to show AgentCore in action. And then Sarbashish will cover how Ericsson is leveraging AgentCore to power Agentic AI innovations. So let's get into it. AI agents are autonomous systems, so even a basic one requires multiple moving parts, such as orchestration, tool execution, state management, and error handling. That's a lot of wiring to get right.

Here is where frameworks like LangGraph, Strands Agents, and CrewAI help. They provide ready-made abstractions for behaviors like multi-step planning, coordination between different sub-agents, and tool invocation. That means developers no longer have to start everything from scratch. They can focus on defining the agent's behavior, while the frameworks take care of the lower level plumbing. So using these frameworks, you can spin up a coding assistant, a customer support bot, or a sales agent in a few days.

But moving that same prototype to production with thousands of users, reliability requirements, and regulatory compliance requirements, that's a completely different ballgame. Let's make that concrete. Imagine you build an agent, let's call it Sales Assist, that helps your sales reps close deals faster. It answers product questions, it pulls customer histories from Salesforce, it drafts personalized emails, and it can even suggest next steps based on prior deals. You can get that prototype up and running for one sales rep locally in a few days.

Now imagine deploying that across ten thousand reps in five different countries, each with dozens of daily conversations. Each agent accessing Salesforce, Confluence, DocuSign, your internal pricing APIs, all while acting under a rep's individual permissions. That's where the real work begins. Let's unpack what makes that transition so challenging. When you try to scale something like Sales Assist, four categories of challenges emerge fast: performance, scalability, security, and governance.

Four Critical Challenges: Performance, Scalability, Security, and Governance

Before you even think about scale, the agent has to be responsive, reliable, and accurate. It has to be able to reason well, reliably call multiple tools to take action, and remember pertinent facts from prior interactions. For example, Sales Assist might ace a demo with one or two tools, but in production, it's tens or hundreds of tools, each with its own API, schema, rate limits, and unpredictable failures. Turning all of those into uniform, reliable Model Context Protocol compatible tools becomes a massive engineering challenge.

Secondly, LLMs are stateless, but agents are not. So Sales Assist must remember the deal history. It must remember pertinent facts from prior customer interactions, such as the quote it gave to the customer last week. Without a proper memory layer, the agent will lose context, misremember things, and can even hallucinate the past state. And without continuous evaluations for things like correctness and helpfulness, the agent performance may even drift in production, leading to widespread customer impact.

So performance is not about picking a bigger and better model. It's the system around the model, the continuous evaluation, the tool integration, the persistent memory that keep the agents reliable and consistent. After performance, the real challenge is scale. Sales assistant production means thousands of concurrent agents, each making multiple tool calls, running long workflows, and handling sensitive customer data at the same time. So you need a secure, resilient, and elastic runtime that can scale up during peak workloads, scale down when idle, keep each agent session isolated, maintain state during long-running multi-step reasoning workflows, and quickly recover from failures like API timeouts or service disruptions when it's calling multiple different tools and services.

Then comes security. Agents act on behalf of humans, so every action must respect fine-grained identity and access controls. If Sarah in Enterprises is using sales assist, she should see a list of her Fortune 500 accounts. But when Tom in Startup does it, he shouldn't. The sales assistant shouldn't see the enterprise accounts. That's per user identity, not shared credentials. And doing that securely across thousands of agent sessions requires deliberate design.

Finally, you need visibility and guardrails to know what each agent is doing and to control unsafe actions before they happen. If a deal goes sideways, can you audit what data sales assist accessed, what email it sent, what pricing it suggested? Additionally, you need a way to enforce your business rules in real time. For example, never share unannounced features, never discount over 20%. So governance isn't something that you bolt on later. It has to be built in to your application from day one.

And doing all of these things essentially means building a distributed enterprise-scale system that can handle thousands of concurrent sessions, millions of tool calls, adhere to strict latency and reliability requirements, all while leaving very little room for error. And that's the difference between a prototype and a production system. Right now, developers have to do significant undifferentiated heavy lifting in building all of these pieces for a combination of a framework, model, and a set of tools. And as your business case evolves and you want to leverage newer models, newer frameworks, and newer tools, you start all over again. And that is what's slowing down organizations right now in realizing their agent tech vision.

Introducing Amazon Bedrock AgentCore: A Serverless Platform for Production Agents

To address these challenges, we launched Amazon Bedrock AgentCore. It's a generally available service. We launched it earlier this year. It's an agentic platform that provides a complete set of services, purpose-built to build, deploy, and operate highly performing agents securely and at scale. Let's look into some of the key characteristics of AgentCore. First, AgentCore helps you optimize time to value, because you don't have to manage any infrastructure. All AgentCore services are completely serverless and you pay for what you use.

For example, if you're using AgentCore Runtime, you only pay for the active compute and memory that your agent consumes. In fact, you don't even have to pay for the I/O wait time, which is the time your agent spends waiting to get responses back from LLMs and different tool calls. And in most agent sessions, I/O wait could make up 60 to 70 percent of the overall agent session time. Secondly, AgentCore was built with flexibility in mind. Each AgentCore service is completely modular, so you can use them together or you can use them independently. And each AgentCore service works with any framework or any model out there.

So if you built your agent using LangGraph or Strands Agent SDK or if your agent is using OpenAI models or Gemini models, you can still use any of the AgentCore primitives with your agent. And finally, AgentCore provides the controls, the access management, and the observability, which are crucial for enterprise deployments. So overall with AgentCore, you can accelerate prototypes into production with the scale, reliability, and security which are critical to real-world deployments, eliminating months of infrastructure work.

AgentCore Services: Memory, Gateway, Tools, Runtime, Identity, Policy, Observability, and Evaluations

Now let's look into some of the key services that AgentCore offers. First, to build highly performing agents, AgentCore offers a memory that helps developers build context for their agents. It automatically extracts and stores short-term memory across multi-turn user interactions, as well as long-term memory patterns such as semantic memory and user preferences across longer horizons and multiple sessions. AgentCore Gateway enables you to compose your tools and agents across your ecosystem into a single MCP server interface. With Gateway, you can combine different tool sources from your REST APIs to your Lambda functions to your existing MCP servers into one unified interface without managing multiple tool connections or implementing any integrations.

Agent Core also provides fully managed tools that are essential to most agenting workflows. Code Interpreter enables agents to securely generate and execute code in isolated environments, and Browser enables agents to interact with web applications at scale.

Then to deploy and scale agents, Agent Core provides a secure and serverless runtime that is purpose built to host agents as well as tools. Agents and tools could be built using any framework, any protocol, whether it's MCP or A2A or any model. Agent Core Identity simplifies agent identity and access management, allowing agents to securely access AWS services and third party services on behalf of users with pre-authorized consent.

We also launched Policy today, which was announced in Matt's keynote. This provides you real-time deterministic controls over how agents interact with your enterprise tools and data. You can define these policies in natural language. For example, block all reimbursement requests which are over $1000. These policies are evaluated and enforced in real time in milliseconds, so that every agent action is operating within the boundaries of policies that you have specified.

Finally, to operate trustworthy agents, we have Observability that provides rich visualization into each step of the agent workflow, as well as operational metrics like token count and latency via unified dashboards. It also emits data like logs, metrics and traces in OpenTelemetry compatible format, so you can plug it into the monitoring tool of your choice, such as CloudWatch, DataDog, LangFuse or LangSmith.

We also launched Evaluations today, which is a new Agent Core service that helps developers continuously inspect the quality of agent behavior so that you can catch issues before they cause widespread customer impact. Overall, with Agent Core, you get everything you need to take your agents from prototype to production, built on the AWS foundation that customers already trust.

Mark Roy on Runtime: Hosting Any Framework, Any Model with Secure Session Isolation

Now I'm going to hand it over to Mark to go a layer deeper into these services and show you a few demos of how they work together. Thank you, Vivek. Great introduction to Agent Core and I'm super excited to be here for my 8th re:Invent. My name is Mark Roy. I'm with AWS and I'm a global tech lead for Agentic AI. Although you may think I'm only 29 years old, I've actually been building for about 30 years now.

Vivek talked a lot about the chasm of production readiness. I really love that term. I'm wondering who here after that description feels like they're in that chasm of production readiness already. Not too many hands, but let me tell you, it is real. For the last two years I've been working with hundreds of customers trying to build and deploy agents and get real business value, and I see three things showing up every time. One: great looking prototypes with amazing capability that looks great. Two: compelling business value that everyone's excited about with all the potential. And then three: months and usually quarters of heavy lifting and pain in between.

So for the next 15 minutes I'll drill down into what are those real challenges and I'll go one level deeper into Agent Core and explain how these services help you solve those compelling challenges. With that, we've got a lot to cover, so buckle up and let's jump in. Let me start with the very most common and compelling challenge, and that is: where am I going to run my agent? I can't run my agent on my laptop and just share that out to users.

So you're an agent builder and you're thinking this through. You've got pressure to deliver some real value here. You know that your agent is going to be used by 10,000 users. You need to ship it now. You've got a security officer breathing down your neck saying you've got to make sure this thing isn't going to do the wrong things. And for most of you, you probably have got a platform team.

And they're thinking beyond your one agent. They're dealing with multiple different teams, and guess what? You're using LangGraph, there's some other teams using Crew AI, other teams using OpenAI, and it's a bit of a mess. If you look across these business units, they've got 1000 use cases lined up. So you've got a big dilemma here: how am I going to do this at scale and securely and not spend the next year getting ready?

That's where AgentCore Runtime steps in and gives you the first major component of your platform. Runtime lets you use any framework, any model, host your agents securely and at scale. You can scale from zero to thousands of concurrent sessions. It comes built in with the ability to host MCP, to use A2A for agent interoperability, OpenTelemetry for observability, and OPA for security. All of these things help you get that time to value where you can focus on building agents and not building infrastructure.

Another key point here is that although you're probably working on an intelligent chat experience, which is usually the first step, there's a lot more out there. There's voice agents, there's long-running deep research, and there's large payloads to worry about. This complicates what you're going to do about hosting. It's not as simple as just putting it into a Lambda. That sales assistant that Vivek talked about, let's say it's a multi-agent system. You've got one agent running with Cloud SDK using Cloud models, another one using LangGraph and GPT models, maybe a third one using Crew AI and Gemini with Strands agent on top of it doing orchestration using Bedrock models. All of this can be done out of the box with Runtime.

Literally, it's just four lines of code to take an existing agent and make it ready for AgentCore Runtime. Then you can deploy it to the cloud, and it's ready to scale and it's ready to be secure. A2A is built in, no problem. You've got full interoperability there. If you want to host MCP servers, you can pop your tools into AgentCore Runtime as well.

I talked about security. I'm not a security expert, and I'm kind of glad about that on some days. When you're building agents, you're using agents because you want them to be a little bit more autonomous. You're not wanting to build a hard-coded traditional application. Now, what about when you're dealing with sensitive information? That's my Social Security number there—don't grab it, please. You might be giving out credit scores, you're doing very sensitive transactions, and while your agent's being used by one user, there might be another user doing something similar with also sensitive data.

So I've got a question for you: who here, raise your hands high, is ready to stand with their security officer and say they're 100 percent sure that those conversations are secure? We've got one guy in the front row. I want to talk to you. This is a huge challenge. For me as a builder, this is where I wake up with a cold sweat. How am I going to deal with this? AgentCore Runtime does this out of the box: secure session isolation. No coding involved—it's just there. Safely execute those conversations. There's no risk of leaving temporary files lying around, no risk of getting permission escalations. All of that is taken care of for you.

You get multiple concurrent sessions, no problem. When the sessions are done, they're gone. This is using our micro VM architecture, so it's not just container-level isolation. It's literally a micro VM that ensures this is going to work, and it does so at scale and with low latency. This is something that I wouldn't recommend trying to build on your own. Okay, problem number one—let's put a checkbox on that. I think hosting with AgentCore has a pretty good story there.

Building Agent Memory: From Short-Term Conversations to Long-Term Personalization

What's the second biggest problem? In my mind, what separates maybe a toy agent from something closer to a real-world agent with real business value is whether it's able to learn, whether it's able to remember, whether it's able to improve. How do you make that happen? First of all, your agent better remember what's been discussed over the last few minutes of a conversation, and then ideally it's remembering what happened in the last few months as well.

This capability is called memory for an agent, and it needs to work reliably. Even if the agent needs to be restarted, that memory should still be there, and it needs to be secure as well. You could build this on your own with do-it-yourself memory. Many people have probably tried it, but I wouldn't necessarily recommend it. With AgentCore, you get a memory component out of the box.

What does that mean? Pick your favorite framework and just plug in AgentCore memory. When you have that, your events and conversations feed right into short-term memory. It has low latency, it has security, and it's there for you to have good conversations. More exciting than that is automatic memory extraction into long-term memory. What does that mean? All of these conversations get fed into a process that runs behind the scenes and extracts knowledge. It extracts facts, it summarizes conversations, and it even identifies episodes. As of yesterday, we have episodic memory now built in. It will actually look across episodes, reflect, and get insights. All of these get put into what's called long-term memory .

This long-term memory can be easily plugged into your agent. So now instead of your agent having a long list of conversations that it's trying to take advantage of, you can selectively pull out long-term memory, manage your context, and make your agent learn from the past. That's hyper-personalization, and that's making a real-world experience for your customers while doing so fully managed, serverless, secure, and plugging into any framework .

Gateway, Policy, and Built-In Tools: Making Agents Action-Ready and Compliant

We have covered two checkboxes: memory and runtime. What's the next big challenge? To me, an agent is pretty useless unless it can take action . You have heard Matt Garman and Andy Jassy say that data is your differentiator, and that's really true. You could just use an off-the-shelf LLM and get generic answers, but that's not going to take you too far. Your agent needs your data, your APIs, your services, and it needs your APIs to take action as well. That's the power.

Why is that so difficult? We have cool things like Model Context Protocol that showed up late last year and is now used everywhere. But how do you put those MCP servers together, and how do you integrate with all of your existing capabilities? That's a lot of heavy lifting, a lot of time, a lot of wrapping, and how do you make that secure as well? All of these things add up to a lot of work. So how does AgentCore help? We have something called Gateway that automatically lets you map those existing resources and surface them as agent-ready tools. It exposes those APIs as MCP, and those MCP tools plug into Strands and CrewAI, LangGraph, whatever you're using, securely.

Let's look at it a little more closely. With Gateway, you can create as many of these gateways as you would like . When you create a gateway, you can then add targets to that gateway. You can add a bunch of API endpoints in there, maybe throw in a few Lambda functions, get a couple of existing MCP servers that have been built or maybe third-party ones, and now you have a gateway. You can hand that gateway to an agent builder and say these are your tools. Or you could have multiple gateways, and they can discover different gateways. Basically, now you have agent-ready tools, securely at scale, to plug into your agents. This is a massive boost in time to value.

The last cool thing on Gateway is this: you started with maybe a lack of tools, but you can easily get into a situation where you have too many tools. Hard to believe, but pretty quickly you can see hundreds or maybe thousands of tools. Guess what happens if you give 1,000 tools' worth of MCP metadata to your agent? You don't want to know. It's not very pretty. We have a built-in search, so instead of handing all 300 tools in this case, we give you a built-in semantic search. As you add tools, they're automatically indexed. You can call this MCP search capability.

Now you can do dynamic tool selection. Instead of doing 300 tools, give a handful of tools that are built for that context. Now you've got faster, cheaper, and more accurate. It's pretty rare to get a triple play here, but faster, cheaper and more accurate, all from doing semantic search.

There's more. Today, in the keynote from Matt, as Vivek mentioned, we introduced policy for AgentCore Gateway. What's the big deal here? You can give permission for who can access the gateway, who can access tools. You can have requested tool calls and then which ones are allowed. But what you really want is something a little bit more granular because what's happening here is your agent has more autonomy and you're balancing the need for that innovation with the need for control. You don't want to risk getting on the front page of the Wall Street Journal with an agent doing a $100 million mistake.

So you want to not trust your agents to do the right thing. You don't want to trust your developers to code up the right controls. With policy, you're able to intercept every single tool call and apply policies that your administrators can put in place and attach to these gateways. It applies those policies on the fly with low latency. You can define these policies in natural language and they're verifiably correct, verifiably enforced, and it's deterministic. Although your agent is non-deterministic, which is a good thing, enforcing policies needs to be deterministic and that's what these policies end up doing for you.

Of course we've got observability because you want to know which policies were enforced, why they were enforced, why they get denied, why they get allowed and be able to audit that. This is keeping your agents in their lane, keeping them inbounds while still allowing innovation.

We've got a couple of built-in tools as well. Beyond just using gateways to get access to your MCP tools, what about the power of LLMs to generate great code? Why not be able to plug that into any agent and make it an instant data analyst? So here I'm saying in that sales assist app, how are my accounts doing? We can go grab some data, but are LLMs good at analyzing large amounts of data? Not necessarily. They're a lot better at generating code, and now you've got a secure sandbox that you just plug in as a tool, and your agent says, okay, I know where to run this, and you've got great results. You can generate visualizations, whatever you need actually, that the agent can generate on its own.

Secondly, although I mentioned earlier a lot of nice APIs and data sources, who here has a few legacy applications? Maybe they're built five years ago, maybe twenty-five years ago. They're still running mission critical processes. So we're not going to reinvent those in order to build an agent. Why not use a browser and automate access to those? LLMs are great now at interpreting screenshots, and then you can have your agent click at a certain field, navigate to a different screen, scroll down, scroll up, copy the data into another field. All of this is possible. We give you a headless browser, you just plug it in as a tool, and you're off and running.

Cross-Cutting Concerns: Security, Observability, and Evaluations Across the Platform

Now let's quickly cover three cross-cutting concerns. First one is security. It's all well and good to come up with a great agent, but there's table stakes here. It's got to be secure, and this is a challenging problem. User talking to an app, talking to an agent, talking to another agent, talking to tools that are internal and third party tools. You better get that right. You can't afford to let the wrong user get access to the wrong data or take the wrong actions. So this is pretty difficult.

An agent core built in to runtime and gateway takes care of inbound and outbound auth. On the inbound side, you use whatever identity provider you'd like. Maybe it's Microsoft Entra, maybe it's Ping or Okta or Auth0. Plug that in,

and we take care of who the user is and whether they are allowed to use this agent. On the outbound side, we can determine if this user is allowed to use this agent and get access to these tools. We plug in again on this side with your credential providers. If you need Salesforce data, want ServiceNow or Workday, Jira, or whatever it is, there is either an API key or OAuth credentials, or maybe IAM in some cases. We automatically let you configure those as outbound providers. Then we make sure there is secure access end to end, either on behalf of a user or on behalf of an agent autonomously. This is super important, and please do not try to do this yourself. It is scary and complicated under the hood. I have tried to make it look pretty easy right here.

Another table stakes piece of the puzzle is observability. Never trust your agents. Never put an agent into production without being able to know exactly what it is doing. You have got to be able to go back and find out exactly what it did. Here is a request. Here is the plan it put together. Here are the steps that it took, maybe it redirected and tried something else. These are the inputs that it passed to the tool, and this is what it got back. Maybe it got back an error and it retried. You need every bit of that. Auditors are going to come in, legal is going to come in, and they will ask why did this agent do this. You have got to be able to have full observability. We have got that across all of these services.

We also give you dashboards so you can easily see not only typical metrics like latency and error rates, and so forth. You can see the end-to-end visualization, a hierarchical timeline, the full trajectory, the inputs, the outputs. It is all there. And in case you were wondering, yes, we have OpenTelemetry. So if you want to use Dynatrace, DataDog, LangFuse, or anything that you would like, go for it.

Last, a cross-cutting concern. Arguably the most important one so far, drumroll please. How do you know how good your agents are? You have got to be able to measure whether they are doing the right thing. This includes whether they are being safe, whether they are being responsible, whether they are being polite, whether they have the right tone, whether they are giving back the right answers, whether they are making the right tool calls, and whether they are passing the right parameters. If you do not know the answers to these questions, you are not doing a good enough job for mission-critical agents in production.

So in today's keynote, we introduced AgentCore Evaluations. What does that do? In under a minute, you can go to the console, pick your agent, pick a set of metrics that you want to turn on, say where your traces are coming from, turn on a sampling rate, and then say go. Any traffic that shows up in that agent is going to get automatically evaluated. AgentCore Evaluations will evaluate each of those metrics on those traces and give you scores. And then not only will it give you a number or a label, but it will say why. It will tell you why it decided that was the score, you will see the reasoning, and that is all logged. You will get dashboards, and you can even do on-demand evaluations. This is huge and critical, and it was launched today in public preview in AgentCore.

This is the entire AgentCore platform. It will probably get bigger and better over time, but this is where it stands today. As Vivek mentioned, you can pick and choose here. These are composable services. Decide which ones you are having challenges with, kick the tires on that, integrate what you need, use any framework, any model, A2A, OpenTelemetry, MCP. All of the right acronyms are there. Let us see a quick demo here, and then I will bring you Sarbashis from Ericsson to dive in even further.

So here is the scenario. I have said any framework, any model, and I have said A2A, but if you are not sure if you really believe that is true, let us see it in action.

In this example, I have an orchestrator using Google ADK and Gemini. You can see a couple of sub-agents, one using Strands and Bedrock models, another one using OpenAI and GPT-4, and they're communicating using A2A. We're using A2A, we're using AgentCore gateway to access tools. We've got memory and observability. Here's the UI. We've got a JavaScript UI on the left showing you access to A2A agent cards. I'm handing it a task, asking it a question. It's actually delegating that to one of the sub-agents. It's using a different framework, different model, uses A2A to get there. It comes back with an answer. We're using A2A to have a conversation using multiple of those servers. At the end here we're showing agent core memory as well. So we're browsing, showing you the short-term memory, the detailed conversations were tracked, showing you the long-term memory where it learned some facts and it was able to surface those. With that, let me hand it off to Sarbashis Das from Ericsson to talk about AgentCore.

Ericsson's Success Story: Reducing Network Engineer Research Time from Days to Minutes

Thank you, Mark. Hello everyone. You just heard about AgentCore and how it bridges the gap between proof of concept to production. Now I'm going to show you how this looks like in practice. My name is Sarbashis Das. I am a principal data scientist and tech lead at Ericsson. Within Ericsson, we are building a number of agentic solutions across different areas. Today I'm going to show you one of our agenting solutions that reduces our network engineers' research time from days to minutes.

Let me give you some context about Ericsson. If you think about global connectivity, Ericsson is at the heart of it. Right now, as I speak to you, 50% of the 5G traffic throughout the world passes through the technology that we build. With such a scale, it comes complexities. The question is how do we keep our network engineers efficient in such a complex environment. In reality, our network engineers were trapped in knowledge silos.

Let me explain. A network engineer working on his features needs to look into thousands of different documents just to understand what this feature is all about. Then they need to find out what exactly this has been implemented in our millions of lines of code base that is distributed on hundreds of different subsystems. You can imagine this takes days. Not because the work is hard, but the information is fragmented, the knowledge is isolated. I can imagine that many of you have also seen this in your organization.

Our goal here is to fuse the information across the different knowledge silos to make a unified understanding of the system. Here is our solution. We have created a three-layered architecture to eliminate the knowledge silos. As you can see, at the bottom layers we have industry standards, we have our code base, we have product information. These are all fragmented information. We need to process this document in a way that it is ready to consume to the next layer, the knowledge layer. That's why we have developed an advanced data processing pipeline. In the advanced data processing pipeline, we fetch the data across different knowledge silos passed through GenAI processing pipeline.

The goal for the GenAI processing pipeline is to fetch, process the data, clean the data, and extract information from different modalities. Once this is done, the data is stored to different AWS storage depending on the data type. It can be AWS Bedrock knowledge base or it can be a graph database. Once the data is ready, it flows to the next layer called the knowledge layer. These knowledge layers consist of three components. We have specialized agents, which are agents that are experts in different areas. We are running a number of these types of specialized agents. Second, we have specialized models. These models are specialized in the sense that we have trained those models with our in-house data and also the code base. The next part are the developer's tools, which are the tools our network engineers use on a daily basis.

The knowledge layer is connected on the top with an orchestration layer where agentic orchestration takes place and communicates through the knowledge layer using MCP protocol. The key here is that we are not just connecting different tools in this layer architecture. Instead, we are enabling system comprehension—a system that understands the whole workflow of our network engineers. Let me show you how this system looks when you deploy it on production using AgentCore. This is our architecture that is deployed on AgentCore. Our network engineers interact with the system using natural language, just like asking a question to a colleague.

That is a very common question people are asking: what is a hard timer? This is a very specific telecom-specific query. The next step is Amazon Q Developer or the front end, which is an agentic system that interprets the queries and tries to identify what specialized knowledge is needed to answer this question. Based on that, it requests securely through the gateway to different specialized agents that are deployed on AgentCore runtime. AgentCore runtime gives us the flexibility to deploy any framework, any open source framework, and these are serverless.

In this example, you can see that we have 3GPP agents, which are experts in telecom standards. We have a code generation agent that helps our network engineers generate code. This agent is specially powered by in-house trained LLM. Ericsson builds its own AI, and the architecture is proprietary information that the self-coding models available today are not aware of. Then another example is the RAN System Design agent, which is an expert in designing the system and how we are configuring it within the Ericsson context. All these agents sit on a strong knowledge foundation as you can see on the bottom part in the knowledge layer, which has been created using advanced data processing.

The key here is that all these agents running on these AgentCore runtimes do not work in isolation. Instead, they collaborate with each other, they set the context, and they build their insights based on each other's findings. We are also leveraging other AgentCore services that make this architecture enterprise ready. For example, AgentCore observability gives us the flexibility to check at a granular level how our specialized agents behave and why. Identity also gives us a secure way to keep access control.

Now the question is how our network engineers use this system. You can see that generally, as I mentioned, they interact with it using natural language. This is an example question: Explain how hard timer works and elaborate on downlink design. Identify and list the main implementation components in the code. As you can see, this has multiple parts asking about hard timers and also implementation in the code.

As you can imagine, a typical RAG solution is not going to give you the answer that our network engineer is asking for. In our case with our agentic solution, first the RAN System Design Agent kicked in , and it is a deep research agent. What it does is it looked through hundreds of different documents, tried to identify what this hard timer is, and generate a deep research output.

This deep research output then fed to the next agent, which is our AI code search. AI code search is able to find out from natural language exactly where it has been implemented by looking at our millions of lines of code base. Once it is done, it passes on the output to the next agent, the Ericsson Silicon LLM , which is another in-house trained LLM capable of explaining the code.

As you can see, all these agents are not going to work in isolation, but they collaborate and share the context. At the end, the network engineer will not just get a chatbot response. Instead, they get a system-level understanding from the theory to the architecture to the implementation in the code, all connected, showing how the features work.

Let me show you quickly two examples of the output. This is an output generated from our deep research agent . As you can see, it is very comprehensive documentation, and it also generates images on the fly that help our network engineer understand the concept easily. This is another one where our Ericsson Silicon LLM explained the code , and you can see that it is not explaining just a syntactical explanation. Instead, it is trying to connect it with the concept.

Now imagine a network engineer started working on these features and had to do all these things manually: looking at the features, doing deep research on that, finding the implementation in the code base, and connecting everything. This takes, depending on their experience, three to five days. Using our solution deployed on AgentCore, we can get it in less than seven minutes . This is a 99% reduction in the research time for our network engineer.

We have built the system. The next question is how we evaluate this system. For evaluation, we are using a dual approach . First of all, using an automated script or framework, we can easily identify the consistency and correctness. But it is also important to evaluate through human experts for the accuracy and whether it meets our network engineer's expectations. By combining the human feedback that we have received and also the value it gives us, it provides us confidence in the reliability of the system.

While building this agentic solution, there are a lot of learnings for us, and let me share the important ones. First is perfect timing with AgentCore . We have been building this type of specialized agent for some time, and we reached a point where scalability was an issue for us. When we see AgentCore, it feels like it is exactly what we needed.

Next is involving the domain expert early. What I mean is that if you are developing an agentic solution, my recommendation is to involve your domain expert from day one. For us, the feedback that we received from the domain expert about how our network engineer works, what their behaviors are, and what their pain points are helped us design these specialized agents.

Third is the importance of unifying the knowledge. When you connect your data and make a unified knowledge base from industry standards, code base, and your product documentation, you get a system-level understanding. That overall improves the accuracy of the answers that you get and at the same time removes any ambiguities.

Last but not least, infrastructure matters even more than we think. Using AgentCore, our network engineers focus more on developing the agentic structure and how the agent will behave. All the heavy lifting is done by AgentCore. As a result, the development life cycles improve significantly.

With Agent Core, we have moved beyond just answering questions. We have developed a system that understands how our network engineers work and think, and we are excited about what comes next. With that, let me close here with a perspective from our organization head, Doug Limbo. By unifying the data and the information, Agent Core lets us build specialized agents that are scaled over our tens of thousands of network engineers. Thank you. Over to you, Mark.

Live Demo and Key Takeaways: Crossing the Chasm with AgentCore

Thanks Sarbashis. Since I was so great at demo one, I'll do a quick pass on demo two here. I'm just going to kick it off and not try to click any mouse. What this is trying to show here is that I get a lot of questions about how well, if I'm using a coding assistant, am I able to easily use Agent Core. Yes, what this is showing you is a scenario where I have an existing REST API with a URL to it and an API key. I kicked off a session with Kiro to say take that existing API, give me a gateway that will use the MCP, give me a Strands agent to use that, and then deploy that agent to the cloud, write a client, test that client, and even write a load test for me and run that. So I gave it some context. I gave it the URL and I gave it the API key. Now it's exploring the Agent Core command line to understand what's offered. It figures out that I can do this. I can create a gateway. I see how to do that. I know I have access to the API, so it probes that REST interface and extracts the OpenAPI spec. Earlier you saw the Swagger user interface there describing the API. It's able to download that, pops it into a JSON file, uploads that to S3, and now it's ready to create the gateway.

It creates the gateway and here on the left I'm showing you the actual API spec. It was able to retrieve that on the fly and again I still haven't written any lines of code. I haven't touched any command lines at all. I just said, hey, go do this for me please. The gateway is already created. Here we're just waiting for the DNS propagation to complete. It takes about a minute, so it's got it all set up and configured. In about another ten seconds here it'll be ready. Once it's ready, then it's going to add the API target to the gateway.

So what is that all about? It has to configure an OAuth authorizer. In this case I've told it we should use Cognito. That's the standard that we were using in this environment. It could be whatever you'd like. It looks like it's already done it now, and now it's on to testing. So I asked it to test. Given that it's standard MCP, just use HTTP and do a list tools, do an invoke tool on any of the available tools, and so now it's got the gateway API. Sorry, the API is now Agent Core ready. Now it's on to testing. So let's see what happens here. It writes a little code to try to test it out. Of course it's going to be successful, no problem. You'll see in a moment here it's going to list the tools. First it's getting an access key from the authorizer. You have to do a secure invocation here using the MCP protocol. And voila, there's all of the tools automatically coming back as MCP even though we started with just a REST API. So it's done that mapping. It's got it live. You can plug that into any agent and here we tested out invoking a tool.

So we did an MCP call tool on one of the tools to list customers and was able to do that. Here it's retrieving the details on a particular order. So we have all of the basics here of managing orders based on existing APIs and it's available as MCP now. So then it's creating a simple Strands agent. Any of these agent frameworks makes it easy to just plug in an MCP server. That's what it's doing here, and in less than a minute here, it's going to have a running agent. See, it's actually executing right now. It's actually listing servers, getting listing orders, getting order details. So now the Strands agent is already working.

We're only a few minutes into this job, and I still haven't done any work. That's quite nice. I like this. Now, what I'd like to do is deploy it in the cloud. It's great on my laptop, but I want it in the cloud so I can hand it to app developers and app builders. Kiro says, "OK, let me see what it takes." It checks out the commands and figures out that all I need to do is configure and launch. The configure step sees that I have the agent and knows how to package the code, upload it, and then create a secure endpoint for that agent. So that's what it's doing right now—packaging the code, getting it uploaded, and setting up the endpoint in the cloud. In another 10 seconds or so, that will be running.

Once it's running in the cloud, Kiro says, "OK, let me test the agent via the command line." We'll do an agent core invoke to prove that it works. Here it goes right now, it's doing an invoke. Let's see if that works. It's able to list the customers. You can get the details of a particular order, no problem. It has session management with secure session isolation. It can handle concurrency at scale. Just to prove that out, I ask it to write a simple client script to prove that you can use it from Python from Boto3 remotely. Then I had to write a little load test to spin up a bunch of sessions concurrently and prove that works as well.

So there's a nice little demo of not writing any code, taking an existing API, making it available via gateway, writing an agent, hosting an agent in the cloud, all with the magic and wonder of AgentCore. Let me give you four quick takeaways. First, business value only comes when you're in production. Maybe it's obvious, but there is that big chasm, and you need to address that in order to start getting real value out of these agents.

Second, know what your agents are up to. Never put an agent in production if you don't have a good mechanism for observability. You need all of the detail there, and you need an easy way to get access to that. Then find a way to iterate and improve your agents, as well as troubleshoot and debug when things go wrong. Third, security is not optional. Scale is not optional. You're not building toys; you're building production-ready agents to deliver real value. You've got to have security and scale figured out.

And then lastly, don't waste time crawling around in that chasm. Use AgentCore and get a smooth path right over that chasm from POCs into production-ready, real business value. So with that, here are some resources. You've got great documentation out there, quick starts, tutorials, workshops that are self-service or you can use our help with those. We've got a pretty robust repository with tutorials as well, examples of integrations, A2A, multiple frameworks. If you're looking to learn more about Agentic AI overall, there's a skill builder capability. And with that, I want to thank you all for coming. Hopefully this gives you a good idea of how to cross that chasm from POC to real production-ready agents. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community