Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Slack securely powers internal AI dev tools with Bedrock and Strands (AIM3309)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Slack securely powers internal AI dev tools with Bedrock and Strands (AIM3309)

In this video, Slack's developer experience team shares their AI journey using AWS services. They evolved from SageMaker to Amazon Bedrock, achieving 98% cost savings and FedRAMP compliance. The team built Buddy Bot for documentation assistance and later adopted Claude Code and Cursor for coding assistance, resulting in 99% developer adoption and 25% increase in PR throughput. They handle over 5,000 escalation requests monthly. The presentation details their transition to agentic workflows using Strands Agents as an orchestrator with Claude Code sub-agents, integrated with Temporal for workflow management and MCP servers for tool access. The architecture maintains conversational state, enables parallel sub-agent execution, and provides model-agnostic flexibility for future scalability.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Slack's AI Journey with AWS at re:Invent

Hello everyone. Nice to see you all here. I hope everyone got to Vegas safely and is ready for a great re:Invent. My name is Prashanth Ganapathy. I'm a Senior Solutions Architect with AWS. I've been with AWS for about five years now. I just completed five years, and I've been in the solution architecting role for about twenty years. My claim to fame is that I was doing AI/ML before it was cool, about three years ago, helping customers at AWS with AI/ML technology since I joined in 2020.

I'm excited to be here today and share Slack's journey with their developer experience group and their internal developers, how they used generative AI and then eventually agents over the last couple of years, and what kind of success they saw. That's what we'll be sharing today. With me are Srivani and Mani from AWS and Slack, and I'll let them introduce themselves. Thank you, Prashanth. Hello everyone. I'm Srivani Bethi. I'm a Staff Software Engineer on the DCP AI team at Slack. I've been with Slack for about seven years, and I've been on the DCP AI team for about three years. Today I'm very excited to share some of our learnings and journey in developing AI tooling.

Awesome. Well, thanks, Srivani. Mani here. I lead the strategic ISV accounts for Gen AI. I've been working with AWS for five years, and throughout this entire journey I've worked with some of our largest ISV providers and our product team. Basically, I help customers put their ideas into actual work and then help our product teams develop their roadmap. Consider me as a bridge between the customers and our product teams. Today I'm here to share all the good work that we did with Slack. That's what this session is going to be all about.

Before we get started, quick show of hands: how many of you checked your Slack before coming in? Exactly. Slack is where I would say the work truly happens. We get into conversations. This is where ideas turn out, they become decisions. Decisions become actions. Every idea that turns into action, the teams are talking and building in Slack every single day. Because of that, Slack can never slow down. They have to be fast. They have to be reliable, and they have to be secure, all of this at a scale as Slack continues to grow across different enterprises across the world.

As Slack continues to grow, innovation becomes absolutely critical. That's what I would say AWS comes in. AWS came in and helped Slack develop and build things faster. It helped unlock new levels of innovation. In this session we'll talk about how Bedrock became the foundation that powers and governs this next chapter for Slack. I love the quote from Andy Jassy which says that together AWS and Slack are giving developer teams the ability to collaborate and innovate faster.

Now, this is quick and easy what we're going to cover today. First, I'll share a little bit about Slack's developer experience AI journey, that's the Slack team using Bedrock. Then we'll talk about how Slack rolled out the code assistant tools, followed by the real impact because I think that's important to the developer community. Then we'll explore how Slack is now moving into agents and Strands, and finally we'll close with what's the road ahead. By the end of this session you'll see not just what Slack built, why they built it, and how they built it. I promise no pop quiz at the end, but hopefully you'll get some good ideas that you can take back and implement in your work life.

The Developer Experience Team and AWS Technology Stack

Now this slide is really about the people behind the Slack developer experience team, the team that built this integration. It exists to make the life of Slack engineers better. They bring AI closer to developers and they remove the friction from everyday work. While we use Slack for our day-to-day work, how do we make the developers much more agile and much more productive? This journey started with the Slack developer team building something called Buddy Bot to help build documentation and help these developers answer questions more quickly. Today this team touches everything. They build and touch the build and release tools. They have the testing infrastructure. They develop tools at scale. They're around about seventy to eighty people, and they are powering the entire Slack.

What makes this team particularly powerful is their approach: they start internally first by building something and putting it to use as a proof of concept or for internal usage. They then roll it out to a smaller engineering team, and once they prove it successful, they expand it to a larger audience. This team extends their impact further to the Salesforce organization and is a huge reason Slack can move fast and roll out changes very quickly without breaking things.

Before I jump into Slack's journey, Srivani and Prashanth will talk about how they implemented the solution. I wanted to quickly level set on the AWS stack that exists and that the team used to build this entire solution. At the foundation level, we have Amazon SageMaker and AI compute that developer teams can use to build, train, and deploy custom models, and they manage their own data and MLOps. Above this layer is Amazon Bedrock, which is a fully managed foundation model layer. This is where teams can move faster and experiment. Bedrock gives you a choice of leading model providers. It includes guardrails, has knowledge bases for retrieval augmented generation, and offers multiple flexible hosting options—pay as you go or pay upfront—all in the most secure way that you can scale very quickly.

We also launched AgentCore, which handles a lot of the plumbing underneath, like runtime, identity, memory, and observability. All of this can be done through AgentCore while the team focuses on the agent technology work. On top of these are our SDKs for agents, which are more like frameworks that we will talk about today, including Strands Agents, and we also have our first-party model called Amazon Nova. On top of all of this finally is the layer of applications like Hiro and Quick Suite, which you use if you really want to plug it into your application. That's the complete stack, and we will be referring to some of these as we talk through the presentation.

Timeline of Slack's Evolution: From SageMaker to Strands Agents

In simple terms, SageMaker is where you build models. Bedrock is where you can scale safely. Strands is where you bring agents to life, and your top layer applications is where you deliver. With this in mind, let's move on to see how Slack built this story. One of my favorite slides shows how Slack actually built this step by step. They started way back in Q2 of 2023, almost two years ago, with SageMaker. At that time it was all about learning. Generative AI was picking up, and everybody was excited, so they started experimenting and prototyping. One thing they decided with SageMaker was that Slack has a really strict requirement of being FedRAMP and compliant, and SageMaker gave them that option.

When it came to Q3 2023, they started with an internal hackathon, and this is where things got real. Our teams experimented, they built prototypes, and they even created things like huddle summaries, which you may have seen in Slack as part of this hackathon. This phase was all about proving the art of the possible and what all you could do with it. Then in Q1 2024, Slack moved to Amazon Bedrock because Bedrock was now FedRAMP compliant. All the latest Anthropic models are available on Bedrock, and infrastructure got easier. They really didn't have to take care of infrastructure with Bedrock. Everything was taken care of, and all the underfringed heavy lifting was done by AWS. They nearly saved 98 percent of the costs when they moved to Bedrock.

As they got familiar with Bedrock, in Q2 of 2024, they came up with their first bot called Buddy Bot. This is more like helping with documentation and helping developers find things more easily. The good part is that while they were building this, they started using knowledge bases, so nobody had to manage their vector stores anymore. Developers just got better on embeddings, they got better knowledge bases, and they got faster answers.

Now as they built Buddy Bot by Q1 2025, the developers being developers, they started asking for coding assistance. They said, "Can we go further? Can we build coding assistance?" And that is where Anthropic started experimenting with Cursor and Claude Code. Since Anthropic models were available and they were the foundation of all the things that they built, it was very easy for them to use Claude Code and Cursor, and they were adopted right on Bedrock.

Then in Q2 of 2025, Slack took a critical step to build agents because Agent Tech was the new way to access data through MCP servers. They didn't rush into it because they just wanted to spend time learning. Slack didn't get into analysis paralysis—they wanted to take it slow. They built their first MCP server and got into the foundation of building agents. Then in Q3 2025, Slack introduced Strands and the Escalation Bot, which Shivani is going to talk about in more detail. It's how they moved from Buddy Bot to Escalation Bot using Strands and agents.

Strands is an open source, multi-model agnostic, and flexible framework that we have. If you're on an agentic journey, we would love for you to use that, and we'll talk more about it. The biggest takeaway here is that Slack did not get stuck in analysis paralysis. They kept experimenting, they kept shipping, and they kept learning from it. So today they have scaled from processing a few hundred thousand tokens per minute to millions of tokens per minute. It's basically AI now instead of jogging, it's sprinting, all thanks to Anthropic's one million context window. They just moved from a few tokens to a million tokens in a minute.

Why Slack Standardized on Amazon Bedrock

Now the question is why did Slack ultimately standardize on Bedrock? I think the first thing that they loved is that it's a unified platform across AWS—one place to build, scale, and govern everything. Second, I think it is very important and it's job zero for us: the built-in security that we have within Bedrock, the guardrails that we have, and the governance that we have in place. Guardrails, security, and compliance were all built in as a part of Bedrock.

Third was the massive scalability. Slack isn't running one AI use case. It is running multiple AI cases across multiple different organizations with thousands and hundreds of thousands of employees, and all of this happened without Slack needing to worry about the infrastructure. I think that's the move from SageMaker to Bedrock really helped them. Now Bedrock let them focus truly on what matters: building an amazing developer experience and user experiences around it.

Now let's bring this home. Before I hand it over to Shivani in terms of impact, with AI-assisted coding using Cursor and Claude Code on Amazon Bedrock, Slack completely changed how fast ideas turned and were shipped into actual features. Quick show of hands, how many of you think the productivity would have gone up by 20%? There's one, maybe 50%. Is that too big of a number? So here's what we know for sure: it accelerated the developer productivity, it empowered the teams to innovate faster, and it massively reduced the prototyping time. But instead of guessing the numbers, I'm going to hand it over to Shivani who is going to show you the actual numbers and the real results behind this infrastructure.

Measuring Impact: 99% Adoption and 25% PR Throughput Increase

I think one of the measuring impact questions is: how many of you here have some kind of metrics to measure AI impact on developer productivity? Quite a few. I think we're all talking about AI, but I think one of the hardest problems and hardest questions to answer is: is it actually helping? I think in order to answer that, we need to know what to measure, how to measure. So we initially started with two foundational metrics.

The first is AI adoption, which means engineers are adopting AI tooling. This is a first sign that it's helping them relieve pain in one of their workflows. The second is impact on developer experience metrics like DORA and SPACE metrics. To measure these metrics, we needed data from multiple data sources, so we used OpenTelemetry and host metrics, which we plumbed into all of our AI tooling to get the usage metrics and information about AI tool calls. We also measured metrics from GitHub, our source code, including pull requests and commits that are co-authored by AI and have some kind of AI signature.

These metrics helped us get not a perfect metric, but overall a good estimate of how developers are impacted when using AI. Let's talk about the overall developer impact here. With all of the tools we have rolled out, we have seen consistent week-over-week adoption rate increases, and we've also seen consistent usage of these tools month over month for a few months now. Currently, 99% of our developers are using some kind of AI assistance, which is huge.

Once we got the adoption numbers we were looking for, we started looking into developer productivity metrics. We started looking at PR throughput and observed that in some of the major repositories we have, we're seeing about 25% consistent month-over-month increase in PR throughput. There are also other metrics we are looking into. The other metric is AI bot assistance, which was just discussed. It's a bot we rolled out to help our engineers with knowledge search and also help them with escalations. Escalations at Slack happen in Slack. When users have questions, they come into escalation channels and escalate them to the appropriate teams. This was causing a lot of on-call fatigue for our engineers, so we rolled out AI assistance to help engineers ease the on-call pain. Currently, our AI Assistance Bot is helping over 5,000 escalation requests per month.

The final and most important metric is the qualitative metric, which is direct developer feedback. The feedback we've been receiving confirms that these tools are actually helping developers. Of course, AI is not perfect. We are also seeing downsides, including an increase in peer review time. As AI is helping engineers write more code, the surface area for review is increasing and causing more load for developers. We're actively working on this area to reduce the review time with AI assistance. We are hoping to implement AI to ease developer pain across the entire developer cycle.

Overcoming Experimentation Fatigue: Bedrock's Transformative Benefits

We have seen the metrics: 99% adoption and 25% PR throughput increase. However, the path to get here wasn't a straight line. Like many of you, we started our AI journey with experimentation about three years ago. We built our initial capabilities using SageMaker and other tools. This gave us maximum control, but it also came with a huge hidden cost of infrastructure maintenance. Our breakthrough came when we adopted AWS Bedrock. This wasn't just a technology change; it was also a philosophy shift for us. Bedrock instantly simplified our infrastructure, handling all of the LLM scale and infrastructure maintenance for us. This change immediately addressed our critical success factors for adoption, like security. Bedrock allowed us to keep everything within our AWS accounts secure. It also enabled adoption by making LLMs available through a proxy API, which allowed us to provide LLM access to all of our developers to experiment with AI.

Another advantage we got with switching to Bedrock is observability. The built-in native observability of Bedrock with CloudWatch logs, metrics, and alerts helped us gain insights into our LLM usage.

Through our journey, one of the main challenges we faced was experimentation fatigue with different LLMs and tools. The AI landscape is changing so fast, and we are struggling to keep up. We realized that constantly rolling out new competing internal features was only causing confusion to developers and also creating maintenance overhead for us.

To combat that, we doubled down on a high-impact tech stack: Amazon Bedrock and Anthropic models and tooling. We drove adoption by integrating tools like Claude Code and Cursor with Bedrock. In fact, I think we were one of the first companies to roll out Claude Code early in Q2 of this year. The goal here was to create a seamless experience that maximizes throughput and reduces decision fatigue for our developers.

Moving Beyond Ad Hoc Workflows: The Case for Agentic Architecture

Now I'm going to hand off to my colleague Prashant to talk about agentic frameworks. Those were some insightful learnings and impressive statistics from Srivani. As you all saw and heard from Srivani and Mani, Slack started their journey in AI developer tools using Amazon Bedrock and saw some impressive early success. But I'm here to talk about the next stage in the agentic journey. This is the year of agents.

How many of you here are exploring agents in your organization? Quite a few. And how many of you have agents running in production? Some of you, maybe we should talk to later and learn from you. So the key questions are: why agents? Besides the obvious answer that everybody else is doing it and we have to do it, for Slack there were some key questions.

As they were using these coding assistant tools and using the LLMs through APIs, a lot of their actions were ad hoc workflows. For example, there's an issue going on and they would take the logs and dump it into Claude Code and ask what's going on, and Claude Code will answer it. But that's an ad hoc workflow. Now imagine on-call engineers are getting lots of these requests. They want automated runbooks to be running.

They want to take that to the next step where agents are processing what the ask is, choosing the right tools, making those decisions, and then doing the analysis and remediation. So moving from ad hoc to automated workflows was one of the reasons, and they were like we're already doing this, we should just extend it. The second part is that with just LLM calls and doing ad hoc flows, it's not doing any complicated reasoning. It's doing LLM retrieval and then maybe some post-processing, but it's not doing any complicated reasoning. It's not planning. It's not adapting.

That's another reason to go towards agents, especially in their environment where they are fixing things on the fly and there are issues happening and they have to react to it. Also, Slack has built a lot of tools and data sources on top of AWS services which they use effectively for their data pipeline, for their CI/CD build, for collecting logs, all of this. So in order to take advantage of all of this in a dynamic fashion, they would need to build some sort of standardized access to this.

So agents would work with something like MCP, the Model Context Protocol. If you haven't heard of it, you'll most probably hear about it this week. Using a standardized protocol to build out connections and being able to dynamically use these tools and data sources is another reason to build agents. These were the key reasons why they were already doing this in an ad hoc fashion and decided they should standardize this and automate it. That was the key reason for building agents.

So they were heavily using Claude Code already. One of the key features about Claude Code was that they were adding a lot of agentic capabilities, especially in the latter half of this year. Features like Claude Code subagents and planning capabilities have come out, and now skills have been released. They're adding agentic capabilities and they built an SDK, which is an agent with an SDK that you can use effectively as subagents for various tasks. Instead of building agents from scratch that can do specialized tasks, which is very complex and hard to perfect in production, they started using Claude Code subagents for a lot of these specialized tasks they were running into. That was the first part they did.

The second part was to access the variety of tools and data sources they have. They started building out their own MCP servers, and they also learned from us. We built an MCP server for EKS, and they learned how to use that for some of their use cases. Being able to standardly access all of these tools and data sources without having to think about which API to use was the goal. They wanted to standardize it. Finally, they were looking at various agentic frameworks for integration, and that's where Amazon came in. We'll talk about Strands Agents as we go along, but that's how they started exploring agents.

So what are they doing right now? One approach is instead of taking a giant leap and building a super agent, they're taking existing workflows which they built with LLM integrations and enhancing them to add more capabilities using these agentic workflows. They're also exploring new use cases in DevOps environments as well as incident management. These three steps might seem small, but they're making progress and putting things into production, which is a very key step. A lot of times we see customers getting stuck in analysis paralysis, and they're moving forward with these small steps.

So the key question when I was talking to Srivani some time back was this: Claude Code and subagents are so powerful and you can create automations with them pretty easily. It's SDK based. So why look beyond something like that which is so powerful and meeting most of their needs today? That's a key question to ask. We should not just adopt a new technology just because it's there. It should serve a purpose.

We discussed that and there were key reasons. One is that it's a great tool, but it can get expensive. It has its own system prompt and you can prompt it with your own user instructions, but it can get expensive and it can be less predictable depending on what you ask it to do. Also, for Slack and for everybody, you should be model agnostic. Today it's Anthropic, tomorrow it could be something else. We are so early in this journey that we don't know what's going to come out, so don't get locked into one technology. Model agnostic is another part of it.

Right now people are in exploratory phases, so cost is not so much of a consideration. But as you roll this out into production and usage goes up, cost will become a big factor. You may want to say, for this specialized task, why am I spending so much money? I want to use a cheaper LLM and point to that. If you get stuck with Claude Code, you may not be able to do that. The other part is an idea which Srivani will talk about: the idea of the orchestrator agent. Claude Code has the ability to have its own orchestrator, the planning and thinking capability, and then it can direct its subagents. But now you're all into Claude Code.

What if you abstract it out? What if you abstract the orchestrator away from Claude Code and use what it's really good at, which is the subagents doing specific tasks? Once you do that, now this orchestrator agent which you have built from scratch using an open source technology, you are able to point to Claude Code subagents today, but you can also point to something else tomorrow. That's the key. Abstract it out and control what you're accessing, keeping that within your framework. That was another reason.

Introducing Strands Agents: Open Source Framework for Production-Ready AI

So finally, by doing all of this, you can create a model-agnostic agentic framework which will future-proof your production deployments. That was the key. So we went through this discussion and then we moved into the journey of the agentic world. That's where Strands Agents comes in. I'll talk about Strands Agents and what it is, but before we dive into Strands Agents, I wanted to discuss why Amazon builds Strands Agents and open sourced it.

While the potential of agents is exciting and everybody wants to build agents, I've tried to build agents, and we've all been trying to build agents. At some point you realize it's more complex than you think it is. If you think it's resulted in being more complex than you thought it was, raise your hands. Many of you have not raised hands, so I guess it's simpler than we think it is.

A key challenge that developers face with reliable agents is a steep learning curve in building agents. While we worked with some of our customers, we saw that they could make simple use cases work. But as it got complex and new features and technologies are coming out every week, it's very hard to keep up and decide whether something is good enough or if you should wait for that next thing to come out. That was happening a lot.

Enterprise and production readiness is another challenge. As you build stuff, most of them will work great in POCs and demos. But as soon as you take it to production, there's a whole other set of criteria that you have to meet, and that takes a much longer cycle. We saw that happening frequently.

Complex orchestration logic is another issue. A single agent or an orchestrator calling one or two agents with a couple of tools works great. But as soon as it goes to thousands of agents, if your goal as a company is not just to build agents, it becomes much more complex. We'll talk about some multi-agent patterns, and it does get complex at that stage. For production, it can be challenging.

There's also a lack of visibility, lack of controls, and not enough flexibility. These are common challenges in most distributed systems, and you see that with agents as well. Since this is a really early stage in this technology, we wanted to keep it in a framework which is open source. That's why we open sourced Strands Agents.

Strands Agents is open source. Initially we released a Python SDK, and you may hear some announcements at re:Invent. The Python SDK is for building agents with just a few lines of code, and I'll show you some examples as we go along. It's simple to use and eliminates the need for complex agent orchestration. It's code for a solution built keeping builders and developers in mind. They can define a prompt, select a list of tools, and then select the LLM and let it go. That's how easy it is.

By open sourcing, we aim to provide developers with powerful and flexible tools to build agents in the rapidly evolving agentic landscape. Now that I've introduced Strands Agents, what are its key features?

One is model and deployment choice. It's open source, and while the default LLM is Amazon Bedrock, you can choose any third-party or custom providers. We keep adding more to be used as the LLM as part of the agent. We're not restricting you from choosing the LLM of your choice, and you're also able to deploy it anywhere in any production agentic framework. We're not restricting that either.

It's highly flexible with built-in guardrails. It connects to AWS's guardrail features, but also to other guardrail features externally. It has built-in native observability and monitoring with metrics that you can stream out. If you use AgentCore, it connects into AgentCore automatically with these metrics, so it's very easy to get visibility and traces of these complex agentic flows that happen.

Another feature is MCP integration. Model Context Protocol has sort of become the industry standard as we move forward to connecting to data sources and tools. We provide that as well. There are a lot of built-in tools in Strands Agents itself that you can use for many tasks. You can add custom tools. We have integrations with Mem Zero, Raga, Stavili, and Temporal. There's a Temporal and AWS open source session happening somewhere else at re:Invent that you should try to attend. We're integrating a lot of these third-party services to make it highly flexible.

These are some of the broad capabilities that Strands Agents has. Before we go deeper into how Slack used Strands Agents to improve their agentic workflows or add agentic workflows, I did want to talk about the multi-agent patterns that exist within Strands Agents.

You can see four of them. I'm going to talk about the swarm and the three to the right of them, and then I'll come back to agent as tools. So swarm is, as you can imagine, collaboration between multiple agents. There are communication patterns, shared memory systems, and coordination.

A graph, as the name suggests, has agents as nodes and the ways they communicate with other agents as edges, where you explicitly define how they communicate. You can build out a graph workflow pattern. The final pattern is the workflow, which is a structured way of defining how one agent performs one task and passes it to the next agent and so on. These are three patterns that we have. The fourth one on the left, agent as tools, is the most interesting to us right now because that's what Slack is using.

I mentioned the orchestrator agent, and we're using the orchestrator agent in the agent as tools pattern, which handles user interaction and decides which specialized tool to call. These tools could be other agents, which is why it's called agent as tools. The specialized agents perform those tasks and then hand back the answer to the orchestration agent to decide how to answer back to the user. As Srivani will describe later, you'll see how they use Strands as the orchestrator agent and specialized agents in Claude Code sub-agents to perform the specialized tasks.

Before we get into the architecture discussion that Srivani will talk about, I want to discuss what Strands agent is. It's a very simple concept. Strands is what it calls an agentic loop that forms the core of its functionality. It receives a prompt and context along with a description of the available tools. Then the model reasons about the tasks and decides whether to respond directly. If it can respond directly, it doesn't need the tools. Otherwise, it plans a series of steps, reflects on previous actions, selects the tools it needs, and executes one of these steps.

Once it gets back the response from the tool or whatever it asks the task to do, it decides whether the task is actually complete, or else it repeats the cycle again until the task is complete. That's the basic nature of Strands and how you create a Strands agent in the multi-agent pattern. Before I hand it back to Srivani, I wanted to leave you with a couple of simple examples of creating Strands agents with model choice, and then I'll show you the tools example so you can see that with a few lines of code we're able to create an agent which uses the default choice, which is Bedrock in this case, and a Nova model to create the agent and ask that question.

Similarly, you can attach tools. There are a whole host of tool categories available, and from these tools, you can attach them to the agent and create a simple agent where in this case it's using the HTTP request tool to ask a specific question. These are simple examples, but you can see that to get started, very few lines of code are needed. I wanted to highlight how easy it is to get started with Strands. Hopefully you got a good understanding of why Slack went towards agentic workflows, why they selected Strands as an exploratory area to move forward with, what it is solving, and also learned a little bit about Strands. That was my part of the presentation, and now I'll hand it back to Srivani to take it home with the technical deep dive starting with the Buddy Bot, how they enhanced it with Strands, and then we'll get to Q&A. Thanks.

Technical Deep Dive: Building the Escalation Bot with Strands and Temporal

I'll talk about the evolution of our Buddy Bot technical deep dive and using Strands. Our story starts with the fundamental pain point of engineers spending a lot of time on escalations. The initial Buddy Bot architecture was designed to handle basic escalations by using knowledge spread across different data sources. The data sources we're calling here include Slack data, Slack messages and files, and also some of the data scattered across our GitHub repositories, like technical design documentation.

The first thing we did was perform a hybrid search to gather the right relevant information across all of these data sources. Then we reranked those data to get more accurate or more relevant data across the knowledge sources. We then provide the top most relevant documents to the LLM with the user query to provide a more accurate answer back to the escalation.

This was our first design, which was working great. However, we ran into challenges with the initial design regarding maintaining the conversational history and executing external actions. We then evolved into the newer version you see here, a powerful agent that we have built and are exploring. It begins when a user sends a message and our backend receives an event. We start a Temporal workflow orchestration around that, which provides durability and maintains the conversational state across the entire escalation until it is resolved. This relieves the pain of maintaining conversations for all of the escalations in our applications.

The Temporal workflow then calls the main Strands orchestrator. The Strands orchestrator agent, which we built using Anthropic and the Anthropic Claude model, decides which sub-agents to call. The sub-agents have access to MCP servers to interact with our internal services. All of the sub-agents you see here we have built using Claude Code SDKs. The orchestrator agent is Strands and the sub-agents are Claude Code. We chose Strands as the orchestrator to explore different LLMs, not just use Claude Code sub-agents, as we are actively exploring other competing models.

Once all of the sub-agents finish running, the main orchestrator agent receives the context or response back, synthesizes and processes it, and validates the response before sending it back to the Slack channel. Let me show you the flow here. We are going to see a live demo of how this happens in Slack. When a user sends a question or an escalation in a Slack channel, our backend gets an event. Our backend then spins off a Temporal workflow which you see here. This workflow has the context of all the Slack conversations or escalations that happen in that Slack thread. It then kicks off our orchestrator agent. Which is written using Strands. Once the orchestrator agent receives the request, it actually spawns sub-agents through tool calls. As you see here, it is calling the triage agent and KB agent. Those are all sub-agents we built using Claude Code. The good thing with Temporal is that it also provides visibility into all of the calls and traceability as well.

Once the main orchestrator agent processes everything, it sends a response back to the Slack channel. When a user asks a follow-up question, all of the context of the previous conversation is actually maintained by Temporal, which eases the pain on the application itself to maintain the conversation history and states. As you see, Temporal resumes the workflow whenever a user continues a conversation in a Slack thread. It just resumes the same workflow. Once it finishes, the response is sent back to the customer. This workflow architecture simplified our code quite a bit, as we did not have to maintain the conversational history and durability of retries. All of that is provided by Temporal. With Strands, we were able to experiment with different sub-agents and were not stuck with just Claude Code.

As you saw in our overview of the technical architecture, we upgraded our Buddy Bot from a simple search bot into a powerful agent. Some of the things we considered while building it are reliability and efficiency. First, we built a stable foundation. We used Temporal for reliability, as the bot never forgets the conversation, even during failures. Even when the backend dies, Temporal maintains the state in a database, so it resumes where it left off. Temporal also supports automated retries, so we didn't have to retry tool call failures in our application, which simplified our code quite a bit.

Next, we solved a crucial security challenge. We created remote MCP servers with OA service, which integrates to our Uber proxy, a networking system. This ensures the bot can safely access sensitive internal systems like GitHub with the right permissions. Finally, we focused on making the bot faster, so we ran all of these sub-agents in parallel. We also optimized token usage management, so the strand subagent summarized each response from the sub-agents before sending it to the LLM to summarize and confirm the response, reducing token management when sending it to very expensive LLMs. Strands also provided us extensibility for any future being LLM agnostic.

So the road ahead: we've stabilized the architecture and solved the security problems and optimized performance, but this is just the foundation. Our vision is much bigger than a single escalation bot. We're on a road to establishing fully automated agentic workflows across the entire development cycle. We would like to experiment with Strands use cases beyond escalation and integrate more internal tools to make our bots and agents more powerful. We're also exploring AgentCore and would like to have native integration with Temporal and Strands Agent for smoother execution and more granular retries. Our long-term goal is simple but ambitious: fully automated agentic workflows across the entire development cycle.

Here are some of the links for you all to get started building agents. Thank you again, and I'll keep it here for a minute and then we can take Q&A after that. If you want to get the QR codes, I'm sure you'll run into a lot of these links throughout re:Invent. Please don't forget to fill out the survey in your mobile app, and then we will move to Q&A. All of these sessions that you see, if you go back on the QR code, if you think any of these topics are interesting, we have multiple sessions going on across the next two or three days. So please feel free to look around the different sessions and the keynotes. Swami will be making an AI keynote. Definitely watch that. A lot of new announcements are coming across Amazon Bedrock, Strands, and other services. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.