Kazuya

Posted on Dec 4, 2025

AWS re:Invent 2025 - Accelerate Developer Productivity with Amazon's Generative AI Approach (AMZ309)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Accelerate Developer Productivity with Amazon's Generative AI Approach (AMZ309)

In this video, Alex Torres and Steve Tarcza from Amazon discuss how generative AI is transforming developer productivity at Amazon. They reveal that developers spend only 30% of their time coding, with the remaining 70% on documentation, meetings, and ticket management. Steve introduces StoreGen, Amazon's internal AI startup, which achieved 4x feature delivery by implementing spec-driven development through Spec Studio and AI Teammate—a proactive AI agent that joins development teams, maintains persistent team memory, handles routine tasks, and even generates code implementations autonomously. The presentation demonstrates how Amazon moved from AI-enhanced tools to AI-native development, utilizing AWS services like Bedrock, Agent Core, and Strands SDK to build agents that automate security reviews, documentation, and cross-team coordination, ultimately aiming for 75% adoption across Amazon Stores teams by 2026.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The 70% Problem: Reclaiming Developer Time with AI

Over the last year, there is one statistic that made me think differently about the challenge of developer productivity. Most developers spend only 30% of their time actually writing code. The rest of their time goes to documentation, ticket management, and meetings. When you think about how AI is working with developers, two years ago AI would allow you to autocomplete a line of code, helping you go faster when building. Today, AI can help you build an entire feature from a requirement. That is not just a small improvement; it represents a fundamental shift in how building software will look in the next few years and how software developers can reclaim that 70% of their time that was spent elsewhere.

Welcome to re:Invent. Whether you came in yesterday or today, thank you so much for joining us. Today, we are happy to share how Amazon is using generative AI to enhance developer productivity. I want to note that what we are sharing today is not just a product pitch. We are going to talk about what we have built, what we have learned, and how we have been measuring what we have put in production. My name is Alex Torres. I am a Sr Solutions Architect, and I am joined here by Steve Tarcza. He leads the team that is driving AI-native development within Amazon Stores, and he will be talking more about what his team has built and what they are driving.

I am going to be talking about the journey of AI. I will be level-setting where we are and how far we have come since AI was released. I will discuss how AWS enables builders internally and externally, and then we will dive into how Amazon is taking what is available today and what we have had to build to complement what was available when we started building things. We will leave with some takeaways that you can take back and maybe run by your teams and implement.

The Evolution of AI in Development: From Experimentation to Proven Business Value

Let us look at two years ago, in 2023, when AI launched. Everybody was asking themselves what this was and what a prompt was. Many people wondered if they needed to become prompt engineers. A lot of people started working through building proofs of concept and understanding the technology. AWS at that point launched Amazon PartyRock and Amazon Bedrock to enable experimentation, allow people to get familiar with the technology, and make it available.

In 2024, as we were building internally and our customers were building with AI, we started seeing more production deployments. Proofs of concept started becoming reality. We released the initial version of Rufus, our AI shopping assistant. We released Q Business, Code Whisper, and Q Developer. We started launching production applications, and the questions shifted. People asked how to keep costs low, how to think about security, and which projects to prioritize. As teams started building and deploying to production, we arrived at 2025, which I have been calling the year of proven business value.

The questions shifted again. People have all these AI tools, and they are awesome and can really increase productivity. But how do you make sure that people use them? How do you ensure that if people are using them, they are using them the right way? How do you make sure it is secure and compliant? These are things that fall back while you are experimenting, but now that they are in production, you need to keep those things secure. I want to talk through this journey and how we have seen customers actually starting to build with AI and where that has taken us.

Four Stages of AI Maturity: From Process Enhancement to Autonomous Agents

Most customers start their journey when they are building applications with AI or using generative AI to enhance some of their role-based processes that they have in place.

On the right, you can see that processes have evolved. Think about having a process with five rules that you need to execute. Now AI runs them for you instead of manually doing them. The problem we see at this stage, which requires more human involvement, is that when anything unexpected happens, the workflow fails and you have to restart. Things don't work smoothly.

As models got smarter and companies moved forward with their AI journey, they transitioned to an assistant application. Think about chatbots with knowledge bases that democratize AI access a bit more. These can help you summarize a document, provide context, help you find information in a wiki, and do similar tasks. It's better, but it still requires quite a bit of human oversight.

This year with agentic AI, we've seen a shift from being an assistant or enhancing what you do to becoming a collaborative part of your day-to-day workflows. Think about moving from "summarize my document" to "help me prepare for my customer meeting tomorrow" or "write me a spec to build a new feature" or "these are my customer requirements, what can I do with them?" The agent starts thinking and figuring it out on its own. You don't have to give more instructions. They become smarter. That's the collaborative stage where most advanced companies are today.

Then we have the pioneers, the people really ahead of the curve. This stage is actually rare. It's where AI is autonomously working on its own, spinning up, assigning tasks to itself, running them, and checking with the human. But they only require high-level governance oversight. Most of you, I'd assume, are somewhere between the assistant and collaborator stages, using or building agents or working in the developer space. That brings us back to what we're discussing today.

From Vibe Coding to Spec-Driven Development: Standardizing AI-Generated Code

We moved from Q Developer, which really helped you understand a codebase or a function, to Ciro that is able to understand your codebase and help you build features. That's the collaborative stage of AI developer agents. When it comes to agentic AI in the development world, we have to talk about where most of the time goes for developers. The reality is that 30% of the time typically goes to coding and development. The rest of it, the 70% of the time, goes to planning, documenting, going through reviews, optimizing things, handling escalations, and so on.

I want to split this in two and talk about the journey we've taken for coding, then come back to that 70% and hand it over to Steve to talk about what we're building. The way AI is changing software development is significant. Think about 2023 with Code Whisperer. If anybody used that, it helped you complete code faster, making that 30% of your time more effective. We moved to being able to understand a larger codebase or file, generating more complex functions or similar tasks, to fully completing a task end to end. For example, build my UI and figure it out. That's where we are today.

The problem is that if there is no standardization and every developer does whatever they feel works, you end up with inconsistent results that really don't allow you to deploy that code to production sometimes. Has anybody deployed AI-generated code without reviewing it and had production come down? I know of a couple of instances. That's what we started with the idea of spec-driven development. I'm pretty sure most of you recognize our friend Ciro. The idea of spec-driven development is helping teams standardize the way they think. A feature description, a feature request, a GitHub issue can be anything that anybody's writing. But how does a team that is building a complex application take that?

Validate the user requirements. From those user requirements, from those specifications, build the storage that you need to have a consistent technical spec. That builds those or specifies how those things are going to be built, right? Like, consistently for every team, for every product that you do. And once you have that information, how do you make sure that it's built in order that it doesn't break, that there's no orphan code that you use test-driven development? That's where those implementation plans and that consolidation happens. So, that's how we started thinking, moving from vibe coding to a more structured way that can allow you push to production.

All of this has been possible because today we're actually at a tipping point, but today, this year, this quarter. Think about how far the models have gotten, right? This year, we actually got to calling working really well, right? Like the agents are able to recognize what they're doing, what tool to pick, and they can access external systems. There is a lot of MCP servers out there tooling that wasn't available before that enhance how our models and our agents actually interact with the things we do. Opening GitHub requests, pull request, reviewing your code base, et cetera. And as people build, there is a lot of frameworks that are open-source that allow you to not have to start from scratch. So this allows faster experimentation, it allows failing fast. You don't have to come up with how am I going to build my agents, right? You can reuse some of the code that is already out there, some of the frameworks, the tools, et cetera. And this is really great for coding, right?

Beyond Coding: Addressing the Overlooked 70% of Developer Activities

However, going back to that 70%. There is a lot of overlooked productivity drains in a developer's day, and we've been doing a lot of studying how our developers do, right? And if you think about that 30% that developers spend writing, you can think about ticket management, status updates, go to Jira, go to Slack, come back, send, get everything going. That can easily take 20 to 30% of your time. Think about meetings. How many meetings do you have a week that could have been an email? How many meetings really drive value of what you're actually building or pushing forward?

And if to providing status updates, managing your ticket queue, going to meetings, you add the process complexity of architectural approvals, security reviews, security approvals, finance, et cetera, you are already at that 70%. Let alone the developer that knew how something works left and the code is not documented. You're using a new API. There's no documentation. Now, you have to figure out how those things work, right? And how much time goes into actually learning something you've never used before. Maybe you inherited a package that is written in a language you've never seen before. How much time are you going to spend learning how to do those things? And this is where AI actually shines, reducing the friction of anything else that is not just building code, writing code.

So, from that perspective, how can agentic AI help developers, right? Software developer agents, yes, we've talked about them, but you can specialize them. And you have an agent that actually just writes Java and he's amazing at writing Java or doing unit tests. Imagine automating if you're, maybe you don't use AI for generating code, but every time you finish building a feature, you have an agent that automatically snaps and writes all your unit tests for you. And the sister agent writes all your documentation. Who here likes writing documentation? Thought so, right? The same thing. Imagine that now you submit your security review and you have an agent that actually does everything, tells you all the vulnerabilities, and you are able to remediate them before you go through the security review. So you start skipping steps. You start reducing the amount of time that you spend in the queue. Like there is always steps that you can't skip, right? Please do not avoid upset, but if you can just have them something that they can quickly approve, that reduces the amount of back and forth, unnecessary meetings, et cetera.

In the same way, transformation agents, right? Think about migrating something or maybe you're refactoring a code base from one language to another.

Building with Strands and Agent Core: AWS Tools for Scalable Agent Development

Building agents might seem simple on the surface, but if you haven't built one before or haven't really worked with agents, it can be intimidating. That's why I want to talk about Strands. I'm sure you've all heard about Strands, and if not, it's our SDK to build repeatable or reusable agents on AWS or anywhere. It's open-source and available on GitHub, which you can check out after the session. Strands lets you get started within minutes. You can try the agents locally, plug in any LLM you want, integrate with MCP servers, native tools, and custom tools. It allows you to quickly prototype and iterate. If something doesn't work, you start again. If it works, you can tune it up, make it work better, and ensure it works with your tools.

Once you have your agents built, I want you to think about how you could leverage Agent Core. I know there's a lot of discussion about building agents to place them on Agent Core, but think from a developer perspective. If you start building agents into your pipelines, you could actually leverage Agent Core for these agents to run and help you automate many of the things you're doing, whether it's communication or code reviews. If you're not familiar with Agent Core, it allows you to run your agents in any framework. You can use LangGraph, you can use Strands, and you just run it. It allows you to connect to your MCPs through the Agent Core gateway, so you can have as many tools as you need. Agent Core manages semantic discovery for you, so you don't have to overload your context window.

We offer tools by default with Agent Core, including a browser and code interpreters. You can run your functions, and if you have functions that you want your agent to run without them being an MCP server, you can do that as well. If you're implementing memory, whether short-term or long-term, to improve your users' or customers' experience, it's fully managed and discovers that for you. If you need to authenticate, we have that layer with Agent Core Identity. Of course, we want to make sure you're able to trace and audit everything that's happening, so we've also built a layer of observability. With this approach, as you're building your agents, whether they're for developers or not, you know what's happening.

So if you eventually reach that pioneer stage I mentioned earlier, you could have an agent that automatically runs code reviews, finds bugs, identifies security issues, and fixes them. It cuts the PR and somebody on your team reviews it. This is one use case for how Agent Core could help you automatically build that fully autonomous agent that handles these tasks. Over the last three years, the landscape has changed significantly. Every three weeks, there's a new update and tools change. We've been adapting and started building things when we weren't ready. I'm going to hand it over to Steve to talk about how Amazon has built tools, how Amazon is looking at improving developer productivity, and how we've been measuring that.

StoreGen: Amazon's Internal AI Startup Targeting 3x Developer Productivity

As Alex mentioned, I'm Steve Tarcza. I've had the pleasure for the last six to eight months to come to work every day and think about generative AI and how we can accelerate developers at Amazon. When March came around this year, someone came to me and said, "Hey Steve, do you think we can make Amazon developers three times faster at getting features to customers?" I thought, wow, three times. I've seen percentage-wise increases, but I haven't seen these step-wise changes. At the beginning of the year, in March, we formed a team called StoreGen. This is the team that I lead inside Amazon Stores, and we were tasked with pioneering AI-native development inside Amazon Stores. You can think of StoreGen a little bit like an internal AI startup. We support thousands of developers spanning folks that build the website and mobile application, folks that support Amazon sellers, and those that help fulfill Amazon orders. So we're supporting all developers that work on Amazon Stores.

Before I get into the details, let me talk about Amazon's development culture because this played a major role in the solutions that we ended up building. Amazon has a two pizza team environment. These are teams of 8 to 10 people, and there are thousands of them that help make the store happen. Developers in these teams innovate on behalf of customers. They're thinking long term, thinking big, and thinking about bold solutions that solve complex problems for millions of customers.

These teams are also continuously learning and exploring new technologies, staying at the forefront of development. Most importantly, they have full end-to-end ownership over the software that they build. They're responsible for the quality, the stability, the maintainability, and the operational support for what they build. That meant that any AI solutions we built with StoreGen needed to take all this into account.

One of the biggest problems that we face is constraint on time and resources. We want to develop as much as we can for customers as fast as we can, but on one hand, that constraint will sometimes drive innovation. On the other hand, sometimes it can constrain that innovation. We see increased friction from keeping the systems running through operational support, increasing technical debt, and the system complexity grows over time as well.

We also see issues trying to keep the store consistent. This is increasingly difficult with thousands of teams, and it forces us at times to run campaigns across thousands of two pizza teams to try to achieve that consistency at a point in time. But even with that, we see inconsistencies. We see the same challenges that developers face. We want to get developers more time to write code and get them out of the business of doing non-development activities like gathering context, reading documentation, and searching for information.

Teams spend substantial effort on coordination activities, things like status updates, stakeholder management, and cross-team alignment. Of course, the more teams there are, the more this becomes a problem. Finally, engineers spend double digit percentages of their time keeping their system running, triaging issues, and managing tickets. Let me talk about how we set out to solve some of these problems and how we built the solutions that we did in StoreGen. We knew we had to change the way we work, not just build some AI solutions for folks to use. So we borrowed some practices from startups. We said we're going to experiment quickly, fail fast, and find the solutions that really resonate with developers.

Defining AI-Native Development: From Experimentation to Production at Scale

In our experiments phase, we started with just a small number of teams, one or two, to test our AI solutions. If something failed, we stopped there. For things that succeeded, we moved on to the scale phase. We gathered feedback from customers on the order of tens of teams validating that the solutions were applicable and helpful for what they face. Solutions that moved past that moved into the mature stage, and this is where we started to scale these out for broader usage. This is the phase that we're in right now.

Some of the solutions that I'm going to talk about here in a few minutes are in this mature stage, and we're gathering feedback on how they work at scale. As we move into 2026, we're moving into the impact stage where we drive adoption of the tools that we've built and demonstrate the scalable acceleration. As I mentioned, we started in March. We set an aggressive timeline, which doesn't really look that way on this chart because it says Q1 to Q4, but the reality was we said from March to November, so roughly six to seven months, to develop a set of tools that could help developers move three times as fast.

We picked a diverse set of teams to work with so that we could make sure that what we were building did not only work for a small set. We set this timeline up, we got started, and in Q1, so March basically, we started with some of our initial experiments. Some of them were successful and moved on, and in Q2 and Q3 we went into that scaling phase. As I mentioned, we're in Q4 right now in that maturity phase, looking to scale that impact in 2026.

Like any good plan though, it didn't really survive contact with reality. It got way messier than it looks like on this graph. There was a ton of demand for the products that the team was building. One of them this morning, Dave Treadwell talked about called Spec Studio. I'll talk a little bit more about that here in a few minutes. Some of the products we were building were quite successful, and so there was significant demand which forced us to accelerate our timeline.

Throughout the day, you've heard people talk about AI native solutions, AI native development, and AI native approaches. We had questions very early on about what it means to be AI native. We created a definition for ourselves so that we could articulate what an AI native solution looks like. This is continuously evolving, but this is where we're at right now. AI native solutions showcase three particular attributes.

First, they're proactive, so they can independently drive activity. These are things that don't require human prompting. They can take action on their own. In this case, we reserve human intervention for places where there's critical judgment and nuanced decisions needed. The second aspect is the idea that they're intent driven. You can define a goal or an outcome that you'd like the AI to achieve, and it can then translate that into coordinated action across tools, agents, and services to accomplish that outcome. Finally, we think AI native solutions are deeply contextual. They can reason over persistent organizational knowledge. They understand what a team does, the historical decisions of the team, and the business objectives of the teams.

Unfortunately, I have not seen a solution yet that is fully AI native. I'll talk about one today that I think is pretty close. It's called AI Teammate, and I'll discuss that in a moment. The reality is AI solutions exist on a spectrum from AI enhanced to AI native. This calls back to something Alex was mentioning about AI assisted and AI augmented improvements. AI enhances when folks use AI to accomplish tasks that you're already doing, but just faster. In AI Native, we seek to do things that we couldn't do previously with AI, remove steps from the SDLC, and potentially scale processes that were previously out of reach due to cost, time, or effort constraints.

For many AI solutions, the focus is on the development bubble. Alex mentioned this earlier. This is not where engineers spend most of their time. In fact, the studies that we did show an even smaller percentage than Alex quoted. Our StoreGen team focused on activities outside of that bubble. We focused on the things that folks are spending their time on that stop them from being able to code. In addition, we recognized that there are all these bubbles here, and there may be AI solutions for some of them, but it's also really tough to connect them together. We wanted to make it so developers don't have to spend time moving things from step one to step two to step three and on.

We knew we had to move quickly. Agent Core didn't exist when we started. We had to use things like Lambda, Bedrock, and DynamoDB to build some of the solutions that you're going to see today. We've been able to provide feedback to the Agent Core team, and now we're shifting many of the solutions we have to that solution to help streamline our approach. We also recognized very quickly that the fact that Amazon was built on AWS gave us an accelerant. The LLMs that we use, typically Claude Sonnet, understand AWS. That means that we don't have to teach it, and it lets us move faster with many of our solutions.

Spec Studio: Transforming Code into Specifications for AI-Powered Development

The first of the two solutions I'll talk about today is spec-driven development. Alex mentioned a little bit about this earlier today, and Dave Treadwell mentioned Spec Studio in one of the innovation talks. I'll dive a little deeper into it for you today. The idea with spec-driven development is that we shift from writing code, abstract one layer further, and start writing specifications. AI can then automatically generate portions of the code from that specification. I like to think about this as the eighty percent first draft version. It's not something that I've launched to production, but it's something that gets me going faster, and then I can use it to finish the piece of code that I need.

Specifications are also interesting because they provide context for AI systems as well, not just for coding, but for other tasks like document writing or status updates. Specifications actually provide a very interesting piece for that, and it does it without disrupting developers because specifications exist in the abstract. You don't need a developer to go look at the code to tell you how it works. One thing that specs do as well is give a unified source of truth. Stakeholders, be it the team itself, other teams, or other stakeholders like legal or accessibility, can all reference the same specification.

They can provide requirements once, and then they're incorporated into the specifications. Now that they're in the specifications, the folks that have contributed them have confidence that they can be enforced using AI in the development process. They can also be validated by AI to make sure that they were consistent.

There are two additional things that we think are very exciting about spec-driven development. Matt talked a little bit about some of this in his keynote, but the ability to eliminate tech debt through specifications. Because if you abstract all of the core pieces of the software into a specification, you can then re-implement it in another language without unused code and with new programming patterns.

We can also enable consistent cross-platform implementations. So if you have a specification for how a piece of software should work, in the StoreGen case, we could implement a web and mobile solution without having to do the work twice. So how does spec-driven development work? What does it look like? And how do we handle the fact that we have tens of thousands, if not hundreds of thousands of packages of existing code at Amazon already?

We knew we couldn't ask folks to go write specifications, as that was never going to work. So we developed what we call code-to-spec, where we actually take a piece of code and generate a specification from that code. Then humans can modify that specification, can augment it, and add things to it. That specification can then be translated into code using your favorite AI tool, such as Claude, for example.

The thing we also recognized at this point was that it's not just code that you can generate with specifications. You can generate the documentation, you can generate the tests, you can generate validations. So you can imagine a case where you generate a piece of code using the specification, but then you use the specification to validate that the code meets the specification. This creates a nice closed-loop cycle for spec-driven development.

This is a view of Spec Studio, as Alex mentioned. This is something we use internally, and I'm showing it here to walk through the solution. This is the product we use to generate specifications. When we put it out, we recognized that there are lots of artifacts that could be generated from a specification, such as system overviews, developer documentation, usage guides, and diagrams. The spec itself is valuable, but the other things turned out to be super valuable as well.

Throughout the rest of my talk today, I'm going to talk about a package called StoreGenAgentTimers. This is a really simple package for everybody listening. It's effectively like Cron, but as an agent, and it allows someone to basically say, "Hey, I'd like this activity to happen every 30 minutes," or something of the sort.

In this case, this is a specification that was created for the agent timer package. You'll notice here that with no human intervention, Spec Studio generated the system boundaries, what the software does, what it does not do, and it also calls out the AWS service dependencies.

Scrolling a little bit further down in the system overview, Spec Studio pulled out three capabilities from the code: natural language processing, timer or state management, and notification delivery. There's quite a bit of detail if you look at these closely and how these are handled, and this was all pulled out of code with no human intervention needed.

One of the things that's great is for developers that are onboarding to a team. They don't have to read every line of code to understand what the software does. They can read something like a system overview to get a quick understanding, and it's a lot easier to parse through. The last thing I'll point out on this slide is the little blue links. Those are citations.

I'll dig deeper into what a citation is from the specification now. Here's one around what we call clarifications. This is when someone requests a timer using natural language, but they didn't fully specify it. So this is what an actual specification looks like. It's structured, it has a description, it has constraints, acceptance criteria, business rules, dependencies—the things that you would expect to see in a specification written by a human, and something that then you can test against to verify that these things have been met.

But maybe one of the coolest parts is that it's linked back to the code. So here you'll see the little section that says related code.

If I click on that, it takes me to the lines of code that were used to generate this part of the specification. These are lines 6 through 46, and what you'll notice is there's a lot of detail that the spec was able to pull out of the code. On the previous slide, one of the pieces that was pulled out was a truncation length. If you look at line 26 here, it's not in the function description and it's not detailed in any comments. It's just in the code, and Spec Studio was able to pull that out and recognize that's a requirement for this particular piece of code.

For Spec Studio, detailed diagrams can be generated. You'll notice here callouts for services, which helps with things like security reviews. We can quickly get an understanding of how a particular piece of code works. One thing that does happen on occasion, though, is these auto-generated specifications can be wrong. The AI can sometimes get it wrong. Within the Spec Studio tool, you can actually add feedback and say that's not right. It's not just plain English; it's any language an LLM will understand. What's nice about this is that this feedback can then be incorporated into future spec generations or code generations. So you can dynamically modify and adjust these specifications in real time.

The last really cool things are, if you have specifications for all of your packages, which we're headed toward, we can do things like semantically search our specifications. So if I wanted to say find all of the code for Amazon that shows Amazon search results, I can just say show me all of that code, and it'll return all the packages that have to do with showing Amazon search results. It's really powerful in the environment that we work in with thousands of teams. The last thing shown here on the screen is deep Q&A on a particular code package. You can dive deep into particular aspects and capabilities and have this sort of Q&A with the AI about the particular package.

AI Teammate: A Proactive Team Member That Learns, Acts, and Collaborates

So we talked about specifications. I'm now going to talk about our second product. We hypothesized that folks typically use AI in a one-on-one scenario. You chat with a chat agent, you invoke an agent, you work in your IDE. We hypothesized that it may be something interesting and unique if we brought AI into a team. AI Teammate is different in that it doesn't work on any one person's behalf, but instead works as part of a team.

So what does AI Teammate mean? AI Teammate is a proactive team member. It joins your team and gets connections to all of the systems that a normal person on your team would. So things like Slack or Quip or ticketing systems that you may have. It continuously learns from those systems and starts to suggest and take action based on what it's seeing happen within a team. It does this throughout the development life cycle just like a developer would, and in the systems that developers are already using.

AI Teammate can handle routine tasks automatically. Things like answering questions for other teams, drafting documents, and executing tool-backed actions can all be done automatically. AI Teammate, and this is one of maybe the coolest parts of it, creates this persistent team memory. It maintains context across systems and communications, which allows us to accelerate future work. It brings that context and memory to everything it does. When it invokes another AI system, it has all of the history of the team. When it invokes another agent or responds to Q&A or does a task, it understands the context in which it's doing that, going back to the deeply contextual part that I mentioned earlier.

Finally, AI Teammate allows developers to focus on high-judgment activities. Things like architecture, design decisions, strategy, and problem solving. I'm going to walk through some examples of AI Teammate in action. This is showing in Slack because it's sort of the easiest thing to show, and you'll notice some names are blanked out. You can kind of ignore that. But here's an example where a team member asked, "How does the storage and agent timers package work?" Within just a few seconds, AI Teammate could go and query Spec Studio, get the specification, understand it, and give an answer to the developer, something that may have taken a developer.

a couple of minutes to do on their own, or worse, required interrupting another developer to get the answer for them. That's useful, but maybe the specification doesn't have all of the context. In the same Slack thread, you can dig deeper and ask what the team has discussed with respect to the agent timers. You can understand the cross-reference between the specification and the team's dialogue, getting the combination of knowledge from both the specification and the team's conversation. Here's an example that cites back to the specific Slack message being used as a reference.

So it can answer questions. It's a chatbot, right? No, I don't think so. Now I want to add a feature to the notifications. I want to add priority levels to it. AI Teammate can help here too. We can give it a document, which is what you see happening on the right-hand side here. It can read the document and then generate tasks based on that feature request. In this case, it says there are five tasks that I would like to generate in order to get these priority levels in place. At this point, developers can now discuss and adjust whatever AI Teammate came up with. Once they're satisfied with it, they can ask AI Teammate to actually create the tasks in their project management system.

Sometimes, and this is really cool when you get to see it work, AI Teammate will recognize that the conversation has reached a natural conclusion and will proactively ask, "Should I go create these tasks for you at this point?" In this case, we asked it to create them. It created them, and the tasks have now been created in our task system. This saves the time that developers or managers would have spent creating these tasks, but maybe more importantly, the quality of the tasks is much higher. They're very detailed and have the context of the team's conversations and the context of the rest of the software that they own and operate.

So we've got the tasks. Now what? Well, AI Teammate can also help us with implementation. We can point AI Teammate at a particular task and say, "Can you make a first draft implementation of this?" This is one of the cool things where AI Teammate, because of its proactive nature, can also do this while the team is asleep. It doesn't actually have to be invoked. You can think of these as first draft implementations, and as models improve, we'll be able to do more complex code generations and improve the quality of it. Now AI Teammate has completed the code review and publishes it back to the Slack channel. It also automatically sends an email to the team saying, "I've completed this code review. Can you go ahead and take a look at it?"

In this case, it's a very simple example, but you can see here it created the definition of priority in the code and then created four priority levels. When the developer went to code review this, they said, "Actually, I'd like to have a critical priority level as well." They provide the code review comments just like they normally would to any other developer on their team. AI Teammate sees this and says, "I should update my code," so it updates the code based on the comment. It does a revision and then shares back the revision with the team, all with no people needing to do anything in the process other than the code review.

Here's a copy of the change. You can see I added the critical aspect of the critical priority level, and the code change is ready to move forward now. As I mentioned, I chose a simple example for today just to make it easy to understand, but you can imagine more complex examples. AI Teammate has shipped hundreds of code revisions to date. Great, Steve. We've heard all about AI Teammate. It sounds really great, but how does it work? Like, what's actually happening there?

At the core of it, it's really pretty simple. There's a lot of traditional scaffolding and engineering happening in the background. We hear about AI and it sounds magical, but the reality is it's magical because of what we build around it, in my view. On the left-hand side, you'll see event sources. These event sources are any system that a developer may work in and can generate information flowing into the team. We collect those up in our Lambda connectors, push those into a stream, and then AI Teammate batches these up. You may ask, why are we batching them? The batching allows us to process lots and lots of information coming in all at once in a rational way.

You can imagine someone saying, "I'm going to be out of the office tomorrow," and then one second later saying, "Oh, I meant today." This allows AI Teammate to compress those messages and not have to process them quite as often. They get into the queue, and the first thing we do is capture some memories. This is one of the really cool parts of how this works—it captures these running memories related to what the team is doing. This allows us to compress very disparate events into one context window, so when we're processing an event in the future, we can pull these together in one place.

We then move to our next step, which is effectively another Lambda underneath the Bedrock symbol. At this point, AI Teammate does its first pass and says, "This is the context of the team. This is what just happened. Do I think I can add any value here?" This is the proactive part. It decides if it thinks it should do something, and in many cases it says no—actually, I don't have anything interesting to do here—and it passes without doing anything. It just records the memories and moves on.

However, it can decide to invoke a tool, and the tool can be anything your imagination can come up with. We even model things like Slack messaging as a tool, so it's the AI deciding it should send a Slack message or deciding it should respond in a ticket, not us hard coding that. On the far right side of this diagram, you'll see MCP tools and agents. This is one of the ways we can scale things like AI Teammate, which is this orchestrator to handle thousands of teams at the company that are specialized in different domains.

I want to reiterate that the LLM is only here a couple of times. There's a lot of traditional software engineering going on to make this happen. We recognized with AI Teammate we couldn't build everything, so we had to build something that would allow us to scale. This is the agent-to-agent connection that I mentioned earlier. It allows us to scale the expertise by embedding organizational knowledge into reusable agent capabilities that can then be invoked by AI Teammate.

It allows AI Teammate to coordinate complex tasks autonomously, breaking down work and orchestrating across specialized tools. This reduces human handoffs and interruptions. Agents communicate directly with one another to complete multi-step processes. I alluded to this earlier when I said developers no longer have to carry around context from tool to tool. AI Teammate will do that for them. Last, it maintains consistent context across all activities from planning to execution to documentation. It understands how we got there, using the team memory and historical decisions to do that.

Results and Key Takeaways: Achieving 4x Feature Delivery Through AI-Native Practices

So I set out what we were tasked with in March, and we talked about a couple of solutions. Where did we end up? Well, we ended up seeing four times the number of features shipped to customers for a subset of our teams by November. We saw a reduction in operations and routine tasks. We saw an improvement in quality, and we saw implementations compressed from weeks to hours in many cases. We did all of that, but we learned along the way, that's for sure.

At the beginning, we focused on efficiency. We said, "How can we make developers more efficient?" As time went on, we said, "Wait a minute, why are we focusing on efficiency? Let's focus on reimagining how we do the process altogether. Let's figure out how we can do things we never could do before." This allows us to tackle previously impossible challenges. We started with what we thought were clear, measurable experiments, but we recognized we had to stay flexible in approach. It turns out our plan didn't go as planned. We had to stay flexible, and progress was more continuous than we had initially anticipated.

The last thing I'll say is that AI-native teams are a collaboration between AI and humans. They work together, they work in parallel, they unblock one another, and they help reduce operational overhead as a unit. Where are we going from here? What's next? Well, we've already started work to scale our AI-native solutions to thousands of teams inside Amazon. Dave mentioned this morning that our plan for 2026 is to have 75% of the teams in stores using these solutions and reaching those productivity gains that we talked about. We've also started to utilize Quiro for accelerating code development. With that, I'm going to hand it back to Alex to talk about some of the key takeaways.

Steve shared a lot of good insights, so I wanted to take a second to recap. Software development is evolving and changing fast, as we all know. It's important to understand how the technology can help you. We're moving from AI enhancing tools to building entire features or packages without writing a single line of code manually. Adopting AI-native development techniques can help you build toward that goal, and Steve was saying that we can move from zero to one hundred toward native development applications and things like that.

Evaluating the new capabilities as they come up will push the frontier forward. However, the most important thing across these two areas is consistency. If your results are not consistent and you don't have standardization across teams building that memory that persists, developers come and go, but the package stays. So how will you make sure that those things stay and you're able to deliver faster or dream bigger? Those are the three things I wanted to leave you with.

Thank you so much for spending the afternoon with us. I know it's happy hour time, so we'll be happy to let you go. We'll stay here for a minute if you have any questions or if you'd like to talk about any of these things. We're happy to answer your questions. Thank you so much.

; This article is entirely auto-generated using Amazon Bedrock.