Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Accelerate Developer Productivity with Amazon's Generative AI Approach (AMZ309)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Accelerate Developer Productivity with Amazon's Generative AI Approach (AMZ309)

In this video, Amazon shares how they're using generative AI to transform developer productivity, moving beyond the 30% of time developers spend coding to address the 70% spent on documentation, meetings, and operational tasks. Alex Torres and Steve Tarcza present Amazon's StoreGen team's journey building AI-native development solutions, including Spec Studio for spec-driven development and AI Teammate, a proactive AI agent that joins development teams. They demonstrate how AI Teammate autonomously handles routine tasks, maintains persistent team memory, generates specifications from code, creates implementation tasks, and even submits code reviews. The team achieved 4x feature delivery for pilot teams by reimagining development processes rather than just improving efficiency, with plans to scale these solutions to 75% of Amazon Stores teams by 2026.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Evolution of AI in Software Development: From Code Completion to Feature Generation

Over the last year, there is one statistic that actually made me think a bit differently about the challenge of developer productivity. Most developers spend just 30% of their time really writing code. The rest they do documentation, ticket management, meetings, and more meetings. If you think about how AI is working with developers, about two years ago AI would allow you to autocomplete a line of code, kind of helping you go faster when you're building. Today AI can help you build an entire feature from a requirement spec. That's not a small improvement if you think about that. It's more of a shift on how building software actually is going to look forward in the next few years and about how software developers really reclaim that 70% of their time that was spent elsewhere.

Welcome to re:Invent, whether you came in yesterday, Sunday, or today. Thank you so much for joining us. Today we're happy to share how Amazon is using generative AI to enhance developer productivity. I just want to say a quick note. What we're going to be sharing today is not just a product pitch. We're going to be talking about what we have built, what we have learned, and how we've been measuring what we've been putting in production.

My name is Alex Torres. I'm a Solutions Architect, and I'm joined here by Steve. He leads the team that is driving AI native development within Amazon stores, and he's going to be talking more a little more about what they've built, what their team's built, and what they're driving. So if you're ready, let's get started.

I'm going to be talking a little bit about the journey of AI. I will be level setting where we are, how far we have come since AI was released, how AWS enables builders internally and externally, and then we'll dive into how Amazon is taking what's available today, what we have to build to complement what was available when we started building things, and we'll leave with some takeaways that you can take back and maybe run by your teams and implement.

So let's take a look at two years ago. If you think about 2023, AI launched and everybody kind of was asking themselves, what is this? What is a prompt? Do I need to become a prompt engineer? A lot of people started kind of working through building POCs, understanding the technology. AWS at this point launched Amazon PartyRock, I don't know if you all remember, and we launched Amazon Bedrock just to enable experimentation, to allow people to get familiar and make the technology available.

In 2024, as we were building internally and a lot of our customers were building with AI, we started seeing more of the year of production. The POCs started becoming a reality. We started deploying to production. We released the initial version of Rufus, if you've heard about it this week or if you used it on the app. It's our AI shopping assistant. We released Q Business, we released CodeWhisperer and Q Developer. So we started launching production applications. And the questions kind of shifted. How do I keep my costs low? How do I start thinking about security a little bit? What is the project that I should prioritize? As teams started building and deploying to production, we got to this year, 2025.

In 2025, which is what I've been calling the year of proven business value, the question shifted a little bit too. Hey, I have all these AI tools, they're awesome, they can really increase productivity, but how do I make sure that people use them? How do I make sure that if people are using them, they're using them the right way? How do I make sure it is secure and compliant? Things that while you're experimenting kind of fall back, but now that they're production, you need to keep those things secure. So I want to talk a little bit through this journey, how we have seen customers actually starting building with AI and where that has taken us.

The Journey from AI-Enhanced to Agentic AI: Understanding the Maturity Stages

Most customers start their journey when they're building applications with AI or using generative AI or anything of that sort, like enhancing some of their role-based processes that they have in place. On the right.

On the left, you can see that, right? Think about, hey, I have this process that has five rules and I need to do them, so now AI is going to run them for me instead of me. The problem that we see when in this stage, which is more human, it's needed, is that anything that is not expected happens, the workflow fails and then you have to restart and things kind of don't work.

As models got smarter and as companies kind of move forward with their AI journey, you go to that assistant kind of application. Think about that chatbot that people have that have knowledge bases, how you allow democratize a little bit more AI access. And it helps you. It can help you maybe summarize a document. It can help you on the context that it has. It can help you find a wiki, it can help you do some of those things. It's better, but it still needs quite a bit of human oversight.

This year with Agentic AI, we have seen that shift of maybe being an assistant, maybe enhancing some of the things that you do to become a collaborative part of your day-to-day workflows. Think about moving from that, summarize my document to help me prepare for my customer meeting tomorrow, or write me a spec to build a new feature, or these are my customer requirements, what can I do with them? And the agent by itself starts kind of thinking, figuring it out. You don't have to give more instructions. They become smarter, and that's kind of what that collaborative step is. That's where most advanced companies are today.

And then we have the pioneer, like that's the people that are really ahead of the curve. This stage is actually rare. It's where the AI is autonomously working on its own. It spins up, it assigns tasks to itself, it runs them, it checks with the human, right? But they just require like that the governance high-level overview. So as you think through that, most, I think most of you, I don't know, I'm going to assume are in between assist and collaborate. You're using or building agents or in the developer space, and that will bring it back to what we're talking here today, right?

Beyond the 30%: Addressing the Hidden Productivity Drains in Development

We moved from Q Developer, really helping you understand a code base or a function to Kira that is able to understand your code base and help you build features, and that's that collaborate stage of AI developer agents, right? So when it comes to Agentic AI in the development world, right, we have to talk where most of the time goes when it comes to developers, right? And the reality is what I mentioned earlier, 30% of the time typically goes on the right side, coding and development. The rest of it, the 70% of the time is planning, documenting, going through reviews, optimizing things, break fix, escalations, et cetera, right?

So I want to split in two and I want to talk about a little bit that journey that we've taken for coding and then I'm going to come back to that 70% and then I will hand it over to Steve to talk about what we're building. So the way that AI is changing software, and that's coming back to what I was saying, right? Think about how we like 2023 Code Whisperer, if anybody used that, helped you out, complete code faster, that 30% of your time just was more effective.

We moved to being able to understand a code or documenting a larger file, generating a more complex, maybe function or something like that, to fully completing a task, end to end, right? Build my UI, figure it out, and that's where we are today. The problem is that if there is no standardization and every developer does whatever they feel works, you end up with inconsistent results that really don't allow you to deploy that code to production sometimes, right? Has anybody deployed any AI-generated code without reviewing it and production came down? I know of a couple of things. Okay.

So that's what we started coming with that idea of Spec-Driven Development. I'm pretty sure most of you recognize our friend, Kira, right? So the idea of Spec-Driven Development is just helping teams standardize the way that they think. A feature description, a feature request, a GitHub issue can be anything that anybody's writing.

But how does our team that is building a complex application take that and validate the user requirements? From those user requirements, from those specifications, build the structure that you need to have a consistent technical spec that specifies how those things are going to be built consistently for every team, for every product that you do. And once you have that information, how do you make sure that it's built in order, that it doesn't break, that there's no orphan code, that you use test-driven development? That's where those implementation plans and that consolidation happens. So that's how we started thinking, moving from vibe coding to a more structured way that can allow you to push to production.

All of this has been possible because today we're actually at a tipping point. This year, this quarter, think about how far the models have gotten. This year, we actually got tool calling working really well. The agents are able to recognize what they're doing, what tool to pick, and they can access external systems. There are a lot of MCP servers out there, tooling that wasn't available before that enhances how our models and our agents actually interact with the things we do, like opening GitHub pull requests, reviewing your code base, and so on. And as people build, there are a lot of frameworks that are open source that allow you to not have to start from scratch.

So this allows faster experimentation, it allows failing fast. You don't have to come up with how am I going to build my agents. You can reuse some of the code that is already out there, some of the frameworks, the tools, and so on. And this is really great for coding. However, going back to that 70%, there are a lot of overlooked productivity drains in a developer's day, and we've been doing a lot of studying of how our developers work. And if you think about that 30% that developers spend writing code, you can think about ticket management, status updates, going to Jira, going to Slack, coming back, sending messages, getting everything going. That can easily take 20 to 30% of your time.

Think about meetings. How many meetings do you have a week that could have been an email? How many meetings really drive value of what you're actually building or pushing forward? And if to providing status updates, managing your ticket queue, and going to meetings, you add the process complexity of architectural approvals, security reviews, security approvals, finance, and so on, you are already at that 70%. Let alone the developer that knew how something works left and the code is not documented. You're using a new API, there's no documentation, and now you have to figure out how those things work. And how much time goes into actually learning something you've never used before?

Specialized Software Developer Agents: Automating Non-Coding Tasks

Maybe you inherited a package that is written in a language you've never seen before. How much time are you going to spend learning how to do those things? And this is where AI actually shines, reducing the friction of anything else that is not just building code, writing code. So from that perspective, how can agentic AI help developers? Software developer agents, yes, we've talked about them, but you can specialize them. And you have an agent that actually just writes Java and is amazing at writing Java or doing unit tests.

Imagine automating, maybe you don't use AI for generating code, but every time you finish building a feature, you have an agent that automatically snaps and writes all your unit tests for you. And the sister agent writes all your documentation. Who here likes writing documentation? Thought so. The same thing, imagine that now you submit your security review and you have an agent that actually does everything, tells you all the vulnerabilities, and you are able to remediate them before you go through the security review. So you start skipping steps. You start reducing the amount of time that you spend in the queue.

There are always steps that you can't skip. Please do not avoid them, but if you can just have something that they can quickly approve, that reduces the amount of back and forth, unnecessary meetings, and so on. In the same way, transformation agents, think about migrating something or maybe you're refactoring a code base from one language to another.

With that being said, building agents, what I was saying, is not simple if you haven't built one before, if you haven't really worked with agents. It can be intimidating. That's why I want to talk about Strands.

Building with Strands and Agent Core: AWS Tools for Scalable Agent Development

I'm pretty sure you all have heard about Strands, and if not, it's our SDK to build repeatable agents or reusable agents on AWS or anywhere. It's just open source, it's in GitHub, you can check it out after the session, but it can let you get started within minutes. You can try the agents locally, you can plug any LLM you want. You can integrate with MCP servers, native tools, custom tools, and it allows you to quickly throw things out. If it doesn't work, start again. If it works, you iterate over. You have these templates that you can use. If it works, tune it up, make it work better, make sure that it works with your tools.

And when it comes to, now you have your agents, you have them built, you have all of these things, I want you to think about how you could leverage Agent Core. If you haven't thought about it, I know there's a lot of build your agents to place them on Agent Core, but think from that developer perspective. If you start building agents into your pipelines, you could actually leverage Agent Core for these agents to run and help you automate a lot of the things that you're doing, whether it is communication, whether it is code reviews.

If you're not familiar with Agent Core, Agent Core allows you to just run your agents in any framework. You can use LangGraph, you can use Strands, and you just run it. It allows you to connect to your MCPs through Agent Core Gateway, so you can have as many tools and it manages that semantic discovery for you, so you don't have to overload your context window. We offer tools by default with Agent Core, browser and the code interpreters. You can run your functions. If you have functions that you want your agent to just run and it doesn't have to be an MCP server, you can also do that.

If you are implementing memory, short term or long term, to improve your users or your customers' experience, you can. It's fully managed and it discovers that for you. If you need to authenticate, we have that layer for Agent Core Identity. And of course, we want to make sure that you're able to trace and audit everything that's happening, so we also built that layer of observability with the idea that as you're building your agents, whether they're developer or not, you know what's happening.

So if you have eventually on that pioneer stage that I was talking about earlier, you have an agent that automatically runs and does code reviews for you or finds bugs or security scans and fixes them, and it cuts the PR and somebody in your team reviews it, right, that's one of the use cases that you can think of, for example, on how Agent Core could help you automatically build that fully autonomous agent that, hey, I need to do these things. Thank you.

StoreGen: Amazon's Internal AI Startup Pioneering Three-Times Faster Development

With that being said, right, I want to recap really quick. Over the last three years, the landscape has changed a lot. Every three weeks, there is a new update. Tools changed, and we have been adapting. We started building things when we're not ready. So I'm going to hand it over to Steve to talk about how Amazon has built tools, how Amazon is looking at improving developer productivity, and how we've been measuring that. So thank you.

Thanks, Alex. As Alex mentioned, I'm Steve Tarcza. I've had the pleasure for the last six to eight months to come to work every day and think about generative AI and how we can accelerate developers at Amazon. When March came around this year, someone came to me and said, hey Steve, do you think we can make Amazon developers three times faster at getting features to customers? And I thought, wow, three times. Okay, I've seen percentage-wise increases, but I haven't seen these step-wise changes.

And so at the beginning of the year, as I say in March, we formed this team called StoreGen. This is the team that I lead inside Amazon Stores, and we were tasked with pioneering AI native development inside Amazon Stores. You can think of StoreGen a little bit like an internal AI startup. We support thousands of developers spanning folks that build the website and the mobile application, folks that support Amazon sellers, and those that help fulfill Amazon orders. So we're supporting all developers that work on Amazon Stores.

Before I get into it, let me talk a little bit about Amazon's development culture because this played a major role in the solutions that we ended up building.

First, maybe if you heard some of the talks earlier today, Amazon has a two pizza team environment. These are teams of 8 to 10. There are thousands of them that help make the store happen. Developers in these teams innovate on behalf of customers. They're thinking long term. They're thinking big. They're thinking about bold solutions, solving complex problems, and handling millions of customers. They're also continuously learning. They're exploring new technologies. They're staying at the forefront of development technology. And last, they have full end-to-end ownership over the software that they build. They're responsible for the quality, the stability, the maintainability, and the operational support for what they build. That meant that any AI solutions that we built with StoreGen needed to take all this into account.

One of the biggest problems that we face is constraint on time and resources. We want to develop as much as we can for customers as fast as we can, but on one hand, that constraint will sometimes drive innovation. On the other hand, sometimes it can constrain that innovation. We see increased friction from keeping the systems running through operational support, increasing technical debt, and then of course the system complexity grows over time as well. We also see issues trying to keep the store consistent. This is increasingly difficult with thousands of teams, and it forces us at times to run campaigns across thousands of two pizza teams to try to achieve that consistency at a point in time. But even with that, we see inconsistencies.

Alex mentioned earlier some of the challenges that developers face. We see the same thing in Amazon and in the store's organization. We want to get developers more time to write code. We want to get them out of the business of doing non-development activities like gathering context, reading documentation, searching for information. Teams spend substantial effort on coordination activities, things like status updates, stakeholder management, and cross-team alignment. Of course, the more teams there are, the more this becomes a problem. Finally, engineers spend double-digit percentages keeping their system running, triaging issues, and managing tickets.

Defining AI Native Development: From Experimentation to Production at Scale

Let me talk about how we set out to solve some of these problems and how we set out to build the solutions that we did in StoreGen. We knew we had to change the way we work, not just build some AI solutions to have folks use. So we borrowed some practices from startups. We said, hey, we're going to experiment quickly, we'll fail fast, and we'll find the solutions that really, really resonate with developers. So in our experiments phase, we started with just a small number of teams, one or two, to test our AI solutions. If something failed, we stopped, we stopped here. For things that succeeded, we moved on to the scale phase. We gathered feedback from customers. This is maybe on the order of tens of teams validating that the solutions were applicable and helpful for what they face.

Solutions that moved past that moved into this mature stage, and this is where we started to scale these out for broader usage. This is sort of the phase that we're in right now. Some of the solutions that I'm going to talk about here in a few minutes, they're in this mature stage, and we're gathering feedback on how they work at scale. And finally, as we move into 2026, we're moving into the impact stage where we drive adoption of the tools that we've built and we demonstrate the scalable acceleration.

As I mentioned, we started in March. We set an aggressive timeline which doesn't really look that way on this chart because it sort of says Q1 to Q4, but the reality was we said from March to November, we had, so roughly six to seven months to develop a set of tools that could help developers move three times as fast. We picked a diverse set of teams to work with so that we could make sure that what we were building did not only work for a small set. We set this timeline up, we got started and in Q1, so March basically, we started with some of our initial experiments. Some of them were successful and moved on, and in Q2 and Q3 we went into that scaling phase and as I mentioned, we're sort of in Q4 right now in that maturity phase, looking to scale that impact in 2026.

Like any good plan though, it didn't really survive contact with reality. It got way messier than it looks like on this graph. There was a ton of demand for the products that the team was building. One of them this morning, Dave Treadwell talked about called Spec Studio. I'll talk a little bit more about that here in a few minutes. And the products, some of the products we were building were quite successful, and so there was, again, the significant demand which forced us to accelerate our timeline.

So probably throughout the day, you've heard people talk about AI native, AI native solutions, AI native development, AI native this, and we got questions very early on. What does it mean to be AI native? And so we created a definition for ourselves so that we could say this is what an AI native solution looks like. This is continuously evolving, but this is sort of where we're at right now.

So AI native solutions showcase three particular attributes. First, they're proactive, so they can independently drive activity. This is things that don't require human prompting. They can just take action on their own. In this case, we reserve human intervention for places where there's critical judgment and nuanced decisions needed. The second aspect is the idea that they're intent driven. You can define a goal or an outcome that you'd like the AI to achieve, and it can then go and translate that into coordinated action across tools, agents, and services to accomplish that outcome. Finally, we think AI native solutions are deeply contextual. They can reason over persistent organizational knowledge. They understand what a team does. They understand the historical decisions of the team, and they understand the business objectives of the teams.

Unfortunately I have not seen a solution yet that is fully AI native. I'll talk about one today that I think is pretty close. It's called AI Teammate, and I'll talk about that in a second, but the reality is AI solutions exist on the spectrum from AI enhanced to AI native, and this calls back to something Alex was mentioning about AI assisted and AI augmented improvements. AI enhanced is when folks use AI to accomplish tasks that you're already doing, but just faster. In AI Native, we seek to do things that we couldn't do previously with AI. Remove steps from the SDLC and potentially scale processes that were previously out of reach due to cost, time, or effort constraints.

Spec-Driven Development and Spec Studio: Generating Code from Specifications

For many AI solutions, the focus is on the development bubble. Alex mentioned this earlier. This is not where engineers spend most of their time. In fact, the studies that we did show an even smaller percentage than Alex quoted, and so we as our StoreGen team focused on activities outside of that bubble. We focused on the things that folks are spending their time on that stop them from being able to code. In addition, we said, hey, there are all these bubbles here. There may be AI solutions for some of them, but it's also really tough to connect them together. And so how can we actually make it so developers don't have to spend time moving things from step one to step two to three and on.

We knew we had to move quickly. Agent Core didn't exist when we started. I wish it had. It would have been great. So we had to use things like Lambda, Bedrock, and DynamoDB to build some of the solutions that you're going to see today. We've been able to provide feedback to the Agent Core team, and now we're shifting many of the solutions we have to that solution to help sort of streamline our approach. We also recognized very quickly that the fact that Amazon was built on AWS gave us an accelerant. The LLMs that we use, typically Claude Sonnet, understand AWS. That means that we don't have to teach it and it lets us move faster with many of our solutions.

So the first of the two solutions I'll talk about today is Spec Driven Development. Alex mentioned a little bit about this earlier today. Dave Treadwell mentioned Spec Studio in one of the innovation talks. I'll dive a little deeper into it for you today. So the idea with Spec Driven Development is that we shift from writing code, maybe abstract one layer further, and we start writing specifications. AI can then automatically generate portions of the code from that specification, and I like to think about this as the 80% first draft version. It's not something that I've launched to production, but it's something that gets me going faster and then I can use it to finish the piece of code that I need.

Specifications are also interesting because they provide context for AI systems as well. So not just for coding, but for other tasks, document writing or status updates. Specifications actually provide a very interesting piece for that, and it does it without disrupting developers because specifications exist in the abstract. You don't need a developer to go look at the code to tell you how it works. One thing that specs do as well is give a unified source of truth for stakeholders.

The team itself, other teams, and other stakeholders like legal or accessibility, for example, in the StoreGen case, can provide the requirements once. Then they're incorporated into the specifications. Now that they're in the specifications, folks that have contributed them have confidence that they can be enforced using AI in the development process. They can also be validated by AI to make sure that they were consistent.

There are two additional things that we think are very exciting about spec-driven development. Matt talked a little bit about some of this in his keynote, but the ability to eliminate tech debt through specifications. Because if you abstract all of the core pieces of the software into a specification, you can then re-implement it in another language without unused code and with new programming patterns. We can also enable consistent cross-platform implementations. So if you have a specification for how a piece of software should work, in the StoreGen case, we could implement a web and mobile solution without having to do the work twice.

So how does spec-driven development work? What does it look like? And how do we handle the fact that we have tens of thousands, if not hundreds of thousands of packages of existing code at Amazon already? We knew we couldn't ask folks to go write specifications. That was never going to work. So we developed what we call CodeToSpec, where we actually take a piece of code and generate a specification from that code, and then humans can modify that specification, can augment it, and add things to it. Then that specification can be translated into code using your favorite AI tool, Amazon Q Developer, for example.

The thing we also recognized at this point was it's not just code that you can generate with specifications. You can generate the documentation, you can generate the tests, and you can generate validations. So you can kind of imagine a case where you generate a piece of code using the specification, but then you use the specification to validate that the code meets the specification. And so you have this nice closed loop cycle for spec-driven development.

This is a view of Spec Studio. As Alex mentioned, this is something we use internally, and so I'm showing it here to sort of walk through the solution. This is the product we use to generate specifications. When we put it out, as I mentioned a little bit, we recognized that there are lots of artifacts that could be generated from a specification, things like system overviews, developer documentation, usage guides, diagrams, and the spec too. But the other things turned out to be super valuable.

Throughout the rest of my talk today, I'm going to talk about a package called StoreGenAgentTimers. This is a really simple package for everybody listening. This is basically effectively like Cron, but as an agent, and it allows someone to basically say, hey, I'd like this activity to happen every 30 minutes or something of the sort. So in this case, this is a specification that was created of the agent timer package. You'll notice here with no human intervention, Spec Studio generated the system boundaries, what the software does, what it does not do, and sort of hidden a little bit on the right-hand side there, it also calls out the AWS service dependencies.

Scrolling a little bit further down in the system overview, Spec Studio pulled out three capabilities from the code: natural language processing, timer state management, and notification delivery. There's quite a bit of detail if you look at these closely and how these are handled, and this was all pulled out of code again with no human intervention needed. One of the things that's great is for developers that are onboarding to a team, they don't have to read every line of code to understand what the software does. They can read something like a system overview to get a quick understanding, and it's a lot easier to parse through. The last thing I'll point out on this slide is the little blue links. Those are citations, and I'll dig deeper into what a citation is from the specification now.

So here's one around what we call clarifications. This is when someone requests a timer using natural language, but they didn't fully specify it. So this is what an actual specification looks like. It's structured, it has a description, it has constraints, acceptance criteria, business rules, dependencies, the things that you would expect to see in a specification written by a human, and something that then you can test against. Have these things been met? But maybe one of the coolest parts is that it's linked back to the code. So here you'll see the little section that says related code.

If I click on that, it actually takes me to the lines of code that were used to generate this part of the specification. This is lines 6 through 46 if you can see them, and what you'll notice is there's a lot of detail that the spec was able to pull out of the code. On the previous slide, one of the pieces that was pulled out was a truncation length. If you look at line 26 here, it's not in the function description. It's not detailed in any comments. It's just in the code, and Spec Studio was able to pull that out and recognize that's a requirement for this particular piece of code.

Finally, for Spec Studio, detailed diagrams can be generated. You'll notice here again callouts for services. This helps with things like security reviews. We can quickly get an understanding of how a particular piece of code works. One thing that does happen on occasion though is these auto-generated specifications can be wrong. The AI can just kind of get it wrong sometimes, and so within the Spec Studio tool, you can actually add feedback and say that's actually not right. It's not just plain English, it's any language an LLM will understand. What's nice about this is this can then be incorporated into future spec generations or code generations. So you can dynamically modify and adjust these specifications in real time.

The last really cool thing, or I guess the last two really cool things, is if you have specifications for all of your packages, which we're headed toward, we can do things like semantically search our specifications. So if I wanted to say find all of the code for Amazon that shows Amazon search results, I can just say show me all of that code, and it'll return all the packages that have to do with showing Amazon search results. It's really powerful in the environment that we work in with thousands of teams. The last thing shown here on the screen is deep Q&A on a particular code package. You can dive deep into particular aspects of capabilities and have this Q&A with the AI about the particular package.

AI Teammate: A Proactive Team Member That Works Across the Development Lifecycle

So we talked about specifications. I'm now going to talk about our second product. We hypothesized that folks typically use AI in this one-on-one scenario. You chat with a chat agent, you invoke an agent, you work in your IDE. We hypothesized that it may be something interesting and unique if we brought AI into a team. And so AI Teammate is different in that it doesn't work on any one person's behalf, but it instead works as part of a team.

So what does AI Teammate do? AI Teammate is a proactive team member. It joins your team and gets connections to all of the systems that a normal person on your team would. So things like Slack or Quip or ticketing systems that you may have. It continuously learns from those systems and it starts to suggest and take action based on what it's seeing happen within a team. It does this throughout the development life cycle just like a developer would, and in the systems that developers are already using.

AI Teammate can handle routine tasks automatically. Things like answering questions for other teams, drafting documents, executing tool-backed actions. These things can all be done automatically. AI Teammate, and this is one of maybe the coolest parts of it, creates this persistent team memory. It maintains context across systems and communications, which allows us to accelerate future work. It brings that context and memory to everything it does. So when it invokes another AI system, it has all of the history of the team. When it invokes another agent or responds to Q&A or does a task, it understands the context in which it's doing that, back to the deeply contextual part that I mentioned earlier.

And finally, AI Teammate allows developers to focus on high judgment activities. Things like architecture, design decisions, strategy, and problem solving. I'm going to walk through some examples of AI Teammate in action. This is showing in Slack because it's the easiest thing to show, and you'll notice some names are blanked out. You can kind of ignore that. But here's an example where a team member asked, "How does the storage and agent timers package work?" Within just a few seconds, AI Teammate could go and query Spec Studio, get the specification, understand it, and give an answer to the developer, something that may have taken a developer a couple of minutes to do on their own, but maybe worse, interrupted another developer to go get the answer for them.

That's cool, but you know, maybe the specification doesn't have all of the context, and so in the same sort of Slack thread you can dig deeper and say, well, what has the team talked about with respect to the agent timers? You can understand the sort of cross between the specification and the team's dialogue, so you get the combination of the knowledge and the specification. Here's an example, and it cites back to the specific Slack message that it's using to reference in this example.

So, okay, great, it can answer questions. It's a chatbot, right? No, I don't think so. Now I want to add a feature to the notifications. I want to add priority levels to it. AI Teammate can help here too. We can give it a document, that's what you see happening on the right-hand side here. It can read the document and then generate tasks based on that feature request. In this case, it says, hey, there are five tasks that I'd like to generate in order to get these priority levels in place. At this point, developers can now discuss and adjust whatever AI Teammate came up with. Once they're satisfied with it, they can ask AI Teammate to actually create the tasks in their project management system.

Sometimes, and this is really cool when you get to see it work, AI Teammate will recognize that the conversation sort of reached this natural conclusion and will proactively ask, hey, should I go create these tasks for you at this point? In this case, we asked it to create them. It created them, and the tasks have now been created in our task system. This would have been time spent for developers or managers to create these tasks, but maybe more importantly, the quality of the tasks is much higher. It's very detailed. It has the context of the team's conversations, the context of the rest of the software that they own and operate.

Okay, so we've got the tasks. Now, now what? Well, AI Teammate can also help us with implementation. So we can point AI Teammate at a particular task and say, hey, can you make a first draft implementation of this? And this is one of the cool things where AI Teammate, because of the proactive nature of it, can also do this while the team is asleep. It doesn't actually have to be invoked. And so you can, again, you can think of these as first draft implementations, and as models improve, we'll be able to do more complex code generations and we'll also be able to improve the quality of it.

Now AI Teammate has completed the code review. It publishes it back to the Slack channel, and also just automatically gets an email to the team saying, hey, I've completed this code review. Can you go ahead and take a look at it? In this case, it's a very simple example, but you can see here it created the definition of priority in the code. And then it created four priority levels. When the developer went to code review this, they said, hey, actually I'd like to have a critical priority level as well. And so they provide the code review comments just like they normally would to any other developer in their team.

AI Teammate sees this and says, oh, I should update my code, and so it updates the code based on the comment and does a revision and then shares back the revision with the team, all with no people needing to do anything in the process other than the code review. Here's a copy of the change. You can see I added the critical aspect of the critical priority level, and the code change is ready to move forward now. As I mentioned, I chose a simple example for today, just to sort of make it easy to understand, but you can imagine more complex examples. AI Teammate has shipped hundreds of code revisions to date.

Architecture, Results, and the Future: Scaling AI Native Solutions to Thousands of Teams

Great, Steve. Okay, we've heard all about AI Teammate. It sounds really great, but how does it work? Like what's actually happening there? At the core of it, it's really pretty simple. It's a lot of traditional scaffolding and engineering happening in the background. We hear about AI and it sounds magical, but the reality is it's magical because of what we build around it, in my view. So in this case, you'll see on the left-hand side event sources. These event sources are any system that a developer may work in and can generate information flowing into the team.

We collect those up in our Lambda connectors, push those into a stream. And then AI Teammate batches these up, and you may say, well, why are we batching them? The batching allows us to process lots and lots of information coming in all at once, in sort of a rational way. So you can imagine someone saying,

hey, I'm going to be out of the office tomorrow, and then one second later saying, oh, I meant today. It allows AI Teammate to compress those and not have to process them quite as often. Anyway, they get into the queue here and the first thing we do is say, let's capture some memories, and this is one of the really cool parts of the way this works. It captures these running memories related to what the team is doing. This allows us to compress very disparate events into one context window so when we're processing an event in the future, we can go and pull these together in one place.

We then go to our next step, which is effectively another Lambda underneath the Bedrock symbol there. At this point, AI Teammate does its first pass and says, okay, this is the context of the team. This is what's just happened. Do I think I can add any value here? This is a proactive part. It decides if it thinks it should do something, and in many cases it says no, actually I don't have anything interesting to do here, and it'll pass and not do anything. It just records the memories and moves on.

But it can decide to invoke a tool, and the tool can be anything your imagination can come up with. We even model things like Slack messaging as a tool, so it's the AI deciding it should send a Slack message or deciding it should respond in a ticket, not us actually hard coding that. On the very far right hand side of this diagram, you'll see MCP tools and agents. This is one of the ways we can scale things like AI Teammate, which is this orchestrator to handle thousands of teams at the company that are specialized in different domains. You'll notice in here too, I just want to reiterate, the LLM is only here a couple of times. There's a lot of just traditional software engineering going on to make this happen.

We recognized with AI Teammate we couldn't, my team couldn't build everything and so we had to build something that would allow us to scale. This is the agent-to-agent connection that I mentioned earlier. It allows us to scale the expertise by embedding organizational knowledge into reusable agent capabilities that can then be invoked by AI Teammate. It allows AI Teammate to coordinate complex tasks autonomously, breaking down work and then orchestrating across specialized tools. This reduces human handoffs and interruptions. Agents communicate directly with one another to complete multi-step processes. I alluded to this earlier when I said no longer do developers have to carry around context from tool to tool. AI Teammate will do that for them.

Last, it maintains a consistent context across all activities from planning to execution to documentation. It understands how we got there. It uses the team memory and historical decisions to do that. Okay, so I set out what we were, what we were tasked with in March. We talked about a couple of the solutions. Where did we end up? Well, we ended up seeing four times the number of features shipped to customers for a subset of our teams by, I think it was November. We saw a reduction in operations and routine tasks. We saw an improvement in quality, and we saw implementations compressed from weeks to hours in many cases.

We did all of that, but we learned along the way, that's for sure. At the beginning, we focused on efficiency. We said, how can we make developers more efficient? And as time went on, we said, wait a minute, why are we focusing on efficiency? Let's focus on reimagining how we do the process all together. Let's figure out how we can do things we never could do before and so it allows us to tackle previously impossible challenges. We started with what we thought were clear measurable experiments, but we recognized we had to stay flexible in approach. It turns out, as I mentioned, our plan didn't go as planned. We had to stay flexible and progress was more continuous than we had initially anticipated.

The last thing I'll say is that AI native teams are a collaboration between AI and humans. They work together, they work in parallel, they unblock one another, and they help reduce operational overhead as a unit. Where are we going from here? What's next? Well, we've already started work to scale our AI native solutions to thousands of teams inside Amazon. Dave mentioned this morning, our plan for 2026 is to have 75% of the teams in stores using these solutions and reaching those productivity gains that we talked about. We've also started to utilize Quiro for accelerating code development. With that, I'm going to hand it back to Alex to talk about some of the key takeaways.

Thank you. Steve shared a lot of good insights, so I just wanted to take a second to do a recap. Software development, as we all know, is evolving and changing fast, so it's important to understand how the technology can help you. We're going from what we were mentioning, AI-enhancing tools to building, maybe not writing a line of code in a feature or a package that you're building. And how adopting AI development techniques can help you build more towards that, but as Steve was saying, zero to 100 towards that native development applications and things like that.

Evaluating the new capabilities as new things come up will push the frontier forward. But the most important thing across these two things is that if the results are not consistent, if you don't have a standardization across teams, building that memory that persists becomes critical. Developers come and go, but the package stays, so how will you make sure that those things stay and you're able to deliver faster or dream bigger? Those are the three things that I just wanted to leave you with.

Thank you so much for spending the afternoon with us. I know it's happy hour time, so we'll be happy to let you go. We'll stay here for a minute if you have any questions. If you'd like to talk about any of these things, we're happy to answer questions. Thank you so much.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community