Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Introducing AI driven development lifecycle (AI-DLC) (DVT214)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Introducing AI driven development lifecycle (AI-DLC) (DVT214)

In this video, Anupam Mishra and Raja from AWS present AI-Driven Development Lifecycle (AI-DLC), a methodology for software development using AI. They identify two common anti-patterns: the AI-managed approach where developers expect AI to autonomously build complete systems, and the AI-assisted approach where AI handles only narrow tasks. Based on over 100 customer experiments, they introduce AI-DLC with practices like mob elaboration, semantic context building for brownfield projects, and adaptive workflows. Key insights include maintaining high semantics-per-token ratios, understanding model training limitations, and ensuring developers understand every line of AI-generated code. They demonstrate fixing a FastAPI issue using Amazon Q Developer with AI-DLC steering files, completing in hours what typically takes much longer. The methodology achieved 10-15x productivity gains with customers like Wipro and Dun, emphasizing that velocity must accompany quality and predictability.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: AI's Impact on Software Development and Common Industry Patterns

Hello, everyone. Thank you for coming to the talk on AI-driven software development. It's an exciting area with a lot of development happening, and we are going to share many learnings from working with several customers and with our own teams building software using AI. My name is Anupam Mishra. I'm a Director of Solutions Architecture at AWS focused on building an AI engineering practice at AWS. With me, I have Raja.

My name is Raja. I lead a team called Developer Transformation in AWS. My focus is on developers. I'm glad to be here. Thank you. We both have a lot of experience developing software. I have been with Amazon for 18 years now, building software for several teams.

Some of what we are sharing comes from our own experiences, some from our engineering teams, and some from experiences we had with our customers. We're looking forward to having a great session together. We'll also have some time for Q&A at the end, which we'll do outside this room. With that, let's start with how AI is disrupting software development. There's a lot happening in this space, with several tools emerging every week, but let's start by understanding who we have in the room.

How many of you are developers writing code every day? How about product managers? Great, there are a few product managers as well. How about people working on DevOps and infrastructure? We have awesome diversity here. Are there any roles we didn't call out? What is it? Engineering managers, of course, and CTOs and VPs of engineering—several leaders thinking about how to navigate the opportunities and challenges ahead of us.

How many of you are already using AI for software development? About half of you. Great. And how many of you are happy with what you're seeing with it? Some of you are, and some of you are not. I hope you'll have clarity from our learnings, and we'd love to discuss your learnings as well. Let me summarize some of our discussions with more than 100 companies across the globe over the last year.

We've worked with early-stage startups, Fortune 100 companies, service industry companies, and product companies. We see three common patterns depending on who we're talking to. Many engineers say they hear that AI is disrupting everything, but what does it mean for us? Should we be doing different training? Should we be using different tools? Should we stop doing what we're doing? There's a lot of confusion. The second pattern is a proliferation of tools. There's no shortage of tools being built using AI, so people are confused about switching from tool to tool. Which tool is best? We keep moving from one tool to another, and by the time we start using the second tool, we learn that the first one was better, and we hope the second one will be better. There's a lot of confusion. The third pattern is leaders of large teams saying they want to make their teams AI-native, but what does it mean to be AI-native? What should they be doing? How do they take it to thousands of developers on their teams? We're going to share some observations on these questions.

The Productivity Paradox: Research Findings on AI-Driven Development

Before we start, there's something important to understand. While there's a lot of talk about productivity gains and making software development easier with AI, ThoughtWorks shared research a few months back showing that velocity gains are actually 10 to 15% when software is built using AI. This is based on practical analysis. We also see evaluations from nonprofits. You probably have heard of Meter.org, which evaluates LLMs. They did an experiment with about 16 open-source developers working on an open-source repository. They divided them into two teams of 8 each. One team was given tools to use AI, and another team was asked not to use AI.

They gave similar sets of problems to both teams—about 250 issues. The developers who worked with AI were asked how productive they were, and their answer was about 23%. They worked on this further and thought maybe they were 20% productive. But when the actual analysis was done comparing what the team without AI did versus the team with AI, the analysis showed that the team which used AI was actually 20% less productive. This is not to say AI isn't working. This study was done in early 2025, so take it with understanding that LLMs have evolved a lot. At the same time, there's a paradox in how we measure productivity—what is perceived productivity versus real productivity.

There's a lot to think about in terms of how we measure it. Now let's think about why it's not working. There are many anti-patterns, and in our one-year research, we grouped them into two major buckets of anti-patterns. Predominantly, developers are using two broad approaches today when it comes to using AI in software development.

Two Anti-Patterns: AI-Managed and AI-Assisted Approaches

The first approach is what we call the AI-managed approach. In this approach, when there's a very complex problem, a large codebase, or a very ambiguous problem to solve, developers throw this problem to AI and expect that AI should work autonomously and build the software end to end. That's the expectation, and it's a very ambitious starting point. I believe this is going to be the future, but right now that is not the case. The starting points are always ambiguous, and AI makes a lot of assumptions. What happens is that this approach of throwing a problem to AI in a single shot and waiting for AI to solve it seldom works except for very small prototyping scenarios or simple scenarios.

If it's a production-grade application where hundreds of design decisions are made and people have to collaborate with each other, this approach seldom works. The worst problem is that there's a lot of code thrown at the developers, and they have to put their name on the source code as the author. The confidence level is low because suddenly so much code is thrown at people, and therefore this does not go to production at the velocity we expect. That's one of the reasons why these studies reveal that while you can throw code much faster, the entire SDLC slows down anyway, so the productivity gain is only a small step increment. It's not the paradigm leap we are talking about.

The second approach is the extreme opposite. Senior developers, having tried the AI-managed approach, decide to take it over. They say, "I'm going to do the task breakdown, I'm going to plan it, and then I'm going to insert AI in some narrow areas." So they narrow it down by saying, "Code this function for me" or "Do a security review on this small piece of code." It seems to work very well, but the problem is that the intellectual heavy lifting is done by humans, which is the same as before AI.

The velocity gain once again is not great. Additionally, the processes they follow are pre-AI era processes. There's a lot of human interaction, people throwing documents at each other, and many meetings have to happen to resolve issues. Therefore, the time saved in the portions where we use AI is wasted in scrum meetings and various other meetings which are no longer relevant. These are the two broad approaches we discovered as the root cause. How many of you are seeing this?

Having seen this, what do we do about it? We started on a journey recognizing that these are the two broad patterns. Engineers who start generally see two outcomes. Either they say AI doesn't work and forget all the hype because it's not useful, or they switch to the second pattern, which is AI-assisted, and they start seeing some value. But we went on a journey to see how we could build the paradigm leap, how we could build twice the productivity, three times, five times, ten times productivity. What does it mean, and how do we get it?

Experimental Journey: Building AI-Driven Development Lifecycle (AI-DLC)

We started creating several experiments. Some of the experiments were internal, where we would take different types of business problems and build software. We would get together as a team for three or four days, build the software end to end, and see how we were using AI. We tried different tools, all the tools you can imagine. We tried things like having clear business context. We tried green field problems where everything is new and a new startup is getting launched. We also tried brown field problems where we take a large, complex open source repository and see how we add features to it and how we fix bugs in it using AI.

Those experiments led to a lot of conclusions and clarity. Then we started going to customers. We said, "We have come up with some conclusions. Can we solve a problem which you are already solving with you?" That was a great way for us to keep learning. We've done more than 100 of those experiments where we go to a customer, work with them on solving their own problem, and then see how fast we can do the work which they already thought was important to them in a much shorter cycle using AI.

Let us share some of the learnings from those experiments. We call this set of learnings AI-DLC, which stands for AI-Driven Development Life Cycle.

This is a set of rituals, tools, and roles working together to create great outcomes for customers while also working with systems at scale. We're not just creating simple examples, but creating systems that really work at scale in production-grade applications, working as if an engineer has written it rather than an AI has written it.

Core Principles: Plan-Verify-Generate Cycle and Mob Elaboration

Now let's think about what the core principles of this method are. The first one is understanding how AI works. We have seen that when you ask AI to do work, it generally tries to be very helpful. When it is very helpful, it does things that you don't want. For example, if you're asking it to create a system for a shopping website, it will create authentication for you, log management, and different applications that you don't need. Generally, it will try to create the entire application. But to control it, you need to understand what AI is going to do before it does it.

This is a cycle where we ask AI to create a plan, humans validate the plan, and that's where we do course correction. We identify the things we don't need, the things we really want to do more of, and the assumptions that shouldn't be made. This process brings AI's brain and human brain to the same level. The human is thinking this way, the AI is thinking that way, and now they move in the same direction. AI keeps refining the plan, AI executes the plan, and humans again verify the output. That way, when AI is doing the work, humans are completely aligned that this is exactly how they would have worked anyway. So AI is working as if a human has done the work.

But before we go there, let me ask a question. Where does time go in the SDLC? How many of you think that most of your day, if you're coders, is spent writing code? Nobody. That's good. How many of you think that most of your time goes into meetings? Yes, about half of your time. That's the reality of life. Nobody talks about it, but software development is not about just writing code. Many people think that if we make coding faster, everything becomes faster. That's not true, and it manifests in many ways.

In SDLC, generally not by design, we have landed in a world where everybody waits for everybody. The security team acts as a gatekeeper for the software development team, asking where your threat model is and what analysis you have done. The operations team is probably waiting for the software development team to release new code. The QA team is waiting for a new release. There are cyclic dependencies, and all of these lead to escalation meetings, alignment meetings, something planned, but then people saying what you have done is wrong, now go back and redo it. So much wastage happens, and we thought about how to reimagine this in a completely different way.

Humans work better when they're together, when they're coming together and sharing their experiences. The cost of interaction is low. Right now, for example, if I send an email to my colleague, they may respond in one day, five minutes, or ten days. If I'm blocked on that information, how do we short-circuit the dependency that we have in humans with different types of backgrounds? Some are business people, some are very good at coding and technology, and some are operations people. How do we create a very synchronous and very good way of communicating so that we don't waste time?

That's the other model we thought about. For AI to work well, we need to solve the human communication problem as well, because a lot of time is being wasted in humans not interacting in real time. Code can be generated very fast, but everything else takes so much time. That's a very fundamental reimagination itself. If you really think about it, Agile in the last twenty years did not happen by itself. Why? Because the sprints were longer, two weeks or four weeks, and that led to natural sequencing of work. A developer has to wait until the product manager releases the stories to that person, and then the rest of the people have to wait. That's everybody waiting for everything, a cycle that was demonstrated.

But in the AI world, the sprints should not be two weeks to four weeks. They should be hours or less than a day. If that is the case, it's really possible for us to bring people together and use AI to make the decisions right then and there and then move forward. One of the rituals using that principle is what we call mob elaboration. If a product manager has an intention at a high level, that's enough. Once the intention is ready, we put product managers, developers, QA, and operations all in the same room. In a matter of four hours or half a day, you use AI to refine the intention, use AI to create the stories, and everybody is offering their insights and validation.

You're able to sharply end with everybody agreeing on what we're going to build, what the stories are, and more importantly, AI also gets great context. That's the alignment of human versus AI and human versus human all compressed in a short span of time, and that's one of the rituals that we keep practicing with customers. Every time we do that, this is a new way of working that creates a jaw-dropping moment where people realize we can really do this now. That's one ritual that changes everything.

The outcome of this exercise is a set of stories divided into units of work. These units of work generally take many months, depending on the size of the company. Sometimes it takes a few months. We've heard from customers that the work they do in these three or four hours takes them multiple quarters depending on the complexity. So it's an amazing practice.

AI-DLC Methodology: A Reimagined Development Framework

What happens next is that after the general requirements are well analyzed and broken down into multiple pieces that can be built, construction starts broadly. That's also a mob ritual. We don't let teams be separated and build on their own. Now, if you've noticed who are practicing AI, we don't need two-pizza-sized bigger teams anymore. The team sizes are shrinking and AI is playing a bigger role in that. Therefore, you have smaller teams, cross-functional smaller teams with full-stack developers, one business person, and one specialist at one desk. It's more like a single-pizza team.

Such teams grouped together who are tackling the entire system get into a room and they're just building it very fast with the same methodology that we're going to give some more details on. These teams interact with each other through the APIs. They're sitting co-located. Co-location doesn't have to be physical; it can be virtual as well. But the important point is people coming together, synchronizing their calendars, and doing it at the same time with AI, moving rapidly fast. This is the reason why you're able to go beyond two-times and five-times velocity, and we have proven this is possible over a hundred times.

Putting it all together now, we don't want to throw these best practices and rituals as individual recommendations. We put them together into a reimagined new methodology which we call AI-Driven Development Lifecycle. The differentiator here is that today, if you look at pure tools-based approaches, they are more geared towards individual developers hacking and building stuff. What enterprises need is a collaborative method where different teams come together, make a lot of decisions, and build things very fast. Therefore, you need a methodology, you need practices, and you need people, process, and tools all coming together to do this.

Like any typical methodology, this covers end-to-end inception, construction, and operation, and it's iterative in nature. The important point is it's reimagined. We don't retrofit AI into existing agile. It's a newly imagined method that gives this breakthrough.

There are three phases. Obviously, you will receive some resources later for you to study deeper. Inception, construction, and operations are broadly the phases, and you have some stages within each phase. Typically, the nine stages represented there are very typical of this number of stages. Each stage goes by the cycle that Anupam pointed out. Each stage you're able to plan that stage, verify and generate the output, and then you verify and validate. The stage ends and each stage leads to the next stage. The context that each stage produces gets richer and richer semantically as you go down the stages, and the AI's output improves as you go down the stages.

One important point to note is that these stages are not one-size-fits-all workflow. It is adaptive. The reason is simple: a defect fix doesn't have to go through all nine stages, but if you're adding a new function, a business-rich function to an existing brownfield, that has to go through most of the stages. Greenfield will go through all stages. So this is very adaptive. It's a real AI-first methodology where we get AI to plan this out, and AI is going to recommend what stages are important for a given workflow or given intention. You will be able to apply your oversight and make some changes if needed, and you can set that into motion.

The way I think about it is if an engineer sees somebody thinking about a problem, to that problem getting well defined, to that problem getting designed, to unit tests being written, to the code being generated, if they see everything, they feel like this is their thing. They don't feel like it's somebody else's.

That is a great outcome of this as well. People who work in this way can see that I'm using AI across the chain, but I understand everything. I can own it. I can maintain it. I can fix bugs in it.

Customer Success Stories: Vipro and Dun

Let's talk about some of the stories from some of the customers we have worked with. This is an example from a system integrating company called Vipro. They have been working in several domains, but we worked with them on one of their customers where they created three distributor teams in three different countries, and they had a few months of work planned. All the teams brought their work. They were able to work for four hours for five days, which is twenty hours in total, and in those twenty hours they were able to finish all that work in an enterprise healthcare space. They shared that it is not just a faster way to build, it's a better way to build. Better here means better quality, better understanding, and the team having a much better feeling about the work, with a lot of excitement in the room. In fact, some customers I've heard saying, I wish we continue working in this way tomorrow. I don't want to go back to the old way of working because you're seeing actions happen rather than a lot of slowness which we see in organizations because of so many people not being available, dependencies, and everything just happens here.

The other example is a fintech company called Dun. They are in a stock trading space and they built a completely new application using this. They had planned to launch it in two months. They built it in forty-eight hours, a matter of two or three days, and they were able to release it the next week itself. There are several stories like this. We have worked with customers in Japan, customers in Asia Pacific, Europe, and across many industries.

But let us also share that it's not just about method. It's also about how we use AI. Sometimes we think of AI or a tool as the best tool and we can throw everything at it and it's going to be the best thing, but working with AI is also a skill. We have learned several things over the last many years, and let us share some of those learnings with you.

Best Practices for Working with AI: Code Understanding to Model Training

The first learning is about code. Probably everybody has heard of AI coding. The term has become more popular than it should be, but I think some people misinterpret it as well. They think they can ask AI to build something and keep asking it to fix it until it finally works. But if you don't know what the code is doing, if you don't understand what dependencies it has, how much duplication it has, how would you feel comfortable owning that? I think the world is coming to a conclusion that it's not great for building production applications, and people are redefining what it means.

Our takeaway from this is that if an engineer is working on an application, they should start with a goal that they need to understand each and every line of code. They should be able to debug this code. If you don't understand the code end to end, how can you be comfortable putting your name on it anyway? Because somebody will say this person did this check-in. You can't say AI did this check-in. That is a fundamental premise of how you work. You have to understand it, otherwise there's no value in moving fast.

This is again a best practice that the methodology inherently follows. For example, if I ask Anupam's example of asking AI to build me a complete e-commerce platform, if I give that task, what is the meaning of completeness here? Should there be payment? Should there be shipping? If so, what are the shipping channels? What are the payment options? There's a lot of assumptions to be made here. So the task for AI should not be broader like that. Instead, if I say, hey, here is a piece of code, check whether there is an SQL injection violation happening in the vulnerability there. This task is very narrow. What to do and where to do it is very clear.

Therefore, what the methodology does and what you should follow also is to ask AI to decompose the tasks. Decomposition should be pretty non-ambiguous and narrow. If AI executes the decomposed task, the output is always better. This is something that the method follows and it's the best practice.

The next learning we had is about context windows. How many of you feel that context windows should keep increasing? Some of you get it, because I think the bigger context window means I can throw more things at this and maybe it will understand more things. But our learning has been that more things sometimes is not good. AI gets confused if you give a very large code base and ask it to make a change regardless of whatever tool. How many of you have seen it touch and change so many files and go to infinite and you feel like, oh, stop, stop? That has been our experience as well. You have to manage context windows carefully. More context doesn't mean better. So we trim context.

We think about what non-relevant information is still sitting in the context, because you may have chatted or done an interaction one hour back, but is that interaction still relevant? Maybe not. So it's better to clear that context so AI doesn't get confused. Also, think about when you reset and how much context is already there. By the way, in Q and A there's a slash context which shows you what is in the context. You can compress context, but I think it's important to think about every interaction you do. Whatever interactions you have done so far, all of them are going unless you clear the context. So it's very important to understand: am I confusing the AI or am I simplifying it? The ethos we have is simplify AI's job and it'll simplify your job. So it's super important.

Asking AI to mimic existing code is fundamentally effective because LLMs work through an attention mechanism, which is a pattern system. It looks at sequences of tokens and patterns lead to generation. That's how it works. So it's rather accurate when I point to source code and say, "Here is a reference. Build another service using that as a reference." In that case, the authentication that is there in that code, your logging mechanisms, your error handling method, everything gets exactly followed in the new service also. It's great to do it that way rather than describing it by saying, "Build me a new service with X authentication, Y logging, and Z error handling." That's not the approach. This is inherently the case with brownfield projects. I'm sure you would have played with AI on brownfield and you might find that it's so difficult to handle, but the idea is if you have the right context built in this way, pointing A to that and making things work, that's going to be very, very accurate.

Let's move to the next learning. Because we use AI, the release velocity also increases. If you're really successful in this, and that has been our experience, if you're able to do, let's say one month of work in one week, and on average you used to have one bug every month, would you now have one bug every week? Because the same work is now done in one week, right? In theory, if you keep the same quality level, the quality standards have to increase for you to maintain the same level of bugs. So everything needs to change at the same time. The way we can fix it is by having a very comprehensive set of unit tests and integration tests. This helps in two ways. When AI is working, it can use those tests to decide if it's going in a path which it shouldn't go, and it can course correct a lot of it itself. It can also understand how to know the definition of done and what are the things which it should be doing. So it's super helpful. It increases your release velocity and also allows you to produce more code, knowing that your tests are going to cover as much as possible.

Again, this practice of semantics per token ratio is important. As an example, if I say "refactor using builder pattern," that's just four words, and these four words are pretty rich in semantics. Refactor is rich in semantics. Builder pattern is great semantics. So here the semantics to token ratio is pretty much 100%. I can say the same thing in a very long way: "Build me a utility to create a complex object by calling many setters, each returning that object, and finally build me a complex object." That's so many tokens, and the semantic meaning is very low in that. So part of the reason why AI struggles with large code bases is because there are so many tokens with low semantic meaning due to the boilerplate code we have. So that's another takeaway: every time we deal with AI, whether we are instructing or whether we are building context, we need to compress the semantics ratio in those tokens in those inputs. There are techniques to do that and we can chat about it later.

Instead of throwing code, the meaning of the code can be sometimes much more valuable. We're going to share some examples maybe later. Let's move to the next learning. It's important to understand how the model was trained. This is some of our own experience in this. How many of you have programmed in Golang here? Keep your hand raised. How many of you have used domain-driven design in Golang? Nobody. Yeah, because what we came to a conclusion on is that languages which have come or been introduced in the last few years and patterns which were much more popular in much older times are no longer popular. So while you can find a lot of code in Java using domain-driven design, for Golang it doesn't exist. So asking AI to use domain-driven design in Golang is not going to work.

We tried it and we saw it tries to create some things which a developer would say is wrong. It's important to understand what the practical reality is of how these programming languages work and what type of design patterns apply to one versus the other, because force-fitting it is not right.

Organizational Enablers: Flow State, DevOps, and Measuring Effectiveness

I think we have seen this multiple times, consistently. In the past when we were developing, you might have gone to Stack Overflow or Google to do your research, copy and paste. That's very distracting. You are out of your zone, out of your IDE, doing all those things. It's very hard to be in the zone longer in the past, but it has given us this great advantage that you can pretty much get into the zone anytime you want because you are not going to multiple places. You are just focused there.

But then what happens is you need to have a contiguous amount of time to work with AI, give the right context, give a judgment, give a validation, verify the output, and there is real jamming happening there. So often what happens is because you need to cut and go for a meeting, as developers we lose the flow state and AI also loses the context in terms of continuity. When we get back, it becomes an anti-pattern actually. So what we recommend is that whether it's an individual developer or teams, let's have a contiguous amount of time blocked without distraction, and that's a great boost for our productivity, which is what we have seen.

We have seen some of our Amazon engineering teams who use this have started doing no meetings in the afternoon. Of course it needs your leadership backing, but it works very well, especially because the pace at which you're working is much more different than what it is without AI. How many of you have a very working developer environment, a dev environment where integration and everything works? Raise your hand. About half of you, yeah, it's pretty common in Amazon also. Not every team has a very good dev environment, but it's very important actually.

When you're moving fast with AI, you would be releasing to dev, integration, pre-prod, whatever you call it. You're going to test how it interacts with these dependencies, how it works with this service returning some specific parameters. But if you don't have those working, AI will go very fast, but you will not have a way to test it. You will not have a way to move fast, and you will have unnecessary gatekeepers and blockers where the space which you have gained will stop you.

So there is some homework and organizational change needed where everybody builds a discipline around how we have end-to-end working dev environments, not just for my team but for dependencies as well. Over time, you have everybody feeling confident that I'm allowing my consumers to do integration testing and dev testing, and that becomes a platform. If that becomes a platform, your pace of release can increase a lot. I think this is a continuation of the same also.

The real reason to have well-oiled CI/CD pipelines is that you're going to get fast feedback from production, from staging, and all of that, and that's a very important DevOps concept. Fifteen or twenty years ago, we know this very well, and yet what we see repeatedly is the CI/CD pipelines are not broken or it's not matured enough. Now in this era, this is going to amplify the problem because the initial stages all the way to coding and unit testing is going to happen very, very fast and then your CI/CD pipeline is going to block things going into production.

There is a backlog building up from the development side. So when the feedback really starts to come from staging about a defect or production about something not working, your dev has moved multiple versions ahead because that's happening fast. So it's going to really crush everything moving forward. It's extremely important that we advise customers to save your time in overall productivity, but take that saved time and really invest in QA and CI/CD environments and make it smooth. If not, once again, you'll only see a step increment in velocity, not the paradigm leap, and you can use AI to do it as well.

Absolutely, moving to the next one, this is the most interesting one which we see from a lot of customers. AI works for greenfields and new use cases, but I already have so much code sitting in my company. Can it really work there? Can it use the validator for email which I have already written rather than an open source one? Can it use my existing libraries? I would say it can. We have tried it many, many times, and it does.

There is a big problem though with just feeding a big codebase and pointing AI to that codebase. We have seen some challenges, and this is getting better with time. At the same time, we have seen how we can make it easier. I think I was asking this earlier: how many times have you had a large codebase, you're making a change, and AI goes in an infinite loop changing almost everything.

You end up thinking you only needed to make one change, but you've ended up modifying 20 files. What we've seen is that to solve this problem, you have to build some semantic meaning of the code. We have utilities which we will share with you where you can build context about the code. If you have a large code base that doesn't even fit in the context window, you can make it work by having semantic meaning of the code, which could be the call graphs, the classes you have, and what each function does. AI can use this to decide what should be loaded in context.

For example, if you have a shopping app and somebody says the order flow should change, how do you know which files really need to be changed rather than making changes everywhere you see the word "order"? Decomposing the problem is super important because if you have a large code base and AI is working on everything, we've seen it get confused. If you keep the scope narrow, it becomes very easy to make it work. We have a workflow which we'll share that can make some of this simpler. We bake this as part of the AI-Driven Development Lifecycle methodology so that all the techniques of building that context from large brownfield projects are already available as part of the method, and the tools will work accordingly.

I'm sure we're all using a lot of MCPs, and our teams are all building a whole list of them. At Amazon, we have a flurry of MCP tools. Sometimes one MCP server will have 100 tools inside it. Unfortunately, what happens in today's technology is that this takes the context away. Every interaction between the agent and the LLM includes the tool descriptions, which takes like 60 to 70 percent of the context away, and that's not good. It's changing, but right now it's important that you disable those MCP servers that are not needed manually. I'm sure in the future systems will be intelligent to block the unnecessary ones, but right now we have to manage it. If not, your precious context is taken away by this, and you can't really do deep work.

A lot of people ask how we should measure whether AI is really working for us, and it's a much bigger topic. Traditional metrics don't work very well. You can say number of lines produced or code accepted, you can talk about mean time to repair and several other things, but I think the best way, or at least the way we have seen working, is to create a baseline metric. This could be how much time it takes from when you as a business leader or tech leader decide to build something to when it gets launched. You can do an AB comparison of AI being used versus not, and it gives you a very good starting point, especially when you're thinking about whether this is really a way to work and whether it's really a better way to do work. There's still a lot of work happening in how we measure how effective AI is, and we've seen that end-to-end productivity gain removes a lot of metrics which can be gamed to make it work or make it not work.

Strategic Considerations: Technical Debt, Experimentation, and Outcomes

Rewriting and patching has to do with technical debt, right? All of us are dealing with systems with technical debt that's taking your sleep away. There's a 2.4 trillion dollar problem in the US alone based on technical debt. In the past, we had an inertia to rewrite applications to escape technical debt because rewriting is a multi-year problem and it's very serious. AI is giving us a new lease of hope that we are able to rewrite much faster. When you deal with situations where you are thinking of just patching or just doing an upgrade, if you have a lot of technical debt, please reconsider because in a matter of an additional two weeks, you probably are able to rewrite and escape from a lot of technical debt.

Methodologies like the AI-Driven Development Lifecycle will help you move forward confidently, having accountability of existing rules and the new systems and new functions. This is a time where we have to reconsider the old inertia. We've seen some teams building something from scratch much faster than repairing it, and maybe you will find some cases like that. How many of you feel that AI is like a senior engineer? It's not a senior engineer. Sometimes it feels like it's telling something very confidently, so it seems right, but at times it will say something very confidently, and that's wrong. You shouldn't hesitate to question it.

When you question it, it will in a very polite way say, "Oh yeah, it looks like you're right, and I should change this." What we have learned is that sometimes AI will go in directions where you don't want it to go. We think of it as an intern who has a lot of ideas and thoughts, but a senior engineer needs to help and guide it. You are the owner, actually. As an engineer, you are the owner because whether you use AI or not, your name is on the check-in. So it's very important to remember that.

Question and don't hesitate to question. When AI would work much better, you're accepting everything it tells you without questioning. We are now in a different world where things are changing very fast. A lot of customers ask us, "Has there been a company that has done this and experienced 20x velocity? Only then will I move." That's not the right approach. There is no gold standard yet. We all have to do it, learn, and evolve. Although we have the methodology, and it's not like we've had it for 10 years, we encourage customers to safely design experiments. Safely design your metrics and then just jump in, use the methodology, and do it. If you don't get the results, it's okay to iterate and change your ways of working to make it work. There is no turning back, so let's not have the inertia to stand outside. It's about time to jump in.

Learning with AI requires real hands-on experience. You can keep reading books about how to code, but unless you start coding, you will not know it. You can keep reading about AI or keep doing things, but unless you do it yourself, you will not learn. I would say spend time, and you may say that you went to this talk and learned five things which are very valuable. Let me try them, and maybe you'll come to your own conclusions that three things were very relevant, but these two things I have my own understanding about. I would highly encourage you to practice, and only then will a lot of these things internalize.

Working with AI sometimes feels the same as writing code, but I have felt, and this is just an observation we have, that you will learn about AI more as you start working. You will know some of these things, like where it works best. For example, one recommendation Raja shared was to give it small tasks rather than large tasks. What is small? I think only by doing will you know this is small enough for it to do well, and you will have those learnings. The conclusion of all we shared is that ultimately an engineer, your name, will be on the code. If your name is in the code, how would you feel comfortable that whatever has been done, whether AI is used or not? How can I feel comfortable that I can defend this code? How can I feel comfortable that I will be the person getting a page in the night and knowing how to fix it?

So far, I think that is the best way to create something which is production grade where you understand whether I would have done exactly what this is. It gives you the confidence that whether it's done by AI or not, I know this is the way to do it. The customers that practice a methodology like this, where they start from the top and verify and validate every step of the way, they have a better affinity to the final code that's coming up. That affinity and confidence is the barrier breaker that lets this code go to production. If you just throw a problem and let AI generate everything, it does not happen. That affinity and confidence doesn't come in.

A lot of questions come to us on how we measure the effectiveness and what the outcomes are. We analyzed it and there seem to be a lot of outcomes that come from AI. These are pretty similar to the agile era outcomes that we expect. The top outcomes that a lot of engineering leaders such as yourself and developers expect are velocity and quality. Velocity doesn't matter if the quality is poor, so they have to go hand in hand. Predictability is another top outcome that we expect. Predictability is defined as the number of sprints I commit versus the number I deliver. Before AI, it was pretty much like 20 percent. We commit 10 things and only deliver 2. With AI era, the predictability rate is more than 80 percent. It's just the beginning and it's going to go to 100 percent also moving forward.

Those are some of the typical metrics, but these things will change also. As Anupam said, once everybody reaches the same velocity, then we will move the metrics to business value or something else. For now, those are the ones. How do we really put an experiment in our organization? We don't have to overengineer this. You take something that you have in the backlog and then use your original ways of estimating it, whether it's user stories or your own approach of estimating it. Have that estimate done and then do the same work with AI with this methodology, and you'll obviously be able to compare where you are. Is it 1.5x? Is it 10 percent or is it 10x? You'll be able to compare it. By doing it repeatedly and widely in an organization, you will average and you will know that this is giving you 10x productivity or 11x productivity, and you'll be able to conclude. It's very simple. There's no need to overengineer this.

Live Demo: Fixing FastAPI with AI-DLC Methodology

So Raja, maybe let's move to a demo of fixing a problem. Let me do a very quick one, and usually these kinds of demos are much longer, but we want to set the context in. We took a use case where FastAPI is an open source library available on GitHub.

We took one of the issues that was submitted asking whether it could support the HTTP QUERY method. This is what was submitted by one of the users. Raja is going to share how we use the AI-DLC to approach this issue.

The problem was that FastAPI didn't have this capability where the output is a lot of JSON, but what you want is a small portion of it, so you want to filter it. FastAPI didn't have this feature, and this was the request that came in. What we thought was let's use our methodology coupled with Amazon Q Developer. Let's see if we can tackle that kind of a pull request, make it very fast, and deploy it.

We have distilled the AI-DLC methodology into steering files. How many of you know what steering files are? They are rules. Basically, it's a way to take an agentic coding tool and customize it to follow our workflow. Without it, the agentic coding tool might do a managed approach and whatever, but you can customize it. These steering rules are open source. Later you will see the resources with the ideal steering rules. We use that, and as I said, AI-DLC is a very rapid workflow. Typically it goes through many stages, but most of the stages are not needed. If it's a defect fix, you don't have to do all the stages. The workflow will adapt itself. Later I will show you how that happens.

This is how we start. This is Amazon Q Developer. It works the same for Codeium and Codeium CLI. You will notice the rules are downloaded. The Amazon Q folder is where the AI-DLC rules are. Later I will give you the resource where you can download it. Then all you have to do is set everything up. We have downloaded the FastAPI code. We have cloned it, and then we're going to start giving some instructions.

On the left hand side, that's what the intention is. Our intention is to support this feature request, and the URL is listed there. The FastAPI URL is listed there. That's all that is needed. Now, what happens is with that as the starting point, because of the steering files and the custom rules we have, the system identifies that I want to use the AI-DLC process methodology to solve this. So it shows a welcome message saying I'm going to use the methodology. I'm going to follow a typical workflow like this. Let's get started.

The first thing it does is identify that there is something existing in the workspace, and it has to reverse engineer it. That's the first step in the AI-DLC methodology, and that's the brownfield problem that Anupam talked about. You have to make semantic rich context from it. So automatically it's building what the structure of this application is, how the core structure looks like, what the data flow looks like, and what are all the existing APIs that are supported in it. How is the business context looking like? Who are all the people involved? How does the whole thing work? Then it will also go into a very important reverse engineering concept. You see that FastAPI uses Starlette to implement a web server, so it understands everything in a very sharp way. And then that's the library of components. What are all the important components and what functions they implement? How do these components work with each other? So that's a beautiful, rich semantic context.

So you have now looked at hundreds of thousands of lines of code, and from there you have reduced it into a semantic rich context. From here on, AI will use this to understand where to make the changes. The biggest trouble today is that in a large code base, if you just find the code base, it is going to struggle with where to make this change. So it uses grep search and all kinds of non-semantic tools, and it struggles with that. That's the biggest problem. We're able to solve this problem by building this semantic rich context. The AI-DLC does it automatically. The moment it senses there is a workflow and there is a workspace, it does everything automatically.

With that context built, now you know it has done in the middle panel you will see workspace detection and reverse engineering. It's all done, and it's going to go into understanding the requirements a little bit more deeply. As we said, as Anupam said, AI has to give you a plan. This is my understanding of what I'm going to do. So it's giving, and for that it has to clarify with you what are the requirements. So it's asking a lot of questions. The HTTP QUERY method, what standard should I use to implement it? So it has a list of very important questions to ask without making any assumptions. I'm going to say, hey, use a particular standard, and I'm going to paste the URL of that standard. That's the URL where you can find the standard. And I'm going to answer all the rest of the questions.

That's the URL where you can find the standard. I'm going to answer all the rest of the questions, and he is going to tell me, "Hey, you know, this is how I understood all your requirements. Shall I continue with all the understanding?" So that's what is happening right now. You know, it is saying this is a reference I'm going to follow. That's a standard HDP query document, and your AI is now pointing it instead of describing it, right? This is what works with AI very well. It's going to locate the query method. It understands what needs to be built. So that's beautiful so far. It has understood the existing code base. It has understood our requirements.

Now it's going to do a plan. This is the context, this is the complexity of the requirement. What should I do next? Should I do functional design? Should I do user stories? So it goes into something called workflow planning. On the left side you notice workflow planning is happening. So at that point it's going to say, "Let me propose to you. Should we do user stories or skip it? Should we do testing or skip it?" So it's going to do a bit of analysis. It's going through the analysis and it's going to tell me what are all the stages I should do and what are all the stages I should skip.

All of these are maintained as context in the same repository, and it's available anytime that you want. It's a bit of a tech issue. You can see in the middle panel, say functional design. In the construction phase it's skipping it because it's a very technical requirement. You don't have to do user stories. Non-functional requirements, it's skipping it because it should be the same. The query method should support the same non-functional requirement compared to anything else. So it's just going through, skipping a few and doing a few. That's the adaptiveness of the workflow. Today, a lot of tools will give you one size fits all workflow that doesn't work at all, right? So that's beautiful.

Next, it's going to skip everything and go into code generation. It's giving a plan, you know, exactly telling me where all the changes it's going to make. That's most important because as Anupam pointed out, sometimes if you don't create a plan like this, AI will make a thousand changes and you don't know what's happening there. But here it's telling you this is what I'm thinking I should do. Can you verify and confirm? So I confirmed this and then right after that, beautiful code comes up with a single iteration. It's working magically very well. So it's going to exactly follow the same steps.

Then the code will come. So it's a little bit of a glitch happening but it's okay. You can see it follows the existing framework. It uses Starlette that is already in the source code and a beautiful amount of code comes in. You notice there FastAPI with Starlette inheritance and it's all perfectly working very well first time itself. So in the end it's going to present to me, "I did all of that. I completed the code. I did the test. We're ready for operations." And there we go. So in a matter of a few hours we were able to release a pull request by fixing and testing it. That's magical. We have great confidence that the code is working also because we have seen, validated, and verified every step of it. Now that's available for you. You should be able to practice it. Later we will share some of the resources.

Yeah, that's an amazing demo, Raja, especially on how much can be achieved while understanding everything which is happening. This is just a simple use case, but you can think of any complex complexity. We have seen very large code bases on which we have used a similar approach, and it works very well. We have converted PHP code to Golang for efficiency purposes. We have seen new applications built. We have seen new features built on existing applications. So we're looking forward to see how you use and what you share with us.

Resources and Next Steps: Learning Materials and AI-Native Builders Community

I'll share some resources on what you can do next. We have a self-help learning resource, which is a set of videos which you can watch to go a bit deeper than what we could go in the time which we have today. So feel free to share it, feel free to use it to learn further, and share with colleagues of yours if you think this is of value. The second thing which I want to share is a workshop. We have built a workshop where you can practice. What is special about this workshop? It gives you access to Kiro. It gives you access to instructions, and we are beginning to pack industry use cases, right? So you can build multiple industry use cases by following the instructions using the AI-DLC methodology. But you're following the instructions to get oriented to it. So it's available, and you should be able to play with it. Yep. The third one we have is the AI-DLC workflow. This is an open source workflow. So whatever we're showing here is open source. So you can go to this, you can play with it, and feel free to share recommendations on how this can be made better or what you've tried.

We want to make sure as a community we make this better. We want to make sure we all learn from each other and create a great way of working and using AI in a much more predictable way, one that delivers real outcomes.

This is an alternative to Quiro's spectrum and development approach. In AI-DLC, it doesn't disturb Spectrum. It takes a completely different path, the AI-DLC path. As we pointed out, if you are collaboratively building software and making hundreds of decisions in a large enterprise context with production-grade applications, the AI-DLC path will take you through that. You can play with Quiro straight away by integrating this into it.

I'll quickly share how we work with companies and what we've seen across these organizations. Generally, our journey starts with a leader saying they feel a lot can be done in this space. They've already tried several things but aren't seeing the outcomes. That executive backing and feeling that this should be tried is crucial. When they try it, they see signature wins where work that would have taken their team a lot of time can be done in much lower time.

This leads to wider trials where they try this across multiple teams. They see that this is something which really delivers. Then we see people customize it. Some teams have created new roles. Some folks in Japan have created an AI-DLC Master as a role, converting the Scrum Master into that. We have also seen some teams create separate teams to spread this, so there's a lot of customization happening.

Some of this customization is happening in terms of the workflow itself. Some teams have created MCP connectors to different tools they have. Looking forward to seeing how you use it and what type of customizations you will do. Ultimately, it leads to better customer experience and as a company you can grow faster because you can react to customer needs faster and respond to business needs faster.

How do we bring this to customers? We have an internal framework called Unicorn Gym where we go to customers. Unicorns are special employees of Amazon, and gym is where we practice our skills. We modified that framework so we go to our customers and say if you have a problem you're already working on, some of our engineers can work with your team and solve that same problem together to deliver the outcomes we talked about. It becomes much more tangible.

We are happy to work with some of the companies. If you're interested, please feel free to contact us and we'll help you there. Now we know that methodologies like AI-DLC are evolving very fast. There are still a lot of open questions about what roles are merging with what roles, how big should a development team be, and there's a lot of things that we need to still build perspectives on.

What AWS has done is create something called the AI-Native Builders Community where thought leaders can come in and bring all the lessons and cross-pollinate, discuss about it, and talk about what should be the AI-native development manifesto and what should be the roles of the future. We are going to collect all this knowledge and centrally process it and release it. That's an open community that's created. If you are a thought leader with perspective or experienced staff, please write to me. You will see our emails later and we will be able to process that and let's work as a group and define the future of development.

We feel very passionate about this, and hopefully that was visible in our expressions. We feel that the world is going to change and we can play a leading role in helping change the world in a way which is good for folks. We're also thinking about how we work together and what learnings we're having together so we don't work in isolated ways. We bring all these learnings together. Looking forward to great interactions, looking forward to some of you taking these learnings to your companies, and looking forward to some of you sharing your learnings with us. That's how we become better.

Thank you so much for taking this time. I know you had to wait in line, so apologies for that. Looking forward to some awesome work which you do. Thank you so much for coming today.

; This article is entirely auto-generated using Amazon Bedrock.