Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Spec-driven development: Shaping the next generation of AI software (DVT212)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Spec-driven development: Shaping the next generation of AI software (DVT212)

In this video, Jay Raval and Al Harris from the Kiro team present spec-driven development, a structured approach to AI coding that generates three key artifacts: requirements (using EARS syntax), design documents, and implementation tasks. They demonstrate how this methodology addresses the "prompt-and-pray loop" by planning before coding, enabling reproducible results through property-based testing that traces back to specific requirements. The session includes live demos showing how Kiro implements features like authentication using MCP servers and Atlassian Jira integration, plus real examples from the Kiro team's own development including agent notifications and remote MCP support. The approach emphasizes human-in-the-loop collaboration, committing specs to version control, and achieving correctness through formal verification rather than relying on LLM decisions alone.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: The Problem with Prompt-and-Pray Development

Good afternoon, everyone. I hope you're having a great re:Invent so far. Welcome to today's breakout session on spec-driven development: shaping the next generation of AI software. I'm Jay Raval, a senior solutions architect with the Kiro team, and I'm joined by Al Harris, principal engineer and one of the founding engineers for Kiro. We're excited to talk to you about spec-driven development today.

Before we dive right in, let's get a sense of the room. How many of you have heard about spec-driven development, are familiar with it, or have used it in some way? I'd say maybe 25 to one-third of you. That's awesome to see. And the next follow-up question: how many of you have heard about Kiro or have had a chance to use it before today? I'd say about half. Awesome.

Whether you've raised your hand or not, don't worry. At the end of the next hour, we're hoping you can take away a lot of good insights on what spec-driven development is and how it can be used to improve your developer productivity end-to-end in the software development life cycle. You'll also get a mental model of how you can work better with Kiro's agent to get protected results in a more structured way for coding.

That's what we aim for. Let's take a look at the agenda for the next hour or so. It's jam-packed with a ton of things to talk about and showcase to you. We'll discuss what spec-driven development is, why specs make a difference to your software development life cycle, and how it works—the set of artifacts it generates end-to-end. With Al here, you'll get firsthand experience of how the Kiro engineering team works with Kiro to build and ship features. We'll do a few case studies of features that have been shipped, and finally, we'll do a live demo of how spec-driven development can be leveraged with MCP to ship a feature from scratch.

You've all been there: prompt, prompt, prompt, and voilà, you have a working application. It's fun and feels like magic. But it requires much more to get to production. There are several unanswered questions when you're working in the white-box coding paradigm. What assumptions does the model make while generating code? Were the requirements completely ironed out, or were they fuzzy? What kind of design decisions were made, and where do those documents live? Are they easily accessible? What kind of implementation was the final decision taken upon and implemented? How do you review those changes once they're done? Once you find a bug in the existing feature you've built, how do you go back and iterate over it again and again?

There are a lot of questions out there, and this is what we call the prompt-and-pray loop. You come up with a need—say you want to add authentication to your existing web app—and then you get a very enthusiastic response from the AI agent saying you're absolutely right. Eventually, you realize that the generated code is not easy to understand. There are a lot of edge cases that have been missed, and the approach taken is wrong. What do you do at that point? You basically go back to the drawing board and say, "Let's undo this, let's take a new approach," and then start building the feature all over again. This is the exact problem that you've been building, and we really want you to take a look at how in the upcoming slides spec-driven development can enable you to do this.

What is Spec-Driven Development? Requirements as the Foundation

A quick introduction to what spec-driven development is: it's a set of artifacts. I'd say it consists of three main things. It's a set of artifacts, a structured workflow, and you get reproducible results out of it. Our mission and vision is to bring more structure to AI coding so that you can get from prototype to production in no time. The set of artifacts involves three main things, as you see right here .

As you can see on the slide, the first part is the requirements. In the software development life cycle, the entry point where you start building a feature is typically when a product manager reaches out with requirements. These requirements could be stored in an external data source such as Jira, Linear, Asana, and so on. That's where everything starts. Quiro comes in and helps you work through the entire requirements before you even write a single piece of code.

The second part is the design. Once you come up with the requirements needed to build the feature, you need to determine the technical implementation and what tech stack to use. Are you building a greenfield application from scratch, a brownfield application where you're iterating on your existing legacy codebase, or are you trying to refactor? The design phase consists of all the technical implementations required and the decisions that have been made based on the rationale that the agent shares with you.

Once you've reached a conclusion about the design you want to go ahead with and the decisions you've made to reach that conclusion, you then get the flow of spec-driven development with task implementation. We didn't want to take away the flow of writing code. We wanted spec-driven development to be an overlap of the flow of writing code but with the clarity and structure of specifications. You get that with tasks. Tasks are discrete, granular, fine-tuned prompts that the agent will then start to implement based on whatever design decisions or technical implementations you've made. That's basically what the set of artifacts is.

We'll quickly take a look at the demo here for the first step in spec-driven development. Let's say you come up here, and this is Quiro if someone hasn't looked at it before. You'll come into the chat. There are two modes basically: a wide mode and a spec mode. In this case, you'll start off with a prompt in the chat with the spec mode. It says here that let's build a CLI which can be used to track meals and meal times. It will support basic CRUD operations as well as a report operation which gives a pretty weekly report of meals on the command line.

Once you enter this prompt, the agent determines that you're trying to build a feature, and the first step included in that is building the requirements. Right now it's in the process of creating the requirements, and once it is created, we'll walk you through what the requirement consists of. It's mainly requirements where there are different user stories, what kind of application you're building, and who your target audience is. Each user story is tied to acceptance criteria, which will be very helpful toward the last part of the process, which is tasks. You really want the agent to ensure that the code it generates is relevant to the requirements. First, relevancy, and second, you would be able to validate at the end that the code the agent generates is actually relevant to your actual requirements.

Here you see a user story: "As a user, I want to add meal entries to track what I eat so that I can maintain a record of my eating habits." Then you'll see the acceptance criteria tied to it. We'll take a look at it in depth when we go through the demo. One thing I'd like to include while we are at the requirements phase is that the requirements use an industry standard for defining these requirements. The acronym for that is EARS, which stands for Easy Approach to Requirements Syntax. We'll talk a little more about why that's important later on. It might look like a random decision early on, but we cared a lot about bringing structured natural language into your requirements. We're just touching on that now, but I'm excited to talk about it.

The Design Phase: Making Technical Decisions with Rationale

The next part in the spec process, once you figure out what your requirements are, is the design phase. Before I get into any details, I'll just show you what the process looks like in the IDE.

Once you start after the requirements are done, we want the agent to work with you as a collaborative tool. We don't really want the agent to build stuff straight away without having the human in the loop. We always want the agent to work with the developer or the end user to get feedback and iterate based on that throughout the spec process. Once you're building with the requirements, the agent will come back to you saying, "Hey, we've now built the requirements. Do they look good? Do they need to be refined further based on the input that you give?" And then once you're good to proceed, you'll move to the design phase.

So here, once you select that, the agent will now start to generate the second file in the spec process. Both of these files—both the requirements and the design files—are in markdown format. So here you can see it's come up with the architecture that is required to build this command line tool. Once it does that, it also outlines all the commands that you need to run and what the underlying architecture is required so that the agent is aware of all these decisions that it needs to make while implementing it in code. All of this is done before even a single line of code is written.

One thing that's really important here and not really demonstrated in this video that we're showing is the fact that at any point in time the agent can ask you clarifying questions because a lot of what we're dealing with here is the resolution of ambiguity. We want to make sure we identify ambiguous requirements. The requirement that's initially given is fairly ambiguous. "I want a CLI app that has CRUD." That could mean a bunch of different things. The agent just made some decisions and went along. So it might ask you questions, but you can also at any point just chat with the agent and say, "Actually, I want to change requirement two to be like this." You can do that via text. You can just type in an update to the requirement and tell the agent "I updated requirement two," or you can say via chat, "I want to make these updates," and so on.

Throughout this process, we do want this to be flexible. We try to stay slightly opinionated with our starting templates, but if, for example, you have concerns in your design document that we do not capture today, I think we can sort of add things like what's the architecture, put a system diagram in, what are security considerations, performance considerations—there are a few of those. But if there's something that matters a lot to your business, this is just natural language, right? You can come in and say, "Oh, I actually want to add this to the document. Let's flesh out these decisions." So our goal really is that this is an interactive process where you're working with the agent throughout.

And to add to that one additional thing: both with the requirements and now the design phase that we saw, both of these are markdown files. Both of these are supposed to be shared between teams. These are artifacts that you'd commit to your Git repositories so that at any point in time, let's say you're reviewing a pull request and then come across something, you can ask, "OK, why was this decision made?" You could directly correlate it to the spec that's already part of your repository. So at that point you have all of the source of truth in a single place, and that way you can go back to the decisions that were made and evaluate as required.

So that's kind of part of what the design phase does. Once you do that, the same feedback loop continues where the agent comes back to you and says, "Hey, I've built this design. What do you think of it? Does it require any further refinements? Do we need to take a different approach?" One of my favorite features in the design phase is when it comes up with a section that says, "Here are some technical decisions I've made," but the best part is it comes with rationale. It comes with points to back that up, so there is data for you to say, "OK, this is why this approach is better than that one." So that's really important when you're building features from scratch using the spec process.

Task Implementation: From Specifications to Discrete Code Changes

Once you do that, you then move on to the final phase where the agent creates these discrete implementation-based tasks for you. So I'll quickly show you how that looks. Once the design is created, you move to the implementation phase. And then it starts to create the third file and the final file for the spec process, which is the tasks.md file. Once the task file is created, you'll see that each of these tasks—in this case there are five primary root tasks that are created—each having detailed instructions for the agent to follow and a specific reason to build that code.

Let's focus on the last task: integrate components and finalize the CLI functionality. This task involves wiring together all the different components—the CLI, the services, and the storage layers—along with the requirements included in the specification. You'll see the task references requirements 1.1, 1.2, and 1.4, which means it's referencing whatever is in your requirements file and tying it to the task that is actually created.

To get started, you can run the task from the file. You'll see text hovering over the root task saying "Start task" to begin the implementation of that task. That's one way to do it. Alternatively, you could interact with the agent via the chat experience and simply say "run this particular task" or "go ahead and run all tasks for me." This is really helpful for getting started.

Once you implement a specific task, the system shows you a couple of different options. One option is to view the execution, which allows you to see the trace of all the changes that the Kiro agent makes for that particular task. At any point in time, if you need to go back and revisit the specification, even locally, you can go back and say "show me the changes that were made for this specific task" or revert a task using checkpointing. This is super helpful.

At the end of task creation, the feedback loop continues even with tasks. The final thing that happens is a feature that was shipped a couple of months ago, which we call the spec MVP mode. If you're in a prototype phase or running a pilot and all you care about is getting an MVP up and running for your stakeholders or your business, this mode helps you get there faster. During spec creation in the design phase, the system takes into consideration all the testing strategies that are also required to build your feature in your project.

If you want to move fast and just build a demo or prototype, and then care about implementing the testing strategies later on, you have the option of going with the spec MVP mode. All it does is create the necessary tasks that are required even for the testing strategy, but it marks them as not required, so those tasks will be marked as optional. As a developer, you still have the option of going back and implementing those tasks once the MVP is up and running. You still have all the control that you require as part of that spec process.

Structured Workflow and the Promise of Reproducible Results

So the second thing in the spec process is the set of artifacts it generates: the requirements file, the design file, and finally the task file. Why is this relevant now? At the beginning of the session, I mentioned that we want to bring structure to AI coding. One of the main core reasons we want to do this is to enable developers, teams, and organizations to plan first and not get into that prompt-and-pray loop. This is exactly related to that. Plan first, then ship and build the right thing the very first time rather than having iterations upon iterations for the future that you're building.

This is what the structured workflow looks like. You start by creating a spec for the feature that you're building. You then define requirements for your feature, and the feedback loop continues where the agent works with you to implement any required changes. Then it does the same process for the design phase, the task phase, and eventually once you're happy with everything, you then start writing the first line of code. At that point in time, you haven't actually interacted with the code at all. All you've been doing is working with the agent, helping the agent make the right decision based on your requirements of the project and the technical implementation that you've gone ahead with.

The next key aspect of spec-driven development is reproducible results. This is something I'm very excited to talk about, and it's some of the work we're really looking forward to in 2026 as well. The promise of spec-driven development is that the cost of vibe code is quite low from a time investment perspective.

It takes seconds to create a prompt and send it to an agent and get that fired off. With the increased investment up front in the time you're spending getting your spec right, thinking about requirements, iterating on requirements, and reviewing your acceptance criteria, looking at a design, we'll talk a little bit more later on in the session about how the Kiro team, for example, uses specs as almost a replacement for the design review process we used to go through. We really want to make sure that the more time you're investing up front, the better the results you're getting at the other end are. So our key really is on correctness and reproducibility and the results that you're going to get.

So what does this mean? It means a lot of things actually. It means that we can help you remove ambiguity from requirements by looking for requirements, for example, that are in conflict with each other or requirements that are overly ambiguous and cannot be resolved by the agent. We really don't want any coin tosses at run time. We want to make sure that what you think is going to happen is what is actually going to happen from the agent.

It also means to us that critical decisions that will impact the system, like the functional system you ship at the end of the day, key interfaces, external APIs, performance considerations, and correctness concerns are really documented up front and we're not leaving these things to chance down the road. Whether or not my database write is strongly or weakly consistent shouldn't be a decision that an LLM makes for me in three hours' time. Those are really important.

And then finally, breaking work into bite-sized chunks. This is just mostly a function of the quality of LLMs today, but we want to make sure that when the agent is working on implementation, it's able to work on an atomic piece of work that can be independently verified and independently reviewed by a teammate. These are just best practices for development at the end of the day.

Property-Based Testing: Ensuring Requirements Are Met

From this perspective, I think in this video I just say do the tasks. Effectively, we've now produced this set of tasks in the task list here. The agent is going to start churning through them one at a time, depending on my steering files. I think it might create commits along the way and sort of checkpoint its work with Git commits, which is something I like but not something you necessarily have to do.

Yeah, so this is where we were as of three weeks ago. Two weeks ago, Kiro 1 GA became generally available, which for the purposes of this room means now property-based testing exists. So we're leaning really hard into this reproducible results paradigm. There are a ton of standards that exist today, things like GraphQL, Swagger, and there are full-blown IDLs like Smithy. Again, if you're somewhere like re:Invent, you've probably heard of those. There are also very platform-specific or application-specific standards like CloudFormation and CDK. We don't want to propose yet another standard. We want to be able to use these, bind them together, and use existing industry best practices to bring you the reproducible results that are so important to you as a software developer.

And so this is where we're introducing properties now. Property-based testing is not new. It's in fact fairly old. This builds on a lot of the work from folks like John Hughes in Haskell and QuickCheck, and there are libraries in your favorite language to do property testing. But most teams don't do property testing. It can be kind of painful to set up, but the benefits for teams who use these techniques is profound.

So here I've taken a random requirement pulled from the Kiro blog. This was for a traffic control system, and one of the acceptance criteria of the requirements is that while the traffic control system is operational, the control module shall maintain that at most one direction is green. It's a fairly simple statement. It's saying that at any point in time I've got an N-way traffic light and I do not want more than one green light at a time. This is perhaps overly simplistic because maybe I want two left turners to be green at the same time, but for now we're saying I don't want to think about that. I want one direction to be green because then I know I have no crashes.

So what Kiro will do for you as of our V0.6.0 release, which was our GA version, is it will produce properties as part of the design. Kiro will first spit out the design document that we showed a sample of earlier, but it will also then reflect on the design, reflect on the requirements, and extract properties of the system you want. In this case, a property is one of several types of invariants. The property extracted in this example is a safety invariant, which basically takes the structured natural language we have from requirements. We're able to parse that out with a combination of LLMs but also standard neurosymbolic processing, which says the safety invariant says that at most one green signal for any sequence of operations, which is state transitions across direction, emergency modes, and so on. Any of these things can happen and I'm still guaranteed that at no point in time did I have more than one green signal.

This is similar to what Jay was mentioning earlier—tasks are tied back to requirements. This property is tied back to a requirement. So if this property can be verified in the implementation, then we have high confidence that requirement 2.3 is met in your system. This is really critical to ensuring that you have reproducibility because at the end of the day, I don't care about the implementation if I know my requirement is met via some test mechanism.

How Property Tests Work: From Manual Cases to Comprehensive Coverage

Before you freak out, I'm going to flip to the next slide. There's a bunch of code on there, but don't worry about reading it. This is effectively just a sample property test. It's fairly lengthy, but this is the test for what we just talked about. If you look at the input at the very top, we have input on the test which takes in this list of timing config. Don't worry about that—we'll talk about that in a minute. And then the operations, which is a set of sequences like switch to east, switch to north, go into emergency mode, come out of emergency mode, whatever.

It then stimulates that control module we talked about, sending each of these operations in. It takes a look at the green status and then it asserts that at no point in time over the sequence of operations were there more than one green. It's fairly simple—it's how you'd write a unit test for this with the exception that we have a flexible input, a set of input parameters. So let's talk about how you would test this sort of classically, right? Let's say I'm a developer and I want to test my control module. I want to test that if I'm north, south, east, west, or northeast, southwest, that at all points in time I do not have more than one green.

Well, that's a straightforward test to write. I create a new control module, I create my list of operations—transition east, south, west, north—I pass it in and I do an assertion on the outputs in all those states. Great, I did it. My PM is happy, I'm happy, and we believe we're safe as long as that sequence of transitions happens. But I should probably make sure that if there are duplicate state transitions, because some external system is stimulating the control module, it still does the right thing. Well, that's easy. I'll add another test. I'll copy paste it and add another test. What do we do? South-south in this case, and still validate that I don't get more than one green when I do south-south, so north doesn't turn green or whatever. Great.

Oh well, what if the system goes into emergency, right? We've all done this and you get long, long laundry lists of test cases to cover every corner case that you've thought of so far, but not the ones you didn't think of. So enter property tests. With property testing, I as a developer am no longer the human in the loop. I use a property testing framework. I think the sample code I had above uses Hypothesis, which is a Python library. In Node you would use fast-check, Haskell's got QuickCheck—choose your favorite library. But effectively, the way these libraries work and the way property tests in general work is that we compute the full state of possible input usually via fuzzing. So you'll send dozens, hundreds, thousands, tens of thousands of sample inputs that can be generatively produced locally. These tests can be sometimes slow, but they give you confidence that if all these random permutations of inputs are met and you've produced no counter examples to say that your system is not working, then you're in good shape.

In this case, we end up with—I will actually just go back briefly. In this case, I don't care about the fact that I wrote one test effectively, or in this case Kiro wrote one test, and my system is going to make sure that I don't need to think about all the possible inputs. This case will always be met, and I will be notified if that ever fails. Why is this important? Well, I can run comprehensive tests without having to think too hard. I have traceability that goes from this particular property test all the way back to a concrete requirement I had originally in the system.

And third, if there is a failure, most of these testing libraries will do something called shrinking where maybe it found some crazy 3000 step sequence of events that produced a failing state. It will actually continue to explore the space and find the minimum reproducible result there. So maybe I do have a bug somewhere in my control module and it happens not on south-south but only on west-west. Ideally you could get a very simple test that says, "Hey, here's a counterexample that proves that your code doesn't work in this scenario. Go fix the bug." This is super powerful and this is something we've leaned into heavily as a team. In fact, property-based tests have already found, I think, three key bugs in the Kiro codebase since we started using them in earnest about two to three weeks ago.

So highly recommended—give them a shot and let us know what you think. This is step one though. We have a lot more work in the pipeline coming down over the next year basically for how we improve reproducibility of the system and help you resolve ambiguity—those tent poles I mentioned earlier. This is something we're super excited about, and I could talk at length about it. So let's talk a little bit about why specs, right? Why do we want to use specs? Why do we think specs are valuable? This is a bit of a rehash of what we just talked about earlier.

Best Practices from Successful Spec Users: Context and Refinement

But the software development life cycle vibe coding we think sort of covers the implementation phase. It's spreading out a little bit. You can use it a bit more, or it can do a few more things now that you have MCP and some additional tools at your disposal. But effectively, vibe coding is really this inner loop thing. We want to expand that inner loop to say I want to be able to move very quickly for planning and design. I want to be able to move quickly for implementation, Q&A, and so on. With the introduction of PBTs or property-based tests, we believe that we have taken meaningful progress towards tackling that testing and Q&A part of the SDLC.

There are a couple of things that we can do today. We do spec generation , which we walked through. I'll kind of brief through that, but you provide as much context up front as you can. This is really going to help the system work. It takes in your feedback, distills requirements, and provides high quality feedback so the system is going to work much better. I'll kind of dive into some of the tips from very successful spec users. Yeah, that's a good one. I think for people who now use these different agent tools for building code and software, there are two things that these users really have in control of. The first thing is the prompt. The second one is context.

As much upfront context as you can provide the agent, the better the decisions it comes up with, the better the reduction in code hallucination, and it does a great job at adhering to the standards, patterns, or paradigms that you already have built for your existing codebase. Just looking at the first part of highly successful spec users, external context. This could be using Kiro's own steering feature, which allows you to set up persistent knowledge for the agent to take decisions based on the codebase that you're working with. The first time you come into a project, you can create three foundational files as part of the steering setup. It mainly encompasses the tech stack of your project and the structure, which helps the agent identify where each functionality of your project exists or how it can better find code that's relevant to the feature you're building or existing codebase.

Those are some aspects which are covered by steering, and then it also consists of a product file, which basically means what kind of data flow does your application adhere to, what's your target end user, and how are the different components talking to each other. An additional part of that is also MCP, so you can bring in additional context from external data sources and knowledge bases right into the IDE. There's also inbuilt context providers, so you could use hashtag files to specifically ask the agent to review a file for making a change or executing a thing that you're trying to do. Followed by docs, there's a good list of docs that are also embedded into Kiro that you could leverage while making an implementation decision. You could ask Kiro to say, "Hey, reference these docs and then come up with a technical implementation for me to review."

Then there's evolving specs. You could do this where you could directly work through it during the chat experience, or the best part is you could use natural language at any phase of the spec. Be it a requirements design task, at any point in any of these markdown files you could write natural language and then there's an option for you to refine the task or refine the spec and it'll automatically adhere to the standards that we've mentioned. Let's say in your syntax, and then once you update the requirements file, it will go ahead and not only update the requirements file but also determine whether it needs to update any of the design decisions that we've made or any of the tasks that need to be changed. That's a very powerful feature for you to have.

I would actually add something to that. When I think of evolving a spec over time, we have an example I'll come to a little later in the talk. A spec is not a closed book once you've executed the task. It lives in your codebase. You can iterate on it. You can change your requirements, and on the Cura team, at least we found that to be super powerful.

Specifically, we can reopen a decision we made two months ago and say we actually want to change our mind. We don't like that requirement anymore. We don't think it's serving users. We're going to change the requirement, re-synthesize the design based on this changing requirement, and add new tasks. Now I have a git commit that actually documents that change. It's effectively an ADR for those of you who have used those.

This basically takes away the pain of creating a new spec altogether. It dives right into committing specs, which is like putting your specs in the code base and committing them with git or whatever your version control system of choice is. We put all of our specs in the code base. We actually have started archiving some of the older ones just to save space and reduce context size, but anything that is in development or in progress for a feature we will leave in the code base in the .github/specs directory. We commit them both on initial review, but then as tasks are executed, we're able to say yes, this is task 1.2 and here's the code and test that go along with that.

You could use the chat experience by default to chat with any phase of the spec. We have a dedicated spec context provider as well, so if your project consists of multiple specs, you could pinpoint and use the exact spec that you'd like to iterate on. That's a functional and useful feature to have while building with specs.

I think that comes back down to refinement basically. You're not a passive user of Kiro. You're not a passive passenger on the spec experience. You're partnering with the agent to deliver your spec, to deliver your software solution, to do whatever you're trying to achieve. At any point in time you can ask Kiro to change what it's doing in natural language. The expectation on my end at least is that we've given some starting points and a rough workflow we want you to run through, but you can stop and say actually before I finish up the design, I want to go and do research on all these things and I want to produce research docs. Depending on the task you're trying to achieve, putting that work in up front can have a pretty profound impact on the quality of the resulting output.

At any point in time you can go through and refine this. You can change your requirements, change your design. Finally, for task execution, just click go. You can say run all tasks and I want these to all be run. I prefer to say run all tasks assuming I have the context length for it. I find that the quality is actually better when I have some carryover from task to task. These are tricks we ideally don't want you to have to think about. We want this to be as basically fire and forget as possible.

Real-World Examples: How the Kiro Team Ships Features with Specs

Let's dive now into some exciting insights about how the Kiro team ends up using the specs to build features on Kiro. I've chosen three arbitrary things. These are just three features we've shipped in the last couple of months that we use specs almost exclusively for. The first one is fun because we were approaching our public preview launch date. Everyone's heads down, everyone who's been laser focused on a launch probably knows those last few days and weeks you're really just trying to polish off the rough edges. You don't have time for new features. You don't have time for the nice-to-haves. It's really just the must-haves.

We had somebody sit down on the team, a developer who said, "Hey, I've gotten a lot of feedback from people. They want agent notifications. They want some sort of a pop-up on their desktop if Kiro is waiting for them. Let's say it wants approval to run a shell command or something like that." Kiro is a fork of Code OSS, which is a 15-year-old code base. It's very well organized, but you need to understand the organization. At this time nobody really understood how notifications work because we're building this agent thing over here.

He said, "OK, Kiro, just go figure it out. I want to put native notifications. I want to use the underlying Electron notifications API. I don't know the right way. I don't know the Code OSS way to plumb this through, but go figure it out for me." He produced two specs, one in our extension space and one in the platform space to do this. We reviewed it and said it looks good. Let's do it and just see how it works. We were able to ship this in I think 48 hours from deciding we should ship this thing to it being shipped.

This was an area where we had not a lot of experience. We didn't have a ton of time to become deeply knowledgeable about how all the message passing in the system worked, but we were able to ship this quickly and easily for customers.

We recently added some new native notification configuration, and this is another example where we were able to reopen that spec and say we're changing the requirement. We used to just have notifications for action required, agent execution succeeded, and agent failed, but we actually want to change the user space config and add configurations. For example, if you're running and coming up to your credit usage cap, we want to notify you in advance of that so if you want to change behavior or focus on critical work, you can do that. That's another one where we just changed the spec as we went.

Another one is remote MCP support. MCP support is basically essential in any agentic tool these days. We love MCP and build new MCPs all the time on the team for little tools we need. Remote MCP was something we needed to ship very quickly. We wanted to ship it well and make sure that we were aligned on the behavior. So in this case, we actually sat down and did a full design review where the engineer working on this feature said they would point to the MCP 2.0 spec to understand how remote MCP works, how the OA protocol with DCR works, and all these details that we needed to understand. We printed the documentation up, reviewed it as a team, and had a conversation about it. We had effectively Kiro in the room because somebody was sitting there on their laptop typing. We were able to provide feedback to the agent very quickly on the design. So we effectively had a real-time review with Kiro on the design, and then we went through to synthesize and ship it out.

The last is dev server support. A lot of the things I do because we're doing a lot of things quickly is I look at tools that I like and say this is great, let's figure out how to maintain the ergonomics of this thing. We looked at an MCP server that provided long running dev tool support and said we love this. Let's start with this because we can ship that in about a week or something very quick. Dev took a look at it and provided the fetch MCP, saying go look at how this thing works. We didn't copy the implementation, but we did take the API because it was a very ergonomic API for the agent and go implement it. In about a week of testing and tinkering, we got it integrated with our own native terminal system. That's one where we really just sat down and did the requirements review as a team and the design and implementation just raced ahead to go implement it. This was one of the really nice features we finally got for long running dev.

I think the main feedback that we've received from developers and the community is that long running processes was a pain point for a lot of them. For example, if you're running a dev server, the first thing was that the output, when the agent hands off the command to the terminal, the output wasn't streamed back into the chat. So the dev wasn't aware of what the response was, whether it succeeded, whether it failed, or if there were any errors during runtime. Now with this feature, I can ask Kiro to run a dev server and then just forget about it. It runs in the background, and at any point in time, if it runs into errors, it lets me know so that I am not distracted or interrupted during the development that happens. So this is really a good feature.

All of this is to say that the Kiro team is evangelizing this spec native way of developing where you review your specs, commit your specs, and talk about the specs as a team, and you'll have a great time. We have Jay with a live demo. This is super cool. I don't know how he found the time to build this, but he's going to share some really cool stuff he's been doing with Spector and Dev.

Live Demo: Building Authentication for a Bike Sharing App from Jira to Production

This is a bike sharing application which I built using Kiro. What we want to do here is showcase how you can go from an idea to a fully functional feature using specs. Just to give you a lay of the land, this is a Next.js application built with Tailwind. Right now it's completely front end. I do not have any back end components to it right now. One of the things that I want to add to this bike sharing app is that right now there are multiple cities that you can select when working with this application. I've intentionally tailored it for Vegas. The current location here is the MGM Grand where we are for the session.

Let's say you wanted to navigate to a different location and use a particular bike type to get there. You have a few options to get there. There's the standard bike type, electric, mountain, road, and hybrid, and then you could also select a filter for the price range that you'd like to use for the rental.

Once you're ready with the price range you'd like to use for the rental, you can select the option to find nearest. If you don't want to make any decisions and just want to get from point A to point B without worrying about what bike type it is, you could come in and select find nearest. This is the closest bike to me, and I'd like to rent this. You'll come here and select rent. Let's say you're going to use it just for 30 minutes. Once you do that, you'll select confirm rental and the rental is activated.

Once you have activated the rental, let's say you get to your destination. You come in here and return the bike to one of the locations that are included. You select the location—let's say I'm going to do Bellagio—and say return bike. That's how the rental ends when you select the return location.

To give you an idea of what the functionality of the app looks like right now, it uses Apple's map kit framework to render the maps along with the markers for different bike locations. If I zoom out, you'll see that there were 50 bikes in total in the city. Now what I want to do is showcase or add a feature to this application. Right now, this application is unauthenticated. I want users, as guest users, to get a view of this map to see what kind of availability for bikes looks like. Then once you're authenticated, you get the ability to rent the bike. Right now, if you see on the top right here, I don't have any sort of authentication involved. So what I'm going to do is go back into the IDE now and showcase.

This is the IDE experience. I'm going to go back into the spec mode. For demo purposes, I have simulated an example where I'm using Atlassian's Jira product, which is my task tracker of choice. Here you'll see there are a lot of tasks which are in progress or to-dos, and then a lot of them which are done. Once I come in here as a developer with a PM persona, the PM comes into the Atlassian board and says, "Add authentication with Amplify."

This task basically involves some sort of requirements: email, password, sign up, and sign in. The application feature should show the username in the header when logged in. Only logged in users can rent bikes—that was one of the main requirements. Here's the acceptance criteria: user can create an account with email, user can sign in and sign out, the rental button is disabled for guests, and the user stays logged in during multiple sessions.

This is what the requirements look like, and right now it's not all the way in depth. These are requirements which are a bit fuzzy and can be iterated over time. There is some ambiguity in the requirements, and this is intentionally kept. What I'll do is use the Atlassian MCP server which I've already configured for the app. I've put in a prompt during the session, pulled the requirements from this Jira ticket for authentication, and created a spec for implementing authentication. Then I'll use the docs server to understand how you'll build the Amplify Gen 2 authentication feature.

Once I do that, it'll say that I'm appending all the steering documents that are part of your project, which gives it even more refined details about what tech stack I'm working with. The directory structure for my application, what kind of libraries am I using, and how would I test or validate builds. All of this information the agent gets from the steering file. Right now it's making tool calls via the MCP server to fetch the exact requirements that are required for this task.

As you see, it's now looking into the documentation, but first it was able to fetch the issue at hand. It was able to fetch the details for that. The next thing it does is go through the documentation for AWS Amplify. Once it does that, now it's ready to build the specs. It says, "Let me create the requirements document with all the research that I've done." Right now it's creating the requirements document.

Right now it's in the phase where it's creating the file. There you go, it's already created. Let me expand this a little bit. It gives you an overview of what your project is, what you're trying to build, and what the different terms are for you to understand as part of this feature implementation. I'll dive right into the user stories. As a new user, I want to create an account with my email address so that I can access the bike rental features. It now expands upon the acceptance criteria, which were not very thoroughly defined in the original story that was defined by a PM in the Atlassian board. That helps the agent make even more granular decisions for specific ambiguities during feature implementation. Let's look at the next story. As a registered user, I want to sign in with my email so that I can access my account and rent bikes. As an authenticated user, I want to sign out of my account as well so that I can secure my session. I'll skip the fourth one and then move to the next. As a product owner, I want only authenticated users to rent bikes so that we can track rentals and ensure accountability. And finally, as a returning user, I want my session to persist when closing the browser. Lastly, as a developer, I want authentication state accessible throughout the application so that components can conditionally render based on user status.

Looking at this file and the requirements that were defined in the original Jira task, there's a significant difference between the level of details that are included as part of the spec. All of this now gives the control back to the human to review and make decisions as to what the feature would look like. Do you want to add anything other than this for feature building? It says let me know if you like any changes or if we should proceed to the design phase. I'll say proceed to the design phase. Once you do that, it says the requirements are approved and now I'll proceed to create the design file, which should initiate shortly. Once that is created, I will walk you through the setup of how the design looks like. There's technical implementation architecture and movement diagrams as part of that, so you could get a visual representation of the architecture that is being proposed, followed by the design decisions.

You want to talk a little bit about formalizing requirements for PBT? This is effectively part of the PBT new features we added. This is not just generation of PBT. This is actually analyzing the requirements, and this is part of the reason that EARS syntax is so important. We have parsers on the back end that take this structured natural language set of requirements. There are seven EARS requirement rule shapes. We parse those out and then we're able to build effectively a full AST over your requirements. Then we're able to look for requirements that overlap or don't. It looks like good news you had good requirements because you didn't get asked any follow-up questions. But if you had requirements that, for example, interacted with each other in a negative way, were incompatible, or overly ambiguous, the system would have asked you at this point in time if this is what you meant. This is fairly new and hot off the presses, so we're still iterating quite aggressively on this part of the DX. It's something we do want to make faster as well.

If we have time at the end, I also want to see if you can have one thing I love to do, especially if I'm using something like Jira or Asana. Once you've bottomed out on the requirements here, have Kiro actually write back the updated requirements back into Asana. That's effectively like synchronizing your data, right? Right. I'll let you carry on. So now that the design file is created, you actually get an option to render it right here. I'm going to select that so that you could read this better. I'm actually going to disable chat for a moment. Basically what it does is it gives you a description of what the feature does and what kind of libraries it's using. In this case with Amplify Gen 2, it uses a UI React library. Since we are working with Next.js, it fits right into the project. Here are the different components for the front end, the Amplify layer which is the back end, and the browser local storage. Here's the authentication flow. Here's the entire data flow that happens based on the user or the client all the way back to an AWS service from Amplify and Cognito. Based on this, you could now visualize what the implementation of the architecture would look like.

You can review the components involved, the interfaces for that, what kind of hooks or React hooks that will be required to achieve this implementation, and then finally, different types of data models, auth user auth state. The correctness property here—so the first one that we see here is the authentication, the sign out, password validation feedback. Let's talk about the authentication. For any successful authentication operation, the application shall reflect the authenticated user status with a valid user object containing user ID and email. These are a few properties that are defined, and in the interest of time I'm just going to quickly move on to the next phase. While that builds, I'm going to go back. We might need to do the cooking show swap out. I think what we can do is while this builds, I'm going to go to the end state which has implemented all the tasks that are associated with this. This is just on a different git branch. The design file with the error handling, the testing strategy that is defined here, followed by the task list. Here it involved tasks 7 and 8. So 8 tasks, and here you'll see that there's some tasks that are marked as optional which gray out. These are tied to the PBTs and I had enabled the spec MVP mode to achieve like a pilot of the feature. Now that all of these tasks are implemented, what you can do is select start task to implement execution of each of these tasks. View changes will show you all the changes that are associated with that specific discrete task. You'll see all the changes that have gone through with it. And finally, you could also view the execution so that you could go back into the chat experience and understand what the entire trace looks like.

Now let me go back to the browser real quick with the new implementation that I have. With this, now that I go back, I see right away that if I select a bike right now, I don't get an option to view or rent it. It says sign-in required to rent bikes. Let's say I come in here, I'll quickly sign in. Once I do that, it says welcome to the bike share app and voila, you can directly go in, select and rent. It now has implemented the authentication part end to end and now allows you as an authenticated registered user to select and rent bikes. This is how, in a nutshell, this is a very concise demo of what spec-driven development looks like end to end from scratch for a feature. You can build on top of it and customize it a lot. In the interest of time, I'll just say for those who haven't built an auth solution, I just glossed over weeks or months of pain in about 10 minutes, which is absolutely wild.

Closing: Join the Kiro Community and Visit the House of Kiro

That is what we had in respect to the demo. If you've not got a chance to play with Kiro, we would highly recommend going to our website kiro.dev. It's available across all platforms: Mac, Linux, Windows. Come join our Discord community at discord.gg/kiro.dev. We have 13,000 plus members in there. We also facilitate office hours every other week, so you get a chance to talk to the Kiro service team directly. We have partners across all different teams within Amazon as well joining the office hours to share their insights. Here's just a bit more information about Kiro if you'd like to further discover and uncover how you can play around and learn more about Kiro. There's a bunch of things that are happening at re:Invent. Quick thing: who has not yet visited the House of Kiro? Check it out. That's my number one recommendation. It's just opposite the AWS expo in the Venetian. You'd be amazed at how good that is. It's super fun. That's my number one recommendation. I know we're just over time. Thank you very much for listening to us. We had so much fun. We'll be around for questions afterwards as well. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.