DEV Community

Cover image for DEV Track Spotlight: How Amazon Teams Use AI Assistants to Accelerate Development (DEV403)
Gunnar Grosch for AWS

Posted on

DEV Track Spotlight: How Amazon Teams Use AI Assistants to Accelerate Development (DEV403)

What if you could go from concept to production-ready code in just seven days, working in an unfamiliar codebase and a language you don't know? That's exactly what James Hood, Principal Software Engineer at AWS, accomplished using AI-assisted development techniques. In this code talk, James and Maish Saidel-Keesing, Senior Developer Advocate at AWS, demonstrated how Amazon teams are using AI coding assistants not for flashy demos, but for real production work that delivers measurable results.

"I am also a former AI skeptic," James admitted at the start. "My previous experience with AI before this year was trying out a bunch of flashy demos on real world code and watching it crash and burn." But that changed in February 2025 when he discovered the right combination of tools, reasoning models, and most importantly, the right techniques.

Watch the Full Session:

From Skeptic to Believer: The Power of Technique

The transformation from AI skeptic to advocate wasn't about better models alone. It was about discovering systematic approaches that work consistently. As Maish explained, Amazon has adopted generative AI tooling "with open arms" and uses it "all day every day, anywhere and everywhere." But the key to success lies in how teams use these tools.

Amazon codifies successful practices into what they call "mechanisms" - tools and processes that work well and can be shared across the company. This aligns with Amazon's leadership principle of "invent and simplify," creating a virtuous cycle where one team's innovation becomes everyone's productivity gain.

Prompt Engineering Fundamentals: Zero-Shot, One-Shot, and Few-Shot

Maish walked through the foundational prompt engineering techniques that Amazon teams use daily, demonstrating them live in Kiro CLI:

Zero-Shot Prompting: Simply asking the LLM a question without additional context. This works for simple tasks but often produces generic results that may not match your specific needs or coding style.

One-Shot Prompting: Providing a single example of what you want. When Maish asked for an IAM policy validation function and provided an example of an email validation function in JavaScript, the LLM understood to generate JavaScript code instead of defaulting to Python, and matched the style of the example.

Few-Shot Prompting: Providing multiple examples to give even more context. This technique helps the LLM understand patterns and produce more consistent, relevant results. The LLM also understood to remove unnecessary example usage code because it recognized the user already knew what they were doing.

The best practice: Use zero-shot for simple tasks, but use few-shot prompting for complex tasks that need to follow specific patterns.

Context and Role Prompting: Guiding the LLM

Beyond shot-based prompting, Amazon teams use two additional techniques to improve results:

Context Prompting: Providing specific information about your environment and requirements. Instead of just asking for an EC2 script, Maish demonstrated adding context: "I manage multiple instances across multiple accounts and here's how we organize our projects." This additional context helps the LLM create scripts that fit your actual workflow.

Role Prompting: Telling the LLM to assume a specific role or expertise. "You are an AWS senior solutions architect who manages infrastructure" - this frames the LLM's responses to match the perspective and knowledge level you need.

The best practice: Combine these techniques for complex tasks. Start with context, define the role, then specify the task and desired output format. Be as specific as possible to get the most benefit.

Advanced Reasoning: Self-Consistency and Tree of Thought

For complex architectural decisions and problem-solving, Amazon teams use more sophisticated prompting patterns:

Self-Consistency Prompting: Ask the LLM to generate multiple different independent solutions to the same problem, then select the best one. Maish demonstrated this by asking for three different approaches to designing a multi-region, highly available application, considering factors like cost, operational complexity, and latency. The LLM explored active-active, active-passive, and active with read replicas approaches, then provided a recommendation.

This technique encourages the LLM to explore multiple perspectives, leading to more robust solutions. Use it for complex architecture decisions, difficult troubleshooting, security analysis, and performance optimizations.

Tree of Thought: Structure the problem-solving process with explicit branches. Maish demonstrated asking the LLM to consider three possible bottlenecks for a Lambda performance problem (cold starts, memory constraints, external API calls), generate three solutions for each bottleneck, evaluate pros and cons for each solution, and then provide a comprehensive recommendation.

This structured exploration helps the LLM work through complex problems systematically. Use it for problems with multiple variables and decision-making with numerous interactions and factors.

Combining Techniques: You can combine self-consistency and tree of thought for even more comprehensive analysis. Maish showed designing a disaster recovery solution using tree of thought to explore three different approaches, with self-consistency analysis for each approach.

The best practice: Be very explicit about how you want the LLM to follow the trail of thought. Provide clear guidance on how to organize the exploration, and the LLM will provide a full, clear picture of options.

Expanding Tools with Context: Making AI Understand Your Style

One of the most powerful techniques is adding your organization's specific context to the LLM. Maish demonstrated loading Amazon's internal writing style guide (how they write narratives, one-pagers, six-pagers, PR FAQs) into Kiro's context. When he asked it to write a narrative, the LLM knew exactly how to structure the document and what questions to ask to help create a clear, concise, and comprehensive document following Amazon's standards.

This same principle applies to coding standards, architectural patterns, security requirements, or any other organizational knowledge. By loading this context, you make the AI assistant truly understand how your team works.

Explore, Plan, Code, Commit: A Systematic Workflow

Maish demonstrated a powerful four-step workflow for implementing features:

Explore: Provide the LLM with your existing codebase. Maish showed three JavaScript files implementing an authentication mechanism. The LLM reads and understands the code structure, dependencies, and patterns.

Plan: Ask the LLM to create an implementation plan based on what it learned. Maish requested a password reset feature, and the LLM generated a detailed plan considering security, robustness, and consistency with the existing code style.

Code: Have the LLM generate the actual code following the plan. Because it already understands your codebase and style, the generated code fits naturally into your existing architecture.

Commit: Ask the LLM to create commit messages following your team's conventions, including links to documentation and best practices. You can even have it commit and submit the PR automatically.

This systematic approach ensures consistency and quality while dramatically accelerating development.

Agent SOPs: The Game-Changing Innovation

James Hood introduced the most significant innovation from Amazon's internal AI adoption: Agent SOPs (Standard Operating Procedures). These are structured markdown documents that guide AI agents through complex tasks with remarkable consistency.

"I am also a former AI skeptic," James explained. "My previous experience with AI before this year was trying out a bunch of flashy demos on real world code and watching it crash and burn." But Agent SOPs changed everything.

What Are Agent SOPs?

An Agent SOP is a deceptively simple concept: a markdown document with a specific structure that tells an AI agent exactly how to accomplish a task. The format includes:

  • Title and Overview: What the SOP does
  • Parameters: Required and optional inputs with defaults
  • Steps with Constraints: Detailed step-by-step instructions with RFC 2119-style keywords (MUST, SHOULD, MAY, MUST NOT)
  • Examples: Sample inputs and outputs
  • Troubleshooting: Common issues and solutions

The magic is in the constraints. Each step includes specific requirements like "You MUST validate that the code base path exists and is accessible" or "You MUST use mermaid diagrams for all visual representations." These constraints keep the agent on track and ensure consistent results.

Code Base Summary: Understanding Any Codebase

James demonstrated the code-base-summary SOP, which analyzes a codebase and generates comprehensive documentation. Running it on the Agent SOP repository itself, it automatically created:

  • Code Base Info: Overview, description, technology stack, project structure
  • Architecture: Mermaid diagrams showing architectural structure
  • Components: Detailed component documentation
  • Interfaces: API and interface documentation
  • Consolidated agents.md: A single file combining all information for the coding assistant

"I'm a principal engineer. We have these principal tenets at Amazon like have resounding impact. One of them is called technically fearless," James explained. "I can tell you that thanks to AI and these Agent SOPs, my level of technical fearlessness has grown orders of magnitude this year because I can go into any code base and I can run this and then my agent has it."

This capability enabled James to go from concept to production in seven days, working in an unfamiliar codebase in a language he didn't know. "I thought I was building a proof of concept. It went concept to PR in two days and then concept to production in seven days, which was wild."

Authoring Agent SOPs: Let AI Write Them

The brilliant part: you don't write Agent SOPs by hand. You use AI to write them.

James demonstrated using a steering file that defines the SOP format. With this rule loaded into Kiro, he simply said: "Create an Agent SOP that, given a person's name, outputs a short, fun poem incorporating their name."

The LLM generated a complete SOP following the standard format. When James wanted to modify it, he just chatted: "Update to include an optional hometown param." The LLM updated the SOP to incorporate the hometown into the poem, adding appropriate constraints.

Real-World Example: GitHub Issue Triage

Moving beyond toy examples, James created a practical Agent SOP: "Create an Agent SOP that triages GitHub issues for given repo URL."

The LLM generated an SOP that retrieves open issues, analyzes each issue, recommends labels, and generates a triage report. James refined it further: "For triage, search for issues with no labels. I also want you to comment a triage comment in each issue and apply relevant labels. Don't write the report to a file, just output it and also add a dry run flag."

Running this SOP on the Agent SOP repository itself, it automatically analyzed unlabeled issues, recommended appropriate labels (bug, documentation, enhancement), and generated triage comments - all following the structured process defined in the SOP.

Prompt-Driven Development: From Idea to Implementation

James then demonstrated the complete development workflow using Agent SOPs, starting with a real GitHub issue requesting list-SOPs and use-SOP tools for the MCP server.

Step 1: PDD (Prompt-Driven Development): This SOP takes a rough idea and guides you through requirements clarification and research. James provided a link to the GitHub issue, and the SOP created a structured planning directory with research, design, and implementation folders.

The SOP became an interactive requirements partner, asking clarifying questions one at a time:

  • What specific functionality should each tool provide?
  • How should agents decide when to use SOPs?
  • Should the tools support external SOPs?
  • What format should the tools return?
  • How should error handling work?

Step 2: Research: James asked it to research how tools are implemented in FastMCP (the library they use). The SOP performed web searches, read documentation, and wrote a comprehensive research document with references and links to all sources.

Step 3: Design: The SOP created detailed requirements and architecture overview. James emphasized this is where the real work happens: "This is where I really distinguish what's going on here from vibe coding. For me, vibe coding is I'm not even looking at the code. My experience with vibe coding is it's amazing for prototypes, not very good for production code."

Step 4: Code Task Generator: This SOP created a detailed implementation task - "the issue you wish everybody wrote" - with description, background, technical requirements, dependencies, implementation approach, and acceptance criteria.

Step 5: Code Assist: The final SOP implements the code using test-driven development best practices:

  • Red Phase: Write tests first and verify they fail
  • Green Phase: Implement functional logic until tests pass
  • Refactor Phase: Run other tests, ensure nothing broke, refactor code to match existing style
  • Commit: Create conventional commit messages

James ran this in auto mode, and it systematically implemented the feature, achieving 97% test coverage before committing.

Beyond Vibe Coding: Production-Quality Development

James made a critical distinction between his approach and "vibe coding" - the practice of letting AI generate code without careful review.

"Vibe coding is amazing for prototypes, amazing for little games I want to make for myself, not very good for production code," he explained. "In production code we want to make sure that there's high quality, we want to make sure that the design makes sense."

The key is active participation: reading the research, reviewing the design, understanding the implementation. James spent over 20 years programming, which means "while I couldn't write the language easily, I could still read it, I could still read roughly what it was doing."

He used an analogy: "You can think of a commercial plumber who spends three days walking around a big industrial building and then tightens one bolt and is like, 'Fixed your problem, and that'll be a thousand dollars.' And they're like, 'A thousand dollars? You tightened one bolt.' And they say, 'Well, okay, $1 for the bolt, $999 for knowing which bolt to tighten.'"

AI helps with the "walking around the building for three days part" - the exploration and understanding. But you still need to know which bolt to tighten.

The Scale of Adoption: Thousands of SOPs

The impact of Agent SOPs at Amazon is remarkable. "There are literally thousands of these SOPs internally in Amazon and we are using them all over the place," James revealed. Teams use them for:

  • Coding: Feature implementation, bug fixes, refactoring
  • Operations: Automated deployments, monitoring, incident response
  • Productivity: Meeting transcription, task tracking, documentation

James shared a personal example: "I have a productivity agent where I can give it a recording of a meeting and say, 'Hey, transcribe this, add a meeting notes document to my Obsidian Vault that has the task breakdown for everybody.' And then it creates separate documents for each person and maps the tasks over."

Open Source Release: Share the Innovation

Amazon open sourced Agent SOPs less than two weeks before re:Invent 2025, making these powerful techniques available to everyone. The release includes:

  • Strands Agent SOP MCP Server: Run SOPs as MCP prompts
  • Four Production SOPs: Code-base summary, PDD, Code Task Generator, Code Assist
  • CLI Tool: Run the MCP server, convert SOPs to Anthropic agent skills, output steering rules
  • Documentation: Complete guide to authoring and using Agent SOPs

The repository is available at https://github.com/strands-agents/agent-sop, and James encouraged everyone to try it and contribute back.

Key Takeaways

Prompt Engineering Matters: Master zero-shot, one-shot, and few-shot prompting. Use context and role prompting for complex tasks. Apply self-consistency and tree of thought for architectural decisions.

Context is King: Load your organization's standards, coding styles, and patterns into your AI assistant. The more context you provide, the better the results.

Systematic Workflows Work: Use structured approaches like Explore, Plan, Code, Commit rather than ad-hoc prompting.

Agent SOPs Are Game-Changing: These structured markdown documents provide consistent, repeatable results for complex tasks. Let AI write them for you using steering files.

Active Participation Required: Review research, validate designs, understand implementations. AI accelerates development but doesn't replace engineering judgment.

Test-Driven Development Still Matters: The Code Assist SOP enforces TDD practices, ensuring quality and maintainability.

Scale Through Sharing: Amazon's mechanism approach means successful innovations get codified and shared across thousands of developers.

From Skeptic to Advocate: Even former AI skeptics can achieve remarkable productivity gains with the right techniques and tools.

As James concluded: "This is the only way I code now. I use this Agent SOP, it's fantastic."


About This Series

This post is part of DEV Track Spotlight, a series highlighting the incredible sessions from the AWS re:Invent 2025 Developer Community (DEV) track.

The DEV track featured 60 unique sessions delivered by 93 speakers from the AWS Community - including AWS Heroes, AWS Community Builders, and AWS User Group Leaders - alongside speakers from AWS and Amazon. These sessions covered cutting-edge topics including:

  • πŸ€– GenAI & Agentic AI - Multi-agent systems, Strands Agents SDK, Amazon Bedrock
  • πŸ› οΈ Developer Tools - Kiro, Kiro CLI, Amazon Q Developer, AI-driven development
  • πŸ”’ Security - AI agent security, container security, automated remediation
  • πŸ—οΈ Infrastructure - Serverless, containers, edge computing, observability
  • ⚑ Modernization - Legacy app transformation, CI/CD, feature flags
  • πŸ“Š Data - Amazon Aurora DSQL, real-time processing, vector databases

Each post in this series dives deep into one session, sharing key insights, practical takeaways, and links to the full recordings. Whether you attended re:Invent or are catching up remotely, these sessions represent the best of our developer community sharing real code, real demos, and real learnings.

Follow along as we spotlight these amazing sessions and celebrate the speakers who made the DEV track what it was!

Top comments (0)