Amazon Lost 6.3M Orders After AI Coding Tool Went Rogue - Now Theyre Hitting the Brakes

#amazon #ai #aws #outage

👆 Watch the 60-second breakdown above

Amazon just did something unprecedented: they're forcing a 90-day safety reset across 335 critical systems after their AI coding tool caused catastrophic outages. The March 5th incident alone lost 6.3 million orders and triggered 21,716 peak Downdetector reports.

The irony is thick. The same SVP who mandated AI coding as the company standard is now adding human guardrails.

How It All Went Wrong

The timeline reads like a cautionary tale:

December 2025: Kiro AI coding tool decides to "delete and recreate" an entire AWS Cost Explorer environment. 13-hour outage in China region follows.

February 2026: Second outage involving Amazon Q Developer. Engineers let the AI resolve an issue without intervention. Spoiler: it didn't go well.

March 2: Amazon Q contributes to incorrect delivery times across marketplaces. 120,000 lost orders, 1.6 million website errors.

March 5: The big one. Amazon.com shopping outage lasts 6 hours. 99% drop in orders across North American marketplaces. 6.3 million orders, gone.

Four major incidents in three months. That's not a bug - that's a pattern.

The 90-Day Reset

SVP Dave Treadwell (head of eCommerce Foundation) convened a company-wide engineering meeting. The result? A temporary but sweeping policy change:

Two-person review for ALL code changes
Mandatory use of internal documentation and approval tool
Director/VP-level audit of all production code change activities
Automated coding systems must follow central reliability engineering rules

They're targeting 335 "Tier-1 systems" - the consumer-facing services that directly impact revenue.

The Bitter Irony

Here's what makes this story perfect: Dave Treadwell co-signed the November 2025 memo mandating Kiro as Amazons standard AI coding tool. The goal was 80% weekly usage across the company.

Amazon deployed 21,000 AI agents across their Stores division, claiming $2B in cost savings and 4.5x developer velocity. Internal docs bragged about the transformation.

Meanwhile, ~1,500 engineers protested the Kiro mandate via internal forums. They preferred Claude Code. Management ignored them.

Speed vs Safety - The Classic Trade-off

Amazons internal briefing note is telling: "novel GenAI usage" with "best practices and safeguards not yet established" and "high blast radius" as recurring characteristics.

Translation: Were moving fast and breaking things. Literally.

AI tools make code 4.5x faster to write, but existing review processes couldn't keep up with the volume. You're generating code faster than you can safely review it. The math doesn't work.

Why Existing Review Failed

I've seen this pattern before. AI code has different failure modes than human code:

Volume overwhelm: Hundreds of PRs per week vs dozens
Subtle logic errors: Passes tests, fails in edge cases
Context confusion: AI doesn't understand broader system implications
Over-optimization: AI finds clever solutions that are brittle

Human reviewers are trained to catch human mistakes. AI makes different mistakes.

The Future: Human-in-the-Loop

Amazons long-term plan? Combining "agentic" AI tools with "deterministic" rules-based systems. Translation: AI writes code, rules check it, humans approve it.

Gartner predicts >40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and inadequate risk controls. Amazon might be ahead of that curve.

Lessons for Everyone Else

This is the first major FAANG company to formally pump the brakes on AI coding after production failures. The lessons are clear:

AI coding tools are incredible - when properly constrained
Existing review processes don't scale to AI code volume
Speed without safety is just faster failure
Cultural resistance from engineers often signals real problems

If you're using AI to write production code, how are YOU reviewing it? The traditional "senior dev glances at the PR" approach doesn't work when the AI is generating more code than humans can meaningfully review.

Amazon learned this lesson the expensive way. 6.3 million orders expensive.