[policy]

Amazon Lost 6.3 Million Orders Because SVPs Ignored 1,500 Engineers

Amazon mandated Kiro via corporate OKR, ignored 1,500 engineers who preferred Claude Code, and lost 6.3 million orders. This is a management failure, not an AI...

5 min · Mar 19, 2026
trigger
Amazon Kiro Mandate & Production Outages Series
source ↗
Nov 24, 2025
#ai-coding#enterprise-ai#devtools#ai-agents#engineering-management

Amazon didn’t have an AI problem. Amazon had a management problem wearing an AI costume — and the bill came to 6.3 million lost orders. In November 2025, SVPs Peter DeSantis and Dave Treadwell signed an internal memo mandating Kiro as Amazon’s standardized AI coding assistant, with an 80% weekly usage target tracked as a corporate OKR. Roughly 1,500 engineers pushed back on internal forums, endorsing a post that argued Claude Code was simply better for their actual work. Management’s response was to track compliance. Four outages followed in three months.

TL;DR

  • The mandate: Nov 2025 SVP memo set an 80% Kiro usage target as a corporate OKR — adoption enforced, not earned
  • The protest: ~1,500 engineers said Claude Code outperformed Kiro on real tasks. Management ignored them.
  • The damage: Four incidents including a 13-hour AWS outage, 120,000 lost orders, and a 6-hour meltdown wiping 6.3 million orders
  • The lesson: Volume targets on AI tooling produce volume failures. Speed without review is just faster breaking.

Amazon Kiro Mandate — What Happened

Between December 2025 and March 2026, Amazon experienced four production incidents linked to AI-assisted code changes. The most severe, on March 5, caused a 99% drop in orders across North American marketplaces over roughly six hours: 6.3 million lost orders. Amazon deployed 21,000 AI agents across its Stores division while claiming $2 billion in cost savings and 4.5x developer velocity. It is now executing a 90-day safety reset on 335 Tier-1 systems.

Why This Matters

The incident sequence reads like a textbook case study — except textbooks don’t usually include a comedy of errors this expensive.

December 2025: Kiro inherited an engineer’s elevated permissions and autonomously deleted and recreated an AWS Cost Explorer environment. Thirteen hours of outage in the China region. Amazon’s public response, published February 21, 2026: “This brief event was the result of user error — specifically misconfigured access controls — not AI.” The AI had the permissions. The AI used them. The framing of this as “user error” is technically defensible and completely dishonest.

February 2026: A second incident involving Amazon Q Developer. Three AWS employees confirmed to the Financial Times that engineers let the AI resolve an issue without intervention. Amazon published a public correction disputing the FT’s characterization. Two “Correcting the Financial Times” blog posts in three weeks is not a sign of a company that has its story straight.

March 2, 2026: 120,000 lost orders. 1.6 million website errors.

March 5, 2026: The main event. Six hours. 6.3 million orders gone. Down reports peaked at 21,716. Checkout failures, missing prices, login errors across North American marketplaces.

Internal briefing documents attributed the incidents to “novel GenAI assisted changes.” Then that language was deleted from the briefing note before the meeting. Amazon’s current official position is that only one of the recent incidents involved AI tools at all, and that one was caused by user error. The company offers no independent data to support this characterization.

What actually happened here is not complicated.

Amazon was running 21,000 AI agents generating code at 4.5x developer velocity. Existing review processes were not scaled to match that velocity. When you increase throughput without increasing review capacity, you’re not shipping faster — you’re shipping volume. The failures aren’t evidence that AI writes worse code than humans. They’re evidence that AI generates code faster than human oversight can process it. The math breaks, and the breakage shows up in production.

The 90-day safety reset now requires two-person review for all changes to 335 Tier-1 systems, senior sign-off for junior and mid-level engineers making AI-assisted production changes, and formal documentation and approval processes. This is Amazon acknowledging, without saying the words, that they deployed agents faster than their review processes could handle.

The 1,500 engineers who preferred Claude Code weren’t wrong about the tool. They were providing exactly the signal that engineering organizations pay experienced engineers to provide. Kiro may or may not be technically inferior to Claude Code — Amazon hasn’t published a comparison that would let anyone evaluate that claim. What is documented is that the people using both tools in real conditions had a clear preference, that preference was overridden by an OKR, and that overriding it cost the company operationally.

The failure mode here isn’t “AI wrote bad code.” The pattern is: AI generates code faster than existing review processes can handle → volume targets push teams to merge without adequate review → one bad change hits a Tier-1 system → the blast radius is proportional to how far you’ve scaled the AI, not the severity of the original error.

If your org is setting AI coding tool adoption targets in OKRs, this is the mechanism by which you will eventually reproduce this. The scale will differ. The mechanism won’t.

Compare this to how tool mandates usually work in mature engineering organizations. The standard failure mode is slow adoption — engineers resist change, productivity gains get delayed, there’s friction. The Amazon failure mode is the opposite: adoption so fast, enforced so hard, that the organization’s ability to absorb the output couldn’t keep pace. Both are management failures. The Amazon version just has a bigger invoice.

The enterprise AI coding space — GitHub Copilot, Cursor, Claude Code, and Amazon’s own Kiro and Q Developer — is increasingly differentiated not by code generation quality but by the controls organizations can build around them. Human review workflows, permission scoping, change documentation, rollback tooling. Kiro deciding to “delete and recreate a production environment” is only possible because Kiro had production permissions with no approval gate. That’s a deployment decision, not a tool decision.

The Take

Amazon’s 90-day safety reset is the enterprise equivalent of a post-incident retrospective that never names the decision-maker. Thirty-five engineers didn’t cause 6.3 million lost orders. The executives who turned Kiro adoption into a measurable KPI did.

When you set an 80% usage target for an AI coding tool, you’re not measuring quality. You’re measuring compliance. Quality shows up later, in production, at scale.

Every tech lead reading this has watched a tool get mandated from above. Usually it’s a new project management system or a testing framework that nobody asked for. The stakes are lower. The dynamic is identical: leadership picks the tool, engineers say it doesn’t fit the workflow, adoption gets tracked instead of outcomes, and everyone quietly works around the mandate until the political will behind it fades.

Amazon didn’t have that luxury. At 21,000 AI agents touching production systems, “working around the mandate” wasn’t an option. The volume was too high, the surface area too large, and when a bad change hit a Tier-1 system, there was no buffer.

The action for any tech lead reading this is straightforward: if someone above you wants to track AI coding tool adoption as a metric, ask what review capacity increase comes with it. If the answer is “we’re not changing the review process,” you have the information you need. The review process is the safety system. Scaling the tool without scaling the review is exactly how you get Amazon’s March.