Originally published at twarx.com - read the full interactive version there.
Last Updated: June 20, 2026
Why Amazon hates 'human-in-the-loop' AI governance comes down to one unexamined assumption the entire AI safety establishment was built on — that humans make AI systems safer. Amazon's VP of Security just blew that assumption apart in public, and the governance industry has no answer.
On June 20, 2026, The Register published Eric Brandwine — distinguished engineer and VP at Amazon Security — arguing that human-in-the-loop is not the gold standard for governing AI agents. This matters now because every enterprise deploying AI agents with LangGraph, AutoGen, or CrewAI is being told to bolt humans onto every decision. Every single one.
By the end of this, you'll know exactly when human review helps, when it actively harms, and how to build governance that survives both machine speed and the EU AI Act.
Eric Brandwine, distinguished engineer and VP at Amazon Security, argues that human-in-the-loop is not the governance gold standard the industry assumes — the central claim behind The Human Bottleneck Fallacy. Source: The Register
Coined Framework
The Human Bottleneck Fallacy — the dangerous assumption that inserting a person into an AI decision loop always improves outcomes, when at scale it systematically introduces fatigue bias, inconsistency, and false confidence that no audit trail can fix
It names the systemic error of treating human approval as a stable ground truth. At machine speed and volume, human reviewers don't add reliability — they add a slower, less consistent, harder-to-audit failure mode that organizations mistake for safety.
What Amazon Actually Announced: The Brandwine Statement Explained
Eric Brandwine's exact argument — what he said and what he didn't
Brandwine's core claim, made in a phone interview with The Register, is blunt: humans are 'a little bit precious about humans.' We believe we're good at our jobs. Consistent. Reliable. 'But when you actually get down to it, humans are not terribly consistent,' he said. His sharpest framing: 'We know how humans fail. We're comfortable with it. So human-in-the-loop isn't necessarily the gold standard.'
Critically, Brandwine did not say remove humans entirely. He said human-in-the-loop is 'something that you should use judiciously, where you absolutely need it. But it's not something that you can do at high velocity. You will not get the results that you want to get.' That distinction gets lost every time someone summarizes this piece.
'If you put a human inside of this tight loop, and ask them to make approval decisions for agentic tools repeatedly, they'll do a good job. And then they'll do an okay job. And pretty quickly they'll be doing a poor job.' — Eric Brandwine, VP, Amazon Security
The Register report: date, context, and official sourcing
The piece — headlined 'Why Amazon hates human-in-the-loop AI governance: VP Eric Brandwine explains people aren't all that great, actually' — was published by reporter Jessica Lyons on Saturday, June 20, 2026 at 15:25 UTC. Brandwine anchors his argument in a concept he first presented at AWS re:Invent in 2017: the normalization of deviance. That's nine years of sitting on this before saying it out loud in public.
Why this statement is significant coming from Amazon's VP of Security
This isn't a growth exec pitching autonomy as a feature. It's a security VP making a risk argument — and that framing changes everything. Amazon Web Services operates at a scale where per-decision human review is arithmetically impossible without catastrophic latency, and Brandwine is publicly saying the industry's default control mechanism is theater. Coming from the person responsible for keeping that infrastructure standing, that's not a casual observation. For deeper context on how security shapes deployment, see our guide to AI security fundamentals.
Brandwine's 'normalization of deviance' example: emergency-room nurses jump at every alarm on day one. After enough false alarms, discipline slips, response stops — and eventually a real alarm is missed. This is documented among healthcare workers in alarm-fatigue research, firefighters, and Army pilots. The same fatigue curve applies to every human approving agent actions all day.
15:25 UTC
Publication time of the Brandwine interview, June 20, 2026
[The Register, 2026](https://www.theregister.com/security/2026/06/20/why-amazon-hates-human-in-the-loop-ai-governance/5258639)
2017
Year Brandwine gave his re:Invent 'normalization of deviance' talk
[The Register, 2026](https://www.theregister.com/security/2026/06/20/why-amazon-hates-human-in-the-loop-ai-governance/5258639)
<10 yrs
Industry experience with modern LLMs vs millennia with humans
[The Register, 2026](https://www.theregister.com/security/2026/06/20/why-amazon-hates-human-in-the-loop-ai-governance/5258639)
What Is Human-in-the-Loop AI Governance — A Full Technical Definition
The three models: human-in-the-loop, human-on-the-loop, and fully autonomous
Human-in-the-loop (HITL) means a human must approve, reject, or correct an AI output before it triggers a downstream action. Standard in regulated industries since around 2019. Human-on-the-loop (HOTL) means humans monitor decisions in aggregate and can intervene, but don't approve each individual output — this is the model Brandwine implicitly advocates. Fully autonomous removes the human from the runtime path entirely, constraining behavior through architecture instead. Most people conflate the second and third. They're not the same.
Brandwine's key technical observation: both humans and AI agents are non-deterministic. Neither is guaranteed to produce the same output given the same input twice. Both make mistakes and both make things up. The difference is familiarity, not reliability.
Humans and LLMs are both non-deterministic. The only reason we trust the human more is that we've had millennia to get comfortable with how they fail — not because they fail less.
How Amazon Augmented AI (A2I) worked — and what it revealed about human review at scale
Amazon Augmented AI (A2I) was Amazon's own managed HITL service: teams could route low-confidence ML predictions to human reviewers. Its very design — routing only low-confidence cases — telegraphs the internal skepticism Brandwine now voices openly. The smart move was never 'review everything.' It was 'review only what the model is unsure about.' Amazon has been quietly acting on this logic for years while the industry was still writing blog posts about HITL being non-negotiable.
The autonomy spectrum: where enterprise AI systems actually sit in 2026
Modern agentic frameworks — LangGraph, AutoGen, and CrewAI — all offer configurable interrupt points. But most production deployments now set those interrupts to exception-only, not default-review. The industry has quietly moved toward Brandwine's position even where it hasn't said so out loud. See our breakdown of multi-agent systems for how interrupts are wired in practice.
The autonomy spectrum: HITL approves every action, HOTL monitors in aggregate, and structural governance constrains behavior at the architecture layer. Brandwine advocates pushing routine decisions rightward.
The Human Bottleneck Fallacy: Why Human Oversight Can Make AI Worse
Cognitive fatigue and the 'approval rubber-stamp' problem
This is the heart of Brandwine's argument, and the heart of the framework this article names. When you ask a person to approve agent actions 'time after time,' their performance degrades in a predictable curve: good, then okay, then poor. The approval becomes a reflex. A rubber stamp that produces an audit trail with no actual judgment behind it — and the scariest part is that the log looks identical either way.
Coined Framework
The Human Bottleneck Fallacy in practice
A six-step agent pipeline with a human approving each step doesn't get safer with volume — it gets a fatigued reviewer whose accuracy decays hourly. The audit log says 'human approved,' but the human stopped genuinely reviewing hours ago. That's false confidence no compliance team can detect from the trail alone.
Inconsistency as a governance risk: when reviewers disagree more than the model does
Brandwine's deepest point: a non-deterministic human reviewing a non-deterministic model does not produce determinism — it stacks two sources of variance. In ambiguous cases, human inter-rater agreement frequently lands below the model's own self-consistency. You're inserting a noisier governor to police a quieter one. I've watched this happen on labeling projects. The humans argued more than the model varied.
The false confidence trap: how HITL creates audit theatre
The most dangerous outcome isn't that humans miss things — it's that their signature on the log feels like accountability. Anthropic's Constitutional AI and OpenAI's RLHF both depend on human preference data that is demonstrably inconsistent across annotator cohorts — which undermines the premise that human judgment is stable ground truth before you've even gotten to the fatigue problem.
What most people get wrong: they treat HITL as a binary safety switch. It isn't. Above roughly 10,000 decisions per day, a fatigued human reviewer introduces more variance than the model they're checking — converting your safety control into your largest unmonitored failure mode.
How the Human Bottleneck Fallacy Degrades a High-Volume Approval Loop
1
**Agent generates action (LangGraph node)**
Agent proposes a downstream action; system pauses at an interrupt_before node awaiting human approval.
↓
2
**Hour 1: reviewer is sharp**
Human reads context carefully, catches edge cases. Genuine judgment applied. Throughput low, accuracy high.
↓
3
**Hour 4: normalization of deviance sets in**
After hundreds of benign approvals, discipline slips. Reviewer pattern-matches and clicks approve. Accuracy drops sharply.
↓
4
**The rubber-stamp state**
Audit log shows 'human approved' for every action — but the human is no longer reviewing. False confidence is now embedded in the record.
↓
5
**Tragic outcome**
A genuinely harmful action passes through, indistinguishable in the log from the benign ones. The 'human brake' failed silently.
This is exactly Brandwine's emergency-room alarm parallel applied to agentic governance — the sequence matters because the failure is invisible until it's catastrophic.
Amazon's Governance Philosophy in Full: What Replaces Human Review
Automated policy enforcement via AWS guardrails and service control policies
Amazon Bedrock Guardrails lets organizations define automated content filters, topic restrictions, and grounding checks without human review queues. Combined with AWS Service Control Policies (SCPs) and IAM permission boundaries, governance becomes structural — constraining what an agent can do by architecture, not by approval. An agent can't perform a forbidden action because the tool that would execute it is never exposed. Full stop.
Statistical process control instead of individual human approvals
Amazon's model relies on probabilistic monitoring: if outputs drift beyond a statistical threshold, an automated rollback triggers — no human in the critical path. This is the engineering translation of 'human-on-the-loop': humans watch the aggregate, machines enforce the runtime. It's the same logic that makes circuit breakers more reliable than a person watching a fuse panel. Our deep dive on AI observability covers how to instrument this drift detection in production.
Stop asking a human to approve every action. Constrain the agent so the harmful action is architecturally impossible — then put humans on the anomalies, not the assembly line.
RAG, vector databases, and retrieval-augmented audit trails
RAG (Retrieval-Augmented Generation) architectures paired with vector databases like Pinecone enforce factual grounding at inference time — replacing the human fact-checker with a verifiable retrieval layer. Every claim an agent makes can be traced to a retrieved source, producing an audit trail that's actually stronger than a tired human's signature. Not marginally stronger. Categorically stronger, because it doesn't degrade after lunch.
Exception-only
Recommended interrupt pattern in modern agent deployments
[LangGraph Docs, 2026](https://langchain-ai.github.io/langgraph/)
Structural
Governance type: enforce via tool permissions, not approval queues
[AWS Bedrock Guardrails, 2026](https://aws.amazon.com/bedrock/guardrails/)
Non-deterministic
What humans AND LLMs both are, per Brandwine
[The Register, 2026](https://www.theregister.com/security/2026/06/20/why-amazon-hates-human-in-the-loop-ai-governance/5258639)
How to Access and Implement Amazon's Governance Approach: Step-by-Step
Setting up Amazon Bedrock Guardrails: configuration and availability
Bedrock Guardrails is available across major AWS regions. You define denied topics, content filters by severity, PII redaction, and contextual grounding thresholds — all enforced automatically at inference time, no review queue required. The configuration is declarative. You're writing policy, not wiring approvals.
Migrating from Amazon A2I to automated governance pipelines
The migration pattern: replace A2I human review tasks with Bedrock Guardrails for policy enforcement plus CloudWatch anomaly detection for drift. Keep humans only on the anomaly cluster, not the per-decision path. If your A2I volume was above roughly 10,000 tasks a day, you were probably already getting rubber-stamped anyway.
Integrating with LangGraph, AutoGen, and MCP-based agentic systems
The recommended pattern in LangGraph is to trigger interrupt_before only on low-confidence outputs — not on every node. MCP (Model Context Protocol) server integrations let you enforce governance as tool-use constraints rather than post-hoc human review. For pre-built patterns, explore our AI agent library, and for governed deployment templates see our production-ready governance agents.
Python — LangGraph confidence-gated interrupt (exception-only governance)
Brandwine-style governance: interrupt ONLY on low confidence,
not on every node. Humans review exceptions, guardrails handle the rest.
from langgraph.graph import StateGraph
CONFIDENCE_THRESHOLD = 0.65 # below this, escalate to a human
def governance_router(state):
score = state['confidence']
if score < CONFIDENCE_THRESHOLD:
# Exception path -> human-on-the-loop review
return 'human_review'
# High-confidence path -> Bedrock Guardrails enforce policy automatically
return 'auto_execute'
graph = StateGraph(dict)
graph.add_node('auto_execute', execute_with_guardrails)
graph.add_node('human_review', escalate_to_reviewer)
graph.add_conditional_edges('agent', governance_router)
Result: a human sees ~5% of actions, stays sharp, and the audit
trail reflects genuine review instead of rubber-stamping.
Bedrock Guardrails enforce topic, content, and grounding policy automatically — replacing the human review queue with structural governance. This is the mechanism behind Amazon's anti-HITL stance.
[
▶
Watch on YouTube
Amazon Bedrock Guardrails for agentic AI governance
AWS • automated policy enforcement vs human review
](https://www.youtube.com/results?search_query=amazon+bedrock+guardrails+agentic+governance)
When to Use Human-in-the-Loop vs Amazon's Automated Governance Model
Use cases where HITL remains legally mandatory in 2026
EU AI Act Article 14 mandates meaningful human oversight for high-risk AI in employment, credit, and healthcare. Amazon's model doesn't exempt anyone from these requirements — Brandwine himself stressed HITL should be used 'where you absolutely need it.' If you're shipping anything that affects hiring, lending, or clinical decisions in Europe and you've removed humans entirely, you're not being bold. You're being non-compliant. Our AI governance playbook walks through the documentation you'll need.
The volume threshold: when automated governance outperforms human review
The practical breakpoint sits around 10,000 decisions per day. Above it, fatigue-driven variance overwhelms any benefit human review provides. Below it — and especially for individually consequential decisions — human review remains justifiable. That's not a crisp line, but it's a real one.
Hybrid governance: the Tiered Autonomy Architecture
The synthesis is a tiered model: automated guardrails for high-volume routine decisions, human-on-the-loop for anomaly clusters, mandatory HITL for individually consequential actions above a defined impact threshold. OpenAI's Operator and Anthropic's Claude tool-use policies both recommend exception-based review rather than universal approval chains. The industry is converging here, just slowly and without admitting it.
❌
Mistake: Default-review on every agent node
Setting interrupt_before on every LangGraph node creates the exact fatigue curve Brandwine describes — reviewers degrade to rubber-stampers within hours.
✅
Fix: Gate interrupts on a confidence threshold (e.g. <0.65). Humans see only ~5% of actions and stay sharp.
❌
Mistake: Treating the approval log as accountability
A log full of 'human approved' entries feels safe but hides rubber-stamping — the false confidence trap at the core of The Human Bottleneck Fallacy.
✅
Fix: Pair RAG grounding traces with CloudWatch anomaly detection so the audit trail captures evidence, not just a signature.
❌
Mistake: Removing humans from high-risk legal decisions
Going fully autonomous on credit, hiring, or healthcare decisions violates EU AI Act Article 14 regardless of technical merit.
✅
Fix: Keep mandatory HITL above your defined impact threshold; automate only the high-volume routine tier.
Amazon vs the Field: How Competitors Approach AI Governance
Microsoft Azure AI: human oversight as default, not exception
Microsoft Azure AI Content Safety escalates high-severity outputs for human review — a hard-coded HITL posture that directly contradicts Brandwine's philosophy. Microsoft's bet is that enterprises will pay a latency and cost premium for a default that passes legal scrutiny. That's a real tradeoff, not a wrong answer.
Google DeepMind and Anthropic: constitutional and interpretability-first
At Google Cloud Next in April 2026, Google Cloud COO Francis deSouza told reporters: 'We have moved from a human-led defense strategy, to a human-in-the-loop defense strategy, to an AI-led defense strategy that's overseen by humans.' That's practically a paraphrase of Brandwine — just with better PR packaging. Anthropic's Constitutional AI uses Claude to review Claude's own outputs — a middle path that replaces human reviewers with a specialized AI auditor, and probably the most intellectually honest attempt to solve the problem so far.
The EU AI Act compliance gap
Under the EU AI Act (high-risk obligations effective August 2026), fully automated governance without documented human oversight capability is non-compliant. Amazon's enterprise customers face a direct tension between Brandwine's philosophy and European law. That tension won't resolve itself — someone's legal team is going to have to make a call.
ProviderDefault postureMechanismEU AI Act readiness
Amazon (Bedrock Guardrails + SCPs)Autonomous, human-on-the-loopStructural guardrails + statistical drift rollbackNeeds documented oversight evidence
Microsoft Azure AIHuman-in-the-loop by defaultSeverity-triggered review escalationMost regulation-ready by default
Anthropic (Constitutional AI)AI-reviews-AITrained auditor model vs a constitutionMiddle path; needs human sign-off layer
Google Vertex AIAI-led, human-overseenDrift detection + retraining triggers, human promotion sign-offClose to Amazon, with promotion gate
Industry Impact: What Amazon's Position Means for Enterprise AI Governance
The enterprise governance vendor market faces existential disruption
A large share of the enterprise AI governance software value proposition is built on human review workflow tooling — precisely the layer Brandwine's position renders architecturally obsolete. Vendors selling 'approval queue' software are now selling the bottleneck. Some of them know it. None of them are saying so. The NIST AI Risk Management Framework increasingly frames oversight as a documented capability rather than a per-decision gate, which accelerates this shift.
The monetization angle for builders: replacing a 5-person, 24/7 review rota (~$400K–$600K/year fully loaded) with Bedrock Guardrails plus CloudWatch anomaly detection routinely costs a fraction of that in API fees — while removing the fatigue-driven error mode entirely. That's the business case Brandwine is implicitly making.
Agentic frameworks already build post-HITL by default
CrewAI ships human_input=True as an optional parameter, not a default. n8n's fastest-growing agentic template category is fully automated agent chains with zero human interrupts. The industry already voted with its defaults. The governance blogs just haven't caught up yet. See how this plays out in workflow automation and orchestration.
Where governance is heading
Microsoft CEO Satya Nadella, in an X post the same week, argued for 'loop learning' instead of checking AI output at every step: 'Companies need to turn their workflows, domain knowledge, and accumulated judgment into AI systems that improve with each use.' IBM execs separately called for human accountability — not humans in the loop — at all stages. Two different framings of the same shift: from approval to oversight, from presence to responsibility.
Expert and Community Reactions to Amazon's Anti-HITL Stance
AI safety researchers push back
Safety researchers at the Center for AI Safety frame the risk bluntly: removing human-in-the-loop governance before interpretability tools mature is 'removing the only brake before we've built new ones.' That's a fair objection. Brandwine's argument works at AWS scale with AWS tooling. It doesn't automatically generalize to every org with a LangGraph deployment and three engineers.
Enterprise architects and ML engineers
Practitioners responding to The Register largely agreed with Brandwine's diagnosis but flagged that compliance and legal teams will block implementation regardless of technical merit. A recurring line: 'Brandwine is right about humans being unreliable at scale, but the answer is better review tooling and smaller review scope — not eliminating humans from consequential decisions.' That's a reasonable read. The diagnosis and the prescription are separable.
The liability counterpoint
The AI governance community keeps citing the 2024 Air Canada chatbot case — where fully automated AI customer service produced a legally binding incorrect refund promise — as evidence that removing oversight shifts liability to the enterprise in unpredictable ways. Autonomy is a security argument; liability is a legal one. They don't always agree, and in a courtroom, the legal argument wins.
What Comes Next: The Future of AI Governance After the Human-in-the-Loop Era
AI-reviews-AI: specialized governance models replace human auditors
The next generation of governance is model-based: a specialized auditor LLM, trained on policy documents, reviews outputs from production models. Anthropic's Constitutional AI is the current proof of concept; purpose-built governance models are in active development across hyperscalers. It's not a perfect solution. But a well-scoped auditor model that never gets tired is a more credible reviewer than a person clicking approve at 4pm on a Friday. We unpack the architecture in our piece on LLM evaluation and auditor models.
Structural governance through architecture
MCP (Model Context Protocol) — now an open standard adopted across LangGraph, AutoGen, and Claude tool-use — enables governance at the tool-permission layer. An agent literally cannot take a non-compliant action because the tool that would execute it isn't exposed. This is the direction Brandwine's argument points: don't review the action after the fact, make the bad action impossible in the first place. For builders, our governance agent templates wire this pattern in by default.
2026 H2
**EU AI Act high-risk enforcement begins (August 2026)**
Enterprises using autonomous governance in Europe must show that automated guardrails constitute 'meaningful human oversight' under Article 14 — a legal test no court has yet defined.
2027
**Purpose-built governance models go GA**
Specialized auditor LLMs ship as managed services, extending the Constitutional AI pattern beyond Anthropic to mainstream enterprise stacks.
2028
**'Human-in-the-loop' redefined as audit capability**
Within ~36 months the term becomes a compliance term of art meaning intervention authority and audit access — not per-decision approval. Amazon will be credited with forcing the shift.
The post-HITL stack: an auditor LLM reviews production outputs while MCP enforces compliance at the tool-permission layer — the structural governance Brandwine's argument points toward.
Frequently Asked Questions
Why does Amazon hate human-in-the-loop AI governance?
Amazon's VP of Security Eric Brandwine argues that humans are non-deterministic and inconsistent, just like LLMs, and that placing a person inside a tight, high-velocity approval loop produces predictable fatigue: 'They'll do a good job, then an okay job, and pretty quickly they'll be doing a poor job.' He calls this the normalization of deviance. Amazon's position, per The Register (June 20, 2026), is that HITL should be used 'judiciously, where you absolutely need it' — not as the default gold standard. At AWS scale, per-decision human review is also arithmetically impossible without catastrophic latency, so Amazon favors structural guardrails like Bedrock Guardrails and statistical drift monitoring instead.
What did Amazon VP Eric Brandwine say about human AI oversight?
Brandwine, a distinguished engineer and VP at Amazon Security, said humans are 'a little bit precious about humans' and that 'when you actually get down to it, humans are not terribly consistent.' His central line: 'We know how humans fail. We're comfortable with it. So human-in-the-loop isn't necessarily the gold standard.' He used the documented emergency-room alarm-fatigue phenomenon — also seen in firefighters and Army pilots — to illustrate the normalization of deviance, where repeated benign alerts erode discipline until a real one is missed. He concluded HITL 'is not something you can do at high velocity. You will not get the results you want.'
What is the difference between human-in-the-loop and human-on-the-loop AI governance?
Human-in-the-loop (HITL) requires a person to approve, reject, or correct each AI output before any downstream action executes — review happens on every decision. Human-on-the-loop (HOTL) means humans monitor decisions in aggregate and can intervene, but don't approve each individual output — machines enforce the runtime while humans watch anomalies. Brandwine implicitly advocates HOTL plus structural guardrails. Google Cloud COO Francis deSouza described the same trajectory: 'from a human-led defense strategy, to a human-in-the-loop strategy, to an AI-led defense strategy that's overseen by humans.' The practical implementation in LangGraph is gating interrupts on confidence thresholds rather than triggering them on every node.
Is Amazon's approach to AI governance compliant with the EU AI Act?
It depends on use case. EU AI Act Article 14 mandates meaningful human oversight for high-risk AI systems in employment, credit, and healthcare. Brandwine himself stresses HITL should be used 'where you absolutely need it,' so Amazon's philosophy doesn't exempt customers from these obligations. For high-risk systems, enterprises using autonomous governance must produce documented evidence that automated guardrails constitute meaningful oversight — a legal test no court has yet defined as high-risk enforcement begins August 2026. For low-risk, high-volume routine decisions, automated governance via Bedrock Guardrails and statistical monitoring is generally defensible. The safe pattern is a Tiered Autonomy Architecture that keeps mandatory HITL above a defined impact threshold.
What replaces human review in Amazon's AI governance model?
Four structural layers replace per-decision human approval. First, Amazon Bedrock Guardrails enforce content filters, denied topics, PII redaction, and grounding checks automatically at inference time. Second, AWS Service Control Policies and IAM permission boundaries constrain what agents can do architecturally — forbidden actions are simply not exposed. Third, statistical process control via CloudWatch anomaly detection triggers automated rollback when outputs drift beyond a threshold, with no human in the critical path. Fourth, RAG paired with vector databases enforces factual grounding and produces a verifiable retrieval audit trail. Humans shift from approving every action to monitoring anomaly clusters — human-on-the-loop, not human-in-the-loop.
How do agentic AI frameworks like LangGraph and CrewAI handle human oversight?
LangGraph offers configurable interrupt points (interrupt_before, interrupt_after); the recommended modern pattern is to trigger them only on low-confidence outputs — for example below a 0.65 confidence score — rather than on every node. CrewAI ships human_input=True as an optional parameter, not a default. AutoGen supports configurable human-in-the-loop modes but production deployments increasingly set them to exception-only. n8n's fastest-growing agentic template category is fully automated chains with zero interrupts. MCP lets all of these enforce governance at the tool-permission layer.
What are the risks of removing human-in-the-loop governance from enterprise AI systems?
Three main risks. First, regulatory: removing humans from high-risk decisions can violate EU AI Act Article 14, exposing the enterprise to penalties. Second, liability: the 2024 Air Canada chatbot case showed that fully automated systems can create legally binding errors that shift liability to the company unpredictably. Third, safety maturity: the Center for AI Safety warns that removing oversight before interpretability tools mature is 'removing the only brake before we've built new ones.' The mitigation is not full autonomy but a Tiered Autonomy Architecture — automated guardrails for routine high-volume decisions, human-on-the-loop for anomalies, and mandatory human review for individually consequential actions above a defined impact threshold.
About the Author
Rushil Shah
AI Systems Builder & Founder, Twarx
Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.
LinkedIn · Full Profile
This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.



Top comments (0)