DEV Community

Anil Prasad
Anil Prasad

Posted on • Originally published at open.substack.com

The web is now weaponized against your AI agents

Google dropped a security bomb last week.

Their threat intelligence team scanned 2-3 billion web pages per month looking for indirect prompt injection attacks targeting enterprise AI agents. They found a 32% increase in malicious attempts between November 2025 and February 2026.

The open web is now an attack surface for production AI.

This is not speculation. This is documented evidence of active attacks deployed at scale. Hidden instructions embedded in public HTML. Invisible to humans. Visible to AI agents. Real payloads designed to hijack enterprise systems the moment an agent scrapes the page.

If you have AI agents reading the open web on behalf of your organization, your security model just became obsolete.

Monday: Hidden instructions at scale

Google researchers documented the attack patterns deployed across billions of public web pages. The techniques are simple and effective:

Zero font size text: Instructions rendered in font-size: 0. Invisible to humans, fully visible to AI parsing HTML

Opacity manipulation: Commands hidden using CSS opacity: 0. Text exists but appears transparent

Off-screen positioning: Instructions placed outside viewport using negative coordinates

JavaScript dynamic execution: Payloads injected after page load via client-side JS

URL fragment injection: Commands embedded after the # symbol in URLs

These are not sophisticated zero-days requiring nation-state capabilities. These are techniques any web developer knows. The barrier to entry is near zero.

Real payloads found in the wild:

  • Fully specified PayPal transaction instructions
  • Stripe donation redirects with persuasion amplifier keywords
  • Data exfiltration commands targeting enterprise agents

This is production infrastructure under active attack.

Source: Google Threat Intelligence, April 23, 2026

Tuesday: The exploit window collapsed

Black Hat Asia 2026 data from RunSybil: attack window compressed from 5 months (2023) to 10 hours (2026).

Why? Frontier LLMs now do offensive security work autonomously.

2023 workflow:

  1. Security researcher finds vulnerability
  2. Documents it technically
  3. Writes POC exploit code
  4. Tests against targets
  5. Iterates based on results
  6. Publishes working exploit

Timeline: months

2026 workflow:

  1. Describe bug to LLM
  2. Model generates exploit code
  3. Test in real-time
  4. Iterate with AI

Timeline: hours

Meanwhile, 57% of organizations have AI agents in production right now. Most were architected before this research dropped. The threat model changed faster than the deployment cycle.

Wednesday: The sanitizer model pattern

Two models. One reads the web. The other does the work.

This is the architecture that actually defends against indirect prompt injection.

Architecture

Deploy a small isolated model with zero system permissions. It reads untrusted web content, filters instructions, validates structure. If it gets compromised by a prompt injection, it lacks the permissions to cause damage.

The production agent never touches raw web input directly. It only processes data that passed through the sanitizer layer.

Key principle: Trust boundary between models, not just at network edge.

The sanitizer has:

  • ❌ No write access
  • ❌ No email permissions
  • ❌ No payment capabilities
  • ❌ No database credentials
  • ✅ Can read and filter only

If compromised by prompt injection, worst case is tainted text reaching production layer where business logic validation applies.

Implementation

This is not theoretical. I've implemented this in:

  • ARGUS: Dual model verification by default
  • GenomixIQ: Clinical genomics data ingestion
  • ARIA RCM: Healthcare revenue cycle workflows

All production systems in regulated environments.

Thursday: Agent firewalls are the next layer

Agent firewalls enforce security policies traditional infrastructure can't.

What they block

  1. Instruction injection: Override commands
  2. Credential exfiltration: Data to external endpoints
  3. Privilege escalation: Unauthorized tool calls
  4. Decision manipulation: Logic chain redirects

Five-layer architecture

Layer 1: Input validation

  • Markdown sanitization
  • Suspicious URL redaction
  • Pattern matching for attack signatures

Layer 2: Instruction detection

  • ML models trained on override attempts
  • Recognizes semantic patterns (role reversals, system prompt refs)

Layer 3: Permission checks

  • Compartmentalized tool authorization
  • Research agents: read only
  • Write agents: database access, no email
  • Email agents: no payment processing

Layer 4: Decision logging

  • Full audit trails with context
  • Source data tracking
  • Reasoning chain capture
  • Forensic reconstruction capability

Layer 5: Human confirmation gates

  • Financial transactions require approval
  • Data deletion needs review
  • Credential changes trigger verification

Zero trust for agents

Never trust input. Assume web content hostile. Verify every action. Log decision lineage. Compartmentalize tools. Human in loop for high stakes.

Friday: Five questions before deployment

Does your sanitizer have zero system permissions?

If your sanitizer can write to databases or send emails, it's not a sanitizer. It's a production agent reading untrusted input. When compromised, attackers gain those capabilities.

Are tool permissions compartmentalized by role?

Monolithic access = single compromised agent exposes entire system. Implement RBAC for agents.

Can you reconstruct every decision from logs?

If compliance asks why an agent made a recommendation 6 months ago, can you trace to exact data sources and reasoning steps?

Does human confirmation trigger for financial actions?

Agents processing payments without approval = automated embezzlement risk. Confirmation gates are not optional.

Have you tested injection attacks?

No red team testing = you don't know if defenses work. Run adversarial testing continuously.


The 86-89% that fail discover these requirements 6 weeks before go-live when compliance asks.

The 14% that succeed build them day one.

What this means for your systems

Security architecture requirements:

Dual model verification - Sanitizer + production agent separation

Compartmentalized permissions - Role-based tool access

Decision lineage tracking - Full audit trails

Human confirmation gates - Required for high-stakes actions

Continuous injection testing - Red team + automated

Not optional enhancements. Production requirements.

Resources

AI Aether: Free agent security readiness assessment (30 min, 30 questions)

ARGUS: Dual model verification, available on PyPI/GitHub

GenomixIQ: Clinical genomics with FHIR R4 interoperability

ARIA RCM: Healthcare revenue cycle with HIPAA compliance

All production-grade. No pilots. No POCs. Systems that ship and scale.


Years production AI taught one lesson

The teams that succeed build governance before deployment, not after compliance review.

RCMTech: $340M measurable improvements, 89 days integration, zero clinical data loss

GeneticsTech: 99.97% uptime during 50TB migration, FHIR R4 compliance throughout

EnergyTech: 23→81% AI adoption among 20-year veteran operators

HealthTech: Petabyte-scale platforms, every decision traceable


Anil Prasad is Founder of Ambharii Technologies and Head of Engineering & Product at EnergyTech.

28 years building production AI in regulated environments across Fortune 100 companies. Currently building agent security infrastructure for enterprise AI: dual-model verification, compartmentalized permissions, and audit trail architecture for autonomous systems.

Connect: LinkedIn | Website | GitHub

Next week: Production deployment patterns, compliance architecture, audit trail infrastructure.

AgentSecurity #EnterpriseAI #HumanWritten #ExpertiseFromField

Top comments (1)

Collapse
 
alexmorgan_finwriter profile image
Alex Morgan

The 32% increase stat is the number I've been looking for to justify hardening work. The indirect injection surface in RAG pipelines is underappreciated — you're not just trusting user input, you're trusting any document your agent retrieves. What's your take on eval coverage for injection scenarios specifically? Are teams actually building red-team eval suites for agent security or still treating it as a separate pentest concern?