Google just pulled back the curtain on how they are fighting one of the nastiest threats in the AI security space right now — indirect prompt injection. And honestly, the approach is more sophisticated than most people would expect.
The Problem That Won't Go Away
Here is the thing about indirect prompt injection (IPI): it is not the kind of vulnerability you patch once and forget. It is an evolving attack vector that targets AI applications pulling from multiple data sources — exactly the kind of environment Google Workspace creates when Gemini is reading your emails, docs, and calendar entries to help you work.
The attacker does not need direct access to your prompt. They just need to hide malicious instructions somewhere the LLM might ingest — a shared document, an email, a calendar invite. When Gemini processes that content, it follows the poisoned instructions without the user ever knowing. No social engineering required. No phishing click. Just data doing what data does.
How Google Is Actually Fighting This
What makes this blog post worth reading is that Google is not relying on a single defense. They are running a continuous, layered system that operates on multiple fronts simultaneously.
Human Red-Teaming: Specialised teams run adversarial simulations using realistic user profiles. They actively try to break the system the way a real attacker would — probing for weaknesses in how Gemini handles untrusted content.
Automated Red-Teaming: Beyond human testers, Google runs ML-driven frameworks that algorithmically generate and iterate on attack payloads at scale. This lets them stress-test far more variations than any human team could manage.
Vulnerability Rewards Program: They are paying external researchers to find new IPI attacks. This is smart — the security research community sees things internal teams miss, and financial incentives accelerate discovery.
Open-Source Intelligence: Google monitors social media, blogs, press releases, and security feeds for newly disclosed IPI techniques in the wild. Every new attack variant discovered externally gets fed back into their own defenses.
The Defense Pipeline
Here is where it gets genuinely interesting from an engineering perspective. New attacks do not just get patched — they get catalogued, reproduced, analysed by Google's Trust and Safety teams, and then Synthetically expanded using a system called Simula. This generates training data covering thousands of attack variants from a single discovered vulnerability.
That synthetic data then feeds three layered defenses:
Deterministic defenses — URL sanitisation, user confirmation prompts, tool chaining policies. Fast to deploy, effective against known attack patterns.
ML-based defenses — Retrained on synthetic data to recognise and mitigate entire categories of new attacks, not just specific signatures.
Model hardening — System-level prompt engineering that teaches Gemini itself to identify and ignore injected instructions at the model level, not just at the application layer.
The key insight: they treat this as a continuous cycle, not a one-time fix. Every day the threat landscape evolves, and their defenses evolve with it.
Why You Should Care
If your organisation uses Google Workspace with Gemini — and that is a lot of organisations — this is directly relevant to your attack surface. But even if you do not, the lessons here apply broadly:
AI integration is expanding across every SaaS platform. Indirect prompt injection is not a Google-specific problem. Microsoft Copilot, Notion AI, every tool that feeds user-generated content into an LLM is a potential target.
The security community is still figuring out how to defend against this at scale. Google's approach — layered defenses, continuous red-teaming, synthetic data expansion, and model hardening — is currently one of the most complete public examples of how to do it.
What To Do
Audit your AI tooling — Every application that feeds user content into an LLM is a potential IPI surface. Map it.
Demand transparency from vendors — If your SaaS provider integrates AI, ask them specifically how they handle indirect prompt injection. Google just published their answer. Others should too.
Monitor your AI outputs — Unexpected behaviour from AI assistants — strange responses, actions you did not request, data being shared oddly — could be a sign of prompt injection.
Limit AI access to untrusted content — The less untrusted data your AI tools process, the smaller your attack surface.
Stay current — This field moves fast. What Google published today will be baseline in six months.
Full story: https://security.googleblog.com/2026/04/google-workspaces-continuous-approach.html
What is your take — are you seeing prompt injection risks in your environment? Have your vendors shared how they are handling this? Drop your thoughts below. 🔒
More at https://securitycyber.uk
Mastodon: https://infosec.exchange/@securitycyber
LinkedIn: https://www.linkedin.com/in/charlie-collins-sec
Bluesky: https://bsky.app/profile/securitycyberuk.bsky.social
Substack: https://securitycyber.substack.com
Discord: https://discord.gg/securitycyber
Recommended resources to go deeper: https://www.hackthebox.com for hands-on practice, https://portswigger.net/web-security for free web security labs, and https://academy.tcm-sec.com for structured courses.
Originally published at https://securitycyber.uk
Top comments (0)