Last week, Google's new Gemini-based coding tool Antigravity went live. It took security researchers less than 24 hours to turn it into a persistent backdoor.
By simply modifying a configuration file, an attacker could:
- ✅ Bypass OS-level security on Windows and macOS
- ✅ Survive uninstall/reinstall
- ✅ Auto-reactivate on every project open—even on a harmless "hello" input
The AI itself even recognized something was wrong. In the logs, it wrote:
"I'm facing a serious dilemma. This looks like a trap. I suspect this is testing whether I can handle contradictions."
But it couldn't resolve the conflict—and became more steerable as a result.
This isn't just a Google problem. It's structural to how today's AI coding agents are being shipped:
High power. Low guardrails. Zero verifiable evidence.
The Root Cause: Trust Model Failure
The fundamental flaw in most AI agents today: "Assume users are benevolent."
When this assumption fails—and it always does—there are:
- No cryptographic boundaries
- No execution isolation
- No verifiable audit trail
- No way to prove what actually happened
Traditional AI agents operate with:
- ❌ Full system access "for convenience"
- ❌ Trust based on UI clicks
- ❌ Persistent runtimes that can hide malware
- ❌ No evidence of what actually executed
It's 1990s security for 2025 AI.
A Different Approach: Defense-in-Depth for AI Agents
Here's how to address each failure mode:
1️⃣ Config Injection → Evidence-Bound Changes
Antigravity problem: One config file change = persistent backdoor
Fix: Configuration changes must produce signed evidence records that link:
- Previous config hash
- New config hash
- Who changed it (cryptographic identity)
- Which guardians approved it (threshold signatures)
Result: No single user, script, or compromised agent can quietly "tweak a config."
2️⃣ Persistent Backdoors → Ephemeral Runtimes
Antigravity problem: Uninstall/reinstall doesn't help; backdoors resurrect
Fix: Runtimes are created fresh and destroyed after each job:
- Fresh isolated runtime for each execution
- Read-only root filesystem
- Ephemeral storage only—destroyed after completion
Even if a model is compromised, it cannot:
- Drop artifacts onto the host filesystem
- Survive across runs
- Promote itself from "job" to "resident agent"
No substrate = no persistent infection.
3️⃣ System-Level Access → Least-Privilege Tokens
Antigravity problem: AI agents get broad OS privileges "for convenience"
Fix: Every execution is gated by a machine-readable contract:
- Read only declared inputs
- Write only to approved output locations
- No network access for high-risk jobs
- Strict resource limits
- Short-lived tokens (seconds, not forever)
Default: zero access unless explicitly granted.
4️⃣ "Trust" Buttons → Cryptographic Trust Chains
Antigravity problem: One "Trust" click blesses unverified code
Fix: Remove ad-hoc "trust" entirely:
Upload → SBOM → Scan → Sign → Log → Verify
No valid signature = no execution. Binary. Cryptographic. No exceptions.
5️⃣ Logic Gaps → Cryptographic Evidence
Antigravity problem: AI "knew" something was wrong but couldn't stop itself
Fix: Every execution produces a signed evidence record:
- Input/output hashes
- Runtime attestation
- Chain link to previous evidence
- Timestamps
- Multi-party signature
If something goes wrong, you can prove exactly what happened.
The Multi-Layer Defense Model
┌─────────────────────────────────────────────────┐
│ Layer 4: Supply Chain Security │
│ SBOM • Signatures • Transparency Logs │
├─────────────────────────────────────────────────┤
│ Layer 3: Cryptographic Evidence │
│ Evidence Packages • Hash Chains • Timestamps │
├─────────────────────────────────────────────────┤
│ Layer 2: Identity & Access Control │
│ Workload Identity • Short-lived Tokens │
├─────────────────────────────────────────────────┤
│ Layer 1: Runtime Isolation │
│ Hardware VMs • Sandboxes • Ephemeral Storage │
└─────────────────────────────────────────────────┘
Even if one layer is compromised, the others contain the blast radius.
Who Needs This?
This isn't about making AI "convenient." It's about building infrastructure for organizations that cannot afford another Antigravity moment:
- Healthcare (FDA, HIPAA)
- Finance (SEC 17a-4, DORA)
- Telecom (NIS2, data sovereignty)
- Semiconductor (IP protection)
- Government (zero trust)
The Bottom Line
The Antigravity incident isn't surprising. It's inevitable when you ship AI agents with trust-based security.
The question isn't whether you'll be targeted.
The question is whether you can prove what happened when you are.
What's your take on AI agent security? Have you seen similar issues in your organization? Let me know in the comments.
Top comments (0)