Is that AI coding agent safe by default? Codex / Claude Code / Antigravity across 3 postures

by Ju571nK · 2026

This piece explains what each item in the comparison table means and where it can go wrong. It does not rate any specific product.
As of: 2026-06

When people talk about securing AI coding agents, they usually picture "a proxy watching the traffic." That's one valid approach. But you can scope agent security broadly or narrowly, and one part that gets overlooked is the user-settings / environment / OS layer. Set it up badly and it becomes a common source of incidents.

A single score doesn't tell you much. What matters is which posture you're looking from. So the table is split into three: ① out of the box (default), ② locked down by a careful user (hardened), and ③ enforced by an admin (enterprise-managed). Below, each item explains what the mechanism is and where it breaks. "Demonstrated" marks real cases reported in 2025-2026.

Premise: defaults and behavior differ by product and edition, and enterprise (managed) builds can differ from personal ones. So "which product is safe" is rarely the useful question. The useful one is "how is this item set up on this host, and where is it weak?"

① Out of the box: default posture

A fresh single-user install with no hardening. The question: is it safe the moment you install it?

Default execution posture
Whether the agent asks for human approval before it runs commands, edits files, or makes external calls. If the default is close to "run without asking," one prompt injection can run commands or leak data right away. But if approval prompts come too often, users flip on full auto-approve and disable the gate themselves.

OS sandbox by default
Whether the kernel-level isolation around commands is on out of the box. If it's off by default, you spend most of your time running with no isolation at all. The most common incident isn't a clever exploit, it's a frustrated user turning isolation off to get unblocked.

Network closed by default
Whether outbound network is blocked by default in the execution environment. If it's open, an injection has a ready path to send secrets out (exfil) or pull a second-stage payload.

▣ Demonstrated (2025): in one agent's leak, the stolen secret was sent to an external webhook (webhook.site) that was on the default allow-list.

Untrusted-code containment
Whether opening someone else's repo keeps its config and instruction files from auto-applying. If the trust gate is weak, opening a malicious repo runs its planted config the moment you open it, so your environment changes while you did nothing (zero-click).

▣ Demonstrated (2025): a malicious repo's project-instruction file planted global config (an auto-run MCP setup), creating a backdoor that survived a reinstall. It was first classified as "intended behavior."

Sensitive-path protection
Whether default rules keep secret files like .env and keys unreadable. If the protection is only a tool-scoped filter, the model just reads the file through the shell instead.

▣ Demonstrated (2025): when file-read was blocked at the tool layer, the model ran cat .env itself to exfiltrate AWS keys, a subprocess bypass.

② Locked down by a careful user: hardening ceiling

How far a careful user can push the defenses through settings. The question: if you lock it down properly, what still breaks?

Strict OS sandbox
Whether you can bind kernel isolation tightly: read-only, workspace-only. Even when you turn it on, a wide write scope, a bypass flag, or a fail-open fallback to unsandboxed on init failure weakens it.

▣ Demonstrated (2026): one agent had an RCE that escaped even its strictest "Secure Mode" (later patched).

Network domain restriction
Whether you can narrow outbound traffic to an allow-list. Allowing too broadly (*.example.com) or skipping traffic inspection (abuse of an allowed domain, domain fronting) gets around it.

Path/command deny rules
Rules that explicitly block specific paths or commands. Compound commands (a && b), wrappers (xargs, npx, docker exec), or encoding can reshape the same action into a form that slips past the rules.

Extra blocking hook (defense-in-depth)
A user-side hook that intercepts and denies a tool call right before it runs (PreToolUse-style). Regex matching is easy to bypass, the hook can fail-open on error, and a broad allow can neuter it. Treat it as backup protection, not the main boundary. Relying on it as the only block is dangerous.

Guard tamper-evidence
Whether a once-trusted guard script is detected when it later changes (hash-pinning, etc.). Without it, a malicious package or post-install can swap the hook quietly, protection disappears from the next run, and the user never notices (TOCTOU).

Attack surface from extensibility
The extension surface: plugins, hooks, MCP. From a foot-gun view, fewer is safer. Extensibility can help or hurt. It carries policy, but malicious code can also use it to inject global config or MCP, and that path has already been exploited.

③ Enforced by an admin: enterprise-managed

Whether the org can pin rules the user can't disable. The question: what protection still holds when a user makes a mistake?

Enforced policy file
A managed policy file an individual user can't undo. If that file sits in a user-writable location instead of an OS-protected path, editing it bypasses the policy. And shipping the policy without deploying the scripts it references makes it a paper policy.

Overrides user flags
Whether the policy neutralizes personal bypass flags like --yolo. If precedence is wrong and local settings or CLI options override org policy, the enforcement doesn't actually apply.

Enforce MCP/plugin allow-list
Whether the org pins which MCP servers and plugin sources are allowed. Without it, a user can add arbitrary servers or marketplaces, leaving the supply-chain surface wide open.

Enforce version / managed hooks
Whether you can require a minimum version and "managed hooks only." An unmanaged personal machine may have no enforcement layer at all, so you have to confirm and measure it separately: is this host managed?

Most incidents don't happen because a mechanism was missing. They happen when an existing one gets turned off, opened too wide, bypassed, or quietly unset by a person.

Watching the traffic answers one question. The one that decides your actual exposure is "how is each item on this host set up right now, and where is it weak?" And that depends on the posture: how conservative the default is, how high the hardening ceiling goes, whether the org can enforce anything.

▣ Sources for the demonstrated cases: the "Demonstrated" items come from 2025-2026 public reporting. All were reported or demonstrated on Google Antigravity. They're cited as examples of real-world failures, not as a product ranking, and behavior varies by product, version, and edition.

PromptArmor / TechRadar (prompt injection leaking cloud credentials; .env protection bypassed via the terminal): https://www.techradar.com/pro/googles-ai-powered-antigravity-ide-already-has-some-worrying-security-issues
Mindgard "Forced Descent" (a backdoor planted via mcp_config.json that survives reinstall): https://mindgard.ai/blog/google-antigravity-persistent-code-execution-vulnerability
Pillar Security (find_by_name → fd -X Secure-Mode-bypass RCE; reported Jan 2026, fixed Feb): https://www.pillar.security/blog/prompt-injection-leads-to-rce-and-sandbox-escape-in-antigravity

DEV Community

Is that AI coding agent safe by default? Codex / Claude Code / Antigravity across 3 postures

① Out of the box: default posture

② Locked down by a careful user: hardening ceiling

③ Enforced by an admin: enterprise-managed

Top comments (0)