DEV Community

Delafosse Olivier
Delafosse Olivier

Posted on • Originally published at coreprose.com

Google vs AI-Driven Exploits: How Autonomy, Agents and LLMs Are Rewriting Offensive Security

Originally published on CoreProse KB-incidents

AI‑assisted exploitation has crossed a line. We now have autonomous AI agents on top of high‑capability large language models that can discover, chain, and weaponize vulnerabilities end‑to‑end, at machine speed. [2] At Google scale, response must shift from “block the IP” to “detect and disrupt the AI campaign itself.”

Anthropic’s Mythos Preview reportedly:

  • Surfaced thousands of zero‑day vulnerabilities across major OSes and browsers
  • Found a 27‑year‑old OpenBSD bug missed by humans [2]
  • Autonomously chained four bugs into a browser sandbox escape [2]

On defense, OpenAI’s Daybreak uses GPT‑5.5 and Codex Security to scan large codebases, propose patches, and validate fixes in minutes — a generative AI vulnerability factory for defenders. [3][4]

Key idea: Offense and defense now share the same primitives (LLMs, agents, cloud orchestration). What differs is how they are governed and who gets to run at machine speed.

Google’s reported disruption of an AI‑driven exploitation campaign looks like an early pattern: AI‑run operations treating infrastructure as a continuous search–optimize–exploit loop. [8]

From “could this happen?” to “Google just stopped it”: why AI-driven exploitation is now real

Mythos Preview is the first widely described frontier LLM explicitly evaluated for autonomous vulnerability discovery at scale. In controlled tests, it:

  • Found thousands of zero‑days in major OSes and browsers
  • Uncovered a 27‑year‑old OpenBSD vulnerability [2]
  • Demonstrated that deep structural flaws in mature codebases are within model reach

Mythos also autonomously chained four distinct vulnerabilities into a working sandbox escape by: [2]

  • Understanding sandbox boundaries
  • Spotting memory‑safety defects
  • Selecting compatible primitives
  • Assembling a reliable exploit

Anthropic’s later report on a state‑backed espionage campaign shows the next step: [8]

  • AI agents performed 80–90% of reconnaissance, lateral movement, exfiltration
  • Humans mainly provided high‑level guidance and approvals

⚠️ Escalation signal: A state actor trusting AI with 80–90% of campaign workload means autonomous systems now outperform junior operators across much of the kill chain. [8]

Inside enterprises, agentic AI is spreading on the defender and developer sides. Netskope observes that LLM‑powered agents with direct access to software and infrastructure are already deployed, often with minimal supervision. [5] These agents become:

  • High‑value targets for compromise
  • Stepping‑stones for lateral movement
  • “Free infrastructure” for attackers

Check Point Research showed that web‑enabled conversational assistants can be hijacked as stealth C2 channels, blending into ordinary AI traffic and requiring no attacker‑hosted infra. [1]

Together, these data points make Google’s disrupted AI‑driven campaign look like a logical next step in an arms race where both attackers and defenders rely on frontier models and autonomous workflows. [2][5][8]

How LLMs discover and weaponize vulnerabilities faster than your patch cycle

Anthropic’s offensive research lead estimates attackers could access Mythos‑class tools within 6–12 months of preview, shrinking defenders’ lead to roughly one release cycle. [2] Mythos already:

  • Identified thousands of zero‑day issues in widely deployed platforms [2]
  • Suggests the backlog of exploitable bugs is growing faster than most orgs can patch

In early 2025, about one‑third of exploited CVEs were hit on or before public disclosure day — before industrialized offensive LLMs. [2] With automated triage, exploit synthesis, and agent‑driven fuzzing, that window compresses from days to hours.

📊 Timeline compression:

  • Pre‑AI: weeks–months from bug intro to discovery; days from disclosure to weaponization
  • Early AI: days to discovery; hours to weaponization [2]
  • Frontier LLM era: minutes–hours from code landing in main to discovery and PoC synthesis [2][3]

OpenAI’s Daybreak mirrors this for defense. GPT‑5.5 and Codex Security can: [3][7]

  • Analyze thousands of lines at once
  • Surface vulnerabilities and data‑flow risks
  • Generate compile‑clean patches plus unit tests
  • Validate fixes in isolated environments

Daybreak makes security a continuous SDLC concern — secure review, threat modeling, dependency analysis, and patch validation integrated into pipelines. [4][7]

OpenAI further splits GPT‑5.5 into: [4]

  • GPT‑5.5 (general)
  • GPT‑5.5 with Trusted Access for Cyber (vetted defensive workflows) [4]
  • GPT‑5.5‑Cyber (more permissive for red teaming and intrusion testing) [4][6]

Implication: Hardware and model capabilities are symmetric for attackers and defenders; governance and allowed tool use are what differ. [4][6]

For engineering teams, every vulnerability in repos — even on feature branches — is now in scope for AI‑accelerated discovery, whether via your Daybreak‑style stack or an adversary’s Mythos‑like tooling. [2][3][7] The Google incident should force CI/CD and vuln‑management pipelines to adapt to AI‑native velocities, not human change‑advisory cycles. [2][6]

Agentic AI as attacker: multi-agent workflows, planning and cloud-scale operations

Anthropic’s 2025 espionage report is the clearest public description of AI as primary operator. In that campaign: [8]

  • AI agents executed 80–90% of tasks from external recon to internal pivoting
  • Humans mainly approved goals and sensitive steps

To generalize, researchers built a multi‑agent penetration‑testing PoC against cloud infrastructure. The system: [8]

  • Did not invent new attack surfaces
  • Dramatically accelerated exploitation of known misconfigurations
  • Excelled at:
    • Enumerating cloud resources via APIs
    • Identifying misconfigured IAM roles/policies
    • Following documented attack paths
    • Scaling across many accounts in parallel

💼 Echo in practice: One SaaS security lead saw a benign agent chain an overly permissive GCP service account into full DB read access in under 10 minutes — a path never documented in manual reviews.

Netskope warns that because agentic systems directly operate software and infrastructure, they are prime cyber targets — yet most orgs lack: [5]

  • A complete inventory of agents
  • Policies for systems agents may control
  • Telemetry specific to agent behavior

On defense, Codex Security already acts as a sophisticated agent: it builds editable threat models from entire repos, identifies realistic attack paths, and validates patches in isolation. [7] These are the same reasoning skills an offensive agent uses to construct and traverse attack graphs.

GPT‑5.5‑Cyber formalizes this dual‑use nature: it is more permissive specifically for authorized offensive workflows like red teaming. [4][6] Without strong governance, “authorized” vs “unauthorized” can collapse to “whoever holds the API key.”

⚠️ Dual‑use warning: Check Point’s hijacking of web‑enabled assistants into stealth C2 shows that a single LLM instance can simultaneously act as planner, operator, and covert infrastructure. [1]

C2 through the front door: how LLM traffic and cloud services hide AI-driven attacks

Attackers have long abused legitimate cloud services (Slack, Dropbox, OneDrive) as C2 because traffic blends into baselines. [1] Defenders eventually instrumented these services and shipped SIEM/XDR rules. [1]

Web‑enabled LLM assistants disrupt that learning curve. Their traffic is: [1]

  • New, with immature telemetry and detection content
  • Hard to block once broadly adopted
  • Trusted as “business productivity” tooling

Check Point’s experiment abused assistants’ web‑fetch features. Malware: [1]

  • Never contacted attacker infra directly
  • Asked the assistant to fetch an attacker‑controlled URL that encoded commands
  • Received results via the assistant’s HTTP requests

This required no API keys, no authenticated accounts, and produced traffic indistinguishable from normal AI usage.

In parallel, the multi‑agent cloud‑attack PoC showed that LLMs can orchestrate complex sequences of GCP API calls: [8]

  • Chaining misconfigurations into full compromise
  • Using only standard control‑plane traffic
  • Standing out mostly by speed, breadth, and sequencing

📊 New observability layer: In AI‑driven campaigns, key signals may include: [5][7]

  • Unusual LLM usage patterns (prompt types, call volumes, odd timing)
  • Orchestrated sequences of cloud API calls at machine speed
  • Correlation between agent actions and data‑plane anomalies

Netskope notes that most organizations have not modeled AI agents as first‑class security entities, leaving blind spots around what they access and how outputs are consumed. [5]

At Google scale, disrupting an AI campaign is less about identifying a new malware family and more about correlating: [1][8]

  • Anomalous model calls
  • Strange agent behavior
  • Cloud control‑plane sequences across tenants and data sources

For engineering teams, LLM access logs, model‑usage fingerprints, and agent execution traces must become core observability signals, alongside syscalls and VPC flow logs. [5][7]

Defensive AI stack: Daybreak, Mythos and AI-native vuln pipelines

Anthropic’s Mythos and Glasswing projects, used for industrial‑scale Firefox vuln hunting, showed that frontier models can be aimed at large, hardened codebases and still uncover subtle, long‑lived flaws. [2][4]

OpenAI’s response is Daybreak — a platform combining GPT‑5.5, GPT‑5.5‑Cyber, and Codex Security into a continuous software‑protection stack, explicitly framed against AI‑accelerated attacks. [3][6][7] Key patterns:

  • Security by design: checks on every merge, not post‑release audits [4][7]
  • Whole‑repo reasoning: Codex Security builds an editable threat model from the entire codebase [7]
  • Sandboxed patch validation: generated fixes are tested with verifiable evidence before landing [3][7]

💡 Pattern to emulate: Treat AI security as a continuous service that:

  1. Watches every change (code, infra, dependencies)
  2. Maintains an evolving threat model
  3. Automatically proposes and tests remediations

Codex Security’s ability to reason over attack paths and validate patches matters against AI‑driven exploit chains, which often depend on multi‑step preconditions. [7] If your defensive agents cannot reason over attack graphs, they will trail offensive agents that can.

OpenAI’s launch cadence — GPT‑5.5‑Cyber first, then Daybreak days later — highlights an industry race to build AI‑native cyber platforms that keep pace with offensive AI. [6] For organizations, the lesson is direct: AI‑based vuln discovery and remediation must be as core as CI/CD or observability. [2][3][6]

Without an AI‑native defensive pipeline spanning code, infrastructure, and production telemetry, reproducing a Google‑style disruption of an autonomous campaign will remain unrealistic, regardless of human IR quality. [3][7]

Engineering for the era of AI-driven hacking: architecture, guardrails and operational playbooks

Netskope argues that adapting security to the “agentic economy” is now urgent. [5] Treat AI agents as:

  • Discoverable assets (inventory and SBOM)
  • Subjects of policy (who they can impersonate, what they can access)
  • Continuous telemetry sources (what they actually do) [5]

Anthropic’s multi‑agent PoC suggests AI’s main offensive advantages are speed and scale, not fundamentally new exploit primitives. Defenders should emphasize: [8]

  • Rate‑limiting automated actions and model calls
  • Anomaly detection over automation patterns (bursts, wide sweeps)
  • Rapid containment (agent kill switches, scoped revocation)

⚠️ Policy gap: Check Point’s LLM‑as‑C2 work implies many enterprises still treat AI assistant traffic as generic HTTPS, with no SIEM rules, EDR thresholds, or egress controls tuned to AI endpoints. [1]

GPT‑5.5 with Trusted Access for Cyber offers a governance blueprint: [4]

  • Confine use to vetted defensive workflows (secure review, malware triage, patch validation)
  • Enforce narrow auth scopes tied to specific repos/environments
  • Log prompts, tools, and outputs with strong retention
  • Require humans in the loop for destructive actions

Daybreak’s workflow integration shows the value of running security agents as persistent, policy‑governed services — like CI jobs or SAST — rather than ad hoc chat tools. [3][7] This makes behavior auditable and impact predictable.

As Mythos and Daybreak compress the vuln lifecycle on both offense and defense, incident playbooks need explicit “AI‑discovered, AI‑exploited” branches. [2][3][8] Those should define:

  • Detection rules (agent anomalies, unusual model usage)
  • Forensic artifacts (LLM logs, agent traces, cloud‑API sequences)
  • Containment steps (agent shutdown, credential rotation, rollbacks)

💼 Operational takeaway: Your SOC should quickly answer: “Which agents touched this system? Which models did they call? What did they ask and do?” If that visibility is missing, it belongs at the top of your engineering backlog. [5][7]

Conclusion: Google’s incident as your last early warning

Anthropic’s Mythos results, the state‑backed espionage campaign, and Check Point’s LLM‑as‑C2 experiments show that AI‑driven exploitation is becoming standard for well‑resourced actors. [2][8][1] In parallel, OpenAI’s Daybreak, GPT‑5.5‑Cyber, and Codex Security illustrate a defensive ecosystem racing to embed AI into code review, threat modeling, and automated patching from day zero. [3][4][6][7]

Netskope’s warnings about agentic AI and the absence of robust monitoring make clear that the main gap is governance and observability, not raw capability. [5] Google’s disruption of an AI‑driven campaign should be treated as a template: any organization with valuable assets should assume similarly autonomous chains will probe their surface.

Call to action: Treat this as your last early warning. Starting now:

  1. Inventory your AI agents — know where they run, what they touch, and who owns them. [5]
  2. Instrument their behavior — log model usage, tool calls, and access patterns as first‑class security telemetry. [1][5][7]

About CoreProse: Research-first AI content generation with verified citations. Zero hallucinations.

🔗 Try CoreProse | 📚 More KB Incidents

Top comments (0)