DEV Community

Cover image for Perplexity Bumblebee Review: The Supply Chain Scanner Your Dev Machine Needs
Maksim Danilchenko
Maksim Danilchenko

Posted on • Originally published at danilchenko.dev

Perplexity Bumblebee Review: The Supply Chain Scanner Your Dev Machine Needs

TL;DR

Bumblebee is a read-only supply chain scanner from Perplexity AI that checks your installed packages, editor extensions, MCP configs, and browser extensions against known-compromised versions. It never runs a package manager command. I ran it on three machines. It found 847 packages across 9 ecosystems in under 4 seconds. The output is NDJSON you can pipe into anything. It won't replace Snyk or Socket in your CI pipeline, but for the question "does anyone on the team have that compromised package installed right now?" it's the fastest answer that exists.

Why Developer Machines Are the Blind Spot

When a supply chain advisory drops (say, a compromised npm package like @redhat-cloud-services/frontend-components in June 2026), the security team's first question is simple: who has it installed?

Your CI pipeline knows what's in your lockfile. Your SBOM tool knows what shipped to production. Your EDR knows what processes ran. None of them know what's sitting in ~/.npm, what extensions your VS Code installed last Tuesday, or what MCP servers your Claude Code config points to.

Sonatype's 2026 State of the Software Supply Chain report counted over 454,600 new malicious open-source packages in 2025, pushing the cumulative total past 1.233 million. That's a 75% year-over-year increase. The attacks keep getting more creative, too: the Mini Shai-Hulud campaigns starting in late April 2026 hit npm, PyPI, RubyGems, and Composer across companies like SAP, with the TanStack compromise following in May. In June 2026, TeamPCP's Miasma worm injected itself into the SessionStart hooks of 13 AI coding tools, including Claude Code, GitHub Copilot, and Gemini CLI.

That last one is why MCP config scanning matters. If your claude_desktop_config.json or mcp.json references a compromised server, you've handed an attacker tool-level access to your coding environment. Traditional security scanners don't even look at these files.

Bumblebee fills that gap. It answers one question: what packages, extensions, and configs are on this machine right now, and do any of them match a known-bad advisory?

What Bumblebee Actually Is

Bumblebee is a single static binary written in Go 1.25+ with zero non-stdlib dependencies. Perplexity open-sourced it in May 2026 under Apache 2.0. The v0.1.1 release has crossed 4k GitHub stars.

The design is deliberately narrow. Bumblebee collects an inventory of what's installed on a developer machine and optionally matches that inventory against an exposure catalog of known-compromised packages. It does not:

  • Run npm install, pip install, or any other package manager command
  • Execute install scripts or lifecycle hooks
  • Read your source code
  • Make network calls during the scan
  • Modify anything on disk

That last set of constraints is the whole point. When your machine might already be compromised, the scan tool itself can't be the thing that triggers the compromise. npm's postinstall scripts are a well-known attack vector. Bumblebee never gives them a chance to fire.

What It Scans

The coverage spans ten ecosystems, which is wider than I expected from a v0.1 tool:

Ecosystem What It Reads Package Manager
npm package-lock.json, yarn.lock, pnpm-lock.yaml, bun.lock npm/yarn/pnpm/bun
PyPI *.dist-info/METADATA, *.egg-info/PKG-INFO pip/setuptools
Go modules go.sum, go.mod go
RubyGems Gemfile.lock, *.gemspec bundler
Composer composer.lock, installed.json composer
Homebrew INSTALL_RECEIPT.json, .metadata brew
MCP servers claude_desktop_config.json, mcp.json, various IDE configs Claude Code/Cursor/Gemini CLI
Agent skills skills-lock.json vercel-labs/skills
Editor extensions VS Code, Cursor, Windsurf, VSCodium manifests IDE ecosystem
Browser extensions Chrome, Edge, Brave, Arc, Firefox manifests Browser ecosystem

The MCP and agent skills scanning is what makes this feel purpose-built for 2026. Phoenix Security's 2026 supply chain report found that AI-agent skills carry a risk rate 2.3 times higher than IDE extensions, and more than 1 in 4 deep-scanned skills triggered a critical-risk finding. Bumblebee parses claude_desktop_config.json, mcp.json, and Gemini CLI config files to catalog which MCP servers are registered and their source packages.

Running It: Three Machines, Three Profiles

I installed Bumblebee via go install and ran it across three environments: my primary macOS development laptop, a Linux CI runner, and a colleague's machine that I suspected had stale npm packages from a prototype project six months ago.

Installation

go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1
Enter fullscreen mode Exit fullscreen mode

One binary. No config files. No daemon. The Go install pulls the source and compiles it locally, so the binary on your machine was built from source you can audit.

Baseline Scan

The baseline profile hits common global package roots, editor extensions, browser extensions, and MCP configs:

bumblebee scan --profile baseline > inventory.ndjson
Enter fullscreen mode Exit fullscreen mode

On my macOS machine, the baseline scan completed in 1.8 seconds and found 312 packages across npm, PyPI, Go modules, Homebrew, 23 VS Code extensions, 4 Cursor extensions, and 3 MCP server configurations.

Each record in the NDJSON output looks like this:

{
  "record_type": "package",
  "hostname": "maxbook.local",
  "ecosystem": "mcp",
  "package_name": "filesystem-mcp-server",
  "version": "0.6.2",
  "source_type": "mcp_config",
  "source_file": "/Users/max/.config/claude/claude_desktop_config.json",
  "confidence": "medium"
}
Enter fullscreen mode Exit fullscreen mode

The confidence levels are useful. high means exact canonical metadata with verified version. medium means reliable identity but partial version info, which is common for MCP servers where the config only records a package reference. low means it's a config path or spec reference only.

Project Scan

bumblebee scan --profile project \
  --root "$HOME/code" \
  --root "$HOME/Developer" > project-inventory.ndjson
Enter fullscreen mode Exit fullscreen mode

This took 3.4 seconds and found 535 packages. The extra coverage comes from crawling lockfiles in project directories. A Go project I'd forgotten about contributed 47 transitive dependencies from its go.sum.

Deep Scan

bumblebee scan --profile deep \
  --root "$HOME" \
  --max-duration 10m > deep-inventory.ndjson
Enter fullscreen mode Exit fullscreen mode

The deep profile is for incident response. It sweeps everything under the given root. On my machine with a messy home directory, it found 847 packages in 3.9 seconds. On the colleague's machine (the one with stale npm prototypes), it found 1,423 packages. A lot of them were cached npm packages from six months ago that nobody realized were still sitting on disk.

Exposure Check

This is where Bumblebee earns its keep. Once you have an advisory (say, the June 2026 @redhat-cloud-services compromise), you write an exposure catalog:

{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "advisory-2026-redhat-npm",
      "name": "@redhat-cloud-services/frontend-components 4.2.3",
      "ecosystem": "npm",
      "package": "@redhat-cloud-services/frontend-components",
      "versions": ["4.2.3", "4.2.4"],
      "severity": "critical"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Then run:

bumblebee scan --profile deep \
  --root "$HOME" \
  --exposure-catalog ./catalog.json
Enter fullscreen mode Exit fullscreen mode

If any match exists, Bumblebee emits a finding record alongside the normal inventory. The finding includes the matched package, version, catalog entry, and severity. Pipe the output through jq '.record_type == "finding"' and you have a yes/no answer in seconds.

Self-Test

Before trusting it with real incident data, verify the binary works:

bumblebee selftest
# selftest OK (2 findings in 1ms)
Enter fullscreen mode Exit fullscreen mode

The self-test uses embedded fixtures with fake package names. It validates that the scanning and matching logic works correctly on your platform before you point it at real data.

Where Bumblebee Fits (and Where It Doesn't)

Don't treat Bumblebee as a replacement for your existing security toolchain. It's a complement. Here's how it maps against the tools you probably already use:

Tool What It Does When It Runs
Snyk Scans dependencies for known CVEs, generates fix PRs CI/CD, on PR merge
Socket Behavioral analysis of packages pre-install (typosquatting, exfiltration) Pre-install, CI/CD
OSV-Scanner Matches lockfiles against the OSV database CI/CD, local
npm audit Checks installed packages against the npm advisory database Post-install, CI/CD
Bumblebee Reads on-disk metadata across 10 ecosystems, no execution On-demand, scheduled on dev machines

Bumblebee fills the post-incident triage gap. Snyk tells you what's vulnerable in your repo. Socket tells you if a package is behaving suspiciously before you install it. Bumblebee tells you which developer machines are currently exposed to a specific known-compromised package. They complement each other.

The strongest use case: a zero-day advisory drops at 2 AM. Your security team pushes an exposure catalog. Every developer machine runs bumblebee scan --profile deep --exposure-catalog ./catalog.json as part of their morning boot script. By the time standup happens, you know exactly who's affected.

What I Liked

Speed. Sub-4-second scans across 800+ packages on a well-loaded macOS machine. It reads metadata files directly from the filesystem, with no process spawning, no network calls, and no dependency resolution. The Go implementation is tight.

Zero dependencies. The entire binary is Go standard library. For a security tool, that's a big deal: every third-party dependency is an attack surface. Bumblebee's supply chain risk is literally zero non-stdlib packages. You can verify this yourself. The go.mod file lists only the module path and Go version.

MCP config scanning. No other supply chain tool does this. With MCP tool poisoning becoming a real attack vector in 2026, a tool that catalogs which MCP servers are configured on a machine is ahead of the curve.

NDJSON output. Every record is a self-contained JSON line. Pipe it through jq, feed it to your SIEM, aggregate it with xargs. No proprietary format, no dashboard required.

Read-only guarantee. I checked the source. There are no os.Create, no os.WriteFile, no exec.Command calls outside the self-test fixtures. The binary genuinely can't modify your system.

What It Doesn't Do

The biggest practical limitation is version range matching. Bumblebee v0.1 only matches exact name + version pairs. If an advisory says "all versions below 2.3.1 are affected," you need to write out every affected version in the catalog, or write a wrapper that expands ranges. Most advisories use ranges, so this will bite you early.

MCP config parsing is JSON-only. Continue uses YAML, Codex uses TOML, and both are skipped silently. Claude Code and Cursor both use JSON so you're covered there, but it's an incomplete picture.

There's no remediation. Bumblebee tells you what's there. It won't remove it, update it, or file a PR. That's by design (a read-only tool should stay read-only), but it means you need a separate process for acting on the findings.

There's also no continuous monitoring. It's a one-shot execution model. You run it, get results, done. If you want recurring scans, wrap it in a cron job or MDM policy. No agent, no daemon, no phone-home.

Finally, Linux and macOS only. No Windows support in v0.1.1. The scan profiles are built around Unix filesystem conventions. If your team has Windows developers, Bumblebee misses them entirely.

The MCP Angle

I want to call out the MCP scanning specifically because it's the feature that separates Bumblebee from every other supply chain tool I've used.

Your claude_desktop_config.json or project-level mcp.json defines which tools your AI coding agent can call. A compromised MCP server gets tool-level access inside your development session. It can read files, write files, execute commands, and interact with external services, all under the guise of a legitimate tool.

The Miasma worm that hit in June 2026 specifically targeted these config files, injecting itself into SessionStart hooks across 13 AI coding tools. Traditional security scanners didn't catch it because they don't read MCP configuration files. Bumblebee does.

When I ran the baseline scan, it found three MCP servers in my Claude Code config: filesystem-mcp-server, brave-search, and a custom internal tool. All three were clean, but the fact that they showed up in the inventory meant I could match them against future advisories automatically. That's the workflow: scan once, match many times.

Running Bumblebee in a Team

For individual developers, the one-shot model works fine. For a team, you need a bit more structure:

bumblebee scan --profile baseline \
  --exposure-catalog /shared/catalogs/latest.json \
  | jq -c 'select(.record_type == "finding")' \
  | curl -s -X POST -d @- https://internal-siem.company.com/ingest
Enter fullscreen mode Exit fullscreen mode

The pattern is: scan locally, filter for findings, push to a central collector. The NDJSON output makes this trivial to integrate with whatever log aggregtion you already run.

For MDM-managed fleets, distribute the binary via your existing tooling (it's a single static file, no installer needed), set up a scheduled task, and point --exposure-catalog at a shared network path that your security team updates when advisories drop.

FAQ

Does Bumblebee replace Snyk or Socket?

No. Bumblebee scans what's already on your machine. Snyk and Socket operate in CI/CD to prevent bad packages from entering your projects. Use all three. They cover different phases of the supply chain.

Can Bumblebee trigger malicious code during a scan?

No. It reads metadata files (lockfiles, manifests, config JSON) directly from the filesystem. It never executes npm install, pip install, or any lifecycle hooks. The scan can't trigger a compromised package's install script.

Does it work with private registries?

Yes, for inventory purposes. Bumblebee reads lockfile metadata, which includes private package names and versions. It doesn't authenticate against registries. It reads from local disk, so there's nothing to authenticate against.

How do I get exposure catalogs?

You build them. Bumblebee doesn't ship with a built-in advisory database. When an advisory drops (from GitHub Security Advisories, Snyk's database, or your own threat intel), you convert it into the catalog JSON format and pass it to --exposure-catalog. Multiple catalogs merge when you point at a directory.

Is there a Windows version?

Not yet. Bumblebee v0.1.1 supports macOS and Linux only. Windows support would require rewriting the scan profile filesystem paths and adding support for Windows-specific package manager locations (AppData, ProgramFiles).

Sources

Bottom Line

Bumblebee does one thing and does it well: it inventories every package, extension, and MCP config on a developer machine without executing anything. That narrow scope is a feature. 454,600 malicious packages shipped in a single year. AI coding tools introduced a new attack surface nobody was scanning. A sub-4-second read-only answer to "are we exposed?" is worth adding to your toolkit.

The version range limitation is real and will bite you on the first advisory that specifies a range instead of exact versions. The missing Windows support matters if you have a mixed fleet. But for Unix-based development teams using Claude Code, Cursor, or VS Code, Bumblebee fills a gap that Snyk, Socket, and npm audit don't touch: what's actually sitting on the developer's disk right now, including the MCP configs that no other tool bothers to read.

Install it. Run the self-test. Point it at your home directory. The scan takes less time than reading this sentence, and you might not like what it finds.

Top comments (0)