DEV Community: Grumpy Sage

Why Every CISO Needs an AIBOM in 2026 and What Vendors Get Wrong

Grumpy Sage — Mon, 01 Jun 2026 18:39:07 +0000

A friend of mine runs security at a mid-sized fintech. About 400 engineers, Series D, the kind of place where the AI strategy memo got written in a weekend and shipped to prod the following Tuesday. She called me in March, somewhere between annoyed and panicked, because her board had asked a simple question: "What models are we running, and where did they come from?"

She had answers. They were wrong.

The official answer was "GPT-4 and Claude, through the standard gateway." The actual answer, after two weeks of digging, was seventeen models. A Llama 3.1 8B fine-tune hosted on a self-managed vLLM box that someone in the ML team spun up for a customer-support prototype, then forgot to tear down. Three Hugging Face embedding models pulled at container build time with no pinned hashes. A Whisper variant running on a developer's GPU workstation that was, somehow, reachable from the staging VPC. Two LoRA adapters fine-tuned on customer support tickets, sitting in an S3 bucket with a permissive policy. And a llama.cpp build serving a 7B model to an internal Slack bot that nobody could remember authorizing.

The part that got me, when she walked me through it, was that none of this was shadow IT in the old sense. Every one of these systems had a Jira ticket. Every one had been "approved." The inventory just didn't exist. There was no single place she could point at and say: this is what we run, this is where the weights came from, this is what data trained them, this is what they're allowed to talk to.

That's the AIBOM problem. And in 2026, if you don't have one, you don't have an AI security program. You have vibes.

The thesis

Most AIBOM products being sold right now are spreadsheets with a UI. They list model names and versions, slap a CycloneDX export on top, and call it done. That's not an AIBOM. That's a parts list pretending to be a bill of materials. A real AIBOM is a living graph that connects models, weights, datasets, prompts, tools, agents, and the data flows between them, and it has to be generated continuously from what's actually running in your environment, not from what someone typed into a form six months ago.

What an AIBOM actually contains

The SBOM analogy is useful but it breaks down fast. A traditional SBOM tells you which libraries are in a binary. The relationships are mostly one-directional: this binary depends on this package at this version. You can run Syft or Trivy and get something reasonable.

An AIBOM has to capture relationships that aren't dependencies in the SBOM sense. A fine-tuned model depends on a base model, sure, but it also depends on the training dataset, the fine-tuning configuration, the RLHF rewards, and increasingly, the prompts that shape its runtime behavior. An agent depends on the model serving it, the tools it can call, the memory store it reads from, and the MCP servers it's connected to. None of this shows up in a CycloneDX schema unless you stretch the schema until it tears.

So at minimum, an AIBOM has to capture: the models themselves with their weights provenance and hash; the training and fine-tuning datasets with lineage back to source; the inference infrastructure (whether that's vLLM, Ollama, TGI, Triton, LocalAI, LM Studio, or llama.cpp); the prompts and prompt templates in use; the tools and MCP servers exposed to those models; the data flows in and out of each model boundary; and the humans or service accounts who can change any of the above.

If your vendor's AIBOM is missing three or more of those, you bought a model registry, not an AIBOM.

Where most vendors fail

I've looked at probably a dozen AIBOM products this year. The failure modes cluster into four patterns.

The first is the form-based AIBOM. Engineers fill out a questionnaire, and the platform stores it. This is theater. The model running in production at 3 a.m. on a Wednesday is not the model someone described in a form during onboarding. Within six weeks the inventory drifts and you're back to vibes.

The second is the API-gateway-only AIBOM. The product hooks into your OpenAI and Anthropic API calls, watches the traffic, and infers what models you're using. This catches your hosted model use. It catches nothing else. Your self-hosted Ollama instance, your vLLM cluster, your llama.cpp processes, the LM Studio sessions on developer laptops, the LocalAI deployments your data team set up - all invisible. And by my estimate, self-hosted inference accounts for somewhere between 30 and 60 percent of model usage at any serious engineering org right now. You're missing half your attack surface.

The third is the registry-as-AIBOM trap. The product is really a model registry with provenance fields. It's great if every model goes through the registry. But registries are opt-in. They don't see what they don't see. And the entire point of a BOM is to discover the things you forgot to register.

The fourth, and this one annoys me the most, is the static AIBOM. The product generates a CycloneDX file at build time, hands it to you, and considers the job done. AI systems aren't static. The same model with a different system prompt is a different system, for security purposes. The same agent with a new tool connected is a different blast radius. If your AIBOM only updates at deploy time, it's already stale by the time it lands in your SIEM.

What you actually need to generate one

If I were building an AIBOM program from scratch today, here's how I'd think about the inputs.

You need code scanning that knows about AI. Not just "is there a call to openai.ChatCompletion in this file," but the structure of agent definitions, RAG pipelines, prompt templates, tool registrations, and model loads from disk or HF Hub. This is why we built cyscan with 1,815 rules across 75-plus languages, with a heavy investment in AI-aware detection. A traditional SAST tool will tell you nothing useful about whether your agent has access to a tool it shouldn't. An AI-aware scanner can.

You need runtime discovery for self-hosted inference. Somebody has to walk your network and your container runtimes and find the Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, and llama.cpp processes. They have to fingerprint them, identify the models being served, and check for the eight ways each of those servers can be misconfigured. That's what cyradar does. If you're scanning only for the hosted APIs, you're playing a different game than the one your attackers are playing.

You need web-layer awareness. Most AI apps end up behind an HTTP boundary eventually. A model is a backend. If your AIBOM doesn't connect the HTTP endpoints in your app to the models behind them, you can't reason about external attack surface. We use cyweb for this with 22 fuzz categories tuned for AI-shaped vulnerabilities, and we get to a 95 percent template conversion rate from the upstream community templates so we don't miss the boring web stuff while we're chasing the new stuff.

You need to model the agent and tool graph. Every MCP server is a capability boundary. Every tool a model can call is a potential confused deputy. Our MCP server exposes 10 tools and we treat each one as a documented edge in the AIBOM graph, with explicit allowlists for which agents can invoke which tools. If you can't answer "which models can call which tools" in your AIBOM, you can't answer the actually-important question, which is "what's the worst thing this model can do if it gets jailbroken."

You need provenance for weights. Hashes. Signatures where they exist. Download source. License. Known-bad lists for poisoned or trojaned models on Hugging Face. This is the part that looks most like a traditional SBOM, and it's also the part most vendors do least badly.

Finally, you need this to be continuous. Daily at minimum. Ideally on every deploy and on a scheduled drift check. Otherwise it's a snapshot, and snapshots lie.

A small example of what continuous looks like

Here's a sketch of what we run on customer environments. You don't need to use our tools to do this; the pattern matters more than the implementation.

# Discover self-hosted inference servers across the environment
cyradar discover \
  --targets ./targets.yaml \
  --servers ollama,vllm,tgi,localai,triton,lmstudio,llamacpp \
  --output aibom-runtime.json

# Scan repos for AI surface: models loaded, agents defined, tools registered
cyscan code \
  --repos ./repos.txt \
  --rules ai \
  --output aibom-code.json

# Walk the HTTP attack surface for AI endpoints
cyweb scan \
  --targets ./web-targets.yaml \
  --ai-categories all \
  --output aibom-web.json

# Merge into a single AIBOM graph and diff against last known good
cybrium aibom merge \
  --inputs aibom-runtime.json,aibom-code.json,aibom-web.json \
  --diff-against last-good.json \
  --output aibom-current.json

The diff is the part most people skip. The AIBOM itself is interesting. The change from yesterday's AIBOM to today's is what you should be alerting on. A new model appeared. A tool got connected to an agent that didn't have it before. A vLLM server is suddenly exposed on a new port. That's the security signal.

The fine-tuning and RAG blind spot

The other place most AIBOM products fall over is anything that touches your data.

Fine-tunes inherit the security posture of their training data. If you fine-tuned a Llama variant on a corpus that included customer PII, the model is now a potential data exfiltration vector even if your inference layer is perfectly locked down. The AIBOM has to know this. It has to record what data went into the fine-tune, who approved it, what residency constraints apply, and what the deletion story is when a customer invokes their rights.

RAG is similar but worse, because the data isn't baked in at training time, it's fetched at runtime. Your AIBOM needs to know which vector stores each agent can read from, what's in those stores, how they're populated, and what the access control model is at the chunk level. Most products treat the vector store as opaque infrastructure. It isn't. It's part of the AI system, and it belongs in the BOM.

I've seen one customer discover, through an honest AIBOM exercise, that their support agent was retrieving from a vector store that contained the contents of an internal HR wiki because the embedding pipeline was pointed at "all of Confluence" with no filtering. The AIBOM caught it. Their old "AI governance committee" had been signing off on the agent for nine months.

Why this has to be one platform

I get asked a lot why we bundled code scanning, runtime AI discovery, web fuzzing, and the MCP server into one platform instead of selling them as separate products. The reason is exactly the AIBOM problem.

An AIBOM is, by definition, a join across surfaces. Code surface plus runtime surface plus web surface plus agent-and-tool surface. If those four sources of truth live in four different vendors with four different schemas and four different update cadences, the join is hopeless. You end up with four partial inventories that disagree, and a quarterly meeting where everyone explains why their slice is the right one. Nobody ships anything.

When the surfaces share a graph, the diff is meaningful. You can ask: this new HTTP endpoint that cyweb found, what model does it route to, what tools does that model have, and is any of that change new compared to last week. You can ask that in one query because it's one graph. Not because we're smarter than other vendors. Because we made the architectural decision to keep the model coherent across surfaces from day one.

The recomposition

What's actually happening in the market right now is that the old vendor categories are being torn up and re-glued around the AI system as the unit of analysis. SAST, DAST, CSPM, model registry, data catalog, agent observability - those are five products and three buyer personas in 2024. In 2026 they're one workflow with one BOM at the center.

The CISOs who get this are quietly rebuilding their security architecture around the AIBOM. They're treating it as the central inventory, the way CMDB was supposed to be in the ITIL era and never quite was. They're plumbing it into change management, into incident response, into vendor risk, into audit. They're using it as the source of truth for which AI systems exist and what each one is allowed to touch.

The CISOs who don't get it are still buying point products and trying to staple them together with a Jira board. They're going to spend 2026 explaining to their boards why they keep getting surprised. The surprises are the AIBOM gaps. There's no other word for them.

If you remember one thing from this post, remember this: the question your board is going to ask in 2026 is not "are you doing AI security." It's "show me the inventory." If your answer involves opening five different consoles, you've already lost the meeting.

What to do this quarter

Start with the discovery problem. Run cyradar against your environment and find the inference servers you didn't know about. That alone will reframe how you think about scope. Then run cyscan on your repos with the AI rules enabled and find the agent definitions, tool registrations, and model loads in code. Then merge. Look at the diff against your current "official" inventory and prepare to be surprised.

You can dig into the specifics at cybrium.ai/cyradar, cybrium.ai/cyscan, and cybrium.ai/cyweb, or read the AIBOM architecture notes at cybrium.ai/aibom.

If you want to talk through yours, find me at anand@cybrium.ai.

Pitt Season 3 Already Premiered in 400 Hospital Server Rooms This Year

Grumpy Sage — Fri, 15 May 2026 03:41:55 +0000

A 180-bed regional hospital in the Midwest lost its EMR at 2:14 a.m. on a Tuesday. The night-shift charge nurse noticed first — the Cerner terminal froze, came back to a lock screen she didn't recognize, then went black. By 2:40, the entire fourth floor was on paper. By 3:15, the ED was running on whiteboards and memory. Nurses were walking lab results between floors because the pneumatic tube system's controller had also gone down — it ran on the same network segment.

The CIO told me later that the part that shook him wasn't the ransomware demand. It was watching his team revert to procedures none of them had practiced since nursing school. He said it felt like watching an episode of Pitt, except the cameras weren't rolling and nobody was going to yell cut.

He wasn't being dramatic. If you've watched Pitt — the Max series set in a Pittsburgh trauma center — you've seen what happens when a hospital's digital infrastructure disappears. Systems go dark. Staff improvise. Patient data lives on sticky notes and shouted vitals. The show treats it as a dramatic set piece. For 389 U.S. hospitals hit by ransomware in 2025 alone, it was a Tuesday.

I keep coming back to this gap between fiction and operations. The entertainment industry has figured out that hospital cyberattacks make great television because the stakes are life and death and the failure mode is visceral — humans scrambling without the tools they've been trained to rely on. What the entertainment industry hasn't figured out, and what most healthcare security vendors haven't either, is that the attack surface making this possible is not the EMR. It's the building.

The systems that fail first and recover last in a healthcare cyberattack are not the clinical applications. They're the operational technology: HVAC controllers, pneumatic tube systems, nurse call stations, infusion pump networks, building management systems, medical gas monitors. These are the systems that turn a ransomware event from a billing disruption into a patient safety crisis. And almost nobody is scanning them.

The building is the attack surface

I spent a week last year walking the basements of three hospitals — two mid-size, one rural critical-access — documenting every networked device that wasn't a workstation or a printer. The list was longer than anyone expected.

Building management systems running BACnet on flat networks with no segmentation. Pneumatic tube controllers on Windows Embedded that hadn't seen a patch since 2019. HVAC units with default credentials accessible from the same VLAN as the nursing stations. Nurse call systems whose firmware update process required a vendor visit that nobody had scheduled in four years. Medical gas monitoring panels with Modbus/TCP interfaces that responded to unauthenticated queries from any device on the subnet.

None of these showed up in any of the three hospitals' IT asset inventories. The IT teams knew about servers, workstations, network gear. The facilities teams knew about the physical plant. Nobody owned the gap between them, and that gap was running on protocols designed in the 1990s for environments where the threat model was "the serial cable might get unplugged."

This is what I mean when I say the building is the attack surface. A ransomware operator who lands on a hospital network and finds BACnet devices responding to unauthenticated read/write requests doesn't just have leverage for a ransom. They have the ability to alter HVAC setpoints in surgical suites, disable nurse call systems, disrupt pneumatic tube routing, and interfere with medical gas monitoring — all without touching a single clinical application. The EMR can stay up and running. The hospital still can't safely operate.

In the Pitt pilot, the chaos starts with monitors going dark. In the real incidents I've reviewed, the chaos starts with the infrastructure nobody thought of as "cyber."

AI made the attacker faster than your patch cycle

The asymmetry in healthcare cybersecurity has always been stark. Attackers move fast; hospitals move slow. But AI has made this gap qualitatively different, not just wider.

I've been tracking the evolution of healthcare-targeting ransomware groups for two years now. The shift started in late 2024 and accelerated through 2025. What changed wasn't the malware itself — it was the reconnaissance. Groups are using LLM-assisted tooling to automate the exact kind of OT discovery I described doing manually in those hospital basements. Automated BACnet enumeration. Automated Modbus device fingerprinting. Automated identification of which building-management protocols are exposed and what default credentials they ship with.

A human operator doing this manually against a single hospital takes days. An AI-assisted pipeline does it against dozens of targets concurrently, triaging which hospitals have the most exposed OT infrastructure and therefore the most operational leverage for a ransom demand. The attacker doesn't need to understand HVAC control systems. The LLM reads the protocol documentation and generates the probe sequences.

This should worry every healthcare CISO, regardless of institution size. The economics of targeting have changed. It used to be that a 50-bed rural hospital wasn't worth the effort compared to a 500-bed academic medical center. When reconnaissance is automated and nearly free, every hospital with an internet-facing IP range is a candidate. The attacker's AI doesn't care about your bed count. It cares about your exposed BACnet port.

The 2025 numbers bear this out. Of those 389 ransomware incidents in U.S. hospitals, 41% were at facilities with fewer than 100 beds. The long tail of small and mid-size hospitals — the ones least likely to have dedicated security staff — is now the primary target pool. The Pitt scenario isn't reserved for large urban trauma centers with dramatic storylines. It's playing out in critical-access hospitals where the nearest backup facility is forty miles away and the IT department is one person who also manages the phone system.

Compliance frameworks know this but compliance alone doesn't fix it

HIPAA's Security Rule has required technical safeguards since 2005. The problem was never the requirement — it was the gap between what the regulation demands and what hospitals actually have the capability to assess.

Ask a compliance officer at a 120-bed community hospital whether they've conducted a risk assessment of their OT infrastructure. Most will tell you their risk assessment covers the EMR, the billing system, the patient portal, and the network perimeter. Maybe the medical devices if they're forward-leaning. Almost never the building management systems, the pneumatic tubes, the nurse call stations, or the HVAC controllers. Not because they don't care, but because they don't have a scanner that speaks BACnet and Modbus, and their compliance framework templates don't have a line item for "building automation system running unauthenticated protocol on clinical network."

This is where I think the industry has failed healthcare. We've built compliance tools that check boxes against the EMR and call it done. We've built vulnerability scanners that speak HTTP and SSH and call the network "scanned." Neither finds the Siemens HVAC controller with default credentials on the same subnet as the cardiac monitors.

The frameworks themselves are getting better. NIST's Healthcare Cybersecurity Framework, HHS's updated HIPAA Security Rule proposed in 2025, and the FDA's premarket cybersecurity guidance all now explicitly call out connected medical devices and operational technology. But a framework that says "assess your OT" is useless without a tool that can actually do it safely — without disrupting the devices it's scanning, without flooding fragile BACnet networks with traffic they can't handle, and without requiring a protocol expert on-site to interpret the results.

What OT scanning in healthcare actually looks like

This is the problem we built cyprobe to solve. It's a purpose-built OT and SCADA discovery and posture engine — not a network scanner with OT support bolted on as a feature checkbox. A tool that speaks the native protocols — BACnet, Modbus/TCP, EtherNet/IP, DICOM, HL7 MLLP — and understands what the responses mean in a healthcare context.

Here's what a cyprobe scan against a hospital network actually produces. First, it discovers every OT device on reachable subnets using protocol-native probes. No aggressive port scanning. No banner grabbing. It speaks BACnet Who-Is and reads the device's object list. It sends Modbus function code 43 to read device identification. It does DICOM C-ECHO to find imaging endpoints. The output is a typed inventory: device class, manufacturer, firmware version, protocol, authentication status, network segment.

Second, it assesses posture against the device's actual capabilities. Can this BACnet device be written to without authentication? Does this Modbus device respond to function codes that should be restricted? Is this DICOM endpoint accepting associations from any calling AE title? Is this HL7 interface transmitting PHI over an unencrypted channel?

Third, it maps what it finds to the compliance frameworks that matter. Each finding carries references to HIPAA Security Rule sections, NIST CSF subcategories, and the specific controls from whatever framework your compliance team is working against.

The output is not "port 47808 is open." The output is "Siemens Desigo CC building controller at 10.4.12.15 accepts unauthenticated BACnet WriteProperty requests on HVAC setpoint objects for the surgical suite air handling unit. HIPAA 164.312(a)(1) — access control. Risk: an attacker on this network segment can alter surgical suite environmental conditions without credentials."

That's the difference between a network scan and an OT assessment. One tells you a port is open. The other tells you what it means for patient safety.

Posture, not just scanning

Scanning is the beginning, not the end. What I've learned building Cybrium is that healthcare organizations — especially smaller ones — need a platform that turns scan results into a compliance posture they can act on and maintain over time.

A 50-bed critical-access hospital doesn't have a team of five security engineers to triage findings. They might have one IT person who also manages the phone system and the printer fleet. What they need is not a 200-page vulnerability report. They need a prioritized action list that says: here are the three things that will reduce your risk this week, here's exactly what to do for each one, and here's how this maps to the HIPAA requirement your auditor is going to ask about in October.

This is what the Cybrium platform does around cyprobe's scan results. Every finding feeds into a compliance posture dashboard that maps your OT, network, and application security state against HIPAA, NIST CSF, HHS CPGs, and — if you're in scope — PCI DSS and SOC 2. The dashboard is not a checklist. It's a live score that changes as your environment changes, because we run continuous assessments, not annual point-in-time audits that are stale before the PDF is delivered.

For healthcare organizations that need to demonstrate compliance to auditors, insurers, or regulators, the platform generates evidence-backed reports. Not "we believe we are compliant." Instead: here is the scan data, here are the controls it maps to, here is the residual risk, here is what we're doing about it, and here is the timeline. Auditors can drill into the underlying findings. Insurers can see the trend over quarters. Regulators get the artifact they need without the hospital spending six weeks preparing for the review.

We price healthcare at $5 per bed per month. That's deliberate. A 50-bed critical-access hospital pays $250 a month for the same platform, the same scanners, and the same compliance mapping that a 500-bed system gets. The attack surface doesn't care about your revenue. The security tooling shouldn't either.

The recomposition happening in healthcare security

What's happening in healthcare cybersecurity right now is a recomposition of the same kind that's reshaping the rest of the security industry, but with higher stakes and tighter constraints.

For twenty years, hospitals have bought point solutions: one tool for the EMR, one for the network, one for endpoints, and nothing for OT. Each tool produces its own reports, speaks its own language, and gets reviewed by a different person — if it gets reviewed at all. The result is a pile of point-in-time assessments that nobody can synthesize into a coherent picture of institutional risk.

The Pitt scenario — everything goes dark at once — happens precisely because these systems are connected in ways that no single point solution can see. The ransomware that encrypts the EMR server also encrypts the BACnet controller that happens to be on the same network segment because nobody ever segmented it, because no scanning tool ever flagged it as a risk, because no compliance framework ever asked about it specifically enough to force the question.

The recomposition is toward platforms that see all of it. OT and IT. Clinical and facilities. Network and application. Compliance and posture. Not because platforms are inherently better than point solutions — sometimes they're worse — but because in healthcare, the attack chains cross every boundary that point solutions draw, and the risk can only be assessed by something that sees across those boundaries.

That is the bet we've made with Cybrium. One platform that scans your code, your cloud, your network, your web applications, your AI infrastructure, and your operational technology. That maps every finding to the compliance frameworks your auditor cares about. That gives a 50-bed clinic the same quality of security posture visibility that a 500-bed academic medical center gets from a dedicated security team.

If you're running a hospital, a clinic, or a healthcare IT organization of any size and you want to know what your OT attack surface actually looks like — not what you hope it looks like, but what an attacker with an LLM and a free afternoon would see — find me at anand@cybrium.ai. The Pitt scenario makes great television. It makes a terrible incident report.

Every CISO Needs an AIBOM in 2026 — Here's What Vendors Get Wrong

Grumpy Sage — Thu, 14 May 2026 15:19:36 +0000

A friend of mine runs security at a mid-sized fintech. Last month her board asked a question that should have been simple: "How many AI models are in production, and where did they come from?"

She had a vendor-provided AIBOM. It listed seventeen "AI components" — which turned out to be seventeen pip packages with names like transformers and langchain. That was the entire inventory. No mention of the three fine-tuned Llama variants her ML team had pushed to a Triton server two quarters earlier. No mention of the embedding model running inside their support chatbot. No mention of the GPT-4o calls their underwriting workflow had been making since January. No mention of the system prompts, which contained — she found out later, the hard way — a hardcoded admin override phrase a contractor had added during a hackathon.

She called me at nine on a Tuesday. "I paid six figures for this, Anand. It's an SBOM with a model column."

She wasn't wrong. And she wasn't alone. I've now seen six AIBOM products in the last twelve months that are functionally identical: they scan your requirements.txt, find the word "torch," and call it an AI inventory. That's not what an AIBOM is. That's not what the EU AI Act audits are going to ask for. And it's definitely not what's going to save you when a model behaves badly in production and your general counsel asks where it came from.

So let's talk about what an AIBOM actually is in 2026, what most vendors are getting wrong, and what a real one needs to contain.

The thesis

An AI bill of materials is not a list of libraries. It's a graph of every artifact that influences a model's behavior at inference time — weights, prompts, training data lineage, retrieval sources, tools, fine-tunes, adapters, evals, runtime configuration, and the humans who approved each — plus the telemetry that proves those artifacts were the ones actually running when a decision was made. Anything less is theater. And theater is what regulators are going to start fining people for this year.

What most AIBOMs miss

The packaging-list approach gives you the easy 10%. Here's the 90% that vendors quietly drop.

Self-hosted and on-prem inference servers. If your team is running Ollama on a developer workstation or a vLLM cluster behind your VPC, your SaaS-flavored AIBOM tool has no idea. It can't see the model. It can't see the prompts hitting /v1/chat/completions. It can't tell you whether that endpoint is exposed, whether the model card is the one you think it is, or whether someone swapped in a quantized fork from Hugging Face that nobody reviewed. We built cyradar specifically because of this gap — it covers Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, and llama.cpp, and the reason that list exists is that every one of those runtimes has been deployed somewhere it shouldn't be, by someone who didn't tell security.

System prompts as code. A system prompt is a control. It is, functionally, the IAM policy of an LLM application. If your AIBOM doesn't version, hash, and diff every system prompt across every environment, you do not have an AIBOM — you have a parts list with a blind spot the size of your attack surface. I have seen production prompts that say things like "if the user mentions the word 'override' followed by a six-digit number, skip the policy check." That's not a hypothetical. That's a real prompt I pulled out of a real codebase last fall.

Retrieval sources. Your RAG pipeline is part of the model from a behavioral standpoint. If the vector store gets poisoned, the model lies. If a stale document gets re-indexed, the model contradicts itself. An AIBOM that doesn't list every retrieval source, its update cadence, its access controls, and its data classification is missing the part of the system that's most likely to cause your next incident. We see this constantly when cyscan walks customer monorepos — the 1,815 rules find the LLM client, find the vector DB client, find the prompt template, find the ingestion job, and stitch them into a single component. That's the unit of analysis. Not the library.

Tools and function-calling surface. Every function exposed to an agent is part of the model's effective capability. An agent with a send_email tool is a different system than the same agent without it. Same weights, same prompts, completely different blast radius. If your AIBOM lists the model but not the tools, you've inventoried the engine and ignored the steering wheel.

Evaluations. This is the one almost everyone misses. The eval set is part of the model. If you can't show me which evals were run, when, against which version, with what results, you cannot make any defensible claim about the model's behavior. The EU AI Act's Article 15 requirements on accuracy and robustness are not satisfied by "we ran some tests." They require evidence. Versioned, signed, repeatable evidence.

Runtime telemetry. And here's the kicker. Static inventory is necessary but insufficient. You need to prove that the artifact you inventoried is the artifact that actually ran when the decision was made. That requires runtime attestation — hashes of the loaded weights, signatures on the served prompts, request/response logging with provenance. Otherwise an attacker who swaps a model adapter at 3 a.m. has compromised your system and your audit trail simultaneously.

The composability problem

Here's why this is genuinely hard, and why I think a lot of vendors haven't tackled it: AI systems compose at runtime in ways traditional software doesn't.

A microservice is what it is at deploy time. You ship the container, you know what's running. An LLM application is different. The model is one artifact. The prompt is another, often loaded from a config service. The retrieval context is generated per-request from a vector store that's being updated continuously. The tool registry might be fetched from a service mesh. The user's session history influences behavior. The model's own previous outputs feed back in via memory.

So the "bill of materials" for a single inference call is computed on the fly. It's not a static document. It's a query.

This is the part that breaks the SBOM mental model. SBOMs are documents. AIBOMs need to be live indexes that can answer the question: "For the inference request that produced output X at timestamp T, what were all the artifacts in play, what were their versions, and who approved each?"

If your AIBOM can't answer that question, it's not going to survive a regulator's first follow-up email.

What a real AIBOM contains

Let me be concrete. Here's the minimum I'd accept on a vendor review in 2026.

A discovered inventory of every model artifact — base models, fine-tunes, adapters, LoRAs, quantizations — with cryptographic hashes, source provenance (where did it come from, who signed it, what license), and the environments where it's deployed. This includes the self-hosted runtimes I mentioned. If your tool can't enumerate a vLLM deployment, you have a hole.

A versioned registry of every prompt — system prompts, user prompt templates, tool descriptions, guardrail prompts — tied to the application and the model version it pairs with. With diff history. With approval metadata. With injection-test results.

A catalog of every retrieval source, with data classification, update cadence, access controls, and the embedding model used to index it. Plus the ingestion pipeline that populates it, because that pipeline is itself an attack surface.

A function and tool registry: every callable exposed to every agent, with input/output schemas, the permissions it executes under, and the human approval trail.

A linked eval history: which evals ran against which version, with results, signed, with reproducibility metadata.

Runtime telemetry that ties inference requests back to all of the above, with enough fidelity to reconstruct any given decision after the fact.

And — this is the part vendors hate hearing — it needs to be open and queryable, not a PDF export. If I can't run SQL against it or hit it from an MCP server, it's not operational. It's a compliance artifact. Those are different things.

Why the platform shape matters

This is the part where I tell you what we built and why, and you can take it as marketing or as honest reasoning, your call.

We didn't set out to build an AIBOM product. We set out to build the security tools an AI-native engineering org actually needs: cyscan for code and configuration analysis (1,815 rules across 75-plus languages, because AI code is polyglot in a way nothing has ever been), cyradar for AI runtime and inference exposure, cyweb for web application fuzzing with proper LLM-aware coverage (22 fuzz categories, 95% template conversion versus the upstream community feed).

The AIBOM emerged as the thing that ties them together. Because if cyscan finds a model invocation in your repo, and cyradar finds an inference endpoint on your network, and cyweb finds a prompt-injection-reachable form on your website — those are the same system. The value isn't three reports. The value is one graph.

That's why our MCP server exposes ten tools across the platform: so an LLM can do what auditors and incident responders do, which is pivot between code, runtime, web surface, and inventory. "Show me every model in production." "Now show me the prompts paired with that model." "Now show me which of those prompts were modified in the last thirty days." "Now show me whether the runtime endpoints serving them are externally reachable." That's an investigation. That's the unit of work. A vendor that only sees one of those layers cannot answer those questions.

I'm not arguing every customer needs all four tools. I'm arguing that the AIBOM is the wrong thing to buy as a standalone product, because the inventory is only as good as the discovery surface feeding it. And the discovery surface needs to cover code, runtime, web, and configuration to be useful in 2026.

The recomposition

What I've noticed in the last six months is that the categories from the last decade are dissolving. AppSec, CSPM, SBOM, ASPM, DSPM, AI governance — these were sold as separate products because the underlying problems used to be separable. They aren't anymore.

A modern AI application is a piece of code, that calls a model, that runs in a container, that pulls from a vector store, that exposes a web interface, that processes regulated data, that gets approved by a governance workflow. Securing it means seeing all of those layers as one thing. The product category that emerges from that is not "AIBOM." It's not "AI-SPM." It's something we don't have a great name for yet — but it's the security layer for AI-native engineering. The AIBOM is a view into that layer. An important view. But just a view.

The vendors who are selling AIBOMs as standalone products in 2026 are, I think, repeating the SBOM mistake from 2022. SBOMs as standalone products didn't work either. They became useful only when they were embedded into vulnerability management, supply chain attestation, and incident response workflows. AIBOMs will follow the same path, but faster, because the regulatory pressure is sharper and the architectural surface is bigger.

If you're a CISO evaluating AIBOM vendors right now, the question I'd ask is not "what fields does your AIBOM contain." The question is: "Show me the workflow that starts at a runtime alert and ends at the line of code, the prompt version, the approver, and the eval result that allowed it through." If they can demo that, they have an AIBOM. If they can only export a JSON file, they have a parts list.

My friend at the fintech has since replaced her vendor. She now has an inventory she can actually defend in front of her board, and — more importantly — in front of the regulator that's coming for her industry next year. The thing that haunted her about her old AIBOM was that the missing models had been in production for three quarters. The inventory said zero. Reality said three. That gap is the entire problem. Closing it is the entire job.

If you want to see what real AI inventory looks like — across code, runtime, and web surface — start with cybrium.ai/cyscan and cybrium.ai/cyradar. If you want to talk through your AIBOM strategy, or have me look at the one you just bought and tell you what it's missing, find me at anand@cybrium.ai.

Why Every CISO Needs an AIBOM in 2026 — And What Vendors Miss

Grumpy Sage — Mon, 11 May 2026 15:56:58 +0000

A friend of mine runs security at a mid-market fintech. Last month she got asked a question by her board that should have been trivial: "How many AI models are in production at our company, and where did they come from?"

She had a vendor-provided AIBOM. A real one. Generated by a well-known platform you've heard of. She pulled it up on the projector during the board meeting.

The AIBOM listed 14 models. She knew there were more.

After the meeting she spent two days with her platform team running their own inventory. The real number was 47. Some were embedded in SaaS tools her business teams had bought without telling her. Some were running locally on engineering workstations — llama.cpp instances developers had spun up to avoid the OpenAI rate limits. Two were fine-tuned variants of Llama 3 that a data science team had deployed inside a Kubernetes namespace nobody was scanning. One was a vLLM server somebody stood up on a GPU node six months ago and forgot about.

The vendor AIBOM had captured the API-based stuff. Anthropic. OpenAI. Bedrock. Easy targets. Everything that left a billing trail.

What it missed was the actual AI surface area. The part that sits inside her perimeter, runs on her hardware, processes her data, and has no rate limit or vendor SOC 2 to fall back on. The part that, if compromised, doesn't ring an alarm at a third party.

This is the AIBOM problem in 2026. The artifact exists. The compliance checkbox gets ticked. And the inventory is still wrong.

The thesis

An AIBOM is not an SBOM with a "model" row added. It's a fundamentally different artifact because AI systems have a fundamentally different supply chain — one that includes weights, prompts, embeddings, retrieval indexes, fine-tuning datasets, inference runtimes, and the agent scaffolding that ties them together. If your AIBOM doesn't capture all of those, what you have is a marketing document. And most of what's being shipped right now is exactly that.

What an AIBOM actually has to contain

Let me be specific, because the vendor space has gotten lazy about this.

A real AIBOM tracks the model itself — name, version, weights hash, license, provenance. That's the easy part. The part everyone gets right.

Then it has to track the inference runtime. This is where the wheels start coming off. Are you running Ollama? vLLM? TGI? LocalAI? Triton? LM Studio? llama.cpp? Each of those has its own CVEs, its own auth model, its own default configurations, and its own attack surface. A Llama 3 8B running on vLLM behind proper auth is a different risk than the same weights running on a default Ollama install with the API exposed on 0.0.0.0. The AIBOM has to know the difference.

Then the data lineage. What did the model get trained on? What does it get fine-tuned on? What sits in the retrieval index it's pulling from at inference time? An AIBOM that doesn't capture the RAG corpus is missing maybe 40% of the actual attack surface, because that's where prompt injection lives now. The model is fine. The PDFs your sales team uploaded last Tuesday are the threat.

Then the prompt layer. System prompts, tool definitions, agent loops, MCP server bindings. If your model has access to ten tools through an MCP server, those ten tools are part of the bill of materials. If one of them is a "send_email" tool with no human approval gate, that's a fact your AIBOM should be screaming about. Not buried in an appendix.

Finally, the runtime context. What network does this thing live on? What service account does it run under? What does it have IAM access to? You cannot reason about AI risk without that context, because the same model is a different risk profile depending on whether it can read your S3 buckets.

If you accept that list, you've already disqualified maybe 80% of the AIBOM tooling on the market. Most of it stops at "model name + version + license."

Where vendors go wrong, specifically

I want to name patterns, not vendors, because the patterns will outlive the vendors.

Pattern one: the SBOM-with-extra-columns approach. Some vendor took their existing software composition analysis tool and added a "model" detection rule. They find references to openai in your package.json and call that an AIBOM entry. This catches nothing self-hosted, nothing embedded in vendor SaaS, and nothing running outside the codebase you happen to be scanning. It's a checkbox.

Pattern two: the API-trail approach. Vendor watches your egress traffic or your cloud billing and infers AI usage. Better than nothing — catches shadow Anthropic accounts. But useless for anything inside the perimeter. A vLLM server on your internal GPU cluster generates zero egress traffic. It also generates zero AIBOM entries in this model.

Pattern three: the survey approach. Vendor sends a questionnaire to your dev teams. "List all AI systems in production." This is governance theater. The teams that fill it out conscientiously are not the teams you're worried about.

Pattern four: the model-registry approach. Vendor integrates with MLflow or SageMaker Model Registry and treats that as ground truth. Great if your entire organization uses one model registry. Nobody's entire organization uses one model registry. The shadow Ollama instance isn't in MLflow.

What all four of these share is that they're trying to generate an AIBOM from one perspective — the codebase, the network, the people, or the registry. AI systems live across all of those. You need detection that lives across all of those too.

The detection problem is a code problem first

Here's an opinionated take. The single highest-leverage place to build AI inventory is the codebase itself. Not because that's where everything lives, but because that's where most of the self-hosted, embedded, and shadow stuff originates. Somebody, somewhere, wrote an import statement.

This is what cyscan does in our platform. We've got 1,815 detection rules across 75+ languages, and a meaningful chunk of those are AI-specific patterns — runtime imports, model loading calls, agent framework usage, embedding library references, MCP client instantiations. If a developer imported vllm or instantiated an Ollama client or wired up a LangChain agent with a tool list, we want to know.

cyscan ai-inventory --repo ./monorepo --output aibom.json

The output isn't a list of models. It's a graph. Here's a service that loads Llama-3-8B-Instruct, runs it on vLLM, exposes it on port 8000, and is called by these three other services, one of which has an MCP server attached with these four tools. That's an AIBOM entry that you can actually reason about.

But code scanning alone isn't enough — that's the lesson I keep watching CISOs learn the expensive way. Code tells you what should exist. It doesn't tell you what's actually running on the GPU node nobody documented.

The runtime side: scanning what's actually live

This is where cyradar comes in, and where the architectural choice we made matters. cyradar specifically targets the self-hosted inference layer — Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, llama.cpp. We picked those seven because they cover almost everything self-hosted in 2026. If you've got a GPU running an LLM, it's almost certainly one of those.

The point isn't just to find them. The point is to fingerprint them. What model is loaded? What version of the runtime? Is the auth configured? Is the API exposed on the management network or the data network? What's the context window, the max tokens, the system prompt baked in at startup?

cyradar discover --cidr 10.0.0.0/8 --runtimes all
cyradar fingerprint --target 10.4.12.88:11434

That second command tells you not just "there's an Ollama at this IP" but "there's an Ollama 0.5.7 with llama3:70b and nomic-embed-text loaded, the API is open, no auth, last queried 14 minutes ago." That's an AIBOM entry the code scanner can't produce because the code that spun this up may not exist in any repo you scan. Someone ran ollama pull on a server.

Combine the code-side inventory with the runtime-side inventory, reconcile them, and now you have something that looks like a real AIBOM. The reconciliation is the hard part. The code says service X should be talking to Ollama. The runtime says Ollama is running on host Y. Are those the same instance? You need topology.

The agent and tool layer

I said earlier that tools are part of the bill of materials. I want to push on that, because it's where I see the most magical thinking in current AIBOM standards.

In 2024 you had models. In 2025 you had models with tools. In 2026 you have agents with toolchains that span MCP servers, traditional APIs, and other agents. The "thing" you're inventorying isn't really a model anymore. It's a capability graph.

Our own MCP server exposes 10 tools. Each one represents a capability — scan a repo, fingerprint a runtime, pull a fuzz template, query the rule database. Any agent that connects to our MCP server inherits those 10 capabilities. If your AIBOM lists "Claude" as one entry, you've underspecified the system by an order of magnitude. The relevant entry is "Claude + these MCP servers + these tool permissions + this system prompt + this RAG corpus."

That's a mouthful. It's also reality. Any AIBOM standard that can't express that — and most of the current ones can't, cleanly — is going to be obsolete within a year.

Web-facing AI surface, which everyone forgets

The other gap I see constantly: AI in the web tier. Chatbots embedded in marketing sites. AI search bars. Internal admin tools with an LLM assistant bolted on. Customer support widgets backed by some RAG pipeline somebody set up in a hurry.

These rarely show up in model registries. They rarely show up in code scans of the main monorepo because they live in their own little frontend repo. They almost never show up in network discovery because they call out to a vendor, not in.

cyweb's 22 fuzz categories include LLM-specific ones — prompt injection across the wire, jailbreak attempts via input fields, system prompt extraction, tool invocation abuse. When we scan a web property, we're not just looking for SQLi anymore. We're testing whether the friendly chatbot in the bottom corner can be talked into revealing the system prompt or executing tool calls it shouldn't. If it can, that goes into the AIBOM as a finding, attached to the model and runtime entry for that chatbot.

Our 95% template conversion rate vs upstream community templates matters here because the upstream community is fast — new prompt injection payloads land daily, and the gap between "known technique" and "we can test for it" needs to be small. An AIBOM that catalogs your AI systems but can't test them is a museum exhibit.

Why one platform

I keep getting asked why we built all of this — cyscan, cyradar, cyweb, the MCP server — instead of just picking one and going deep. The answer is exactly the AIBOM problem we've been talking about.

You cannot generate a real AI bill of materials from one vantage point. Code-only misses runtime. Runtime-only misses provenance. Network-only misses the SaaS-embedded stuff. Survey-only misses everything anyone forgot. To get an inventory that's actually correct, you have to triangulate from at least three of those.

If those three tools are bought from three vendors with three data models, the reconciliation happens in a spreadsheet maintained by an exhausted security engineer. I've watched this fail in real organizations. The spreadsheet drifts. The board gets the wrong number.

When the inventory comes from one platform with one data model, reconciliation is a join, not a meeting. That's the architectural choice. It's not about wanting to sell more SKUs. It's that the AIBOM problem is fundamentally a correlation problem, and correlation across vendor boundaries doesn't work.

The recomposition

Here's what I think is actually happening, beyond the AIBOM specifically.

The security industry spent twenty years building tools for a world where software was deployed by humans, ran in known places, and changed on quarterly release cycles. Every tool category — SAST, DAST, SCA, EDR, CSPM — assumes that model.

AI broke the model. Software is now partly deployed by agents, runs in places nobody documented, and changes when a developer types ollama pull. The asset isn't a server or a service anymore. It's a capability graph that includes weights, prompts, tools, data, and runtime. The discovery problem isn't "what hosts do I have" but "what can my systems do, and who taught them to do it."

The AIBOM is the first artifact that tries to express this. The current versions of it are bad because the standards bodies are still thinking in SBOM terms. The good versions, the ones that will actually matter when regulators start asking for them — and they will, by end of 2026 in at least three jurisdictions I'm tracking — those are going to look like capability graphs, not parts lists.

The vendors who get this right are the ones rebuilding their data model around the AI supply chain rather than retrofitting their old one. Everyone else is going to spend 2027 explaining to their customers why the AIBOM they shipped missed half the surface area.

What to do Monday

If you're a CISO reading this and your current AIBOM came from a vendor demo, do one experiment. Run your own inventory — survey the engineering teams, scan the internal network for the seven self-hosted runtimes, grep the monorepo for AI imports. Compare your number to the vendor's number.

If they match, congratulations, you picked well. If they don't, you have a problem that no compliance report will surface until something goes wrong.

We can help with the inventory side. cyscan handles the code, cyradar handles the runtime, cyweb handles the web-facing surface, and the MCP server lets your own agents query the AIBOM directly — which is, in a meta way, how I think AIBOMs will mostly get consumed by 2027 anyway. By other agents.

If you want to talk through yours, find me at anand@cybrium.ai.

Why I Stopped Letting Claude Shell Out for Security Scans

Grumpy Sage — Mon, 11 May 2026 04:44:53 +0000

A founder I know spent last Tuesday night debugging what he thought was a Claude bug. He'd wired up Claude Code to his repo with the default shell tool, asked it to "scan this codebase for secrets and SQL injection," and watched it confidently produce a clean report. Zero findings. He shipped to staging. Twelve hours later his Datadog alert fired on a Postgres error trace that exposed a hardcoded service account key in a config file Claude had supposedly scanned.

He called me at 11pm. We screen-shared. The problem was almost funny once we saw it. Claude had run cyscan — correctly, with the right flags — against the wrong directory. It had cd'd into a subfolder earlier in the conversation to read a file, never cd'd back, and then run the scan from there. The scan completed in 400ms because there were six files in scope. Claude wrote up a confident summary of those six files, called it a codebase audit, and moved on.

That's not a Claude failure. That's a tool design failure. Shell is a terrible interface for a security scanner when the caller is a probabilistic agent with no model of working directory state, no schema for what "done" looks like, and no way to know if the tool it just invoked actually understood the request. The whole exchange was vibes. The agent produced confident output because shell tools produce stdout and stdout looks like an answer.

I've been building Cybrium for two years now, and the single most important architectural decision we made in the last six months was to stop telling people to invoke our scanners via shell. Today everything routes through an MCP server. Ten tools. Typed inputs. Structured outputs. No working directory drift. Let me explain why this matters and what we learned along the way.

The thesis

If your agent talks to security tooling over a shell, you've built a system where the agent's confidence is decoupled from the scanner's actual coverage. MCP fixes this by making the contract between agent and tool explicit, machine-checkable, and inspectable after the fact. This isn't a UX upgrade. It's the difference between a security pipeline you can audit and one you cannot.

What "default shell tool" actually gives you

When Claude Code, Cursor, or any agent runs cyscan --path . --format json through a bash tool, here's what's actually happening. The agent constructs a string. The string goes to a shell. The shell maintains its own state — working directory, environment variables, prior exit codes — that the agent only partially observes. The scanner runs, writes to stdout, maybe also stderr, exits with a code, and the agent reads it all back as a single blob of text that it then has to parse.

Every step there is a place where things break in ways the agent can't see.

The agent doesn't know if cyscan was the binary it thought it was, or some alias, or a different version on PATH. It doesn't know if the path it passed was a symlink, was expanded by the shell glob, or got truncated. It doesn't know if stderr contained warnings that materially change the meaning of stdout. It doesn't know if the exit code maps to "clean scan" or "scanner crashed after partial run." It just sees text.

And here's the part that haunts me as someone shipping a security product: the agent doesn't know how many rules ran. If cyscan ran 1,815 rules across 75+ languages on a 200-engineer monorepo, that's one outcome. If it ran 12 rules because it only found two file types in the subdirectory it was actually pointed at, that's a completely different outcome. Stdout looks similar in both cases — a JSON array of findings, possibly empty. The agent summarizes "no findings." The CISO sleeps poorly.

Shell tools optimize for human flexibility. Humans cross-reference, notice anomalies, get suspicious when a scan finishes too fast. Agents don't, at least not reliably, and certainly not under pressure when they're four turns deep in a conversation about something else.

What MCP changes structurally

Model Context Protocol is, at its core, a typed RPC layer between agents and tools. That sounds dry. The implications are not.

When Claude calls cyscan_repository through our MCP server, it isn't writing a shell string. It's calling a function with a typed schema. The schema declares that path is required, that it must be an absolute path, that language_filter is an optional enum, that rule_packs defaults to "all 1,815." The MCP server validates the call before our scanner ever runs. If the agent forgets a required arg, the call fails with a structured error the agent can actually reason about — not a bash error that says "missing argument" in some format the agent has to text-match against.

The response is structured too. Not stdout. A JSON object with fixed fields: scan_id, files_scanned, rules_executed, findings, coverage_metadata, duration_ms. The agent doesn't have to parse anything. It just reads files_scanned. If that number is 6 when the repo has 4,000 files, the agent has a fighting chance of noticing, because files_scanned is a first-class field that the agent's system prompt can be told to check.

This is what I mean by making the contract machine-checkable. With shell, "did the scan actually scan the thing" is a vibes question. With MCP, it's a field.

The ten tools and why ten

Our MCP server exposes exactly ten tools right now. I get asked sometimes why so few — surely a security platform has more surface area than that. The answer is that ten is the result of a lot of arguing about granularity.

Too few tools and each tool becomes a god-function with twenty parameters and the agent has to learn a sub-language to drive it. Too many tools and the agent's context window fills with tool descriptions before it's done a single useful thing. Ten was where we landed after watching agents actually use the server for three months.

The tools split roughly into three families. Code and repo scanning lives in cyscan-backed tools that handle static analysis across 75+ languages. AI-specific scanning lives in cyradar-backed tools that probe local inference endpoints — Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, llama.cpp — for the kinds of misconfigurations that don't show up in any conventional vuln scanner. Web and API fuzzing lives in cyweb-backed tools that drive our 22 fuzz categories with 95% template-conversion fidelity against upstream community signatures.

Each tool does one thing. The agent composes them. That composition is where the real power lives, and it's also what shell tools fundamentally can't do, because shell composition happens through pipes and string parsing instead of through structured data the agent actually understands.

A concrete example

Here's the kind of workflow that's trivial with MCP and miserable with shell. Suppose I want my agent to do a full security pass on a new microservice: scan the source for vulns and secrets, then if any of the findings touch an AI inference path, probe the running inference endpoint for those specific issues, then if any of the findings touch an HTTP route, fuzz that route with relevant templates.

With shell, this is a small program. The agent has to invoke cyscan, parse the output, build a follow-up command, invoke that, parse, build another. Every parsing step is a place where the agent can hallucinate field names, miss findings, or get tripped up by formatting changes between versions. I've seen agents miss findings because they expected severity and got risk_level.

With MCP, here's roughly what it looks like from the agent's side:

1. call cyscan_repository(path=/repo/order-service)
   -> returns findings[] with structured types
2. for findings where category == "ai_inference":
     call cyradar_probe(endpoint=finding.endpoint, checks=["prompt_injection","model_extraction"])
3. for findings where category == "http_route":
     call cyweb_fuzz(target=finding.url, template_packs=relevant_packs)
4. call generate_report(scan_ids=[...])

The agent doesn't write parsing code. It doesn't construct strings. It calls functions on objects. When cyradar_probe finds a prompt injection vector against a llama.cpp endpoint, that finding is a typed object with a CVE-style identifier, a severity, a remediation hint, and a pointer back to the originating cyscan finding. The lineage is preserved. The audit trail is automatic.

You can build something similar with shell. People do. It involves jq, bash heredocs, and a lot of prayer. It is not robust to scanner version updates, scanner output changes, or agent context drift across turns. I have watched these pipelines work flawlessly for two weeks and then silently start dropping findings because someone added a field to the JSON output and the jq filter didn't match anymore. Nobody noticed for a month.

The state problem

This is the one I care about most, and it's the one that bit the founder I mentioned at the top. Shell sessions have state. Agents have an imperfect model of that state.

Working directory is the obvious one but it's not the only one. There's environment variables, which the agent often sets early in a conversation and then forgets about. There's PATH ordering, which can change which binary gets executed. There's shell history affecting tab completion if the agent uses it. There's locale settings affecting how filenames with non-ASCII characters get handled. There's umask affecting permissions on output files. Every one of these is a state surface the agent has to track or risk getting wrong.

MCP tools are stateless by default in the way the protocol is designed. Each call is a self-contained, fully-specified invocation. If you want state — say, a long-running scan whose results you want to retrieve later — that state is explicit and addressable. Our scan_id is a first-class thing. The agent passes it in, gets the same results back, can hand it to another tool. There's no "where am I in the filesystem" question because the filesystem isn't part of the protocol. Paths are arguments. Arguments are typed. The scanner resolves them against a known, fixed base.

This eliminates a whole class of failure mode that I genuinely believe is responsible for most agent-driven security incidents I've seen in the last year. Not zero-days. Not novel attacks. Agents scanning the wrong directory and confidently reporting clean.

Why one server and not ten CLI wrappers

I get the architectural question a lot: why does Cybrium ship one MCP server that exposes scanners, instead of three separate MCP servers wrapping cyscan, cyradar, and cyweb? Why couple them?

Because the findings need to talk to each other. A cyscan SAST finding about an unsafe deserialization in an LLM-output handler is interesting on its own. It becomes urgent when cyradar finds that the upstream inference endpoint accepts prompts from untrusted users. It becomes a P0 when cyweb confirms that the HTTP route exposing that handler is reachable without auth. None of those tools, in isolation, can tell you you have a critical incident chain. The MCP server holds the cross-tool context that makes correlation possible.

You could rebuild this on top of three separate MCP servers if you put a coordinator agent in front of them. People will try. I've tried. The coordinator agent has to know the semantics of findings from each scanner well enough to correlate them, which means baking scanner-specific knowledge into the agent's prompts, which means every scanner version bump becomes a prompt-engineering exercise. We did this. It was bad. Centralizing the correlation in the MCP server itself — where it can be versioned, tested, and updated alongside the scanners — is the better factoring.

The same logic, by the way, is why I don't believe in "bring your own scanner" MCP servers as a long-term architecture. Generic shells over arbitrary security tools sound great in a slide deck. In practice, the semantic gap between tools is where all the value lives, and a generic shell can't bridge it.

The recomposition

What's actually happening across the security tooling industry right now is a quiet recomposition. For fifteen years, the unit of integration was the CLI. You wrote a scanner that emitted SARIF or some custom JSON, and CI systems plumbed it together with bash. That worked when the orchestrator was a human writing YAML.

The orchestrator now is an agent. The agent doesn't write YAML. The agent makes decisions turn-by-turn based on what it just saw. The unit of integration for that world is not CLI output. It's a typed protocol that lets the agent reason about tools the same way a human reasons about a library. MCP is the first credible attempt at that protocol, and the products that win the next five years of security tooling will be the ones that ship native MCP surfaces, not the ones that bolt an MCP wrapper around their existing CLI as an afterthought.

The reason this is recomposition and not just integration is that once you have MCP-native tooling, the right unit of work changes. You stop thinking about "the scan" as a CI step and start thinking about "the security question" as an agent conversation. What did this PR change that touches PII? Did any of those changes introduce new attack surface that wasn't there yesterday? Are the inference endpoints we just deployed exposed to the same prompt injection that bit us last quarter? Those questions don't have YAML-shaped answers. They have agent-shaped answers, and they need tools the agent can actually drive.

What I'd do tomorrow if I were you

If you're using Claude Code or any agentic dev tool with shell access to security scanners right now, I'd do two things this week. Try our MCP server end-to-end on a real repo. The setup is one config block in your MCP client. Compare the findings count against whatever you're getting via shell. I would bet money you find a delta, and I'd bet that delta is in the direction of "MCP found things shell missed because shell was scanning the wrong scope."

The second thing: audit one of your agent conversations from last week. Pick a security-related one. Read the transcript. Count the number of places the agent made an assumption about shell state that it had no way to verify. Then ask yourself how many of those assumptions would still be assumptions if the tool had a typed schema.

You can pull the MCP server from cybrium.ai/mcp. The ten tools are documented there. Source for cyscan, cyradar, and cyweb lives in the same place. If you want to talk through your setup — especially if you're running local inference at scale and worrying about what your agents are actually seeing when they scan it — find me at anand@cybrium.ai.

Four Pillars, One Platform: How Cybrium Unifies Code, Cloud, AI, and GRC

Grumpy Sage — Mon, 11 May 2026 04:05:22 +0000

A friend of mine runs security at a 200-engineer SaaS company. Last winter she got paged at 2 a.m. for an exposed S3 bucket. Customer PII. The bucket had been flagged by their cloud scanner three weeks earlier. The ticket sat in a Jira board owned by the platform team, who had been waiting on an IAM change from the cloud team, who needed sign-off from compliance, who were busy preparing for their SOC 2 audit. By the time the breach was contained, the marketing email had already gone out announcing their new Series B.

She told me later that the part that haunted her was not the breach. It was that the finding had existed. The scanner had done its job. The system around the scanner had not.

I keep coming back to that story because it explains almost every modern breach I have seen. The signal exists. The fix is known. The owners are identifiable. But the four pieces of the puzzle — code, cloud, AI, and governance — live in four separate tools owned by four separate teams, each pretending the others do not exist. The breach is the gap between them.

This is the case I want to make: those four pieces should be one product. Not four products that talk to each other through APIs. One product, one asset graph, one workflow. I am going to use Cybrium as the worked example because it is what my team builds, but the architectural argument generalises.

What the four pillars actually are

I keep these labels short because everyone in security uses them but rarely defines them.

Code is everything that happens before a deploy. SAST, SCA, secrets in repos, infrastructure-as-code, container images, Kubernetes manifests. The unit of work is a pull request.

Cloud is everything that happens after the deploy. Posture in AWS / Azure / GCP, identity, drift, runtime config. The unit of work is a resource.

AI is the new pillar that nobody had three years ago. Who is running what model, where, with what data, calling which tools, exposed how. The unit of work is an asset that did not exist in the old asset taxonomy.

GRC is the layer that turns all of the above into auditable evidence. Frameworks, controls, risk register, trust center. The unit of work is a control.

Now look at the market. Snyk does code very well and reaches into cloud weakly. Wiz does cloud very well and barely touches code. The AI security startups each take one slice — runtime guardrails, prompt injection scanning, model inventory — and assume someone else is doing the other three pillars. Vanta and Drata collect evidence from everything and generate nothing.

This is a feature map, not a strategy. The customer pays for four tools and assumes glue code will make them coherent. It does not. It never does.

Code

I will start with code because it is the best-understood pillar and that makes the gap between best-in-class and standard practice the most visible.

Most SAST tools produce a number that I think of as the friendship-ending number. The CI pipeline says "we found 10,000 issues in your repo this morning," and the developer either ignores it forever or quits Slack. Neither is the response you want.

The fix is reachability. A CVE in a transitive dependency only matters if your code actually reaches it at runtime. Most don't. If you can rank findings by whether a real call path touches them, the friendship-ending 10,000 collapses to something like 12. Twelve is a number a human can act on.

In Cybrium the code engine is a Rust binary called cyscan. It runs:

SAST across 75-plus languages with 1,815 hand-curated rules
SCA with reachability — only CVEs your code can actually reach
Secrets detection (entropy + format + context)
IaC: Terraform, CloudFormation, Bicep, Pulumi, plus Kubernetes manifests
Span-based autofix, so the scanner does not just point at the problem; it produces a code edit you can apply or open as a PR

You can run it locally without ever signing up for anything:

brew install cybrium-ai/cli/cyscan
cyscan .
cyscan supply .                   # SCA with reachability
cyscan fix . --apply              # write the autofixes
cyscan . --format sarif --output cyscan.sarif

The SARIF output drops straight into GitHub Code Scanning or any CI that reads SARIF. For web apps where SAST is not enough, the companion binary is cyweb — same Rust core, but DAST: spider, headless-Chrome AJAX spider, fuzzer, template engine, OAST callbacks for blind SSRF and RCE detection. It replaced ZAP/Nikto/Nuclei in our pipeline and the conversion rate on upstream templates is around 95 percent.

Cloud

Cloud is where the market is most fragmented because every cloud has its own posture-management API surface and most vendors specialise in one.

We cover AWS, Azure, and GCP plus M365 and Active Directory under a single connector. The customer adds a cloud account once with a least-privilege read role, and the platform produces CSPM, ISPM (identity posture), ASPM (the wiring from repos to deployed services to cloud resources), container scanning via image-registry hooks, full Kubernetes scanning with the seven phases CIS calls out, and an M365 baseline that includes the DMARC/SPF/DKIM check from cymail.

What makes a cloud security tool useful versus useful-looking is the fix. Cybrium generates a Terraform pull request for every cloud finding. Behind a feature gate, there is a direct-apply mode for low-blast-radius changes. The developer sees the same shape of work whether the finding came from code or cloud — a PR, a diff, a CI pipeline running. They do not have to context-switch into a separate UI to fix a cloud problem versus a code problem.

AI

This is the pillar I think most vendors are getting wrong, and the one that explains why I think the next two years in this market will be a recomposition.

Almost every "AI security" company you can name right now sells a runtime gateway. A proxy between your developer and the model. That is one slice of one problem. It is the slice that demos well — you can stand in front of an audience and watch a prompt-injection attempt get blocked in real time. But it does not answer the question that actually keeps CISOs awake: "what AI is running in my company that I do not know about?"

You cannot govern what you cannot see. Cybrium's AI inventory has five channels:

The first is an active probe. A Rust binary called cyradar sweeps network ranges and identifies self-hosted inference servers: Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, llama.cpp, OpenAI-compatible endpoints. It fingerprints each match against a YAML signature catalogue. We ship the catalogue versioned; new model servers are a config update, not a code release.

The second is cloud API. We ingest Bedrock usage from AWS billing, Azure OpenAI from the Azure activity log, Vertex AI from GCP audit logs. Whatever model invocations are going through the sanctioned cloud accounts, we see.

The third is endpoint. A host-posture agent called cydevice runs on machines outside MDM coverage and reports which AI CLIs are installed (ollama, the OpenAI CLI, claude), which IDE extensions are active (Copilot, Continue, Cline, Cursor's local model use), which desktop apps are running (LM Studio, Anything LLM), and which model files are on disk (GGUF, safetensors, ONNX). This is the channel that catches shadow AI on developer laptops.

The fourth is traffic inspection — passive observation of egress to flag cloud-API calls to AI providers that did not go through SSO.

The fifth is SCM/SAST. The cyscan engine recognises imports of langchain, llama-index, transformers, the anthropic SDK, the openai SDK, and surfaces them as AI usage. If you have an LLM call in your code, we know about it from the repo before it ever hits production.

All five channels write into the same AIAsset row in the platform. The AI governance team can run a single query — "show me every AI surface in the company" — and get the union across channels. Policy then layers on top: no inference servers in corp/ subnet without TLS, no Bedrock model invocation without a sanctioned tag, no production code path that takes LLM output and pipes it into a tool call without sanitisation.

The prompt-injection point is worth dwelling on for a second. We do not have a separate scanner for it. The same cyscan engine that does SAST recognises the patterns: unsanitised LLM output flowing into a tool-call argument, hidden-character-aware string handling, RAG ingestion that does not strip control characters from untrusted documents. The AI pillar is not a separate product. It is a set of new questions asked by engines we already had.

brew install cybrium-ai/cli/cyradar
cyradar discover --targets 10.0.0.0/24    # find AI servers on the LAN
cyradar local-scan                         # inventory local AI tooling

For AI coding agents that should reach into the platform directly, we ship an MCP server — @cybrium-ai/mcp-server on npm — with ten tools. Claude Desktop, Cursor, Windsurf, Cline can call any of them by name. I will come back to this in a minute.

GRC

Most security platforms wave their hands here. The GRC team gets handed a CSV export and told to "make the audit work."

A serious GRC implementation has three components that have to be wired into the other three pillars, not bolted on after.

The first is framework mapping. Every finding from code, cloud, and AI must map to a control in SOC 2, ISO 27001, HIPAA, PCI, EU AI Act, NIST AI RMF, and whatever industry-specific frameworks apply. Without this mapping, a finding is operational noise; with it, the same finding becomes audit evidence. We do the mapping at rule-authoring time — every cyscan rule and every cloud check carries the relevant control IDs.

The second is evidence collection. When an auditor asks "show me that control CC6.1 is enforced," the answer cannot be a screenshot. It has to be a query that runs against the live asset graph and returns a count, a list, and a timestamped attestation. The compliance engine in the platform does this nightly, automatically, against the same asset graph the other pillars write into.

The third is the Trust Center. Your customers' procurement teams are asking the same security-questionnaire questions of every vendor. A Trust Center that exposes your controls publicly — with continuous, auto-collected evidence — cuts months off the sales cycle. Ours is at https://trust.cybrium.ai and updates from the same store as everything else.

We also ship a vCISO module — engagements, risk register, policy library, treatment tracking — for teams that do not have a full-time CISO but need to look like they do for a Series B raise. The risk register is keyed on the same asset graph, so a risk row is always traceable to specific findings and specific controls. Not narrative text in a Word document.

Why one platform, not four

If the only argument for unification were "fewer dashboards," you could ignore it. The actual argument is structural, and it lives in three properties that one asset graph makes possible.

A finding in one pillar becomes an enforcement signal for another. A reachable CVE in code creates a deployment-gate policy in cloud. A new AI inference server discovered on the LAN auto-creates a risk row in the GRC register. An auditor's evidence query pulls from the live posture, not a copy of it from last Tuesday.

A fix in one pillar resolves the corresponding finding in the others. Close an IAM mis-scoping in cloud, the related SOC 2 finding in GRC closes automatically. The compliance team stops chasing the cloud team for evidence.

Coverage gaps become visible. "What is not covered" becomes a query. Three repos have full code coverage, twelve have partial. Two clouds are scanned, one is not. The AI inventory has four channels but the fifth is unconfigured. You can see the holes before someone else finds them.

These three properties cannot be retrofitted by integration. Every API integration between four point tools is a translation layer that loses data and a workflow boundary that delays the response. The only architecturally clean approach is to start with one asset graph and build outward from there.

The new buyer is an AI agent

There is one more reason this matters now that I want to end on, because I think most security vendors have not internalised it yet.

A year ago, when a developer needed a security tool, they searched Stack Overflow, asked a colleague, or read a blog post. Today, increasingly, the developer asks Claude or Cursor. The agent reads the project state, parses the question, and picks a tool. The agent does not see ads. It does not have a procurement team. It reads documentation.

This is going to recompose the market. The vendors who ship coherent, AI-agent-readable tooling — with intent-mapped documentation, clean MCP integrations, READMEs that describe when to use the tool versus when to use something else — will absorb workloads that used to be spread across a long tail of point tools. The vendors who write press releases about "AI-powered security" and hope the AI does not look too closely will lose their seat at the table.

We have made our bet on the first model. The CLIs are open source and Apache-2.0. The MCP server is published on npm. The VS Code extension is on the Marketplace (cybrium-ai.cybrium). Every public repo has an AGENTS.md that tells an AI coding agent when to invoke which tool. The website has an llms.txt at the root that explains the same thing to any agent fetching the domain for the first time. The OpenAPI schema is public. The Trust Center is public.

If you are building anything that touches code, cloud, AI, or compliance, you can start with the pieces you need:

Code: https://github.com/cybrium-ai/cyscan
Cloud: https://app.cybrium.ai (14-day trial)
AI inventory: https://github.com/cybrium-ai/cyradar
MCP for agents: npm install -g @cybrium-ai/mcp-server
Trust Center: https://trust.cybrium.ai
Docs: https://docs.cybrium.ai

The four pillars are not optional anymore. The breach my friend stayed up for came from a gap between them. The question for every security team this year is whether they want one platform that closes those gaps or four that hold them open.

We have made our choice. If you want to talk through yours, find me at hello@cybrium.ai.