Grumpy Sage

Posted on Jun 1 • Originally published at cybrium.ai

Why Every CISO Needs an AIBOM in 2026 and What Vendors Get Wrong

#security #ai #devsecops #governance

A friend of mine runs security at a mid-sized fintech. About 400 engineers, Series D, the kind of place where the AI strategy memo got written in a weekend and shipped to prod the following Tuesday. She called me in March, somewhere between annoyed and panicked, because her board had asked a simple question: "What models are we running, and where did they come from?"

She had answers. They were wrong.

The official answer was "GPT-4 and Claude, through the standard gateway." The actual answer, after two weeks of digging, was seventeen models. A Llama 3.1 8B fine-tune hosted on a self-managed vLLM box that someone in the ML team spun up for a customer-support prototype, then forgot to tear down. Three Hugging Face embedding models pulled at container build time with no pinned hashes. A Whisper variant running on a developer's GPU workstation that was, somehow, reachable from the staging VPC. Two LoRA adapters fine-tuned on customer support tickets, sitting in an S3 bucket with a permissive policy. And a llama.cpp build serving a 7B model to an internal Slack bot that nobody could remember authorizing.

The part that got me, when she walked me through it, was that none of this was shadow IT in the old sense. Every one of these systems had a Jira ticket. Every one had been "approved." The inventory just didn't exist. There was no single place she could point at and say: this is what we run, this is where the weights came from, this is what data trained them, this is what they're allowed to talk to.

That's the AIBOM problem. And in 2026, if you don't have one, you don't have an AI security program. You have vibes.

The thesis

Most AIBOM products being sold right now are spreadsheets with a UI. They list model names and versions, slap a CycloneDX export on top, and call it done. That's not an AIBOM. That's a parts list pretending to be a bill of materials. A real AIBOM is a living graph that connects models, weights, datasets, prompts, tools, agents, and the data flows between them, and it has to be generated continuously from what's actually running in your environment, not from what someone typed into a form six months ago.

What an AIBOM actually contains

The SBOM analogy is useful but it breaks down fast. A traditional SBOM tells you which libraries are in a binary. The relationships are mostly one-directional: this binary depends on this package at this version. You can run Syft or Trivy and get something reasonable.

An AIBOM has to capture relationships that aren't dependencies in the SBOM sense. A fine-tuned model depends on a base model, sure, but it also depends on the training dataset, the fine-tuning configuration, the RLHF rewards, and increasingly, the prompts that shape its runtime behavior. An agent depends on the model serving it, the tools it can call, the memory store it reads from, and the MCP servers it's connected to. None of this shows up in a CycloneDX schema unless you stretch the schema until it tears.

So at minimum, an AIBOM has to capture: the models themselves with their weights provenance and hash; the training and fine-tuning datasets with lineage back to source; the inference infrastructure (whether that's vLLM, Ollama, TGI, Triton, LocalAI, LM Studio, or llama.cpp); the prompts and prompt templates in use; the tools and MCP servers exposed to those models; the data flows in and out of each model boundary; and the humans or service accounts who can change any of the above.

If your vendor's AIBOM is missing three or more of those, you bought a model registry, not an AIBOM.

Where most vendors fail

I've looked at probably a dozen AIBOM products this year. The failure modes cluster into four patterns.

The first is the form-based AIBOM. Engineers fill out a questionnaire, and the platform stores it. This is theater. The model running in production at 3 a.m. on a Wednesday is not the model someone described in a form during onboarding. Within six weeks the inventory drifts and you're back to vibes.

The second is the API-gateway-only AIBOM. The product hooks into your OpenAI and Anthropic API calls, watches the traffic, and infers what models you're using. This catches your hosted model use. It catches nothing else. Your self-hosted Ollama instance, your vLLM cluster, your llama.cpp processes, the LM Studio sessions on developer laptops, the LocalAI deployments your data team set up - all invisible. And by my estimate, self-hosted inference accounts for somewhere between 30 and 60 percent of model usage at any serious engineering org right now. You're missing half your attack surface.

The third is the registry-as-AIBOM trap. The product is really a model registry with provenance fields. It's great if every model goes through the registry. But registries are opt-in. They don't see what they don't see. And the entire point of a BOM is to discover the things you forgot to register.

The fourth, and this one annoys me the most, is the static AIBOM. The product generates a CycloneDX file at build time, hands it to you, and considers the job done. AI systems aren't static. The same model with a different system prompt is a different system, for security purposes. The same agent with a new tool connected is a different blast radius. If your AIBOM only updates at deploy time, it's already stale by the time it lands in your SIEM.

What you actually need to generate one

If I were building an AIBOM program from scratch today, here's how I'd think about the inputs.

You need code scanning that knows about AI. Not just "is there a call to openai.ChatCompletion in this file," but the structure of agent definitions, RAG pipelines, prompt templates, tool registrations, and model loads from disk or HF Hub. This is why we built cyscan with 1,815 rules across 75-plus languages, with a heavy investment in AI-aware detection. A traditional SAST tool will tell you nothing useful about whether your agent has access to a tool it shouldn't. An AI-aware scanner can.

You need runtime discovery for self-hosted inference. Somebody has to walk your network and your container runtimes and find the Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, and llama.cpp processes. They have to fingerprint them, identify the models being served, and check for the eight ways each of those servers can be misconfigured. That's what cyradar does. If you're scanning only for the hosted APIs, you're playing a different game than the one your attackers are playing.

You need web-layer awareness. Most AI apps end up behind an HTTP boundary eventually. A model is a backend. If your AIBOM doesn't connect the HTTP endpoints in your app to the models behind them, you can't reason about external attack surface. We use cyweb for this with 22 fuzz categories tuned for AI-shaped vulnerabilities, and we get to a 95 percent template conversion rate from the upstream community templates so we don't miss the boring web stuff while we're chasing the new stuff.

You need to model the agent and tool graph. Every MCP server is a capability boundary. Every tool a model can call is a potential confused deputy. Our MCP server exposes 10 tools and we treat each one as a documented edge in the AIBOM graph, with explicit allowlists for which agents can invoke which tools. If you can't answer "which models can call which tools" in your AIBOM, you can't answer the actually-important question, which is "what's the worst thing this model can do if it gets jailbroken."

You need provenance for weights. Hashes. Signatures where they exist. Download source. License. Known-bad lists for poisoned or trojaned models on Hugging Face. This is the part that looks most like a traditional SBOM, and it's also the part most vendors do least badly.

Finally, you need this to be continuous. Daily at minimum. Ideally on every deploy and on a scheduled drift check. Otherwise it's a snapshot, and snapshots lie.

A small example of what continuous looks like

Here's a sketch of what we run on customer environments. You don't need to use our tools to do this; the pattern matters more than the implementation.

# Discover self-hosted inference servers across the environment
cyradar discover \
  --targets ./targets.yaml \
  --servers ollama,vllm,tgi,localai,triton,lmstudio,llamacpp \
  --output aibom-runtime.json

# Scan repos for AI surface: models loaded, agents defined, tools registered
cyscan code \
  --repos ./repos.txt \
  --rules ai \
  --output aibom-code.json

# Walk the HTTP attack surface for AI endpoints
cyweb scan \
  --targets ./web-targets.yaml \
  --ai-categories all \
  --output aibom-web.json

# Merge into a single AIBOM graph and diff against last known good
cybrium aibom merge \
  --inputs aibom-runtime.json,aibom-code.json,aibom-web.json \
  --diff-against last-good.json \
  --output aibom-current.json

The diff is the part most people skip. The AIBOM itself is interesting. The change from yesterday's AIBOM to today's is what you should be alerting on. A new model appeared. A tool got connected to an agent that didn't have it before. A vLLM server is suddenly exposed on a new port. That's the security signal.

The fine-tuning and RAG blind spot

The other place most AIBOM products fall over is anything that touches your data.

Fine-tunes inherit the security posture of their training data. If you fine-tuned a Llama variant on a corpus that included customer PII, the model is now a potential data exfiltration vector even if your inference layer is perfectly locked down. The AIBOM has to know this. It has to record what data went into the fine-tune, who approved it, what residency constraints apply, and what the deletion story is when a customer invokes their rights.

RAG is similar but worse, because the data isn't baked in at training time, it's fetched at runtime. Your AIBOM needs to know which vector stores each agent can read from, what's in those stores, how they're populated, and what the access control model is at the chunk level. Most products treat the vector store as opaque infrastructure. It isn't. It's part of the AI system, and it belongs in the BOM.

I've seen one customer discover, through an honest AIBOM exercise, that their support agent was retrieving from a vector store that contained the contents of an internal HR wiki because the embedding pipeline was pointed at "all of Confluence" with no filtering. The AIBOM caught it. Their old "AI governance committee" had been signing off on the agent for nine months.

Why this has to be one platform

I get asked a lot why we bundled code scanning, runtime AI discovery, web fuzzing, and the MCP server into one platform instead of selling them as separate products. The reason is exactly the AIBOM problem.

An AIBOM is, by definition, a join across surfaces. Code surface plus runtime surface plus web surface plus agent-and-tool surface. If those four sources of truth live in four different vendors with four different schemas and four different update cadences, the join is hopeless. You end up with four partial inventories that disagree, and a quarterly meeting where everyone explains why their slice is the right one. Nobody ships anything.

When the surfaces share a graph, the diff is meaningful. You can ask: this new HTTP endpoint that cyweb found, what model does it route to, what tools does that model have, and is any of that change new compared to last week. You can ask that in one query because it's one graph. Not because we're smarter than other vendors. Because we made the architectural decision to keep the model coherent across surfaces from day one.

The recomposition

What's actually happening in the market right now is that the old vendor categories are being torn up and re-glued around the AI system as the unit of analysis. SAST, DAST, CSPM, model registry, data catalog, agent observability - those are five products and three buyer personas in 2024. In 2026 they're one workflow with one BOM at the center.

The CISOs who get this are quietly rebuilding their security architecture around the AIBOM. They're treating it as the central inventory, the way CMDB was supposed to be in the ITIL era and never quite was. They're plumbing it into change management, into incident response, into vendor risk, into audit. They're using it as the source of truth for which AI systems exist and what each one is allowed to touch.

The CISOs who don't get it are still buying point products and trying to staple them together with a Jira board. They're going to spend 2026 explaining to their boards why they keep getting surprised. The surprises are the AIBOM gaps. There's no other word for them.

If you remember one thing from this post, remember this: the question your board is going to ask in 2026 is not "are you doing AI security." It's "show me the inventory." If your answer involves opening five different consoles, you've already lost the meeting.

What to do this quarter

Start with the discovery problem. Run cyradar against your environment and find the inference servers you didn't know about. That alone will reframe how you think about scope. Then run cyscan on your repos with the AI rules enabled and find the agent definitions, tool registrations, and model loads in code. Then merge. Look at the diff against your current "official" inventory and prepare to be surprised.

You can dig into the specifics at cybrium.ai/cyradar, cybrium.ai/cyscan, and cybrium.ai/cyweb, or read the AIBOM architecture notes at cybrium.ai/aibom.

If you want to talk through yours, find me at anand@cybrium.ai.

DEV Community