DEV Community

Cover image for Every CISO Needs an AIBOM in 2026 — Here's What Vendors Get Wrong
Grumpy Sage
Grumpy Sage

Posted on • Originally published at cybrium.ai

Every CISO Needs an AIBOM in 2026 — Here's What Vendors Get Wrong

A friend of mine runs security at a mid-sized fintech. Last month her board asked a question that should have been simple: "How many AI models are in production, and where did they come from?"

She had a vendor-provided AIBOM. It listed seventeen "AI components" — which turned out to be seventeen pip packages with names like transformers and langchain. That was the entire inventory. No mention of the three fine-tuned Llama variants her ML team had pushed to a Triton server two quarters earlier. No mention of the embedding model running inside their support chatbot. No mention of the GPT-4o calls their underwriting workflow had been making since January. No mention of the system prompts, which contained — she found out later, the hard way — a hardcoded admin override phrase a contractor had added during a hackathon.

She called me at nine on a Tuesday. "I paid six figures for this, Anand. It's an SBOM with a model column."

She wasn't wrong. And she wasn't alone. I've now seen six AIBOM products in the last twelve months that are functionally identical: they scan your requirements.txt, find the word "torch," and call it an AI inventory. That's not what an AIBOM is. That's not what the EU AI Act audits are going to ask for. And it's definitely not what's going to save you when a model behaves badly in production and your general counsel asks where it came from.

So let's talk about what an AIBOM actually is in 2026, what most vendors are getting wrong, and what a real one needs to contain.

The thesis

An AI bill of materials is not a list of libraries. It's a graph of every artifact that influences a model's behavior at inference time — weights, prompts, training data lineage, retrieval sources, tools, fine-tunes, adapters, evals, runtime configuration, and the humans who approved each — plus the telemetry that proves those artifacts were the ones actually running when a decision was made. Anything less is theater. And theater is what regulators are going to start fining people for this year.

What most AIBOMs miss

The packaging-list approach gives you the easy 10%. Here's the 90% that vendors quietly drop.

Self-hosted and on-prem inference servers. If your team is running Ollama on a developer workstation or a vLLM cluster behind your VPC, your SaaS-flavored AIBOM tool has no idea. It can't see the model. It can't see the prompts hitting /v1/chat/completions. It can't tell you whether that endpoint is exposed, whether the model card is the one you think it is, or whether someone swapped in a quantized fork from Hugging Face that nobody reviewed. We built cyradar specifically because of this gap — it covers Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, and llama.cpp, and the reason that list exists is that every one of those runtimes has been deployed somewhere it shouldn't be, by someone who didn't tell security.

System prompts as code. A system prompt is a control. It is, functionally, the IAM policy of an LLM application. If your AIBOM doesn't version, hash, and diff every system prompt across every environment, you do not have an AIBOM — you have a parts list with a blind spot the size of your attack surface. I have seen production prompts that say things like "if the user mentions the word 'override' followed by a six-digit number, skip the policy check." That's not a hypothetical. That's a real prompt I pulled out of a real codebase last fall.

Retrieval sources. Your RAG pipeline is part of the model from a behavioral standpoint. If the vector store gets poisoned, the model lies. If a stale document gets re-indexed, the model contradicts itself. An AIBOM that doesn't list every retrieval source, its update cadence, its access controls, and its data classification is missing the part of the system that's most likely to cause your next incident. We see this constantly when cyscan walks customer monorepos — the 1,815 rules find the LLM client, find the vector DB client, find the prompt template, find the ingestion job, and stitch them into a single component. That's the unit of analysis. Not the library.

Tools and function-calling surface. Every function exposed to an agent is part of the model's effective capability. An agent with a send_email tool is a different system than the same agent without it. Same weights, same prompts, completely different blast radius. If your AIBOM lists the model but not the tools, you've inventoried the engine and ignored the steering wheel.

Evaluations. This is the one almost everyone misses. The eval set is part of the model. If you can't show me which evals were run, when, against which version, with what results, you cannot make any defensible claim about the model's behavior. The EU AI Act's Article 15 requirements on accuracy and robustness are not satisfied by "we ran some tests." They require evidence. Versioned, signed, repeatable evidence.

Runtime telemetry. And here's the kicker. Static inventory is necessary but insufficient. You need to prove that the artifact you inventoried is the artifact that actually ran when the decision was made. That requires runtime attestation — hashes of the loaded weights, signatures on the served prompts, request/response logging with provenance. Otherwise an attacker who swaps a model adapter at 3 a.m. has compromised your system and your audit trail simultaneously.

The composability problem

Here's why this is genuinely hard, and why I think a lot of vendors haven't tackled it: AI systems compose at runtime in ways traditional software doesn't.

A microservice is what it is at deploy time. You ship the container, you know what's running. An LLM application is different. The model is one artifact. The prompt is another, often loaded from a config service. The retrieval context is generated per-request from a vector store that's being updated continuously. The tool registry might be fetched from a service mesh. The user's session history influences behavior. The model's own previous outputs feed back in via memory.

So the "bill of materials" for a single inference call is computed on the fly. It's not a static document. It's a query.

This is the part that breaks the SBOM mental model. SBOMs are documents. AIBOMs need to be live indexes that can answer the question: "For the inference request that produced output X at timestamp T, what were all the artifacts in play, what were their versions, and who approved each?"

If your AIBOM can't answer that question, it's not going to survive a regulator's first follow-up email.

What a real AIBOM contains

Let me be concrete. Here's the minimum I'd accept on a vendor review in 2026.

A discovered inventory of every model artifact — base models, fine-tunes, adapters, LoRAs, quantizations — with cryptographic hashes, source provenance (where did it come from, who signed it, what license), and the environments where it's deployed. This includes the self-hosted runtimes I mentioned. If your tool can't enumerate a vLLM deployment, you have a hole.

A versioned registry of every prompt — system prompts, user prompt templates, tool descriptions, guardrail prompts — tied to the application and the model version it pairs with. With diff history. With approval metadata. With injection-test results.

A catalog of every retrieval source, with data classification, update cadence, access controls, and the embedding model used to index it. Plus the ingestion pipeline that populates it, because that pipeline is itself an attack surface.

A function and tool registry: every callable exposed to every agent, with input/output schemas, the permissions it executes under, and the human approval trail.

A linked eval history: which evals ran against which version, with results, signed, with reproducibility metadata.

Runtime telemetry that ties inference requests back to all of the above, with enough fidelity to reconstruct any given decision after the fact.

And — this is the part vendors hate hearing — it needs to be open and queryable, not a PDF export. If I can't run SQL against it or hit it from an MCP server, it's not operational. It's a compliance artifact. Those are different things.

Why the platform shape matters

This is the part where I tell you what we built and why, and you can take it as marketing or as honest reasoning, your call.

We didn't set out to build an AIBOM product. We set out to build the security tools an AI-native engineering org actually needs: cyscan for code and configuration analysis (1,815 rules across 75-plus languages, because AI code is polyglot in a way nothing has ever been), cyradar for AI runtime and inference exposure, cyweb for web application fuzzing with proper LLM-aware coverage (22 fuzz categories, 95% template conversion versus the upstream community feed).

The AIBOM emerged as the thing that ties them together. Because if cyscan finds a model invocation in your repo, and cyradar finds an inference endpoint on your network, and cyweb finds a prompt-injection-reachable form on your website — those are the same system. The value isn't three reports. The value is one graph.

That's why our MCP server exposes ten tools across the platform: so an LLM can do what auditors and incident responders do, which is pivot between code, runtime, web surface, and inventory. "Show me every model in production." "Now show me the prompts paired with that model." "Now show me which of those prompts were modified in the last thirty days." "Now show me whether the runtime endpoints serving them are externally reachable." That's an investigation. That's the unit of work. A vendor that only sees one of those layers cannot answer those questions.

I'm not arguing every customer needs all four tools. I'm arguing that the AIBOM is the wrong thing to buy as a standalone product, because the inventory is only as good as the discovery surface feeding it. And the discovery surface needs to cover code, runtime, web, and configuration to be useful in 2026.

The recomposition

What I've noticed in the last six months is that the categories from the last decade are dissolving. AppSec, CSPM, SBOM, ASPM, DSPM, AI governance — these were sold as separate products because the underlying problems used to be separable. They aren't anymore.

A modern AI application is a piece of code, that calls a model, that runs in a container, that pulls from a vector store, that exposes a web interface, that processes regulated data, that gets approved by a governance workflow. Securing it means seeing all of those layers as one thing. The product category that emerges from that is not "AIBOM." It's not "AI-SPM." It's something we don't have a great name for yet — but it's the security layer for AI-native engineering. The AIBOM is a view into that layer. An important view. But just a view.

The vendors who are selling AIBOMs as standalone products in 2026 are, I think, repeating the SBOM mistake from 2022. SBOMs as standalone products didn't work either. They became useful only when they were embedded into vulnerability management, supply chain attestation, and incident response workflows. AIBOMs will follow the same path, but faster, because the regulatory pressure is sharper and the architectural surface is bigger.

If you're a CISO evaluating AIBOM vendors right now, the question I'd ask is not "what fields does your AIBOM contain." The question is: "Show me the workflow that starts at a runtime alert and ends at the line of code, the prompt version, the approver, and the eval result that allowed it through." If they can demo that, they have an AIBOM. If they can only export a JSON file, they have a parts list.

My friend at the fintech has since replaced her vendor. She now has an inventory she can actually defend in front of her board, and — more importantly — in front of the regulator that's coming for her industry next year. The thing that haunted her about her old AIBOM was that the missing models had been in production for three quarters. The inventory said zero. Reality said three. That gap is the entire problem. Closing it is the entire job.

If you want to see what real AI inventory looks like — across code, runtime, and web surface — start with cybrium.ai/cyscan and cybrium.ai/cyradar. If you want to talk through your AIBOM strategy, or have me look at the one you just bought and tell you what it's missing, find me at anand@cybrium.ai.

Top comments (0)