DEV Community

Cover image for Why Every CISO Needs an AIBOM in 2026 — And What Vendors Get Wrong
Grumpy Sage
Grumpy Sage

Posted on • Originally published at cybrium.ai

Why Every CISO Needs an AIBOM in 2026 — And What Vendors Get Wrong

A friend who runs security at a mid-sized fintech called me three weeks ago. Her board had asked a question her team couldn't answer in under a week: "How many models are running in production, and which ones touch customer PII?"

She sent me her first draft of the answer. It was a spreadsheet. Forty-seven rows. The columns were Model Name, Owner, Risk Tier, Data Classification. Most of the cells said "TBD" or "see Notion." Two rows were duplicates of the same model deployed in different namespaces. One row was a model that had been deprecated six months earlier but was still receiving traffic because a forgotten Lambda kept calling its endpoint. Another row didn't exist on the spreadsheet at all — it was a fine-tuned Llama variant running inside a vendor's SaaS that her data science team had embedded via SDK. Nobody had told security.

The part that bothered her wasn't the spreadsheet. It was what the spreadsheet implied. Her org had stood up an AI governance committee, hired a head of responsible AI, and bought a "model risk management" platform from a name-brand vendor. And she still couldn't answer the board's question without two weeks of Slack archaeology. The vendor's tool had a beautiful dashboard. It just didn't know about half the models, none of the agents, zero of the MCP tools, and obviously nothing about the prompts and retrieval indexes hanging off them.

She asked me a simple question: "What does a real AIBOM look like in 2026?"

This is my answer.

The thesis

An AIBOM — AI Bill of Materials — is not a spreadsheet of models, and it's not an SBOM with three extra columns. It's a living graph of every artifact that participates in an AI decision: weights, base models, training and fine-tuning datasets, prompts, system instructions, retrieval indexes, agents, tools, MCP servers, evaluation suites, and the humans accountable for each. Most vendors are selling you the 2023 version of this — a static registry. In 2026, that's worse than useless, because it gives the board false confidence while the actual attack surface lives in the gaps.

What an AIBOM has to cover that vendors keep missing

Let me be specific about where the 2023-era tooling falls apart.

Self-hosted inference. When OpenAI was the only game in town, an AIBOM was easy: list your API keys, log your prompts, done. In 2026, every serious enterprise I talk to has at least three self-hosted inference stacks running concurrently. Ollama for the experimentation team. vLLM for the production serving. Maybe TGI or LocalAI for a regulated workload that has to stay in a specific VPC. Maybe Triton for the computer vision pipeline that predates the LLM craze. Maybe llama.cpp on an edge device. Maybe LM Studio on a developer's laptop that somehow ended up serving an internal Slack bot.

A real AIBOM has to discover and fingerprint each of these runtimes, identify which models they're serving, and continuously monitor them for drift. Most vendor tools assume "the model" is a SaaS endpoint. That assumption is now wrong more often than it's right.

Agents and tools, not just models. A model is a function. An agent is a process. When your customer success team builds a workflow where a model calls a tool that calls another model that writes to your CRM, the security-relevant unit isn't the model — it's the agent-tool graph. An AIBOM that lists "GPT-5-turbo" and stops there is hiding the actual blast radius. The blast radius is the tools the agent can invoke, the data those tools can touch, and the authority gradient between them.

Prompts as code. Every system prompt in production is a security control. Every system prompt in production is also unversioned, untested, and edited by whoever has access to the LangChain repo. The 2023 AIBOM treated prompts as ephemera. The 2026 AIBOM treats them as first-class artifacts with their own signing, their own change log, and their own evaluation history. If a prompt changed yesterday and an incident happened today, you need to be able to draw that line in seconds.

Training and fine-tuning provenance. Did your fine-tuned support model train on customer conversations? Which ones? Were the opt-outs honored? Was any of it scraped from a forum that has since issued a takedown? When the regulator asks — and in the EU they're now asking — "spreadsheet" is not an answer.

The vendor SaaS that embeds models you didn't approve. This one is brutal. Your CRM vendor added an AI assistant. Your dev tool vendor shipped Copilot-for-our-thing. Your HR platform now does "smart" summarization. Each of those is a model in your decision path that no internal procurement process saw. A real AIBOM has to ingest vendor disclosures and treat third-party AI as part of your surface area, not someone else's problem.

Why static registries are worse than no registry

I'm going to make an unpopular claim: a stale AIBOM is more dangerous than no AIBOM.

When you have nothing, your incident responders know they have nothing. They go look. They grep. They ask questions. They find things.

When you have a registry that says "we have 47 models," your responders trust the 47. They don't grep. They don't ask. And the 48th model — the one your data science intern stood up in a forgotten GCP project to test something on a Friday — is exactly where the breach lives.

This is the same dynamic that played out with CMDBs in the 2010s. Every enterprise had one. None of them were right. The ones that mattered were the ones generated continuously from observed reality, not declared by humans on a wiki. AIBOMs are CMDBs for AI, and we're going to repeat every mistake unless we start from "discover, don't declare."

Discovery means scanning the code that defines models and prompts. It means watching the network for inference traffic. It means probing inference endpoints to identify what they actually serve. It means parsing CI/CD pipelines for training jobs. It means reading IaC to find GPU instances. And it means doing all of that on a cadence measured in hours, not quarters.

What we built, and why we built it this way

I'll be direct: this is the problem Cybrium is built around, and our architecture reflects a strong opinion about how to solve it.

cyscan is our static analysis engine. It runs against your repos and finds the AI artifacts the way a SAST tool finds SQL injection — by understanding code, not by waiting for someone to fill out a form. 1,815 rules across 75+ languages, looking for model loads, prompt definitions, agent constructions, tool registrations, MCP client wiring, and the patterns that indicate AI is happening even when nobody labeled it. If your repo has from transformers import or a LangChain agent or a raw HTTP call to an Ollama port, cyscan finds it and tags it.

cyradar is the runtime side. It fingerprints self-hosted inference: Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, llama.cpp. The point isn't to brag about coverage. The point is that these are the runtimes actually deployed in 2026, and a tool that only knows about OpenAI endpoints is solving a 2023 problem. cyradar tells you what's running, what model it's serving, what version, what configuration, and whether it's exposed in ways it shouldn't be.

cyweb handles the offensive validation. 22 fuzz categories, 95% template conversion from the upstream community templates, plus our own. Because an inventory entry that says "this model is deployed" is half the story. The other half is "and here's what an attacker can do to it right now."

The MCP server exposes 10 tools that let an agent — yours or ours — query this graph the way you'd query a database. "What models touch PII?" becomes a tool call, not a two-week investigation.

The reason these are one platform rather than four products is the graph. A model finding from cyscan and a runtime fingerprint from cyradar and a fuzz result from cyweb are not three reports. They're three edges on the same node. The value isn't the scan results. The value is being able to ask, in one query, "show me every model that (a) touches PII according to cyscan, (b) is exposed on a runtime cyradar can reach from the internet, and (c) failed a cyweb prompt injection probe in the last 24 hours." That query is the AIBOM doing its job. You can't run that query across four siloed vendors. I've tried. I gave up.

The five questions your AIBOM has to answer in under a minute

I tell every CISO who asks me about AIBOM to forget the vendor pitches and write down the questions their board, their regulator, and their incident responders will actually ask. If your AIBOM can't answer these in under a minute, it's not an AIBOM. It's a spreadsheet.

  • Which models, in which environments, touch which data classifications, owned by whom?
  • For any given model, what's the full dependency chain — base model, fine-tuning data, prompts, retrieval indexes, tools, agents?
  • What changed in the last 24 hours, 7 days, 30 days, across any of the above?
  • Which of our deployed AI components currently have a known vulnerability or exploitable misconfiguration?
  • If we deprecate model X tomorrow, what breaks?

That last one is the one most teams underestimate. The deprecation question is where you discover the Lambda nobody remembered, the vendor SDK that embeds a forgotten endpoint, the eval suite that's secretly running in production traffic. If you can answer the deprecation question, you have a real AIBOM. If you can't, you have decoration.

What 2026 regulators are actually asking for

I sit on a couple of advisory boards where regulators show up. Let me tell you what they're asking, because it's not what the marketing decks say.

They're not asking for a list of models. They're asking for provenance and reproducibility. Given an output that affected a customer, can you reconstruct the chain that produced it — which model, which version, which prompt, which retrieval, which inputs — and would the same chain produce the same output today? If yes, you have governance. If no, you have a press release waiting to happen.

They're asking for human accountability per artifact. Not "the AI team owns AI." Which specific person is accountable for this prompt? For this fine-tuned model? For this agent's tool permissions? An AIBOM that doesn't carry accountability at the artifact level is a compliance theater prop.

They're asking for change velocity with safety. They've stopped pretending change freezes are realistic. What they want is evidence that every change to an AI artifact passed through evaluation, that the evaluation suite is itself versioned, and that production deployment is gated on those evals. The AIBOM is the evidence layer for that.

If your vendor is showing you risk-tier dashboards and skipping the provenance graph, they're optimizing for the 2024 audit. The 2026 audit is going to eat that approach.

The recomposition

Here's the broader pattern I keep seeing. The previous generation of security tooling fragmented every problem into its own product: SAST, DAST, CSPM, CWPP, SIEM, SOAR, vuln management, asset inventory. Each was justifiable. Together, they produced the dashboard sprawl every CISO complains about.

AI security is being born into a different world. The artifacts cross every boundary at once. A prompt lives in code (SAST territory), runs in a container (CWPP territory), gets attacked over a network (DAST territory), produces logs (SIEM territory), and is an asset (inventory territory). You cannot solve AI security by buying five tools and hoping the integrations work. You have to start from the graph and let the capabilities compose around it.

That's the recomposition. The previous decade fragmented; this decade is recomposing. The AIBOM is not a product category. It's the substrate the next ten years of security will run on, the same way the CMDB tried and failed to be in the last decade. Get it right and everything else — detection, response, governance, audit — gets easier. Get it wrong and you'll be the CISO sending the spreadsheet to the board at 11 p.m. the night before earnings.

Where to start this quarter

If I were standing up an AIBOM program tomorrow, I'd do three things this quarter and resist the urge to do more.

First, get discovery running across every code repo and every production environment. Not declared inventory. Discovered inventory. cyscan against the repos, cyradar against the runtimes, and reconcile the two graphs. The gap between what code says you have and what runtime says you have is, in my experience, where 80% of the risk lives.

Second, instrument prompts and agent definitions as versioned artifacts with named owners. This is the cheapest high-leverage thing on the list and almost nobody does it. A prompt change without a diff is malpractice in 2026.

Third, build the deprecation query. Pick one model. Ask the AIBOM what would break if you turned it off. If you can't answer in under a minute, your AIBOM isn't real, and now you know where to invest.

Skip everything else for the first 90 days. Risk tiers, executive dashboards, board reporting — they're all downstream of having a discovered graph with owners. Build the substrate. The reporting layer is easy once the substrate is right.


If you want to see what discovered-not-declared AIBOM looks like in practice, start with cyscan for repo-side discovery and cyradar for runtime fingerprinting at cybrium.ai. If you want to talk through your specific environment — the messy one, the one with the forgotten Lambda and the vendor SDK and the intern's GCP project — find me at anand@cybrium.ai.

Top comments (0)