If you've ever stared at hundreds of SCA matches wondering which ones actually matter, this tool was built for you. I recently released Izumi — an SBOM generation tool, and here's the story behind it.
SBOM stands for Software Bill of Materials — a document that describes which OSS libraries and other components are included in a given software product. It is becoming an essential part of software license management and supply chain security. In Europe, regulations such as the Cyber Resilience Act (CRA) will make SBOM creation mandatory by 2027.
The problem with existing tools
I work as an embedded software engineer, and our field is no exception when it comes to preparing for these requirements. When I had the opportunity to create an SBOM at work, I researched the available OSS tools and found that most of them assumed a package manager-based development environment. Tools that worked easily with C/C++ projects — especially in embedded software development contexts — were surprisingly hard to find.
I also tried several commercial tools. They come equipped with SCA (Software Composition Analysis) features for detecting software components within a project, and they may well be powerful in theory. In practice, however, I found that they flagged an enormous number of matches that I could only describe as false positives, and the workflow required a human to review each one individually. On top of that, their design didn't fit well with the strict development processes typical of embedded software, and I struggled to figure out how to properly store the tool's output as evidence in our process.
What if LLMs could help?
As I stared at all those matches that seemed safe to ignore, a thought struck me: what if LLMs could help cut through this noise? What I really want to know isn't that some common constant or boilerplate happens to match something in an OSS project — it's whether code I thought was original actually came from someone else's work. That kind of flexible, context-aware judgment felt like exactly the kind of task LLMs are good at. Given how rapidly LLMs have advanced recently, I suspected they could detect OSS components with accuracy comparable to traditional SCA techniques, but with much more flexibility.
That's when I decided to build Izumi. It also happened to be the perfect timing — I had just started using Claude Code as a hobby project and was looking for something meaningful to build with it. My design goals were straightforward: cross-platform support, and the ability to simply point the tool at a source directory and get results. Given the risk of hallucinations, I positioned LLM functionality as a supplementary feature rather than the core engine.

Izumi's main view — files are classified as CONFIRMED, INFERRED, or UNKNOWN, with license information shown alongside the source code.
Solving the confidentiality problem
However, I immediately ran into a significant problem when thinking through this idea. In a real company, you cannot send closed-source code containing confidential information to a public LLM.
To address this, I designed two options. The first is using a local LLM. I wasn't sure this would work well at first, but it performed better than I expected. A 7B-class code-specialized model running on my GeForce RTX 5070 12GB turned out to be capable enough to identify OSS components from code fragments. The second option is summarizing the code into natural language, stripping out confidential details, and sending that summary to an external LLM. Since information is lost in the summarization process, I expected the accuracy to be limited. That said, I thought there was still a chance that a capable external LLM might identify the software from a natural-language description of its characteristics, so I kept it as an option.

The LLM analysis screen, showing the three options for handling confidential code — including local LLM and summarization-based approaches.
The result is Izumi — an SBOM tool that combines LLM-based OSS detection with static analysis for license identification directly from source code. If you're working on embedded software or C/C++ projects and struggling with SBOM compliance, I'd love to hear your thoughts. The project is open source — [https://github.com/moonkick64/Izumi].
Top comments (0)