Introduction: What is SPDX?
At the root of modern software supply chain security lies SPDX—short for Software Package Data Exchange.
At its core, SPDX is a standardized format for describing what’s inside a piece of software.
Think of it as an ingredients label for software.
An SPDX document helps answer critical questions such as:
What packages are included?
What files exist?
What licenses apply?
Who created the software?
How do different components relate to each other?
Why SPDX Matters
In real-world scenarios, when you install something like:
a Docker image
an npm package
a Linux distribution
…you are pulling in hundreds (sometimes thousands) of dependencies.
SPDX provides a structured way to declare:
“Here’s everything inside this software—legally and technically.”
This is essential for:
SBOMs (Software Bill of Materials)
Supply chain security
License compliance
CI/CD automation pipelines
Major organizations like Google, Microsoft, and Red Hat rely on SPDX or compatible standards internally.
What’s Inside an SPDX Document?
An SPDX document typically consists of:
1. Packages
Includes metadata such as name, version, and supplier.
2. Files
Individual files along with their associated licenses.
3. Relationships
Defines how components interact, for example:
“A depends on B”
“A contains B”
- Licenses
Standard identifiers like MIT, Apache-2.0, GPL, etc.
SPDX Versions: Why This Project Exists
SPDX 2.3 (Target)
Document-based structure
Organized into sections (packages, files, relationships)
Simpler and widely adopted
SPDX 3.0 (Source)
Graph-based model
Modular design (profiles like software, security, AI, etc.)
Far more expressive and flexible
This shift from a document model → graph model is powerful—but it introduces a major challenge:
Backward compatibility
The Core Problem: Not Transformation, But Controlled Loss
I’ve been working on contributing to SPDX tooling this summer, specifically focusing on:
SPDX 3.0 → SPDX 2.3 backward conversion
At first glance, this might sound like a simple transformation—but it’s not.
Because:
SPDX 3.0 is graph-based
SPDX 2.3 is document-based
Not all information in 3.0 can be represented in 2.3.
So the goal is not a perfect transformation.
Instead, the real objective is:
Controlled loss of information
This means:
Preserving what can be represented in 2.3
Gracefully handling what cannot
Ensuring no critical data is silently lost
Why This Matters for End Users
While SPDX 3.0 is the future, many existing systems still rely on SPDX 2.3.
A backward conversion enables:
Compatibility with legacy tooling
Gradual migration to SPDX 3.0
Continued support for existing compliance systems
In simple terms:
It allows ecosystems to adopt SPDX 3.0 without breaking what already works.
Where tools-golang Fits In
The tools-golang project provides Go-based utilities for working with SPDX documents.
It is commonly used to:
Parse SPDX files
Generate SPDX outputs
Validate document structure
However:
It primarily supports SPDX 2.x
It does not fully support SPDX 3.0 yet
This makes it a natural fit for:
Generating valid SPDX 2.3 output after conversion
Conclusion
The evolution from SPDX 2.3 → 3.0 represents a major leap in how we model software systems—from static documents to rich, interconnected graphs.
But with that progress comes a practical challenge: ensuring backward compatibility.
The work on SPDX 3.0 → 2.3 conversion sits right at this intersection.
It’s not about perfect translation—it’s about:
Making thoughtful trade-offs
Preserving essential information
Enabling real-world adoption
As the software supply chain ecosystem continues to evolve, solutions like this will play a key role in bridging the gap between where we are and where we’re going.
Top comments (0)