Model Weight Registry
Disclosure: AI tools were used for source collection and editorial review. The article was written by a human author, who checked the facts, code, and conclusions.
Crypto risk disclosure: This article is a technical explanation, not investment advice. It is not a recommendation to buy, sell or hold any cryptoasset.
Model Weight Registry should not treat a model name as a model identity. A name, repository, tag, branch, or user-facing label can point to useful software, but stable identity for AI weights starts with the exact bytes being loaded.
That boundary matters for AI x crypto systems because onchain claims are expensive to correct after users rely on them. If a contract, agent, or audit trail says "model X," the next question should be "which file, revision, size, digest, and receipt?"
Byte Identity
A digest identifies bytes under a chosen algorithm and input. NIST FIPS 180-4 defines secure hash algorithms, while RFC 6920 describes naming information with hashes.
That support is narrow and useful. A SHA-256 digest can say the file bytes match a recorded value; a digest cannot say the model is safe, aligned, licensed, useful, or trained on the right data.
Weight Receipt
The practical artifact is a receipt that refuses to overclaim. Instead of a full JSON object, the registry can show the receipt as an audit line with named fields:
| Receipt field | Example value | Why the field exists |
|---|---|---|
| Type | ai.weight_hash_receipt.v1 |
Separates this statement from a model card or benchmark |
| Model label | org/model-name |
Keeps the human pointer visible |
| Source revision | Full commit hash | Avoids a floating branch as identity |
| File path | model.safetensors |
Names the exact artifact inside the source |
| Format and size |
safetensors, byte count |
Catches conversion and truncation mistakes |
| Hash | sha256:<64 hex chars> |
Identifies the exact byte sequence |
| Optional content address | CID plus construction notes | Prevents CID/file-hash confusion |
| Issuer and signature | Registry key id, EIP-712 profile | Says who made the statement |
| Limits | Byte identity only | Blocks safety, license, and behavior overclaims |
This receipt is not a universal standard. This receipt is a defensive shape for registries that want to separate exact artifact identity from marketing labels.
Canonical Receipt
If the receipt itself is hashed or signed, its representation matters. RFC 8785 exists because JSON needs canonicalization before stable hashing and signing.
The registry should therefore decide what is hashed: the model file, the canonical receipt, or both. Mixing those claims is how a model registry starts saying "verified" without saying what was verified.
Supply-Chain Pattern
Software supply-chain systems already use the subject-plus-digest pattern. SLSA Provenance and the in-toto Statement specification bind subjects to names and digests in provenance statements.
Model Weight Registry can borrow that habit without pretending the problem is solved. The digest gives artifact identity; the provenance statement gives issuer and process context; neither one proves the model's behavior.
Tag Boundary
Container registries make the pointer problem familiar. The OCI descriptor identifies content with a media type, digest, and size, while tags remain convenient names that can move.
AI weights need the same discipline. A label such as "latest," "main," or "production" is an operational pointer. A digest and size are the beginning of stable artifact identity.
Revision Boundary
Model hubs already expose a better path than floating names. Hugging Face Hub download docs describe revision-pinned downloads, including branches, tags, and commit identifiers.
The registry should record the revision, not just the repository name. Without the revision and file path, the phrase "we used org/model-name" is a clue, not an identity.
Content Address
Content addressing can help, but Model Weight Registry should not flatten every content address into "the file hash." IPFS content-addressing docs explain content-derived identifiers, while CID construction depends on representation details.
That caveat belongs in the receipt. A CID is useful when the registry records the CID version, codec, chunking or import method, and the relationship between the CID and the file digest.
Signed Statement
A signed receipt can authenticate who made the claim. EIP-712 supports typed structured-data signing with domain separation, which fits a registry receipt better than an opaque string.
The signature still has a hard limit. A signed false receipt is still false; a signed byte-identity receipt still says nothing about safety, license rights, or training data.
Format Boundary
File format is part of the receipt because model weights are not just names. SafeTensors gives a concrete format context for tensor files and metadata.
The format field prevents a common mistake: treating a converted artifact as the same object without recording the conversion. Byte identity changes when serialization changes, even if the model is intended to behave similarly.
Boundary Table
The registry should keep every claim in its lane.
| Field | What it can say | What it cannot say |
|---|---|---|
| Model label | Human-readable pointer | Stable identity |
| Revision | Source state or commit context | File bytes without path and digest |
| Digest | Exact bytes under an algorithm | Quality, safety, or license validity |
| CID | Content-addressed object reference | Raw file hash unless construction matches |
| Signature | Issuer made the statement | Statement is true |
| Model card | Intended use and evaluation context | Exact loaded weights |
This table is the product. Model Weight Registry becomes useful when a consumer can tell whether a claim is about a name, a file, a receipt, or the model's behavior.
Final Receipt
The safest registry sentence is short: "This receipt identifies these bytes and these limits." Everything else should be linked as separate evidence.
That makes onchain AI claims less brittle. A model name is a pointer; a weight hash receipt is a checkable boundary around the artifact a system actually loaded.
Top comments (0)