DEV Community

Cover image for The chain shouldn't store your data: anchoring proofs, not data
Solidus Network
Solidus Network

Posted on

The chain shouldn't store your data: anchoring proofs, not data

In 2015, Tim Berners-Lee — the inventor of the World Wide Web — was asked about blockchain-based identity and gave a direct answer:

The chain is too slow. The chain is too expensive. The chain is too public.

He was right.

A decade later, the identity-on-blockchain conversation has matured. Most credible identity projects no longer propose storing personal data on chain. The cryptography has gotten more refined; the architectural discipline has improved; the lessons of "let's just put everything on chain" have been absorbed.

But the architecture that emerged is not uniformly applied. There are still identity systems out there with too much information on-chain — credential subjects, attribute values, even biometric hashes that practically enable correlation. Each of these is a legacy of the early "store everything on chain" era that the field has been slowly leaving behind.

Solidus's architecture, by contrast, anchors proofs, not data. Here is what that means in practice, and how we reconcile with the Berners-Lee critique.

What goes on chain

Three things, and only three things:

Issuer public keys — the cryptographic identities of organizations that can issue credentials. A bank, a government, a KYC vendor. These keys are public by design. Without them, the credentials they issue cannot be verified.

Revocation references — pointers (typically status list URLs or revocation tree roots) that let a relying party check whether a previously issued credential has been revoked. The references are non-identifying; they do not reveal who holds the credential, only that the revocation state has changed.

DID Document roots — the cryptographic commitments to the DID Document of each did:solidus identifier. The DID Document itself contains keys and service endpoints; it does not contain personal data.

That is the entire on-chain footprint per user-identity event. No name. No date of birth. No address. No biometric. No document number. No image. No private key. Just the cryptographic anchors that the rest of the system needs to verify against.

Where the personal data lives

In a personal data pod that the user controls.

The pod concept comes from Berners-Lee's Solid project at MIT, started in 2015. The idea is that every person has a "pod" — a personal data store, possibly self-hosted or possibly hosted by a chosen provider — that contains their personal data. Applications request access to specific data; the user grants or denies on a per-attribute basis. The data never centralizes.

Solidus is Solid-compatible, in the sense that our wallet can read and write to Solid pods, our credential format is interoperable with Solid resource shapes, and our access control respects Solid permissions. We are honest about the current state: full Solid Protocol conformance is on the roadmap; "Solid-compatible, migration-ready" is the accurate description today.

For users who do not want to run their own pod, we provide a hosted pod option as a convenience. The hosted option is end-to-end encrypted; we cannot read the contents. The threat model is that we (the hosting party) could subpoenaed for the encrypted blobs but not for the plaintext, which is materially different from the centralized "we hold everything, including the keys" model.

For users who do want to run their own pod — self-hosted, or chosen-provider-hosted — the wallet handles the data lifecycle the same way. The chain does not care where the pod is.

How verification works when data is off-chain

This is the part that worries people who are new to decentralized identity. If the credential lives in a pod, and only proofs go on chain, how can a relying party verify anything?

The answer is in the standards. A Verifiable Credential is a JSON-LD document signed by the issuer's private key. Anyone with the issuer's public key (which is on chain) can verify the signature. The credential itself does not need to be on chain; only the public key of the entity that signed it needs to be reachable.

When a user presents a credential to a relying party, the wallet provides the credential plus a presentation proof. The relying party fetches the issuer's public key from the chain, verifies the credential signature, verifies the presentation proof, and checks the revocation status against the on-chain revocation reference. The relying party never sees the personal data unless the user chose to disclose it.

For selective disclosure — where the user shares only "over 18" but not the actual birthdate — BBS+ signatures provide a derived proof that the credential satisfies the predicate, without revealing the underlying value. The chain stores the issuer's public key; the proof is computed in the wallet; the relying party verifies the proof. The birthdate never leaves the user's pod.

This is a clean separation of concerns. Personal data lives where the user controls it. Cryptographic anchors live where they can be verified. The chain provides the trust root; the wallet provides the privacy.

Why this matters

It matters because the alternative — storing personal data on chain in any form — does not survive contact with regulation, with breach disclosure, with right-to-be-forgotten requests, or with the basic principle that the user should be able to leave a system without their data being held hostage.

It also matters because Tim Berners-Lee was right.

The chain is too slow for personal data — a single credential might be 5 kilobytes; multiplying that by millions of users is unworkable. The chain is too expensive for personal data — anchoring 5 KB costs orders of magnitude more than anchoring a 32-byte hash. The chain is too public for personal data — even encrypted, the metadata leaks.

By anchoring proofs and not data, we sidestep all three problems. The chain stays lean, anchoring the trust roots that the rest of the system needs. The data stays personal, controlled by the user, in a pod they own.

This is the architecture Berners-Lee was pointing toward, in his 2015 critique. It just took the rest of the field a decade to catch up.

solidus.network

Top comments (0)