I thought some folks may be interested to learn about the ways the tools we built to ensure the security and privacy of data stored on our distributed storage platform can also be used by the developers and end users of applications that store data on the platform to increase the security and privacy of those applications.
A key takeaway is that the things we have to do to make a decentralized network a secure place to store your data also have produced a unique set of tools for developers to in turn secure and manage access to their data.
In order to get to a level of maturity that allows is to balance all the different needs of security and privacy at once, it was important to recognize and respect the constraints involved. Because we had to build with the assumption that any Node could be run by an untrusted person, we had to implement a zero-knowledge security architecture. It was a actually a blessing in disguise. This turns out to not only make our system far more resistant to attacks than traditional architectures, but also brings significant benefits to developers building apps on the platform.
Ultimately, we've found that he security and privacy capabilities of the platform are some of the most differentiating features and they give our partners and customers some exciting new tools.
It's a non-trivial endeavor to build to store and secure people’s data in an environment where the majority of the infrastructure and all of the storage hardware is run by other people, potentially hostile people.
This network is a sophisticated and complicated machine. There are a lot of moving parts, but the main things to understand when it comes to the parts that protect our users’ data are related to 4 things because we’re running on a distributed network of other people’s stuff:
- Encryption - enforcing privacy
- Erasure codes - primarily durability/performance, but also churn, failure, nefariousness
- Macaroons-based API Keys - need to separate access management from encryption and our privacy focus requires that we push access management to the client too
- And a dev-friendly construct called the Access to make all of these things easy to use - because access management doesn’t do much without encryption
Each of these key concepts brings its own affordances and benefits to the table.
So, we’re starting in a completely different place from most traditional approaches to data storage.
That potentially hostile environment has forced us to build a different type of security model that addresses an entirely different set of threats. It’s as if we’re training for a marathon at high altitude.
The kit encrypts automatically. The most recent standard best practice for encryption is authenticated encryption. Our default configuration uses AES256-GCM authenticated encryption.
In the architecture of our system, every time you upload an object, we choose an encryption key for that object on the client side -- under your control. We choose a random encryption key for your data and store the data encrypted with that key into the network. Then the metadata about where we put your file (where we sprinkle the encrypted grains of sand) is, to use a metaphor, keeping track of where we put it and keeping track of that random key
Files are broken up into segments, those segments are actually encrypted with a randomized, salted, derived encryption key, that is then encrypted with the metadata.
It’s important that we do encryption first. Then with the encrypted data that we have, we use the erasure code Reed-Solomon. Reed Solomon is actually an old algorithm. It’s used in CDs. It’s the reason why when you scratch a CD, your CD will actually keep working.
That ensures we’re able to reliably break that data up, store it on lots of different nodes, and reconstitute it once we have some of the original pieces that we stored back.
Right now in production we are using 29/80. What that means is for any 80 pieces that we store, we only need 29 of them to get the data back. Because the data that we used with Reed Solomon was authenticated with authenticated encryption, once we reconstitute the data the very act of decrypting it confirms that the data that we got back is exactly the data that was stored.
Incidentally: taking this step of error correction to confirm the data remains unchanged is something that you could forgivably imagine is the norm, but actually I was surprised to learn that some solutions just assume that whatever is "up there" is what you put there, without ever checking.
We use macaroons for API keys, and internally we even use the terms interchangeably sometimes. However, since the UI of our platform has "API Key" as the label, Im trying to train myself to stick with that term. Whichever term one prefers, macaroons/API keys help our network to do flexible file sharing with caveats.
Macaroons are cryptographic authorization credentials. Like a cookie with superpowers. We implemented macarons in 2019, meaning clients are able to share encrypted files stored on the network with others.
Macaroons are bearer credentials, presented by the client along with the request. In our implementation, they're used as an authorization credential that enables decentralized encryption key management.
The benefit they offer in a trustless environment like the Tardigrade network is that developers building on top of the platform can have the power to manage file access, but don't need to actually trust the Tardigrade satellite's ability to properly manage encryption keys.
(Sidenote: SATELLITES are the part of our network that know where all the pieces are, perform audits and handle repairs. But they can't see data and the data doesn't pass through them. So it's very important that the encryption keys are handled correctly outside of the satellites.)
Macaroons can also be chained, which becomes important when your use cases go beyond anything but the simplest use cases. Macaroons give us delegated access control so that a satellite doesn't need to be in the loop for that step. When you append one macaroon to another you get a totally new macaroon, but you don't get the original signature.
Unlike the HD encryption keys where the hierarchy is derived from the path prefix structure of the object storage hierarchy, the hierarchy of API keys is derived from the structure and relationship of access restrictions. HD API keys embed the logic for the access it allows and can be restricted, simply by embedding the path restrictions and any additional restrictions within the string that represents the macaroon. Unlike a typical API key, a macaroon is not a random string of bytes, but rather an envelope with access logic encoded in it.