Manmit Singh

Posted on Feb 7

Accountable Privacy in Web3 (2/4)

#blockchain #bitcoin #beginners #algorithms

So you're interested in learning more about privacy. You might look up some articles online, or read a few research papers to get a better grasp of the concept. But wait, what's with all the math in these articles? And the strange acronyms 🤔 What even is a “zero-knowledge” and how does it add any value?

Here, you will find brief explanations for the core mathematical concepts underlying privacy features on blockchains. Each of the techniques described below fulfills a specific purpose, from obscuring a numeric value to enabling secure computation, and distributing trust in a network.

Please note that the following explanations are not meant to be thorough or technically rigorous, but simply provide developers with a working understanding of how and when some common cryptographic protocols can be useful.

1 - Commitment Schemes

Intro: https://www.cs.cmu.edu/~mblum/research/pdf/coin/
Usage: Storing private data on a public database, in an immutable way
Examples: Pedersen Commitments, Hash-based Commitments

A commitment scheme allows one party (the committer) to commit to a value on a public database while keeping it hidden. They can later reveal it in a way that ensures it hasn’t changed.

A secure commitment scheme must satisfy the following properties:

Hiding: A commitment does not reveal any information about the underlying data.
Binding: The commitment cannot be altered to a different message, or vice versa.

Imagine a player Alice wants to pick a specific number for an online lottery, without revealing what it is. She may want to do so for a variety of reasons, from protecting her personal information from other participants to ensuring fairness in the lottery.

Alice can do so by following the 3 stages of a commitment scheme:

Setup: The lottery organizer will run an algorithm to generate some public parameters for this system. $Setup(k) \rightarrow pp$
Commit: Alice takes her secret number, and chooses another random number (called a blinding factor) and uses them to run an algorithm herself. $Commit(pp, m, r) \rightarrow C$
Reveal: At some point, perhaps when the results are announced, Alice can reveal her secret number to the network using the last algorithm. $Open(pp, C) \rightarrow (m, r)$
Verify: Finally, anyone on the network can easily verify that the number does actually match Alice’s initial commitment. $Verify(C, m, r, pp) \rightarrow 1/0$

Additionally, some commitment schemes support properties such as homomorphic addition, where the sum of 2 commitments corresponds to the sum of their underlying messages as well. This allows for arithmetic manipulations on committed values.

Commitments also play an important role in more complicated protocols like zero-knowledge proof systems and verifiable secret sharing systems.

2 - Multiparty Computation

Intro: https://dl.acm.org/doi/pdf/10.1145/28395.28420
Usage: Replacing a trusted third party by distributing trust over network
Examples: Shamir’s Secret Sharing, Yao’s Garbled Circuit, SPDZ

A multiparty computation protocol allows a set of mutually distrusted parties to jointly compute a function over their private inputs without revealing anything about the inputs beyond the final output.

A secure MPC protocol must satisfy the following properties:

Privacy: No party’s secret inputs are leaked while creating the output.
Fairness: Either all, or none of the involved parties are able to learn the output.
Correctness: The output generated would be the same as by a single third-party.

Say 3 companies—A, B, and C—want to find the average salary across their organizations without revealing individual salary data. They could use MPC as follows:

Input Sharing:
- Each company splits its salary data into multiple encrypted shares.
- These shares are distributed among all participating companies.
- No company can reconstruct the original data from the shares it possesses.
Secure Computation:
- Each company runs computations locally on the received shares.
- Participants exchange intermediate results securely along the way.
- These computations are designed to aggregate the salary data privately.
Reconstruction:
- The companies combine their computations to obtain the final result.

Multiparty computation protocols are used in various ways in privacy-preserving settings. They may be used to distribute trust in the initial setup phase of a zero-knowledge proof system. They may also be used to run off-chain private computations over inputs from multiple users.

3 - Zero-Knowledge Proofs

Intro: https://dl.acm.org/doi/pdf/10.1145/22145.22178
Usage: Outsourcing mission-critical programs to an untrusted external device
Examples: Groth16, Bulletproof, zkSTARK

A zero-knowledge proof enables a prover P who holds some private witness w for a public instance x and an NP-relation R, to convince a verifier V that some property of w is true, i.e.

R(x, w) = 1

A secure zero-knowledge proof must satisfy the following properties:

Completeness: If P is honest and follows the protocol, then V will be convinced of the statement’s validity.
Soundness: A dishonest P cannot convince V of a false statement, except with negligible probability.
Zero-Knowledge: The proof does not reveal anything beyond the fact that the statement is true.

Say a company wants to run a machine learning model on the cloud such that its users can verify the output, but not learn the set up of the model. Here’s how they might use ZK:

Trusted Setup:
- A trusted authority generates secret parameters.
- A publicly known “common reference string” (CRS) is created.
Prover Computation:
- The cloud provider computes the machine learning model on private inputs and generates a zero-knowledge proof that the output is correct.
- The prover P encodes their witness w and public statement x into a structured proof.
Proof Submission:
- The cloud provider submits the result along with a compact proof.
- The proof guarantees correctness without revealing the proprietary model details.
Verification:
- The client checks the proof against the public parameters.
- The CRS does not need to change, and can be reused for several proofs.

Here, it is important to note that a trusted setup introduces security risks if the setup process is compromised. However, as mentioned above, an MPC procedure can be used to distribute trust during setup. Still, proof schemes like STARKs eliminate the need for a trusted setup altogether.

ZK proofs are valuable for privacy since they allow a client to prove to the network that they ran a certain operation (e.g. a transaction) on their local machine correctly. This saves them the operational overhead of conducting consensus or exposing their personal inputs to the network.

4 - Homomorphic Encryption

Intro: https://crypto.stanford.edu/craig/craig-thesis.pdf
Usage: Outsourcing mission-critical programs to a public network
Examples: RSA, Paillier Cryptsystem

Homomorphic encryption allows a party to perform computations on encrypted data without needing to decrypt it. The decrypted result matches the outcome of the same computation on the original plaintext. Homomorphism is generally defined over addition and/or multiplication.

E(M_1) \odot E(M_2) \leftrightarrow E(M_1 \ast M_2)

Where

The operation is represented in ciphertext space by $\odot$ .
The operation is represented in plaintext space by $\ast$ .

A secure homomorphic encryption scheme must satisfy the following properties:

Correctness: The decrypted result of a computation on encrypted data must be identical to the computation performed on the original data.
Privacy: The encrypted data must not reveal any information about the underlying plaintext during computation.

Say a medical researcher wants to input some sensitive medical data to an AI model on the cloud, without compromising the privacy of the data. Here is how they might use HE:

Key Generation:
- The data owner generates a public-private key pair.
- The public key is shared with the cloud service provider, while the private key remains secret.
Encryption & Upload:
- Patient records are encrypted using the public key before being stored in the cloud.
- The encrypted data remains unreadable to the cloud provider.
Computation on Encrypted Data:
- The researcher requests statistical analysis (e.g., average age of patients).
- The cloud provider performs this computation directly on the encrypted data without decrypting it.
Decryption of Results:
- The encrypted result is sent back to the research organization.
  - The organization decrypts it using their private key, obtaining the correct final result without exposing any raw patient data.

Homomorphic encryption plays an important role in enabling fully on-chain privacy in the Web3 industry. While this provides strong privacy guarantees, FHE schemes are computationally expensive due to their reliance on complex mathematical structures. This is an area of active research and development.

5 - Accumulators and Merkle Trees

Intro: https://doi.org/10.1007/3-540-69053-0_33
Usage: Outsourcing mission-critical programs to a public network
Examples: Merkle Trees, Dynamic Accumulators (RSA)

Accumulators are compact data structures that allow large datasets to be represented by a short value in a concise and binding manner. They also support mechanisms to mathematically prove that a value is part of an accumulator, through a technique called membership proofs.

A cryptographic accumulator must satisfy the following properties:

Correctness: If an element is part of the accumulated set, the accumulator will always produce a valid membership proof for that element.
Soundness: It should be computationally infeasible for an adversary to generate a valid membership proof for an element not present in the accumulated set.
Efficiency: Both the accumulator size and the membership proofs should remain compact (ideally independent of the number of accumulated elements).

For example, say a company wants its users to verify whether or not their file is stored on a database, without exposing the entire data to each user. They might use a Merkle tree as:

Accumulator Generation:
- The cloud provider initializes an empty binary Tree T.
- A cryptographic hash function (e.g., SHA-256) generates leaves.
File Insertion:
- When a user uploads a file f, it is hashed to $h_{f}$ .
- This hash is added as a new leaf node to the tree.
- The tree is recursively updated, where each parent node is computed as: $Leaf_{1,2} \leftarrow SHA(SHA(Leaf_1||Leaf_2)$
- The Merkle root $R_{T}$ is stored as the accumulator value.
Membership Proof Generation:
- The company computes a Merkle proof for the user’s file f.
- The company selects the leaves required to reconstruct the Merkle root from the hash of f. $\pi_{f} \leftarrow MemProofCreate(T, h_f)$
Proof Verification:
- The proof consists of a set of relevant sibling hashes.
- The user reconstructs the root based on the available information. $Verify(h_f,\pi_{f},R_T) \rightarrow 1/0$

Cryptographic accumulators like Merkle trees play an important role in efficiently managing data on distributed networks like blockchains. They can also be extended with mechanisms such as zero-knowledge proofs in order to further increase their efficiency, and potentially add privacy related features.

Conclusion

For the sake of clarity, here is a quick recap of the all schemes we discussed in this article:

Technique	Privacy Guarantees	Computation Cost	Trust Assumptions	Use Cases
Commitment Schemes	Hiding, Binding	Low	Public setup	Storing private data on a public database
Multiparty Computation (MPC)	Private computation	High (interactive)	Distributed trust	Replacing a trusted third party
Zero-Knowledge Proofs (ZKPs)	Zero-Knowledge, Soundness	Moderate	Some SNARKs require a trusted setup	Outsourcing computation to untrusted devices
Homomorphic Encryption (HE)	Computation on encrypted data	Very high (FHE)	Public key cryptosystem	Secure computation on public networks
Accumulators	Compact membership proofs	Low	Depends on scheme	Verifying data integrity in public networks

Hopefully, now you feel a bit more comfortable with the dense technical jargon often found in blockchain and privacy-related literature. From securing sensitive data to enabling encrypted computation, these protocols provide a solid foundation for the privacy-preserving applications of the future!

DEV Community