The rise of privacy-preserving computation is driven by two major trends:
- Stricter regulations: Laws like the EU’s_General Data Protection Regulation (GDPR)_and China’s_Data Security Law_and_Personal Information Protection Law_place strong requirements on protecting personal data.
- Urgent business needs: Data silos across industries are stifling innovation. Privacy-preserving computation becomes the bridge between safeguarding privacy and unlocking the value of data.
By 2025, with the widespread adoption of*Generative AI, a new question has become central to the field of data privacy:Who owns the content generated by AI trained on user data?*How can data providers ensure their privacy is protected when their data powers these models?
To address these challenges, we introduce*SecretFlow— a modern, user-friendly framework for privacy-preserving computation that enables encrypted data to be computed securely. It embodies the principle of “usable but invisible*” when it comes to data privacy.
Press enter or click to view image in full size
📍 GitHub:https://github.com/secretflow/secretflow
The most surprising part? If you know*Python*, you can start using SecretFlow today to explore the future of privacy computing.
This hands-on tutorial will guide you step-by-step into this powerful technology, helping you find the perfect balance between*data security and data utility*.
Let’s begin our journey into the world of SecretFlow!
What is SecretFlow?
SecretFlowis a*trusted privacy-preserving computation framework, acting like a “security guard” for your data. It allows multiple parties to collaboratively compute and analyze datawithout ever revealing their private data*to each other.
With SecretFlow, different organizations can “play a game blindfolded” — working together efficiently, while never exposing their own sensitive data.
✅ Key Advantages
- Unified and Integrated Built-in support for mainstream privacy-preserving technologies like Secure Multi-Party Computation (MPC), Federated Learning (FL), and Differential Privacy (DP) — no need to learn multiple frameworks.
- Zero Learning Curve Native support for SQL, Python, and AI-friendly interfaces lets developers onboard quickly with minimal effort.
- Modular & Flexible LEGO-like architecture allows you to plug and play based on business needs, speeding up iteration.
- Proven High Performance Successfully deployed in financial and healthcare industries, with tested scalability on*billion-level data volumes*.
Real-World Use Cases
- Medical Research Hospitals can conduct joint studies*without sharing raw patient data*, ensuring privacy during data analysis.
- Financial Fraud Detection Banks can collaborate on fraud detection models*while protecting customer data*.
- Cross-Company Collaboration Enterprises can co-analyze datasets without exposing internal data, improving the*efficiency and safety of cooperation*.
- AI Model Training AI companies can train models on user data securely, with SecretFlow ensuring data privacy throughout the pipeline.
Quick Start: Run SecretFlow in Minutes
🚀 Official Docker Images Available
You can get started with one command using the official Docker images.
# Full version
docker run -it secretflow/secretflow-anolis8:latest
# Lite version (no deep learning, smaller size)
docker run -it secretflow/secretflow-lite-anolis8:latest
Secure Multi-Party Computation (MPC) Example
Let’s walk through a simple MPC example using SecretFlow: computing the*average income of three people*without revealing individual incomes.
What is MPC?
Secure Multi-Party Computation (MPC) is a cryptographic technique that allows multiple parties to jointly compute a function*without revealing their private inputs*.
Example:
Alice, Bob, and Carol want to calculate their*average salary, but none of them wants to disclose their exact income. MPC allows them to get the resultwithout exposing any personal data*.
Key Concepts
- Party: An entity holding data and participating in computation.
- Protocol: Rules ensuring secure computation.
- Secret Sharing: Splitting sensitive data into fragments. Only with enough fragments can the original data be reconstructed.
Step-by-Step Demo: Secure Average Calculation
1. Initialize the Privacy Environment
import secretflow as sf
sf.init(
parties={'Alice', 'Bob', 'Carol'},
address='local'
)
2. Assign Devices to Each Party
Each participant will use a dedicated computing device to store their own data. We create devices for Alice, Bob, and Carol:
alice = sf.PYU('Alice')
bob = sf.PYU('Bob')
carol = sf.PYU('Carol')
3. Each Party Inputs Their Income Securely
To protect privacy, each participant enters their own income data through their own device:
\# Assume that Alice, Bob, and Carol have incomes of 5,000, 6,000, and 7,000 respectively.
alice_income = alice(lambda: 5000)()
bob_income = bob(lambda: 6000)()
carol_income = carol(lambda: 7000)()
Note: Each value only exists*within the assigned device*, invisible to others.
4. Use Secure Computation (SPU) to Compute the Average
spu = sf.SPU(sf.utils.testing.cluster_def(\['Alice', 'Bob', 'Carol'\]))
average_income = spu(lambda x, y, z: (x + y + z) / 3)(alice\_income, bob\_income, carol_income)
During this process:
Alice, Bob, and Carol’s income data is fed to the SPU device in an encrypted or secret-shared form.
The SPU device securely completes the computation, preventing the participants from accessing each other’s original data.
5. Reveal the Encrypted Result
Printing the result directly will show an encrypted object:
print("Average income is:", sf.reveal(average_income))
\# Output: 6000.0
At this point the data is still encrypted, so we need to use the sf.reveal method to securely decrypt and view the result: This demonstrates*data usability without visibility*— the very core of privacy-preserving computation.
\# Securely decrypt and view the result
print("The average income of the three people is:", sf.reveal(average_income))
\# Output: The average income of the three people is: 6000.0
\# That is, (5000 + 6000 + 7000) / 3 = 6000
SecretFlow System Architecture
SecretFlow’s layered architecture ensures modular, scalable privacy computing:
- Abstract Device Layer Includes both public and secure computation devices (e.g., SPU, HEU).
- Device Flow Layer Models algorithms as*device object streams*and DAGs.
- Algorithm Layer Supports horizontal/vertical partitioned data for analytics and ML.
- Workflow Layer Integrates data processing, model training, and hyperparameter tuning.
🔗 Related Ecosystem Projects
- Kuscia: Lightweight task orchestration framework for privacy-preserving computation
- SCQL: SQL-style engine for multi-party secure analysis
- SPU: Secure computing backend for MPC
- HEU: High-performance homomorphic encryption library
- YACL: Core C++ library for crypto, networking, and IO
Think of SecretFlow as a*“smart privacy factory”*:
- All raw materials (data) stay encrypted
- Different workers (parties) collaborate safely
- Modular and adaptable like LEGO
- Fast and stable production line (high-performance computing)
Final Words
This article only scratches the surface of what*SecretFlow*can do.
Whether you’re a*developer,data scientist, orresearcher, SecretFlow makes it easy to embraceprivacy-preserving technologiesand unlockdata value without compromising privacy*.
If you love open source, give SecretFlow a ⭐️ on GitHub! Every star counts ❤️
👉 GitHub:https://github.com/secretflow/secretflow
Top comments (0)