Encryption Guide for Unity Catalog
Overview
Encryption provides data protection at two levels:
- At rest: Data stored on disk is encrypted
- In transit: Data moving between services is encrypted
1. Encryption at Rest
Default Encryption (Platform-Managed Keys)
Databricks encrypts all data at rest by default using platform-managed keys:
- Delta tables on ADLS Gen2: AES-256 encryption
- Databricks DBFS: AES-256 encryption
- Notebook content: Encrypted in control plane
- Cluster EBS volumes: Encrypted
Customer-Managed Keys (CMK)
For regulatory requirements, you can bring your own encryption keys:
Azure Key Vault Setup
# Create Key Vault
az keyvault create \
--name your-governance-kv \
--resource-group your-rg \
--location eastus \
--sku premium \
--enable-purge-protection \
--enable-soft-delete
# Create encryption key
az keyvault key create \
--vault-name your-governance-kv \
--name databricks-cmk \
--ktype RSA \
--size 2048
# Grant Databricks access to the key
az keyvault set-policy \
--name your-governance-kv \
--object-id <databricks-enterprise-app-object-id> \
--key-permissions get wrapKey unwrapKey
Terraform Configuration
resource "azurerm_key_vault_key" "databricks_cmk" {
name = "databricks-cmk"
key_vault_id = azurerm_key_vault.governance.id
key_type = "RSA"
key_size = 2048
key_opts = ["wrapKey", "unwrapKey"]
}
resource "azurerm_databricks_workspace" "this" {
name = "your-workspace"
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
sku = "premium"
customer_managed_key_enabled = true
custom_parameters {
encryption {
key_vault_uri = azurerm_key_vault.governance.vault_uri
key_name = azurerm_key_vault_key.databricks_cmk.name
key_version = azurerm_key_vault_key.databricks_cmk.version
}
}
}
ADLS Gen2 Encryption
# Enable infrastructure encryption (double encryption)
az storage account update \
--name yourstorageaccount \
--resource-group your-rg \
--encryption-key-source Microsoft.Keyvault \
--encryption-key-vault https://your-governance-kv.vault.azure.net \
--encryption-key-name storage-cmk
2. Encryption in Transit
TLS Configuration
All Databricks communication uses TLS 1.2+ by default:
- Workspace UI to control plane: TLS 1.2
- Cluster to metastore: TLS 1.2
- Cluster to ADLS: TLS 1.2
- Inter-cluster communication: TLS 1.2
- JDBC/ODBC connections: TLS 1.2
Enforce Minimum TLS Version
# ADLS Gen2: Enforce TLS 1.2
az storage account update \
--name yourstorageaccount \
--min-tls-version TLS1_2
Cluster Encryption
Enable encryption for cluster inter-node communication:
{
"spark_conf": {
"spark.databricks.cluster.encryption.enabled": "true"
}
}
3. Delta Lake Encryption Considerations
Column-Level Encryption
For columns requiring additional encryption beyond storage-level:
-- Encrypt sensitive columns before writing
CREATE OR REPLACE FUNCTION your_catalog.governance.encrypt_value(
value STRING,
key STRING
)
RETURNS STRING
COMMENT 'AES encryption for column-level protection'
RETURN BASE64(AES_ENCRYPT(value, key));
-- Decrypt when reading (requires key access)
CREATE OR REPLACE FUNCTION your_catalog.governance.decrypt_value(
encrypted_value STRING,
key STRING
)
RETURNS STRING
COMMENT 'AES decryption for column-level protection'
RETURN CAST(AES_DECRYPT(UNBASE64(encrypted_value), key) AS STRING);
Key Rotation
Implement regular key rotation:
# Key rotation schedule
# 1. Generate new key version in Key Vault
# 2. Re-encrypt data with new key
# 3. Update Databricks workspace configuration
# 4. Verify data accessibility
# 5. Mark old key version for deletion (after retention period)
4. Secrets Management
Azure Key Vault Integration
# Create a secret scope backed by Key Vault
# (via Databricks CLI)
# databricks secrets create-scope \
# --scope governance-secrets \
# --scope-backend-type AZURE_KEYVAULT \
# --resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/your-kv \
# --dns-name https://your-governance-kv.vault.azure.net/
# Access secrets in notebooks
# encryption_key = dbutils.secrets.get(scope="governance-secrets", key="encryption-key")
Best Practices
- Never hardcode secrets in notebooks or scripts
- Use managed identity when possible
- Rotate secrets on a regular schedule (90 days recommended)
- Audit secret access via Key Vault diagnostic logs
- Use separate Key Vaults for different environments
5. Encryption Checklist
- [ ] Platform encryption at rest is enabled (default)
- [ ] Customer-managed keys configured (if required by regulation)
- [ ] ADLS Gen2 encryption enabled
- [ ] Minimum TLS 1.2 enforced on all storage accounts
- [ ] Cluster inter-node encryption enabled
- [ ] Key rotation schedule established
- [ ] Secrets stored in Key Vault (not in code)
- [ ] Secret scopes configured in Databricks
- [ ] Key Vault diagnostic logging enabled
- [ ] Encryption key access audited
This is 1 of 6 resources in the DataStack Pro toolkit. Get the complete [Unity Catalog Governance Pack] with all files, templates, and documentation for $39.
Or grab the entire DataStack Pro bundle (6 products) for $164 — save 30%.
Top comments (0)