DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Unity Catalog Governance Pack: Encryption Guide for Unity Catalog

Encryption Guide for Unity Catalog

Overview

Encryption provides data protection at two levels:

  • At rest: Data stored on disk is encrypted
  • In transit: Data moving between services is encrypted

1. Encryption at Rest

Default Encryption (Platform-Managed Keys)

Databricks encrypts all data at rest by default using platform-managed keys:

  • Delta tables on ADLS Gen2: AES-256 encryption
  • Databricks DBFS: AES-256 encryption
  • Notebook content: Encrypted in control plane
  • Cluster EBS volumes: Encrypted

Customer-Managed Keys (CMK)

For regulatory requirements, you can bring your own encryption keys:

Azure Key Vault Setup

# Create Key Vault
az keyvault create \
  --name your-governance-kv \
  --resource-group your-rg \
  --location eastus \
  --sku premium \
  --enable-purge-protection \
  --enable-soft-delete

# Create encryption key
az keyvault key create \
  --vault-name your-governance-kv \
  --name databricks-cmk \
  --ktype RSA \
  --size 2048

# Grant Databricks access to the key
az keyvault set-policy \
  --name your-governance-kv \
  --object-id <databricks-enterprise-app-object-id> \
  --key-permissions get wrapKey unwrapKey
Enter fullscreen mode Exit fullscreen mode

Terraform Configuration

resource "azurerm_key_vault_key" "databricks_cmk" {
  name         = "databricks-cmk"
  key_vault_id = azurerm_key_vault.governance.id
  key_type     = "RSA"
  key_size     = 2048

  key_opts = ["wrapKey", "unwrapKey"]
}

resource "azurerm_databricks_workspace" "this" {
  name                = "your-workspace"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location
  sku                 = "premium"

  customer_managed_key_enabled = true

  custom_parameters {
    encryption {
      key_vault_uri   = azurerm_key_vault.governance.vault_uri
      key_name        = azurerm_key_vault_key.databricks_cmk.name
      key_version     = azurerm_key_vault_key.databricks_cmk.version
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

ADLS Gen2 Encryption

# Enable infrastructure encryption (double encryption)
az storage account update \
  --name yourstorageaccount \
  --resource-group your-rg \
  --encryption-key-source Microsoft.Keyvault \
  --encryption-key-vault https://your-governance-kv.vault.azure.net \
  --encryption-key-name storage-cmk
Enter fullscreen mode Exit fullscreen mode

2. Encryption in Transit

TLS Configuration

All Databricks communication uses TLS 1.2+ by default:

  • Workspace UI to control plane: TLS 1.2
  • Cluster to metastore: TLS 1.2
  • Cluster to ADLS: TLS 1.2
  • Inter-cluster communication: TLS 1.2
  • JDBC/ODBC connections: TLS 1.2

Enforce Minimum TLS Version

# ADLS Gen2: Enforce TLS 1.2
az storage account update \
  --name yourstorageaccount \
  --min-tls-version TLS1_2
Enter fullscreen mode Exit fullscreen mode

Cluster Encryption

Enable encryption for cluster inter-node communication:

{
  "spark_conf": {
    "spark.databricks.cluster.encryption.enabled": "true"
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Delta Lake Encryption Considerations

Column-Level Encryption

For columns requiring additional encryption beyond storage-level:

-- Encrypt sensitive columns before writing
CREATE OR REPLACE FUNCTION your_catalog.governance.encrypt_value(
  value STRING,
  key STRING
)
RETURNS STRING
COMMENT 'AES encryption for column-level protection'
RETURN BASE64(AES_ENCRYPT(value, key));

-- Decrypt when reading (requires key access)
CREATE OR REPLACE FUNCTION your_catalog.governance.decrypt_value(
  encrypted_value STRING,
  key STRING
)
RETURNS STRING
COMMENT 'AES decryption for column-level protection'
RETURN CAST(AES_DECRYPT(UNBASE64(encrypted_value), key) AS STRING);
Enter fullscreen mode Exit fullscreen mode

Key Rotation

Implement regular key rotation:

# Key rotation schedule
# 1. Generate new key version in Key Vault
# 2. Re-encrypt data with new key
# 3. Update Databricks workspace configuration
# 4. Verify data accessibility
# 5. Mark old key version for deletion (after retention period)
Enter fullscreen mode Exit fullscreen mode

4. Secrets Management

Azure Key Vault Integration

# Create a secret scope backed by Key Vault
# (via Databricks CLI)
# databricks secrets create-scope \
#   --scope governance-secrets \
#   --scope-backend-type AZURE_KEYVAULT \
#   --resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/your-kv \
#   --dns-name https://your-governance-kv.vault.azure.net/

# Access secrets in notebooks
# encryption_key = dbutils.secrets.get(scope="governance-secrets", key="encryption-key")
Enter fullscreen mode Exit fullscreen mode

Best Practices

  • Never hardcode secrets in notebooks or scripts
  • Use managed identity when possible
  • Rotate secrets on a regular schedule (90 days recommended)
  • Audit secret access via Key Vault diagnostic logs
  • Use separate Key Vaults for different environments

5. Encryption Checklist

  • [ ] Platform encryption at rest is enabled (default)
  • [ ] Customer-managed keys configured (if required by regulation)
  • [ ] ADLS Gen2 encryption enabled
  • [ ] Minimum TLS 1.2 enforced on all storage accounts
  • [ ] Cluster inter-node encryption enabled
  • [ ] Key rotation schedule established
  • [ ] Secrets stored in Key Vault (not in code)
  • [ ] Secret scopes configured in Databricks
  • [ ] Key Vault diagnostic logging enabled
  • [ ] Encryption key access audited

This is 1 of 6 resources in the DataStack Pro toolkit. Get the complete [Unity Catalog Governance Pack] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire DataStack Pro bundle (6 products) for $164 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)