In Part 1, we covered the SQL patterns and repository structure for building idempotent data warehouses. Now let's automate everything with GitHub Actions and Python orchestration.
This article (Part 2) covers:
- GitHub Actions workflow configuration
- Python SQL runner with parameterization
- Environment variable management
- Cross-platform portability strategies
- Advanced sandbox naming conventions
π GitHub Actions Workflow
The magic happens in .github/workflows/deploy-sandbox.yml
. This workflow triggers on every push and PR, creating isolated environments automatically.
name: Deploy Snowflake Sandbox
on:
push:
branches: [ main, develop, feature/* ]
pull_request:
branches: [ main ]
workflow_dispatch:
env:
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
SNOWFLAKE_ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
jobs:
deploy-sandbox:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Generate sandbox name
id: sandbox
run: |
if [ "${{ github.event_name }}" == "pull_request" ]; then
SANDBOX_SUFFIX="pr_${{ github.event.number }}"
else
SANDBOX_SUFFIX="${{ github.ref_name }}_${{ github.run_number }}"
fi
CLEAN_SUFFIX=$(echo $SANDBOX_SUFFIX | sed 's/[^a-zA-Z0-9_]/_/g')
echo "database_name=SANDBOX_${CLEAN_SUFFIX}" >> $GITHUB_OUTPUT
- name: Deploy sandbox
env:
DATABASE_NAME: ${{ steps.sandbox.outputs.database_name }}
run: python scripts/run_all_sql.py
π Python SQL Runner
The scripts/run_all_sql.py
orchestrates SQL execution with proper variable substitution:
import os
import snowflake.connector
from pathlib import Path
class SnowflakeRunner:
def __init__(self):
self.connection = snowflake.connector.connect(
account=os.getenv('SNOWFLAKE_ACCOUNT'),
user=os.getenv('SNOWFLAKE_USER'),
password=os.getenv('SNOWFLAKE_PASSWORD'),
role=os.getenv('SNOWFLAKE_ROLE'),
warehouse=os.getenv('SNOWFLAKE_WAREHOUSE')
)
def substitute_variables(self, sql_content: str) -> str:
variables = {
'DATABASE_NAME': os.getenv('DATABASE_NAME'),
'SCHEMA_NAME': os.getenv('SCHEMA_NAME'),
'WAREHOUSE_NAME': os.getenv('SNOWFLAKE_WAREHOUSE')
}
for var_name, var_value in variables.items():
pattern = f'{{{{{var_name}}}}}'
sql_content = sql_content.replace(pattern, var_value)
return sql_content
π·οΈ Smart Naming Conventions
Branch-based naming ensures isolation:
-
feature/user-analytics
βSANDBOX_FEATURE_USER_ANALYTICS_123
-
PR #42
βSANDBOX_PR_42
-
main
βSANDBOX_MAIN_456
This prevents conflicts and makes environments easily identifiable.
π§ Environment Management
GitHub Secrets store sensitive credentials:
SNOWFLAKE_ACCOUNT
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ROLE
SNOWFLAKE_WAREHOUSE
Dynamic variables are generated at runtime:
DATABASE_NAME
SCHEMA_NAME
SANDBOX_SUFFIX
π Cross-Platform Flexibility
The sandbox architecture is designed for platform independence. By using standard SQL patterns and abstracted connections, migrating to Databricks requires minimal effort:
Databricks Migration Strategy
Connection Layer Swap:
# Snowflake (current)
import snowflake.connector
# Databricks (migration target)
from databricks import sql
SQL Compatibility Matrix:
Feature | Snowflake | Databricks | Migration Effort |
---|---|---|---|
CREATE TABLE IF NOT EXISTS | β Native | β Native | Zero |
MERGE statements | β Native | β Native | Zero |
Variable substitution | β {{VAR}} | β ${VAR} | Low |
Database/Schema creation | β Native | β Native | Zero |
Environment Variables Mapping:
# Snowflake
SNOWFLAKE_ACCOUNT=account.region.cloud
SNOWFLAKE_WAREHOUSE=COMPUTE_WH
# Databricks (same sandbox, different platform)
DATABRICKS_HOST=workspace.databricks.com
DATABRICKS_CLUSTER_ID=cluster-123
Platform-Agnostic SQL Patterns
Our sandbox SQL scripts work across platforms with minimal changes:
-- Works on both Snowflake and Databricks
CREATE DATABASE IF NOT EXISTS {{DATABASE_NAME}};
USE {{DATABASE_NAME}};
CREATE SCHEMA IF NOT EXISTS {{SCHEMA_NAME}};
CREATE TABLE IF NOT EXISTS raw.events (
event_id STRING,
user_id STRING,
event_type STRING,
event_time TIMESTAMP,
payload STRING
);
π Sandbox Flexibility Benefits
Multi-Cloud Strategy: Test the same data models across Snowflake (AWS/Azure/GCP) and Databricks without vendor lock-in.
Cost Optimization: Compare performance and pricing between platforms using identical workloads in isolated sandbox environments.
Team Flexibility: Data engineers can experiment with different platforms while maintaining consistent development workflows.
Future-Proofing: When new platforms emerge, the same sandbox methodology applies with minimal adaptation.
β Conclusion
This GitHub Actions + Python orchestration creates a platform-flexible sandbox automation system. The combination of idempotent SQL, smart naming, and abstracted connections ensures reliable, isolated environments that can evolve with your technology stack.
- Platform independence β Same sandbox concepts work across Snowflake, Databricks, and beyond
- Minimal migration effort β 90% of SQL scripts transfer without modification
- Consistent workflows β Developers use identical processes regardless of the underlying platform
- Cost flexibility β Easy A/B testing between platforms for workload optimization
- Future-ready architecture β Foundation scales to new technologies and requirements
- Vendor lock-in avoidance β Strategic independence through standardized patterns
Migration Effort Comparison:
Code Complexity Impact: How much the migration affects code maintainability, readability, and architectural patterns.
Component | Lines Changed | Code Complexity Impact |
---|---|---|
SQL scripts | <10% | Minimal |
Python runner | ~30% | Low |
GitHub Actions | ~20% | Low |
Total migration | ~15% | Minimal |
The sandbox methodology proves that good architecture transcends individual platforms. Whether you're on Snowflake today or considering Databricks tomorrow, the foundation remains solid.
What's your experience with multi-platform data strategies? Have you considered the migration costs between different data warehouses?
Top comments (0)