🚀 Building a Sandbox Data Warehouse with Snowflake + GitHub Actions (Part 2)

In Part 1, we covered the SQL patterns and repository structure for building idempotent data warehouses. Now let's automate everything with GitHub Actions and Python orchestration.

This article (Part 2) covers:

GitHub Actions workflow configuration
Python SQL runner with parameterization
Environment variable management
Cross-platform portability strategies
Advanced sandbox naming conventions

🔄 GitHub Actions Workflow

The magic happens in .github/workflows/deploy-sandbox.yml. This workflow triggers on every push and PR, creating isolated environments automatically.

name: Deploy Snowflake Sandbox

on:
  push:
    branches: [ main, develop, feature/* ]
  pull_request:
    branches: [ main ]
  workflow_dispatch:

env:
  SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
  SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
  SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
  SNOWFLAKE_ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
  SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}

jobs:
  deploy-sandbox:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Generate sandbox name
      id: sandbox
      run: |
        if [ "${{ github.event_name }}" == "pull_request" ]; then
          SANDBOX_SUFFIX="pr_${{ github.event.number }}"
        else
          SANDBOX_SUFFIX="${{ github.ref_name }}_${{ github.run_number }}"
        fi
        CLEAN_SUFFIX=$(echo $SANDBOX_SUFFIX | sed 's/[^a-zA-Z0-9_]/_/g')
        echo "database_name=SANDBOX_${CLEAN_SUFFIX}" >> $GITHUB_OUTPUT

    - name: Deploy sandbox
      env:
        DATABASE_NAME: ${{ steps.sandbox.outputs.database_name }}
      run: python scripts/run_all_sql.py

🐍 Python SQL Runner

The scripts/run_all_sql.py orchestrates SQL execution with proper variable substitution:

import os
import snowflake.connector
from pathlib import Path

class SnowflakeRunner:
    def __init__(self):
        self.connection = snowflake.connector.connect(
            account=os.getenv('SNOWFLAKE_ACCOUNT'),
            user=os.getenv('SNOWFLAKE_USER'),
            password=os.getenv('SNOWFLAKE_PASSWORD'),
            role=os.getenv('SNOWFLAKE_ROLE'),
            warehouse=os.getenv('SNOWFLAKE_WAREHOUSE')
        )

    def substitute_variables(self, sql_content: str) -> str:
        variables = {
            'DATABASE_NAME': os.getenv('DATABASE_NAME'),
            'SCHEMA_NAME': os.getenv('SCHEMA_NAME'),
            'WAREHOUSE_NAME': os.getenv('SNOWFLAKE_WAREHOUSE')
        }

        for var_name, var_value in variables.items():
            pattern = f'{{{{{var_name}}}}}'
            sql_content = sql_content.replace(pattern, var_value)

        return sql_content

🏷️ Smart Naming Conventions

Branch-based naming ensures isolation:

feature/user-analytics → SANDBOX_FEATURE_USER_ANALYTICS_123
PR #42 → SANDBOX_PR_42
main → SANDBOX_MAIN_456

This prevents conflicts and makes environments easily identifiable.

🔧 Environment Management

GitHub Secrets store sensitive credentials:

SNOWFLAKE_ACCOUNT
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ROLE
SNOWFLAKE_WAREHOUSE

Dynamic variables are generated at runtime:

DATABASE_NAME
SCHEMA_NAME
SANDBOX_SUFFIX

🔄 Cross-Platform Flexibility

The sandbox architecture is designed for platform independence. By using standard SQL patterns and abstracted connections, migrating to Databricks requires minimal effort:

Databricks Migration Strategy

Connection Layer Swap:

# Snowflake (current)
import snowflake.connector

# Databricks (migration target)
from databricks import sql

SQL Compatibility Matrix:

Feature	Snowflake	Databricks	Migration Effort
CREATE TABLE IF NOT EXISTS	✅ Native	✅ Native	Zero
MERGE statements	✅ Native	✅ Native	Zero
Variable substitution	✅ {{VAR}}	✅ ${VAR}	Low
Database/Schema creation	✅ Native	✅ Native	Zero

Environment Variables Mapping:

# Snowflake
SNOWFLAKE_ACCOUNT=account.region.cloud
SNOWFLAKE_WAREHOUSE=COMPUTE_WH

# Databricks (same sandbox, different platform)
DATABRICKS_HOST=workspace.databricks.com
DATABRICKS_CLUSTER_ID=cluster-123

Platform-Agnostic SQL Patterns

Our sandbox SQL scripts work across platforms with minimal changes:

-- Works on both Snowflake and Databricks
CREATE DATABASE IF NOT EXISTS {{DATABASE_NAME}};
USE {{DATABASE_NAME}};

CREATE SCHEMA IF NOT EXISTS {{SCHEMA_NAME}};

CREATE TABLE IF NOT EXISTS raw.events (
  event_id STRING,
  user_id STRING,
  event_type STRING,
  event_time TIMESTAMP,
  payload STRING
);

🚀 Sandbox Flexibility Benefits

Multi-Cloud Strategy: Test the same data models across Snowflake (AWS/Azure/GCP) and Databricks without vendor lock-in.

Cost Optimization: Compare performance and pricing between platforms using identical workloads in isolated sandbox environments.

Team Flexibility: Data engineers can experiment with different platforms while maintaining consistent development workflows.

Future-Proofing: When new platforms emerge, the same sandbox methodology applies with minimal adaptation.

✅ Conclusion

This GitHub Actions + Python orchestration creates a platform-flexible sandbox automation system. The combination of idempotent SQL, smart naming, and abstracted connections ensures reliable, isolated environments that can evolve with your technology stack.

Platform independence — Same sandbox concepts work across Snowflake, Databricks, and beyond
Minimal migration effort — 90% of SQL scripts transfer without modification
Consistent workflows — Developers use identical processes regardless of the underlying platform
Cost flexibility — Easy A/B testing between platforms for workload optimization
Future-ready architecture — Foundation scales to new technologies and requirements
Vendor lock-in avoidance — Strategic independence through standardized patterns

Migration Effort Comparison:

Code Complexity Impact: How much the migration affects code maintainability, readability, and architectural patterns.

Component	Lines Changed	Code Complexity Impact
SQL scripts	<10%	Minimal
Python runner	~30%	Low
GitHub Actions	~20%	Low
Total migration	~15%	Minimal

The sandbox methodology proves that good architecture transcends individual platforms. Whether you're on Snowflake today or considering Databricks tomorrow, the foundation remains solid.

What's your experience with multi-platform data strategies? Have you considered the migration costs between different data warehouses?