DEV Community

Vinicius Fagundes
Vinicius Fagundes

Posted on

πŸš€ Building a Sandbox Data Warehouse with Snowflake + GitHub Actions (Part 2)

In Part 1, we covered the SQL patterns and repository structure for building idempotent data warehouses. Now let's automate everything with GitHub Actions and Python orchestration.

This article (Part 2) covers:

  • GitHub Actions workflow configuration
  • Python SQL runner with parameterization
  • Environment variable management
  • Cross-platform portability strategies
  • Advanced sandbox naming conventions

πŸ”„ GitHub Actions Workflow

The magic happens in .github/workflows/deploy-sandbox.yml. This workflow triggers on every push and PR, creating isolated environments automatically.

name: Deploy Snowflake Sandbox

on:
  push:
    branches: [ main, develop, feature/* ]
  pull_request:
    branches: [ main ]
  workflow_dispatch:

env:
  SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
  SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
  SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
  SNOWFLAKE_ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
  SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}

jobs:
  deploy-sandbox:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Generate sandbox name
      id: sandbox
      run: |
        if [ "${{ github.event_name }}" == "pull_request" ]; then
          SANDBOX_SUFFIX="pr_${{ github.event.number }}"
        else
          SANDBOX_SUFFIX="${{ github.ref_name }}_${{ github.run_number }}"
        fi
        CLEAN_SUFFIX=$(echo $SANDBOX_SUFFIX | sed 's/[^a-zA-Z0-9_]/_/g')
        echo "database_name=SANDBOX_${CLEAN_SUFFIX}" >> $GITHUB_OUTPUT

    - name: Deploy sandbox
      env:
        DATABASE_NAME: ${{ steps.sandbox.outputs.database_name }}
      run: python scripts/run_all_sql.py
Enter fullscreen mode Exit fullscreen mode

🐍 Python SQL Runner

The scripts/run_all_sql.py orchestrates SQL execution with proper variable substitution:

import os
import snowflake.connector
from pathlib import Path

class SnowflakeRunner:
    def __init__(self):
        self.connection = snowflake.connector.connect(
            account=os.getenv('SNOWFLAKE_ACCOUNT'),
            user=os.getenv('SNOWFLAKE_USER'),
            password=os.getenv('SNOWFLAKE_PASSWORD'),
            role=os.getenv('SNOWFLAKE_ROLE'),
            warehouse=os.getenv('SNOWFLAKE_WAREHOUSE')
        )

    def substitute_variables(self, sql_content: str) -> str:
        variables = {
            'DATABASE_NAME': os.getenv('DATABASE_NAME'),
            'SCHEMA_NAME': os.getenv('SCHEMA_NAME'),
            'WAREHOUSE_NAME': os.getenv('SNOWFLAKE_WAREHOUSE')
        }

        for var_name, var_value in variables.items():
            pattern = f'{{{{{var_name}}}}}'
            sql_content = sql_content.replace(pattern, var_value)

        return sql_content
Enter fullscreen mode Exit fullscreen mode

🏷️ Smart Naming Conventions

Branch-based naming ensures isolation:

  • feature/user-analytics β†’ SANDBOX_FEATURE_USER_ANALYTICS_123
  • PR #42 β†’ SANDBOX_PR_42
  • main β†’ SANDBOX_MAIN_456

This prevents conflicts and makes environments easily identifiable.

πŸ”§ Environment Management

GitHub Secrets store sensitive credentials:

  • SNOWFLAKE_ACCOUNT
  • SNOWFLAKE_USER
  • SNOWFLAKE_PASSWORD
  • SNOWFLAKE_ROLE
  • SNOWFLAKE_WAREHOUSE

Dynamic variables are generated at runtime:

  • DATABASE_NAME
  • SCHEMA_NAME
  • SANDBOX_SUFFIX

πŸ”„ Cross-Platform Flexibility

The sandbox architecture is designed for platform independence. By using standard SQL patterns and abstracted connections, migrating to Databricks requires minimal effort:

Databricks Migration Strategy

Connection Layer Swap:

# Snowflake (current)
import snowflake.connector

# Databricks (migration target)
from databricks import sql
Enter fullscreen mode Exit fullscreen mode

SQL Compatibility Matrix:

Feature Snowflake Databricks Migration Effort
CREATE TABLE IF NOT EXISTS βœ… Native βœ… Native Zero
MERGE statements βœ… Native βœ… Native Zero
Variable substitution βœ… {{VAR}} βœ… ${VAR} Low
Database/Schema creation βœ… Native βœ… Native Zero

Environment Variables Mapping:

# Snowflake
SNOWFLAKE_ACCOUNT=account.region.cloud
SNOWFLAKE_WAREHOUSE=COMPUTE_WH

# Databricks (same sandbox, different platform)
DATABRICKS_HOST=workspace.databricks.com
DATABRICKS_CLUSTER_ID=cluster-123
Enter fullscreen mode Exit fullscreen mode

Platform-Agnostic SQL Patterns

Our sandbox SQL scripts work across platforms with minimal changes:

-- Works on both Snowflake and Databricks
CREATE DATABASE IF NOT EXISTS {{DATABASE_NAME}};
USE {{DATABASE_NAME}};

CREATE SCHEMA IF NOT EXISTS {{SCHEMA_NAME}};

CREATE TABLE IF NOT EXISTS raw.events (
  event_id STRING,
  user_id STRING,
  event_type STRING,
  event_time TIMESTAMP,
  payload STRING
);
Enter fullscreen mode Exit fullscreen mode

πŸš€ Sandbox Flexibility Benefits

Multi-Cloud Strategy: Test the same data models across Snowflake (AWS/Azure/GCP) and Databricks without vendor lock-in.

Cost Optimization: Compare performance and pricing between platforms using identical workloads in isolated sandbox environments.

Team Flexibility: Data engineers can experiment with different platforms while maintaining consistent development workflows.

Future-Proofing: When new platforms emerge, the same sandbox methodology applies with minimal adaptation.

βœ… Conclusion

This GitHub Actions + Python orchestration creates a platform-flexible sandbox automation system. The combination of idempotent SQL, smart naming, and abstracted connections ensures reliable, isolated environments that can evolve with your technology stack.

  • Platform independence β€” Same sandbox concepts work across Snowflake, Databricks, and beyond
  • Minimal migration effort β€” 90% of SQL scripts transfer without modification
  • Consistent workflows β€” Developers use identical processes regardless of the underlying platform
  • Cost flexibility β€” Easy A/B testing between platforms for workload optimization
  • Future-ready architecture β€” Foundation scales to new technologies and requirements
  • Vendor lock-in avoidance β€” Strategic independence through standardized patterns

Migration Effort Comparison:

Code Complexity Impact: How much the migration affects code maintainability, readability, and architectural patterns.

Component Lines Changed Code Complexity Impact
SQL scripts <10% Minimal
Python runner ~30% Low
GitHub Actions ~20% Low
Total migration ~15% Minimal

The sandbox methodology proves that good architecture transcends individual platforms. Whether you're on Snowflake today or considering Databricks tomorrow, the foundation remains solid.

What's your experience with multi-platform data strategies? Have you considered the migration costs between different data warehouses?

Top comments (0)