TL;DR: snowflake-emulator is a local Snowflake-compatible SQL interface powered by DuckDB, designed for dev/test. It works with the gosnowflake driver (Go) and REST API v2 (Python, Node.js, any language). Free.
Why I Built This
I was building a data pipeline that talked to Snowflake. Running integration tests meant spinning up a warehouse, waiting for queries, and using credits.
Mocking didn't help much either:
// This passes with mocks...
mock.ExpectQuery("SELECT IFF(x > 0, 'yes', 'no')").WillReturnRows(...)
// ...but IFF doesn't exist in PostgreSQL.
Mocks don't catch SQL compatibility issues. I wanted something that actually executes Snowflake SQL locally.
Architecture: Executor as Orchestrator
The emulator uses an orchestrator pattern where the Executor coordinates SQL processing:
┌─────────────────────────────────────┐
│ Executor │
│ ┌───────────┐ ┌────────────┐ │
SQL Request ───> │ │ Classifier│ │ Translator │ │ ───> DuckDB
│ └───────────┘ └────────────┘ │
│ │ │ │
│ Detect Transform │
│ statement Snowflake SQL │
│ type to DuckDB │
└─────────────────────────────────────┘
- Executor receives SQL and orchestrates the processing pipeline
- Classifier detects statement type (SELECT, DDL, COPY INTO, MERGE)
- Translator transforms Snowflake-specific syntax to DuckDB
- Executor runs the translated SQL on DuckDB
Key Design: AST-Based SQL Translation
The most interesting part is the SQL translation. I tried regex first:
// Bad: Regex breaks on edge cases
sql = regexp.MustCompile(`\bIFF\s*\(`).ReplaceAllString(sql, "IF(")
// What about: SELECT 'IFF(' AS label?
// → SELECT 'IF(' AS label ← Broken!
Regex can't distinguish function calls from string literals. So I switched to AST-based parsing using vitess-sqlparser.
// Good: AST transformation is precise
sqlparser.Walk(func(node sqlparser.SQLNode) (bool, error) {
if fn, ok := node.(*sqlparser.FuncExpr); ok {
if strings.EqualFold(fn.Name.String(), "IFF") {
fn.Name = sqlparser.NewColIdent("IF")
}
}
return true, nil
}, stmt)
AST parsing correctly identifies function nodes vs string literals, so SELECT 'IFF(' AS label stays unchanged.
In the actual implementation, simple renames (e.g., IFF → IF) are handled via a function map, and more complex cases use small handlers plus a post-processing pass.
The Marker Function Pattern
Some transformations can't be done in-place. For example, DATEADD(day, 7, date) needs to become date + INTERVAL 7 day — the arguments need to be reordered.
I use a two-stage approach:
Stage 1: Mark during AST walk
// DATEADD(day, 7, created_at) → __DATEADD__('day', 7, created_at)
if strings.EqualFold(fn.Name.String(), "DATEADD") {
fn.Name = sqlparser.NewColIdent("__DATEADD__")
}
Stage 2: Post-process with a small parser (no regex)
// __DATEADD__(part, n, date) → (CAST(date AS DATE) + interval n part)
sql = transformMarkedFunction(sql, "__DATEADD__", func(args string) string {
parts := splitFunctionArgs(args, 3) // respects nested parentheses
part := strings.TrimSpace(parts[0])
n := strings.TrimSpace(parts[1])
date := strings.TrimSpace(parts[2])
return fmt.Sprintf("(CAST(%s AS DATE) + interval %s %s)", date, n, part)
})
The marker function (__DATEADD__) is a placeholder that's safe to transform after AST serialization (it only exists because we inserted it during the AST walk). Using a tiny parser keeps it robust when arguments contain nested expressions (e.g., function calls), without relying on brittle regexes.
DuckDB Connection Handling
DuckDB only supports a single writer at a time. Multiple readers are fine, but writes must be serialized.
I solved this with a simple mutex pattern:
type Manager struct {
db *sql.DB
writeMu sync.Mutex // Serializes write operations
}
// Reads can be concurrent (no lock)
func (m *Manager) Query(ctx context.Context, query string, args ...any) (*sql.Rows, error) {
return m.db.QueryContext(ctx, query, args...)
}
// Writes are serialized
func (m *Manager) Exec(ctx context.Context, query string, args ...any) (sql.Result, error) {
m.writeMu.Lock()
defer m.writeMu.Unlock()
return m.db.ExecContext(ctx, query, args...)
}
This lives in pkg/connection/manager.go. When an operation needs to touch both DuckDB and the emulator's metadata tables (e.g., CREATE DATABASE / CREATE SCHEMA), it uses ExecTx to keep those changes atomic. (Table-level metadata tracking for CREATE TABLE / DROP TABLE is intentionally kept minimal for now.)
Quick Start
docker run -p 8080:8080 ghcr.io/nnnkkk7/snowflake-emulator:latest
Optional: run with persistent storage:
docker run -p 8080:8080 -v snowflake-data:/data \
-e DB_PATH=/data/snowflake.db \
ghcr.io/nnnkkk7/snowflake-emulator:latest
Go (gosnowflake driver):
dsn := "user:pass@localhost:8080/TEST_DB/PUBLIC?account=test&protocol=http"
db, _ := sql.Open("snowflake", dsn)
db.Query(`SELECT IFF(score >= 90, 'A', 'B'), DATEADD(day, 7, created_at) FROM users`)
Python, Node.js, etc. (REST API v2):
curl -X POST http://localhost:8080/api/v2/statements \
-H "Content-Type: application/json" \
-d '{"statement": "SELECT IFF(1 > 0, '\''yes'\'', '\''no'\'')", "database": "TEST_DB"}'
The response includes a statementHandle. Fetch the result via GET /api/v2/statements/{handle}.
Limitations
This is a dev/test tool, not a Snowflake replacement:
- No authentication (dev mode only)
- No Time Travel / Zero-Copy Cloning
- No external stages (S3, Azure, GCS)
- No stored procedures / UDFs
Conclusion
Building a SQL translator taught me that regex is tempting but fragile for SQL transformations. AST-based parsing handles edge cases correctly, and the marker function pattern bridges the gap when in-place AST transformations aren't possible.
Repository: github.com/nnnkkk7/snowflake-emulator
If you try it, let me know what SQL functions are missing.
The Go gopher was designed by Renee French.
Top comments (0)