DEV Community

Cover image for Stop Burning Snowflake Credits - Build a Local Emulator with Go and DuckDB
kuro
kuro

Posted on

Stop Burning Snowflake Credits - Build a Local Emulator with Go and DuckDB

TL;DR: snowflake-emulator is a local Snowflake-compatible SQL interface powered by DuckDB, designed for dev/test. It works with the gosnowflake driver (Go) and REST API v2 (Python, Node.js, any language). Free.


Why I Built This

I was building a data pipeline that talked to Snowflake. Running integration tests meant spinning up a warehouse, waiting for queries, and using credits.

Mocking didn't help much either:

// This passes with mocks...
mock.ExpectQuery("SELECT IFF(x > 0, 'yes', 'no')").WillReturnRows(...)

// ...but IFF doesn't exist in PostgreSQL.
Enter fullscreen mode Exit fullscreen mode

Mocks don't catch SQL compatibility issues. I wanted something that actually executes Snowflake SQL locally.


Architecture: Executor as Orchestrator

The emulator uses an orchestrator pattern where the Executor coordinates SQL processing:

                    ┌─────────────────────────────────────┐
                    │             Executor                │
                    │  ┌───────────┐   ┌────────────┐     │
   SQL Request ───> │  │ Classifier│   │ Translator │     │ ───> DuckDB
                    │  └───────────┘   └────────────┘     │
                    │       │                │            │
                    │    Detect          Transform        │
                    │    statement       Snowflake SQL    │
                    │    type            to DuckDB        │
                    └─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode
  1. Executor receives SQL and orchestrates the processing pipeline
  2. Classifier detects statement type (SELECT, DDL, COPY INTO, MERGE)
  3. Translator transforms Snowflake-specific syntax to DuckDB
  4. Executor runs the translated SQL on DuckDB

Key Design: AST-Based SQL Translation

The most interesting part is the SQL translation. I tried regex first:

// Bad: Regex breaks on edge cases
sql = regexp.MustCompile(`\bIFF\s*\(`).ReplaceAllString(sql, "IF(")

// What about: SELECT 'IFF(' AS label?
// → SELECT 'IF(' AS label  ← Broken!
Enter fullscreen mode Exit fullscreen mode

Regex can't distinguish function calls from string literals. So I switched to AST-based parsing using vitess-sqlparser.

// Good: AST transformation is precise
sqlparser.Walk(func(node sqlparser.SQLNode) (bool, error) {
    if fn, ok := node.(*sqlparser.FuncExpr); ok {
        if strings.EqualFold(fn.Name.String(), "IFF") {
            fn.Name = sqlparser.NewColIdent("IF")
        }
    }
    return true, nil
}, stmt)
Enter fullscreen mode Exit fullscreen mode

AST parsing correctly identifies function nodes vs string literals, so SELECT 'IFF(' AS label stays unchanged.

In the actual implementation, simple renames (e.g., IFFIF) are handled via a function map, and more complex cases use small handlers plus a post-processing pass.


The Marker Function Pattern

Some transformations can't be done in-place. For example, DATEADD(day, 7, date) needs to become date + INTERVAL 7 day — the arguments need to be reordered.

I use a two-stage approach:

Stage 1: Mark during AST walk

// DATEADD(day, 7, created_at) → __DATEADD__('day', 7, created_at)
if strings.EqualFold(fn.Name.String(), "DATEADD") {
    fn.Name = sqlparser.NewColIdent("__DATEADD__")
}
Enter fullscreen mode Exit fullscreen mode

Stage 2: Post-process with a small parser (no regex)

// __DATEADD__(part, n, date) → (CAST(date AS DATE) + interval n part)
sql = transformMarkedFunction(sql, "__DATEADD__", func(args string) string {
    parts := splitFunctionArgs(args, 3) // respects nested parentheses
    part := strings.TrimSpace(parts[0])
    n := strings.TrimSpace(parts[1])
    date := strings.TrimSpace(parts[2])
    return fmt.Sprintf("(CAST(%s AS DATE) + interval %s %s)", date, n, part)
})
Enter fullscreen mode Exit fullscreen mode

The marker function (__DATEADD__) is a placeholder that's safe to transform after AST serialization (it only exists because we inserted it during the AST walk). Using a tiny parser keeps it robust when arguments contain nested expressions (e.g., function calls), without relying on brittle regexes.


DuckDB Connection Handling

DuckDB only supports a single writer at a time. Multiple readers are fine, but writes must be serialized.

I solved this with a simple mutex pattern:

type Manager struct {
    db      *sql.DB
    writeMu sync.Mutex  // Serializes write operations
}

// Reads can be concurrent (no lock)
func (m *Manager) Query(ctx context.Context, query string, args ...any) (*sql.Rows, error) {
    return m.db.QueryContext(ctx, query, args...)
}

// Writes are serialized
func (m *Manager) Exec(ctx context.Context, query string, args ...any) (sql.Result, error) {
    m.writeMu.Lock()
    defer m.writeMu.Unlock()
    return m.db.ExecContext(ctx, query, args...)
}
Enter fullscreen mode Exit fullscreen mode

This lives in pkg/connection/manager.go. When an operation needs to touch both DuckDB and the emulator's metadata tables (e.g., CREATE DATABASE / CREATE SCHEMA), it uses ExecTx to keep those changes atomic. (Table-level metadata tracking for CREATE TABLE / DROP TABLE is intentionally kept minimal for now.)


Quick Start

docker run -p 8080:8080 ghcr.io/nnnkkk7/snowflake-emulator:latest
Enter fullscreen mode Exit fullscreen mode

Optional: run with persistent storage:

docker run -p 8080:8080 -v snowflake-data:/data \
  -e DB_PATH=/data/snowflake.db \
  ghcr.io/nnnkkk7/snowflake-emulator:latest
Enter fullscreen mode Exit fullscreen mode

Go (gosnowflake driver):

dsn := "user:pass@localhost:8080/TEST_DB/PUBLIC?account=test&protocol=http"
db, _ := sql.Open("snowflake", dsn)
db.Query(`SELECT IFF(score >= 90, 'A', 'B'), DATEADD(day, 7, created_at) FROM users`)
Enter fullscreen mode Exit fullscreen mode

Python, Node.js, etc. (REST API v2):

curl -X POST http://localhost:8080/api/v2/statements \
  -H "Content-Type: application/json" \
  -d '{"statement": "SELECT IFF(1 > 0, '\''yes'\'', '\''no'\'')", "database": "TEST_DB"}'
Enter fullscreen mode Exit fullscreen mode

The response includes a statementHandle. Fetch the result via GET /api/v2/statements/{handle}.


Limitations

This is a dev/test tool, not a Snowflake replacement:

  • No authentication (dev mode only)
  • No Time Travel / Zero-Copy Cloning
  • No external stages (S3, Azure, GCS)
  • No stored procedures / UDFs

Conclusion

Building a SQL translator taught me that regex is tempting but fragile for SQL transformations. AST-based parsing handles edge cases correctly, and the marker function pattern bridges the gap when in-place AST transformations aren't possible.

Repository: github.com/nnnkkk7/snowflake-emulator

If you try it, let me know what SQL functions are missing.

The Go gopher was designed by Renee French.

Top comments (0)