Snowflake Cost Optimization Starts With Workload Design, Not Auto-Suspend

#architecture #database #dataengineering #performance

Snowflake Cost Optimization Starts With Workload Design, Not Auto-Suspend

The biggest savings usually come from attribution, workload isolation, and choosing the right Snowflake-native optimization lever for the problem you actually have.

Most Snowflake cost advice starts with the same recommendation: lower your auto-suspend timeout.

That advice is not wrong. It is just too small.

In smaller environments, aggressive auto-suspend can capture obvious idle time. In larger Snowflake estates, the biggest savings usually come from something else, such as mixed workloads sharing the same warehouse, concurrency problems misread as sizing problems, expensive scans treated as unavoidable, and teams using billing data as a proxy for warehouse behavior.

That is why the best Snowflake cost reviews do not start with asking whether this warehouse should be smaller.

They start with a better question: Does this workload belong on this warehouse at all?

That framing changes everything. It pushes you to identify what is running, where it is running, and whether the warehouse design matches the workload. It also helps you choose the right Snowflake-native lever, whether that is workload isolation, a multi-cluster strategy, Snowpark Optimized Warehouses, Dynamic Tables, or simply better attribution.

Here is the review sequence I trust most.

1. Start with attribution, not assumptions

Before you tune a single warehouse, answer the most basic cost question in the account: Who is consuming credits, and for what?

In many Snowflake environments, one warehouse is serving all of this at once, including dbt model runs, BI dashboards, analyst ad hoc SQL, notebook exploration, and health checks.

If those are not clearly identifiable, cost optimization becomes guesswork. The fastest improvement is usually operational, which involves standardizing your query tags, using warehouse names that encode the workload and environment, and separating service users from human users.

Stop manually setting query tags for dbt. Instead of running session-level alter commands, use the native dbt-snowflake adapter configurations. Set your query tag in your project configuration file to automatically inject the model name, environment, and execution ID into the Snowflake query history.

Then inspect query history:

SELECT
    QUERY_TAG,
    USER_NAME,
    WAREHOUSE_NAME,
    COUNT(*) AS query_count,
    SUM(BYTES_SCANNED) / POWER(1024, 3) AS total_gb_scanned,
    AVG(TOTAL_ELAPSED_TIME) / 1000 AS avg_elapsed_sec
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP())
  AND EXECUTION_STATUS = 'SUCCESS'
GROUP BY 1, 2, 3
ORDER BY query_count DESC;

2. Stop treating billing data like utilization data

This is one of the most common Snowflake mistakes. The metering history tells you what was billed, but it does not tell you whether a warehouse was saturated, queueing, or mostly idle between bursts. For that, you need the load history.

Start with metering to see the cost, then compare it with load:

SELECT
    WAREHOUSE_NAME,
    DATE_TRUNC('hour', START_TIME) AS hour_start,
    AVG(AVG_RUNNING) AS avg_running,
    AVG(AVG_QUEUED_LOAD) AS avg_queued_load,
    AVG(AVG_QUEUED_PROVISIONING) AS avg_queued_provisioning
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORY
WHERE START_TIME >= DATEADD('day', -14, CURRENT_TIMESTAMP())
GROUP BY 1, 2
ORDER BY 1, 2;

That distinction matters because the fixes are different. High credits with low queueing often indicate idle time, over-provisioning, or poor workload placement. High queueing often points to concurrency pressure. Long-running heavy queries with spill often reveal memory pressure, query shape issues, or poor pruning.

If you use metering as a stand-in for everything, you will solve the wrong problem.

3. Find mixed-workload warehouses before you resize anything

One of the most expensive Snowflake patterns is a warehouse serving workloads with completely different economics. Typical examples include heavy ELT transformations sharing compute with BI dashboards, or tiny monitoring queries running on a warehouse sized for large joins.

Averages hide this, but distribution exposes it.

SELECT
    WAREHOUSE_NAME,
    WAREHOUSE_SIZE,
    COUNT(*) AS query_count,
    PERCENTILE_CONT(0.10) WITHIN GROUP (ORDER BY BYTES_SCANNED) AS p10_bytes_scanned,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY BYTES_SCANNED) AS p50_bytes_scanned,
    PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY BYTES_SCANNED) AS p90_bytes_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP())
  AND EXECUTION_STATUS = 'SUCCESS'
GROUP BY 1, 2
ORDER BY 1;

If your 10th percentile bytes scanned is tiny while your 90th percentile is massive, you have several workloads sharing one compute pool. That is your segmentation signal. Split them into a transform warehouse, an interactive warehouse, and a utility warehouse.

4. Treat sizing and concurrency as different decisions

Snowflake tuning is often reduced to going up one warehouse size. Sometimes that is right, but often it is incomplete.

Larger warehouses generally improve single-query performance. But if the real issue is many users or dashboards hitting the warehouse at once, the better answer is a multi-cluster warehouse, not just a bigger size.

That is why load history matters so much:

SELECT
    WAREHOUSE_NAME,
    AVG(AVG_RUNNING) AS avg_running,
    AVG(AVG_QUEUED_LOAD) AS avg_queued_load,
    AVG(AVG_QUEUED_PROVISIONING) AS avg_queued_provisioning
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORY
WHERE START_TIME >= DATEADD('day', -7, CURRENT_TIMESTAMP())
GROUP BY 1
ORDER BY avg_queued_load DESC;

A good rule of thumb is that if one heavy query is slow and spilling, you should think about sizing or query shape. If many small queries are waiting, you should think about your concurrency strategy. If both are happening on one warehouse, you must split the workload before deciding anything else.

5. Use auto-suspend as a workload setting, not a religion

Auto-suspend still matters, but it just is not a universal answer. For intermittent interactive work, short auto-suspend settings often make sense, but assuming 60 seconds is always right is too absolute.

Inspect current settings with:

SHOW WAREHOUSES;

A better default approach involves tailoring the timeout to the use case. For interactive analyst warehouses, start around 60 to 300 seconds, then validate against actual usage. For scheduled batch warehouses, rely on a tight 60-second auto-suspend so the warehouse safely spins down the moment the orchestration tool stops sending queries, and never use your orchestration tool to explicitly send suspend commands since a tool crash means the warehouse burns credits forever. As a modern alternative for heavy and frequent batch pipelines, stop managing warehouses entirely and move these workloads to Dynamic Tables or Serverless Tasks, which automatically scale compute to the exact size needed and scale to zero the millisecond the job finishes.

6. Treat spill and scan inefficiency as diagnostic signals

Spill is one of the most useful signals in Snowflake, especially for transformation-heavy workloads.

SELECT
    WAREHOUSE_NAME,
    QUERY_TYPE,
    COUNT(*) AS query_count,
    SUM(BYTES_SPILLED_TO_LOCAL_STORAGE) / POWER(1024, 3) AS local_spill_gb,
    SUM(BYTES_SPILLED_TO_REMOTE_STORAGE) / POWER(1024, 3) AS remote_spill_gb,
    AVG(TOTAL_ELAPSED_TIME) / 1000 AS avg_elapsed_sec
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP())
  AND EXECUTION_STATUS = 'SUCCESS'
  AND (
      BYTES_SPILLED_TO_LOCAL_STORAGE > 0
      OR BYTES_SPILLED_TO_REMOTE_STORAGE > 0
  )
GROUP BY 1, 2
ORDER BY remote_spill_gb DESC;

Consistent remote spill is a strong signal, but it is not an automatic order to resize the standard warehouse.

If a query is memory-bound and spilling heavily, sizing up a standard warehouse wastes money on CPU cores you do not need. The correct Snowflake-native lever is to switch to a Snowpark Optimized Warehouse of the same size, which gives you 16x the memory per node to eliminate the spill without paying for unneeded compute.

The same principle applies to expensive scans. Large scans are not inherently bad, but if selective queries are scanning far more than they should, evaluate the right lever, which might include query rewrites, natural clustering versus explicit cluster keys, the Search Optimization Service, or Dynamic Tables for incremental processing.

Do not jump straight from an expensive query to adding a cluster key.

The Takeaway

Most Snowflake waste does not come from one catastrophic mistake. It comes from reasonable decisions that were never revisited as the platform grew.

One warehouse becomes the default for everything. The system still works, so nobody redesigns it. Spend drifts upward, nothing looks broken, but plenty is inefficient.

If there is one Snowflake-specific lesson worth keeping, it is this: Do not start by asking whether the warehouse should be smaller. Start by asking whether the workload belongs there at all.