Krishna Tangudu

Posted on May 7 • Edited on May 27

How a Simple Warehouse Resize Saved Us 11% in Daily Credits While Boosting Performance

#snowflake #dataengineering #finops #optimization

Our team hit a familiar Snowflake paradox: slower ETL runs arrived at the same time as FinOps alerts about rising credits.

The warehouse in question was WH_ETL_BRONZE_01, a multi-cluster warehouse dedicated to Bronze layer ingestion and merge workloads. What looked like a simple cost problem turned out to be a workload-isolation and concurrency problem.

The Problem

We started with this setup:

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'XSMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 8;  -- default behavior

And we saw both of these at once:

Higher queue times (queued_overload_time)
Higher daily credits

At that point, we were loading 1,500+ tables through this single warehouse. That meant heavyweight Bronze MERGE workloads and smaller ingestion/utility queries were all competing in the same execution pool.

The critical issue was workload mix. Long-running MERGE statements (30 to 60 minutes) were running alongside tiny queries. When too many heavy MERGE statements landed on the same node/cluster, they competed for resources and all slowed down.

In other words, even with multi-cluster enabled, assignment patterns could create unstable performance if too many heavyweight queries were packed together.

We also learned we needed workload separation: large-table loads should run in a dedicated warehouse so they do not contend with smaller table loads and operational queries.

What We Changed

This was not a single before/after discovery from history. We ran controlled experiments in sequence, while splitting the largest table loads into a separate warehouse path.

Phase 1: Baseline

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'XSMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 8;

Phase 2: First Experiment

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'XSMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 2;

Goal: force scale-out behavior earlier and reduce the chance that many heavy MERGE jobs share the same node.

At the same time, we moved larger-table processing to a separate warehouse so those jobs would not compete with the remaining 1,500+ table ingestion flow.

Phase 3: Final Optimization

ALTER WAREHOUSE WH_ETL_BRONZE_01 SET
  WAREHOUSE_SIZE = 'SMALL',
  MAX_CLUSTER_COUNT = 10,
  MIN_CLUSTER_COUNT = 1,
  SCALING_POLICY = 'STANDARD',
  AUTO_SUSPEND = 60,
  MAX_CONCURRENCY_LEVEL = 3;

This became the best balance for our workload profile.

Final state for this ETL warehouse path:

SMALL size
MAX_CONCURRENCY_LEVEL = 3
Large-table workloads isolated to a separate warehouse

Why This Helped

From deeper analysis of the workload behavior:

The number of queries and output volume could look similar across runs, but bytes scanned at table level could still rise.
Heavy MERGE overlap was a key instability driver.
On slow runs, many heavy MERGE statements could get assigned to the same node, causing resource contention.
On fast runs, fewer heavy MERGE statements shared a node, leaving room for short-running queries.
Isolating large-table workloads reduced cross-workload contention in the primary Bronze ETL warehouse.

Lowering concurrency reduced the tendency to over-pack heavyweight merges. Then moving to SMALL provided enough per-query resources to cut elapsed time and queueing further. Separating large-table loads into a dedicated warehouse stabilized performance for the rest of the pipeline.

Configuration Summary

Phase	Size	Max Clusters	Max Concurrency	Intent
Baseline	XSMALL	10	8	Initial setup
Experiment	XSMALL	10	2	Force earlier scale-out / reduce heavy-query packing
Final	SMALL	10	3	Balance throughput, queueing, and cost

The two final intentional changes vs baseline were:

WAREHOUSE_SIZE: XSMALL -> SMALL
MAX_CONCURRENCY_LEVEL: 8 -> 3

How We Measured

We used Snowflake ACCOUNT_USAGE views for both performance and cost, comparing baseline and final optimization windows.

Query Performance (Before/After)

SELECT
  CASE
    WHEN start_time < '2026-05-05' THEN 'XSMALL Period'
    ELSE 'SMALL Period'
  END AS period,
  COUNT(*) AS total_queries,
  ROUND(AVG(total_elapsed_time)/1000, 2) AS avg_elapsed_sec,
  ROUND(AVG(queued_overload_time)/1000, 2) AS avg_queued_sec,
  ROUND(MAX(total_elapsed_time)/1000, 2) AS max_elapsed_sec
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE warehouse_name = 'WH_ETL_BRONZE_01'
  AND start_time >= '2026-04-30'
GROUP BY period
ORDER BY period;

Credit Consumption (Daily)

SELECT
  CASE
    WHEN start_time::DATE < '2026-05-05' THEN 'XSMALL Period'
    ELSE 'SMALL Period'
  END AS period,
  ROUND(SUM(credits_used_compute) / COUNT(DISTINCT start_time::DATE), 2) AS daily_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE warehouse_name = 'WH_ETL_BRONZE_01'
  AND start_time::DATE >= '2026-04-30'
GROUP BY period
ORDER BY period;

Results

Cost

Period	Size	Daily Credits
Baseline (5 days)	XSMALL	15.12/day
Optimized (3 days)	SMALL	13.50/day

Daily credits dropped by about 11% after upsizing.

Performance

Metric	XSMALL	SMALL	Improvement
Avg query time	26.15s	24.13s	8% faster
Max query time	3,475s	2,850s	18% faster
Total queue time	2.20 min	0.28 min	87% less queuing

Note: The intermediate XSMALL plus concurrency 2 phase was used to validate behavior and direction. The published KPI table above compares the stable baseline period against the final tuned period.

The Counterintuitive Lesson

In some workloads, a bigger warehouse can cost less.

1. Faster execution means earlier suspend

Queries completed faster on SMALL, so the warehouse reached AUTO_SUSPEND = 60 sooner. Less runtime plus less idle overhead translated to fewer daily credits.

2. Higher concurrency can reduce cluster sprawl

Concurrency must match workload shape. In our case, moving from default 8 down to 2 first improved isolation for heavy MERGE jobs, then landing at 3 with SMALL gave the right balance of throughput and stability.

3. Less queueing improves throughput and utilization

With 87% less queue time, work finished in tighter windows. The warehouse did useful work and went to sleep sooner.

Key Takeaways

Do not assume smaller is always cheaper.
Monitor queued_overload_time closely. Persistent queueing often means you are under-sized or under-concurrented.
Tune MAX_CONCURRENCY_LEVEL with warehouse size and workload type; they should be optimized together.
Isolate heavyweight large-table workflows into a dedicated warehouse to reduce overlap and contention.
Use both QUERY_HISTORY (performance) and WAREHOUSE_METERING_HISTORY (cost) for before/after decisions.
Keep aggressive auto-suspend for bursty workloads. Faster queries plus short suspend windows compound savings.

Closing Thought

Warehouse right-sizing is a performance and FinOps exercise, not just a cost-control exercise. In many real workloads, a slightly larger warehouse with slightly higher concurrency can win on both speed and spend.

Disclosure and Scope

Results reflect our specific environment and workload profile; outcomes may vary.

DEV Community