DEV Community: Carlos Mario Mora Restrepo

Automating with AI is not adopting AI

Carlos Mario Mora Restrepo — Sun, 19 Apr 2026 00:00:00 +0000

There is a pattern that repeats itself every time a team seriously starts working with AI tools.

Most of the energy goes into one place: taking what already exists and making it faster. Repetitive tasks, manual processes, things that someone has been doing the same way for years — all of it gets fed into a pipeline or handed off to an agent. The process doesn’t change. The speed does.

That is the first level. And the market rewards it loudly.

Teams publish it. Leaders celebrate it. Vendors use it as proof. And because the feedback loop is fast and visible, most teams stop there — not because they don’t want to go further, but because the noise convinced them they already did.

The surface problem

The hype around AI has done something subtle and damaging: it has lowered the bar for what counts as transformation.

When every conference talk, every LinkedIn post, and every vendor pitch celebrates automation as the destination, teams internalize that framing. Automating three workflows becomes a success story. Replacing a manual report with a scheduled script becomes “AI adoption.” And nobody questions it — because everyone around them is doing the same thing and calling it progress.

The problem is not that automation has no value. It does. It reduces friction, frees up time, and creates the operational breathing room that makes everything else possible.

The problem is that it has a ceiling. And most teams never look up.

The conversation about redesign

There is a second level that requires something automation never demands: questioning what already works.

The question shifts from how do I automate this to how should this work if I built it today. No inherited assumptions. No legacy logic carried forward because nobody stopped to challenge it. No process designed for a world where these tools didn’t exist, now just running faster.

This is harder than it sounds. Redesigning a process means accepting that the current version — the one your team built, refined, and depends on — might not be the right starting point. That is a confronting idea, especially under delivery pressure.

But the teams that get there start producing something different. Not faster outputs. Different outputs. Workflows that couldn’t have been designed before, because the reasoning layer that makes them possible simply wasn’t available.

The conversation nobody is having

The third level is the least visible and the most consequential.

It is not about processes at all. It is about commitments.

What can your team take on today that six months ago would have been rejected — not for lack of ambition, but because there was genuinely no viable path to execute it with the resources available?

When a team starts answering that question with concrete examples, something important has happened. They are no longer optimizing existing capacity. They are operating with capacity that didn’t exist before.

That distinction matters more than it might seem. Efficiency scales linearly — with enough automation, a team can do more of the same with less friction. But new capability opens categories. A team that can commit to things that were previously out of reach doesn’t just perform better. It operates in a different space entirely.

The reason this level is so rarely discussed is that it doesn’t announce itself. You don’t recognize it in the planning meeting. You recognize it in retrospect, when you realize that what you just shipped wouldn’t have made it into the backlog six months ago — not because nobody wanted it, but because nobody could see a real path to doing it.

Why most teams don’t get there

It is not a technology problem. The tools are available. Most teams already have access to everything they need to move beyond the first level.

It is a question problem.

Automation asks: what can I offload?
Redesign asks: what should this actually look like?
New capability asks: what becomes possible now that wasn’t before?

Each question lives in a different territory. And you cannot reach the third without having moved through the first two — but moving through them doesn’t guarantee you arrive. The last step requires intentionality. It requires creating space for a question that doesn’t have an obvious answer and doesn’t generate immediate return.

That space is exactly what delivery pressure eliminates first.

What this means in practice

The hype is not going away. If anything, it will intensify — more tools, more benchmarks, more case studies, more pressure to demonstrate that your team is “already using AI.”

Automation is enough to satisfy that pressure. It is visible, measurable, and easy to communicate.

But the teams that will matter in two or three years are not the ones that automated the most. They are the ones that asked the harder question early enough — and built the discipline to keep asking it.

Where is your team right now? And more importantly: when did you last ask what has become possible that wasn’t before?

Your AI workload is not your infrastructure’s problem. Until it is.

Carlos Mario Mora Restrepo — Sat, 11 Apr 2026 00:00:00 +0000

There’s a conversation happening in the software architecture community about how bad code design inflates LLM token consumption. It’s a valid point. But it misses an entire layer of the problem — the one Platform Engineers and SREs actually own.

Most infrastructure running AI workloads today was not designed for them. It was designed to make software artifacts run. That’s a different problem, and it has a different cost.

The infrastructure assumption that breaks under AI

Traditional infrastructure design answers one question: can this artifact deploy and run?

Compute? Sized for average load. Network? Enough bandwidth for expected traffic. Storage? Enough for the data the app needs. Security? Perimeter defined, access controlled.

That model works for deterministic workloads. You know what the artifact needs. You provision for it. You monitor it.

AI workloads break the assumption at the foundation. The resource profile isn’t fixed — it shifts with every inference call, every context window, every agent loop iteration. The same infrastructure that handles your morning traffic can behave completely differently at 3pm when a poorly scoped agent starts chaining tool calls.

Nobody sized for that. Because nobody asked the infrastructure question before deploying.

What “infrastructure readiness for AI” actually means

It’s not a checklist. It’s a mindset shift: infrastructure is not a deployment target for AI workloads — it’s an active variable in their cost, latency, and reliability.

That shift surfaces four concrete areas worth reviewing before — or while — running AI in production.

1. Context passing architecture

Every token sent to a model costs money. Where does that context come from, and how is it assembled? In many infrastructures, context is rebuilt from scratch on every request: full conversation history pulled from a database, system instructions fetched from a config store, user data loaded from multiple services — all assembled in the application layer on each call.

The infrastructure question is: where can this be cached, pre-assembled, or compressed without losing fidelity? A well-designed caching layer between your application and your model endpoint can reduce token consumption significantly without touching a single line of application code.

2. Model routing and gateway configuration

Most teams deploy AI workloads with a direct application-to-model-endpoint pattern. One app, one model, one endpoint. That works in a pilot. It doesn’t scale, and it doesn’t optimize.

An AI gateway layer — whether that’s a managed service or a self-hosted proxy — enables model routing based on request complexity, cost thresholds, or latency requirements. Simple requests go to cheaper, faster models. Complex reasoning tasks go to the capable but expensive ones. That routing logic lives in infrastructure, not in application code.

If your current infrastructure has no routing layer between your application and your model endpoints, every request is treated the same regardless of what it actually needs.

3. Retry and timeout configuration

LLM calls fail. They time out. They return partial responses. The default retry behavior inherited from your existing infrastructure — designed for fast, deterministic API calls — is almost certainly wrong for inference workloads.

Aggressive retries on a timed-out LLM call don’t recover the request. They generate duplicate token consumption and compound the latency problem. Infrastructure that wasn’t configured with AI call patterns in mind will retry its way into a cost spike before anyone notices.

Reviewing timeout thresholds, retry policies, and circuit breaker configurations for AI-specific endpoints is unglamorous work. It’s also directly impactful.

4. Observability gaps inherited from pre-AI infrastructure

This one connects to a broader problem. Infrastructure deployed before AI workloads were introduced was instrumented for traditional signals: error rates, latency, throughput. Those signals don’t tell you what’s happening inside an inference call.

Token consumption, context size per request, model latency versus total request latency, MCP call chains — none of these appear in dashboards built for microservices. If your observability layer wasn’t updated when AI workloads were introduced, you’re monitoring the infrastructure around the problem, not the problem itself.

The optimization conversation nobody is having

The original framing — “fix your software architecture to reduce token consumption” — puts the responsibility on the application layer. That’s fair. But it leaves Platform Engineers in a passive role: waiting for developers to write better code while watching the inference bill grow.

The infrastructure layer has more leverage than it’s given credit for. Caching, routing, retry configuration, and observability are all infrastructure concerns. Optimizing them doesn’t require touching application code. It requires treating infrastructure as an active participant in AI workload performance — not just a surface to deploy on.

Most teams haven’t had that conversation yet. The ones that do it early will spend significantly less time explaining unexpected cost spikes later.

This post is part of an ongoing series on operating AI systems in production infrastructure. If you found it useful, the post on AI observability gaps in 2026 covers the monitoring side of the same problem.

AWS Cost Explorer Just Got Conversational — And That Changes the Workflow

Carlos Mario Mora Restrepo — Thu, 09 Apr 2026 00:00:00 +0000

AWS just closed the last friction gap in cost analysis.

Natural language queries in Cost Explorer — powered by Amazon Q — launched this week. You ask, Cost Explorer updates its charts in real time. No filters. No manual groupings. No switching to a separate Q Developer chat.

“How much did we spend on RDS last month compared to the previous one?” → instant answer + automatic visualization update.

The problem with cost tooling has always been friction

As an SRE managing multi-cloud infrastructure, I’ve spent years building cost alert layers manually: tagging strategies, Budget alarms, custom Lambda parsers for anomaly detection. Each layer added complexity. Each handoff between tools added friction.

The tooling was always capable. The problem was the interface — engineers had to translate between what they wanted to know and what the tool could show them. That translation cost was real, and it was killing adoption.

What’s actually new here

Amazon Q has had Cost Explorer integration since late 2024. What changed isn’t the underlying capability — it’s the interface.

The answer and the visualization now live in the same surface, updating together, maintaining full conversation context across follow-up questions. You can ask a follow-up without resetting the query. The conversation persists.

That sounds small. It isn’t. That’s the friction that was killing adoption.

What this means for cost governance

My first blog post on this site was about building a 4-layer cost defense strategy for cloud data platforms. At the time, building the alert pipeline was a manual exercise in connecting layers: resource monitors, warehouse sizing, connection pooling, user education.

Today AWS gives you natural language on top of those same layers. The layers still matter — you still need tagging discipline, budget boundaries, and anomaly detection. But the interface to analyze and interrogate those layers just got dramatically lower friction.

The next unlock

The question I keep coming back to: if cost analysis is now conversational, what’s next?

Proactive anomaly surfacing before the spike hits?
Rightsizing recommendations that execute autonomously?
Cost SLOs with automated enforcement?

The distance between “cost alert” and “autonomously governed cost” is closing fast. And for SREs who’ve been hand-building that infrastructure for years — that’s worth paying attention to.

Have you tried the natural language queries in Cost Explorer yet? Curious how teams are integrating this into their FinOps workflows.

From ticket to PR with agents: how to use Claude to automate platform changes without breaking SLOs

Carlos Mario Mora Restrepo — Wed, 08 Apr 2026 15:42:10 +0000

In Platform Engineering and SRE, the hardest part of change is rarely writing the change itself. The hard part is everything around it: understanding the intent behind a ticket or incident, locating the right context, identifying the systems involved, deciding what should change, validating the blast radius, documenting rollback, and making the result legible enough for someone else to review with confidence.

That is why I think the real promise of Claude is not code generation. It is the ability to help close the loop between operational intent and reviewable execution.

The translation problem

A ticket, incident, or operational task expresses intent. But between that intent and a merged change, there is usually a long chain of manual translation. Engineers need to gather context from runbooks, infrastructure repositories, dashboards, previous incidents, documentation, and platform conventions. They need to decide whether the task requires a configuration tweak, an IaC change, a runbook update, or some combination of all three. They need to make the work explicit enough to review and safe enough to deploy.

That translation layer is where Claude becomes interesting.

Anthropic describes effective agents as systems that use tools dynamically, adapt based on feedback from the environment, and operate with clear stopping conditions and human oversight. That is a much more useful framing than treating Claude as a smarter autocomplete layer.

The pattern

Applied to Platform Engineering, the workflow looks something like this:

A ticket, incident, or task becomes the initial statement of intent.
Claude gathers context from the relevant repos, documentation, and operational systems.
It uses tools to inspect files, compare configurations, reason about likely changes, and validate assumptions.
It produces a proposed change in a form that the team can actually govern — ideally as a pull request.
Humans review the result, enforce policy, and decide whether it should ship.

The pull request as the unit of governance

The pull request is the key unit here.

The real output of an agent in this workflow should not be a blob of generated code. It should be a reviewable change set with rationale, scope, validation steps, and rollback guidance. Once the output becomes a PR rather than a prompt response, the conversation shifts from "Can the model write this?" to "Can the organization safely absorb and govern this change?"

That distinction matters because SRE is not optimized for novelty. It is optimized for reliability. A change that is fast but opaque is often worse than a change that is slower but auditable. If Claude is going to be useful in platform workflows, it has to increase clarity, not just speed.

Why SLOs matter in the title

This is also why the phrase "without breaking SLOs" matters so much. It prevents the conversation from drifting into generic AI optimism. In a platform context, any serious use of agents has to be evaluated against reliability outcomes. Faster workflows are not automatically better workflows if they increase incident risk, reduce operator understanding, or blur accountability.

Guardrails are not obstacles — they are the design

A credible workflow therefore needs guardrails. At minimum, that means:

Clear tool boundaries and scoped permissions
Strong context about the system being changed
Validation before merge
Human review for sensitive or high-impact changes
Explicit rollback paths
Traceability from original intent to final diff

This guardrail-heavy framing is not anti-agent. It is what makes agents useful in production environments. Anthropic's own materials emphasize that agents work best when they can interact with the environment, test their assumptions, and operate inside structured limits rather than open-ended autonomy.

The real opportunity

That is why I think the most interesting future for Claude in Platform Engineering is not "AI writes infrastructure code." It is "AI helps translate operational work into changes that humans can evaluate, approve, and ship with confidence."

Seen this way, Claude is not just a writing assistant or coding assistant. It starts to look more like an operational interface — a system that sits between intent and execution, helping teams move from ticket to PR with more context, better traceability, and less manual translation overhead.

Not replacing engineers.

Not removing judgment.

But reducing the distance between work that needs to happen and changes that are safe enough to review, govern, and deploy.

How are you thinking about AI agents in your platform workflows? Are you already using them for operational tasks, or still evaluating the risk?

AI Observability: the problem nobody is solving well in 2026

Carlos Mario Mora Restrepo — Fri, 03 Apr 2026 00:00:00 +0000

We’ve spent years building AIOps — using AI to observe infrastructure. But there’s a more urgent problem taking shape: who observes the AI itself?

Monitoring hallucinations, prompt drift, MCP call latency, and inference costs in production is the new frontier of modern SRE. And almost nobody has a complete stack for it.

The monitoring gap is structural, not tactical

Your current observability stack was built for deterministic systems. A service either returns 200 or it doesn’t. Latency is measurable. Error rates are countable. SLOs make sense because “correct behavior” is definable.

AI systems break all of these assumptions.

The failure mode isn’t a 500 error — it’s a confident hallucination delivered with perfect latency and a 200 status code. Your dashboards are green. Your AI is producing garbage. A Fortune 100 bank misrouted 18% of critical cases without triggering a single alert.

This isn’t a tooling gap you can close by adding a plugin to your existing stack. It’s a paradigm problem.

The current landscape: 15+ tools, zero consensus

The AI observability market hit $510M in 2024, growing at 32% annually. That sounds like a mature space. It isn’t.

The landscape splits into two camps that don’t talk to each other:

AI-native platforms (Langfuse, LangSmith, Arize Phoenix, Helicone, Braintrust) understand prompts, tokens, and semantic evaluation — but have no context about your infrastructure, your SLOs, or your cost centers.

Traditional APM vendors (Datadog, New Relic, Dynatrace, Grafana) understand infrastructure deeply — but treat AI as just another microservice, missing everything that makes AI systems different.

OpenTelemetry’s GenAI Semantic Conventions are the closest thing to a unifying standard — still experimental as of Q1 2026, not GA. Every major vendor has adopted them as a wire format while building proprietary analytics on top. The instrumentation layer is converging. Everything above it is fragmented.

Four gaps practitioners can’t close

1. Inference cost is invisible at the decision layer

AI inference cost is generated where routing decisions happen — model selection, retry logic, token budgets, context window management. Your observability monitors the infrastructure layer. These are different layers, and the gap between them is expensive.

A typical pattern: a poorly optimized prompt costs more per day than the entire Kubernetes cluster running the application. One team discovered they were paying an LLM to be reminded of its job — sending the same system instructions hundreds of times daily. Reasoning models like o3 add internal “thinking tokens” that inflate consumption silently. Output tokens cost 3–10x more than input tokens.

What looks like $500/month in a pilot becomes $15,000 at production scale. Before accounting for growth.

2. MCP traces break at the boundary

97 million monthly SDK downloads. 5,800+ MCP servers in the ecosystem. And a fundamental tracing problem: when a user request flows from Agent → LLM Provider → MCP Server → External Tool, the trace breaks at the MCP boundary. Two disconnected traces. No correlation. No end-to-end visibility.

Sentry shipped the first dedicated MCP monitoring tool in mid-2025 — after running their own MCP server at 50 million requests per month and discovering random user timeouts with no results and no errors. No way to even know how many users were affected.

OpenTelemetry’s MCP semantic conventions remain in draft.

3. Silent semantic failures don’t trigger alerts

A single user request can trigger 15+ LLM calls across embedding generation, vector retrieval, context assembly, reasoning steps, and response synthesis. Every traditional metric can look healthy while the output is meaningless.

44% of organizations still rely on manual methods to monitor AI agent interactions. The current state-of-the-art for detecting semantic failures in production is largely “a human reads logs and guesses.” Most teams discover problems through downstream business metrics — weeks after the damage.

4. SLOs don’t exist for non-deterministic systems

This is the open question practitioners keep returning to. Traditional SRE practice assumes you can define expected behavior, measure deviation, and set error budgets. When the same input can legitimately produce different outputs, when “correct” requires semantic judgment, and when model providers silently update weights underneath you — the entire SLI/SLO framework needs rethinking.

Nobody has solved this. The conversation is still at the “how do we even frame the problem” stage.

The cost paradox

Adding AI monitoring to Datadog increases observability bills by 40–200%. A typical RAG pipeline generates 10–50x more telemetry than an equivalent API call. LangSmith customers routinely sample down to 0.1% of production traffic to control costs.

You end up paying significantly more to observe significantly less.

Gartner predicts that more than 40% of agentic AI projects will be canceled by 2027. The Dynatrace 2026 Pulse of Agentic AI survey found that 51% of engineering leaders cite limited visibility into agent behavior as their top technical blocker.

What’s actually converging

OpenTelemetry is winning the instrumentation war. The GenAI SIG has defined semantic conventions for LLM spans, agent spans, tool execution, token metrics, and evaluation events. Every major vendor accepts OTel GenAI spans.

That’s the one genuine convergence story. Everything above the wire format remains fragmented — comparable to cloud monitoring circa 2010–2012. Except OpenTelemetry’s existence may accelerate consolidation faster than it happened last time.

The practitioner reality

This is the infrastructure monitoring crisis of 2010 all over again. The stakes are higher. The systems are non-deterministic. The failure modes are semantic rather than structural.

If you’re an SRE or Platform Engineer who’s been handed responsibility for AI systems without the tools to properly operate them — that’s the actual state of the industry, not a gap in your skills or your team’s preparation.

The tooling will converge. OpenTelemetry will help. The ecosystem is moving.

But right now, in early 2026, most teams are flying partially blind — and the first step is naming the problem clearly enough to start solving it.

Data points: Dynatrace 2026 Pulse of Agentic AI (919 leaders), KubeCon Atlanta 2025, OneUptime AI Observability Cost Analysis, Sentry MCP Server Monitoring launch, Gartner 2025–2027 predictions, Pydantic AI observability pricing analysis.

Multi-Layer Cost Controls for Cloud Data Platforms

Carlos Mario Mora Restrepo — Sat, 10 Jan 2026 00:00:00 +0000

Managing costs in cloud data platforms is challenging, especially in sandbox environments where analysts experiment freely. A single misconfigured query can run for hours, consuming resources and exploding budgets.

After experiencing several unexpected cost spikes in a Snowflake sandbox environment, I implemented a 4-layer defense strategy that reduced overages by 60% while maintaining analyst productivity.

The Problem

Sandbox environments are critical for data teams. Analysts need freedom to experiment, test queries, and explore data without the constraints of production. However, this freedom comes with risks:

Forgotten queries : Analysts start a query, switch tasks, and forget to terminate it
Inefficient SQL : Experimentation means suboptimal queries that scan entire tables
Large compute : “Let me just use XLARGE for this one query” becomes the default
No accountability : Costs are aggregated, so individual users don’t see their impact

Traditional approaches fail:

Budget alerts only : By the time you get the alert, you’ve already spent the money
Query timeouts : Legitimate long-running analytics get killed
Restrictive permissions : Kills innovation and analyst productivity

The 4-Layer Defense Strategy

Instead of relying on a single mechanism, I implemented redundant layers so that if one fails, others catch the issue.

Layer 1: Warehouse Configuration (Prevention)

Aggressive auto-suspend and right-sizing:


-- Create warehouse with 60-second auto-suspend
CREATE WAREHOUSE SANDBOX_WAREHOUSE
  WAREHOUSE_SIZE = 'SMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE;

Why 60 seconds? Most analysts iterate on queries with 1-2 minute gaps. 60 seconds catches forgotten warehouses while allowing workflow continuity.

Default to SMALL: Unless there’s a documented need, sandbox warehouses start at SMALL. Analysts can scale up temporarily, but the default prevents “XLARGE for everything.”

Results:

Average warehouse utilization: 15 minutes/day per analyst (down from 2+ hours)
70% reduction in idle warehouse costs

Layer 2: Resource Monitors (Guardrails)

Per-user budget enforcement:


-- Create resource monitor for sandbox user
CREATE RESOURCE MONITOR USER_MONITOR
  CREDIT_QUOTA = <MONTHLY_BUDGET>
  FREQUENCY = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 90 PERCENT DO NOTIFY
    ON 95 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND_IMMEDIATE;

-- Assign to user's warehouse
ALTER WAREHOUSE SANDBOX_WAREHOUSE SET RESOURCE_MONITOR = USER_MONITOR;

Why per-user monitors?

Individual accountability (users see their own consumption)
Graceful degradation (one user hitting limit doesn’t affect others)
Data for user education (who needs query optimization training?)

Progressive notifications:

75%: “You’re on track, no action needed”
90%: “Slow down, review your queries”
95%: “Critical - optimize or your warehouse suspends at 100%”
100%: Immediate suspension (prevents overage)

Results:

Zero budget overages since implementation
Users self-optimize before hitting 90% threshold
Average spending maintained well within budget limits

Layer 3: Connection Pooling (Efficiency)

Reuse connections to reduce cold-start costs:

Python implementation:


import snowflake.connector
from contextlib import contextmanager

class SnowflakeConnectionPool:
    def __init__ (self, config):
        self.config = config
        self._connection = None

    @contextmanager
    def get_connection(self):
        """Reuse connection if available, create new if needed"""
        if self._connection is None or self._connection.is_closed():
            self._connection = snowflake.connector.connect(**self.config)

        try:
            yield self._connection
        except Exception:
            # Connection failed, reset for next attempt
            self._connection = None
            raise

# Usage
pool = SnowflakeConnectionPool(config)

with pool.get_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM data")

Why this matters:

Each new connection incurs warehouse resume cost (if auto-suspended)
Connection pooling achieves 80% reuse rate
Warehouse stays “warm” during analyst work sessions

Results:

80% connection reuse rate
30% reduction in warehouse resume events
Faster query execution (no cold-start delay)

Layer 4: User Education (Culture)

Monthly cost transparency:

Send each analyst their personal cost report:


Your Snowflake Usage - December 2025

Credits used: XX of YY (84%)
Queries executed: X,XXX
Most expensive query: X.X credits (view details)

Cost breakdown:
- Compute: XX credits
- Storage: X credits
- Data transfer: X credit

Top 3 expensive queries:
1. Full table scan on LARGE_TABLE (X.X credits)
   → Optimization: Add WHERE clause to filter data
2. Cartesian join (X.X credits)
   → Optimization: Add JOIN condition
3. Repeated aggregation (X.X credits)
   → Optimization: Materialize intermediate results

Tips for next month:
- Use LIMIT when exploring data
- Add filters before aggregations
- Check query profile before large runs

Results:

40% reduction in inefficient query patterns
Users proactively optimize before hitting budget limits
Cultural shift: “cost-aware” becomes default mindset

Combined Impact

The 4-layer strategy delivered:

Cost metrics:

60% reduction in unexpected sandbox overages
Average spending per user maintained within budget
Zero budget overruns since implementation

Operational metrics:

80% connection pooling efficiency
70% reduction in idle warehouse costs
40% improvement in query efficiency

Cultural metrics:

Users self-optimize at 90% threshold (before suspension)
Proactive query profiling becomes standard practice
Cost awareness embedded in daily workflow

Why Redundancy Matters

Each layer catches different failure modes:

Failure Scenario	Layer That Catches It
Analyst forgets to terminate warehouse	Layer 1: Auto-suspend after 60 seconds
Inefficient query runs for hours	Layer 2: Resource monitor suspends at 100%
Many short queries throughout the day	Layer 3: Connection pooling reduces resume costs
User habitually writes expensive queries	Layer 4: Monthly report triggers education

Healthcare compliance bonus: In regulated environments, this approach provides audit trails showing cost governance without restricting legitimate data access.

Implementation Checklist

Want to implement this in your organization? Here’s the checklist:

Week 1: Infrastructure (Layers 1-2)

Configure warehouse auto-suspend (aggressive timing)
Set default warehouse size to SMALL
Create per-user resource monitors with monthly quotas
Set up progressive notification thresholds (75%, 90%, 95%, 100%)

Week 2: Optimization (Layer 3)

Implement connection pooling in application code
Measure connection reuse rate
Monitor warehouse resume events

Week 3: Culture (Layer 4)

Build monthly cost report automation
Include query optimization recommendations
Send first round of reports

Week 4: Monitoring

Dashboard for cost trends
Alert on anomalies (user exceeding historical average)
Quarterly review and adjustment

Lessons Learned

No single mechanism is enough : Redundant layers provide resilience
Make costs visible : Users can’t optimize what they can’t see
Default to small : Scaling up is easier than justifying scale-down
Progressive alerts work : Users self-correct before hitting hard limits
Culture beats controls : Education changes behavior permanently

Technologies Used

Snowflake resource monitors
Python connection pooling
Automated reporting (SQL + pandas)
TOML configuration management

Want to discuss cost optimization strategies? Connect with me on LinkedIn

Related reading:

Multi-Account Data Warehouse Governance