<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Carlos Mario Mora Restrepo</title>
    <description>The latest articles on DEV Community by Carlos Mario Mora Restrepo (@carlosmoradev).</description>
    <link>https://dev.to/carlosmoradev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3868105%2F4b24f81c-7f01-4759-9dd6-1df452c62c85.png</url>
      <title>DEV Community: Carlos Mario Mora Restrepo</title>
      <link>https://dev.to/carlosmoradev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/carlosmoradev"/>
    <language>en</language>
    <item>
      <title>Automating with AI is not adopting AI</title>
      <dc:creator>Carlos Mario Mora Restrepo</dc:creator>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/carlosmoradev/automating-with-ai-is-not-adopting-ai-4pbj</link>
      <guid>https://dev.to/carlosmoradev/automating-with-ai-is-not-adopting-ai-4pbj</guid>
      <description>&lt;p&gt;There is a pattern that repeats itself every time a team seriously starts working with AI tools.&lt;/p&gt;

&lt;p&gt;Most of the energy goes into one place: taking what already exists and making it faster. Repetitive tasks, manual processes, things that someone has been doing the same way for years — all of it gets fed into a pipeline or handed off to an agent. The process doesn’t change. The speed does.&lt;/p&gt;

&lt;p&gt;That is the first level. And the market rewards it loudly.&lt;/p&gt;

&lt;p&gt;Teams publish it. Leaders celebrate it. Vendors use it as proof. And because the feedback loop is fast and visible, most teams stop there — not because they don’t want to go further, but because the noise convinced them they already did.&lt;/p&gt;




&lt;h2 id="the-surface-problem"&gt;The surface problem&lt;/h2&gt;

&lt;p&gt;The hype around AI has done something subtle and damaging: it has lowered the bar for what counts as transformation.&lt;/p&gt;

&lt;p&gt;When every conference talk, every LinkedIn post, and every vendor pitch celebrates automation as the destination, teams internalize that framing. Automating three workflows becomes a success story. Replacing a manual report with a scheduled script becomes “AI adoption.” And nobody questions it — because everyone around them is doing the same thing and calling it progress.&lt;/p&gt;

&lt;p&gt;The problem is not that automation has no value. It does. It reduces friction, frees up time, and creates the operational breathing room that makes everything else possible.&lt;/p&gt;

&lt;p&gt;The problem is that it has a ceiling. And most teams never look up.&lt;/p&gt;




&lt;h2 id="the-conversation-about-redesign"&gt;The conversation about redesign&lt;/h2&gt;

&lt;p&gt;There is a second level that requires something automation never demands: questioning what already works.&lt;/p&gt;

&lt;p&gt;The question shifts from &lt;em&gt;how do I automate this&lt;/em&gt; to &lt;em&gt;how should this work if I built it today&lt;/em&gt;. No inherited assumptions. No legacy logic carried forward because nobody stopped to challenge it. No process designed for a world where these tools didn’t exist, now just running faster.&lt;/p&gt;

&lt;p&gt;This is harder than it sounds. Redesigning a process means accepting that the current version — the one your team built, refined, and depends on — might not be the right starting point. That is a confronting idea, especially under delivery pressure.&lt;/p&gt;

&lt;p&gt;But the teams that get there start producing something different. Not faster outputs. Different outputs. Workflows that couldn’t have been designed before, because the reasoning layer that makes them possible simply wasn’t available.&lt;/p&gt;




&lt;h2 id="the-conversation-nobody-is-having"&gt;The conversation nobody is having&lt;/h2&gt;

&lt;p&gt;The third level is the least visible and the most consequential.&lt;/p&gt;

&lt;p&gt;It is not about processes at all. It is about commitments.&lt;/p&gt;

&lt;p&gt;What can your team take on today that six months ago would have been rejected — not for lack of ambition, but because there was genuinely no viable path to execute it with the resources available?&lt;/p&gt;

&lt;p&gt;When a team starts answering that question with concrete examples, something important has happened. They are no longer optimizing existing capacity. They are operating with capacity that didn’t exist before.&lt;/p&gt;

&lt;p&gt;That distinction matters more than it might seem. Efficiency scales linearly — with enough automation, a team can do more of the same with less friction. But new capability opens categories. A team that can commit to things that were previously out of reach doesn’t just perform better. It operates in a different space entirely.&lt;/p&gt;

&lt;p&gt;The reason this level is so rarely discussed is that it doesn’t announce itself. You don’t recognize it in the planning meeting. You recognize it in retrospect, when you realize that what you just shipped wouldn’t have made it into the backlog six months ago — not because nobody wanted it, but because nobody could see a real path to doing it.&lt;/p&gt;




&lt;h2 id="why-most-teams-dont-get-there"&gt;Why most teams don’t get there&lt;/h2&gt;

&lt;p&gt;It is not a technology problem. The tools are available. Most teams already have access to everything they need to move beyond the first level.&lt;/p&gt;

&lt;p&gt;It is a question problem.&lt;/p&gt;

&lt;p&gt;Automation asks: &lt;em&gt;what can I offload?&lt;/em&gt;&lt;br&gt;
Redesign asks: &lt;em&gt;what should this actually look like?&lt;/em&gt;&lt;br&gt;
New capability asks: &lt;em&gt;what becomes possible now that wasn’t before?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Each question lives in a different territory. And you cannot reach the third without having moved through the first two — but moving through them doesn’t guarantee you arrive. The last step requires intentionality. It requires creating space for a question that doesn’t have an obvious answer and doesn’t generate immediate return.&lt;/p&gt;

&lt;p&gt;That space is exactly what delivery pressure eliminates first.&lt;/p&gt;




&lt;h2 id="what-this-means-in-practice"&gt;What this means in practice&lt;/h2&gt;

&lt;p&gt;The hype is not going away. If anything, it will intensify — more tools, more benchmarks, more case studies, more pressure to demonstrate that your team is “already using AI.”&lt;/p&gt;

&lt;p&gt;Automation is enough to satisfy that pressure. It is visible, measurable, and easy to communicate.&lt;/p&gt;

&lt;p&gt;But the teams that will matter in two or three years are not the ones that automated the most. They are the ones that asked the harder question early enough — and built the discipline to keep asking it.&lt;/p&gt;

&lt;p&gt;Where is your team right now? And more importantly: when did you last ask what has become possible that wasn’t before?&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>platformengineering</category>
      <category>leadership</category>
      <category>strategy</category>
    </item>
    <item>
      <title>Your AI workload is not your infrastructure’s problem. Until it is.</title>
      <dc:creator>Carlos Mario Mora Restrepo</dc:creator>
      <pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/carlosmoradev/your-ai-workload-is-not-your-infrastructures-problem-until-it-is-3h0e</link>
      <guid>https://dev.to/carlosmoradev/your-ai-workload-is-not-your-infrastructures-problem-until-it-is-3h0e</guid>
      <description>&lt;p&gt;There’s a conversation happening in the software architecture community about how bad code design inflates LLM token consumption. It’s a valid point. But it misses an entire layer of the problem — the one Platform Engineers and SREs actually own.&lt;/p&gt;

&lt;p&gt;Most infrastructure running AI workloads today was not designed for them. It was designed to make software artifacts run. That’s a different problem, and it has a different cost.&lt;/p&gt;

&lt;h2 id="the-infrastructure-assumption-that-breaks-under-ai"&gt;The infrastructure assumption that breaks under AI&lt;/h2&gt;

&lt;p&gt;Traditional infrastructure design answers one question: can this artifact deploy and run?&lt;/p&gt;

&lt;p&gt;Compute? Sized for average load. Network? Enough bandwidth for expected traffic. Storage? Enough for the data the app needs. Security? Perimeter defined, access controlled.&lt;/p&gt;

&lt;p&gt;That model works for deterministic workloads. You know what the artifact needs. You provision for it. You monitor it.&lt;/p&gt;

&lt;p&gt;AI workloads break the assumption at the foundation. The resource profile isn’t fixed — it shifts with every inference call, every context window, every agent loop iteration. The same infrastructure that handles your morning traffic can behave completely differently at 3pm when a poorly scoped agent starts chaining tool calls.&lt;/p&gt;

&lt;p&gt;Nobody sized for that. Because nobody asked the infrastructure question before deploying.&lt;/p&gt;

&lt;h2 id="what-infrastructure-readiness-for-ai-actually-means"&gt;What “infrastructure readiness for AI” actually means&lt;/h2&gt;

&lt;p&gt;It’s not a checklist. It’s a mindset shift: infrastructure is not a deployment target for AI workloads — it’s an active variable in their cost, latency, and reliability.&lt;/p&gt;

&lt;p&gt;That shift surfaces four concrete areas worth reviewing before — or while — running AI in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context passing architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every token sent to a model costs money. Where does that context come from, and how is it assembled? In many infrastructures, context is rebuilt from scratch on every request: full conversation history pulled from a database, system instructions fetched from a config store, user data loaded from multiple services — all assembled in the application layer on each call.&lt;/p&gt;

&lt;p&gt;The infrastructure question is: where can this be cached, pre-assembled, or compressed without losing fidelity? A well-designed caching layer between your application and your model endpoint can reduce token consumption significantly without touching a single line of application code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Model routing and gateway configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most teams deploy AI workloads with a direct application-to-model-endpoint pattern. One app, one model, one endpoint. That works in a pilot. It doesn’t scale, and it doesn’t optimize.&lt;/p&gt;

&lt;p&gt;An AI gateway layer — whether that’s a managed service or a self-hosted proxy — enables model routing based on request complexity, cost thresholds, or latency requirements. Simple requests go to cheaper, faster models. Complex reasoning tasks go to the capable but expensive ones. That routing logic lives in infrastructure, not in application code.&lt;/p&gt;

&lt;p&gt;If your current infrastructure has no routing layer between your application and your model endpoints, every request is treated the same regardless of what it actually needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Retry and timeout configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LLM calls fail. They time out. They return partial responses. The default retry behavior inherited from your existing infrastructure — designed for fast, deterministic API calls — is almost certainly wrong for inference workloads.&lt;/p&gt;

&lt;p&gt;Aggressive retries on a timed-out LLM call don’t recover the request. They generate duplicate token consumption and compound the latency problem. Infrastructure that wasn’t configured with AI call patterns in mind will retry its way into a cost spike before anyone notices.&lt;/p&gt;

&lt;p&gt;Reviewing timeout thresholds, retry policies, and circuit breaker configurations for AI-specific endpoints is unglamorous work. It’s also directly impactful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Observability gaps inherited from pre-AI infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one connects to a broader problem. Infrastructure deployed before AI workloads were introduced was instrumented for traditional signals: error rates, latency, throughput. Those signals don’t tell you what’s happening inside an inference call.&lt;/p&gt;

&lt;p&gt;Token consumption, context size per request, model latency versus total request latency, MCP call chains — none of these appear in dashboards built for microservices. If your observability layer wasn’t updated when AI workloads were introduced, you’re monitoring the infrastructure around the problem, not the problem itself.&lt;/p&gt;

&lt;h2 id="the-optimization-conversation-nobody-is-having"&gt;The optimization conversation nobody is having&lt;/h2&gt;

&lt;p&gt;The original framing — “fix your software architecture to reduce token consumption” — puts the responsibility on the application layer. That’s fair. But it leaves Platform Engineers in a passive role: waiting for developers to write better code while watching the inference bill grow.&lt;/p&gt;

&lt;p&gt;The infrastructure layer has more leverage than it’s given credit for. Caching, routing, retry configuration, and observability are all infrastructure concerns. Optimizing them doesn’t require touching application code. It requires treating infrastructure as an active participant in AI workload performance — not just a surface to deploy on.&lt;/p&gt;

&lt;p&gt;Most teams haven’t had that conversation yet. The ones that do it early will spend significantly less time explaining unexpected cost spikes later.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is part of an ongoing series on operating AI systems in production infrastructure. If you found it useful, the post on &lt;a href="/2026/04/03/ai-observability-the-gap-nobody-is-solving/"&gt;AI observability gaps in 2026&lt;/a&gt; covers the monitoring side of the same problem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>platformengineering</category>
      <category>sre</category>
      <category>aiagents</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>AWS Cost Explorer Just Got Conversational — And That Changes the Workflow</title>
      <dc:creator>Carlos Mario Mora Restrepo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/carlosmoradev/aws-cost-explorer-just-got-conversational-and-that-changes-the-workflow-59f6</link>
      <guid>https://dev.to/carlosmoradev/aws-cost-explorer-just-got-conversational-and-that-changes-the-workflow-59f6</guid>
      <description>&lt;p&gt;AWS just closed the last friction gap in cost analysis.&lt;/p&gt;

&lt;p&gt;Natural language queries in Cost Explorer — powered by Amazon Q — launched this week. You ask, Cost Explorer updates its charts in real time. No filters. No manual groupings. No switching to a separate Q Developer chat.&lt;/p&gt;

&lt;p&gt;“How much did we spend on RDS last month compared to the previous one?” → instant answer + automatic visualization update.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with cost tooling has always been friction
&lt;/h2&gt;

&lt;p&gt;As an SRE managing multi-cloud infrastructure, I’ve spent years building cost alert layers manually: tagging strategies, Budget alarms, custom Lambda parsers for anomaly detection. Each layer added complexity. Each handoff between tools added friction.&lt;/p&gt;

&lt;p&gt;The tooling was always capable. The problem was the interface — engineers had to translate between what they wanted to know and what the tool could show them. That translation cost was real, and it was killing adoption.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s actually new here
&lt;/h2&gt;

&lt;p&gt;Amazon Q has had Cost Explorer integration since late 2024. What changed isn’t the underlying capability — it’s the interface.&lt;/p&gt;

&lt;p&gt;The answer and the visualization now live in the same surface, updating together, maintaining full conversation context across follow-up questions. You can ask a follow-up without resetting the query. The conversation persists.&lt;/p&gt;

&lt;p&gt;That sounds small. It isn’t. That’s the friction that was killing adoption.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for cost governance
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://carlosmora.dev/2026/01/10/multi-layer-cost-controls.html" rel="noopener noreferrer"&gt;first blog post on this site&lt;/a&gt; was about building a 4-layer cost defense strategy for cloud data platforms. At the time, building the alert pipeline was a manual exercise in connecting layers: resource monitors, warehouse sizing, connection pooling, user education.&lt;/p&gt;

&lt;p&gt;Today AWS gives you natural language on top of those same layers. The layers still matter — you still need tagging discipline, budget boundaries, and anomaly detection. But the interface to analyze and interrogate those layers just got dramatically lower friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The next unlock
&lt;/h2&gt;

&lt;p&gt;The question I keep coming back to: if cost analysis is now conversational, what’s next?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proactive anomaly surfacing before the spike hits?&lt;/li&gt;
&lt;li&gt;Rightsizing recommendations that execute autonomously?&lt;/li&gt;
&lt;li&gt;Cost SLOs with automated enforcement?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The distance between “cost alert” and “autonomously governed cost” is closing fast. And for SREs who’ve been hand-building that infrastructure for years — that’s worth paying attention to.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you tried the natural language queries in Cost Explorer yet? Curious how teams are integrating this into their FinOps workflows.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>finops</category>
      <category>costoptimization</category>
      <category>sre</category>
    </item>
    <item>
      <title>From ticket to PR with agents: how to use Claude to automate platform changes without breaking SLOs</title>
      <dc:creator>Carlos Mario Mora Restrepo</dc:creator>
      <pubDate>Wed, 08 Apr 2026 15:42:10 +0000</pubDate>
      <link>https://dev.to/carlosmoradev/from-ticket-to-pr-with-agents-how-to-use-claude-to-automate-platform-changes-without-breaking-slos-48lg</link>
      <guid>https://dev.to/carlosmoradev/from-ticket-to-pr-with-agents-how-to-use-claude-to-automate-platform-changes-without-breaking-slos-48lg</guid>
      <description>&lt;p&gt;In Platform Engineering and SRE, the hardest part of change is rarely writing the change itself. The hard part is everything around it: understanding the intent behind a ticket or incident, locating the right context, identifying the systems involved, deciding what should change, validating the blast radius, documenting rollback, and making the result legible enough for someone else to review with confidence.&lt;/p&gt;

&lt;p&gt;That is why I think the real promise of Claude is not code generation. It is the ability to help close the loop between operational intent and reviewable execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  The translation problem
&lt;/h2&gt;

&lt;p&gt;A ticket, incident, or operational task expresses intent. But between that intent and a merged change, there is usually a long chain of manual translation. Engineers need to gather context from runbooks, infrastructure repositories, dashboards, previous incidents, documentation, and platform conventions. They need to decide whether the task requires a configuration tweak, an IaC change, a runbook update, or some combination of all three. They need to make the work explicit enough to review and safe enough to deploy.&lt;/p&gt;

&lt;p&gt;That translation layer is where Claude becomes interesting.&lt;/p&gt;

&lt;p&gt;Anthropic describes effective agents as systems that use tools dynamically, adapt based on feedback from the environment, and operate with clear stopping conditions and human oversight. That is a much more useful framing than treating Claude as a smarter autocomplete layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern
&lt;/h2&gt;

&lt;p&gt;Applied to Platform Engineering, the workflow looks something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A ticket, incident, or task becomes the initial statement of intent.&lt;/li&gt;
&lt;li&gt;Claude gathers context from the relevant repos, documentation, and operational systems.&lt;/li&gt;
&lt;li&gt;It uses tools to inspect files, compare configurations, reason about likely changes, and validate assumptions.&lt;/li&gt;
&lt;li&gt;It produces a proposed change in a form that the team can actually govern — ideally as a pull request.&lt;/li&gt;
&lt;li&gt;Humans review the result, enforce policy, and decide whether it should ship.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The pull request as the unit of governance
&lt;/h2&gt;

&lt;p&gt;The pull request is the key unit here.&lt;/p&gt;

&lt;p&gt;The real output of an agent in this workflow should not be a blob of generated code. It should be a reviewable change set with rationale, scope, validation steps, and rollback guidance. Once the output becomes a PR rather than a prompt response, the conversation shifts from "Can the model write this?" to "Can the organization safely absorb and govern this change?"&lt;/p&gt;

&lt;p&gt;That distinction matters because SRE is not optimized for novelty. It is optimized for reliability. A change that is fast but opaque is often worse than a change that is slower but auditable. If Claude is going to be useful in platform workflows, it has to increase clarity, not just speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why SLOs matter in the title
&lt;/h2&gt;

&lt;p&gt;This is also why the phrase "without breaking SLOs" matters so much. It prevents the conversation from drifting into generic AI optimism. In a platform context, any serious use of agents has to be evaluated against reliability outcomes. Faster workflows are not automatically better workflows if they increase incident risk, reduce operator understanding, or blur accountability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardrails are not obstacles — they are the design
&lt;/h2&gt;

&lt;p&gt;A credible workflow therefore needs guardrails. At minimum, that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear tool boundaries and scoped permissions&lt;/li&gt;
&lt;li&gt;Strong context about the system being changed&lt;/li&gt;
&lt;li&gt;Validation before merge&lt;/li&gt;
&lt;li&gt;Human review for sensitive or high-impact changes&lt;/li&gt;
&lt;li&gt;Explicit rollback paths&lt;/li&gt;
&lt;li&gt;Traceability from original intent to final diff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This guardrail-heavy framing is not anti-agent. It is what makes agents useful in production environments. Anthropic's own materials emphasize that agents work best when they can interact with the environment, test their assumptions, and operate inside structured limits rather than open-ended autonomy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real opportunity
&lt;/h2&gt;

&lt;p&gt;That is why I think the most interesting future for Claude in Platform Engineering is not "AI writes infrastructure code." It is "AI helps translate operational work into changes that humans can evaluate, approve, and ship with confidence."&lt;/p&gt;

&lt;p&gt;Seen this way, Claude is not just a writing assistant or coding assistant. It starts to look more like an operational interface — a system that sits between intent and execution, helping teams move from ticket to PR with more context, better traceability, and less manual translation overhead.&lt;/p&gt;

&lt;p&gt;Not replacing engineers.&lt;/p&gt;

&lt;p&gt;Not removing judgment.&lt;/p&gt;

&lt;p&gt;But reducing the distance between work that needs to happen and changes that are safe enough to review, govern, and deploy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How are you thinking about AI agents in your platform workflows? Are you already using them for operational tasks, or still evaluating the risk?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>platformengineering</category>
      <category>sre</category>
      <category>aiagents</category>
      <category>automation</category>
    </item>
    <item>
      <title>AI Observability: the problem nobody is solving well in 2026</title>
      <dc:creator>Carlos Mario Mora Restrepo</dc:creator>
      <pubDate>Fri, 03 Apr 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/carlosmoradev/ai-observability-the-problem-nobody-is-solving-well-in-2026-5959</link>
      <guid>https://dev.to/carlosmoradev/ai-observability-the-problem-nobody-is-solving-well-in-2026-5959</guid>
      <description>&lt;p&gt;We’ve spent years building AIOps — using AI to observe infrastructure. But there’s a more urgent problem taking shape: who observes the AI itself?&lt;/p&gt;

&lt;p&gt;Monitoring hallucinations, prompt drift, MCP call latency, and inference costs in production is the new frontier of modern SRE. And almost nobody has a complete stack for it.&lt;/p&gt;

&lt;h2 id="the-monitoring-gap-is-structural-not-tactical"&gt;The monitoring gap is structural, not tactical&lt;/h2&gt;

&lt;p&gt;Your current observability stack was built for deterministic systems. A service either returns 200 or it doesn’t. Latency is measurable. Error rates are countable. SLOs make sense because “correct behavior” is definable.&lt;/p&gt;

&lt;p&gt;AI systems break all of these assumptions.&lt;/p&gt;

&lt;p&gt;The failure mode isn’t a 500 error — it’s a confident hallucination delivered with perfect latency and a 200 status code. Your dashboards are green. Your AI is producing garbage. A Fortune 100 bank misrouted 18% of critical cases without triggering a single alert.&lt;/p&gt;

&lt;p&gt;This isn’t a tooling gap you can close by adding a plugin to your existing stack. It’s a paradigm problem.&lt;/p&gt;

&lt;h2 id="the-current-landscape-15-tools-zero-consensus"&gt;The current landscape: 15+ tools, zero consensus&lt;/h2&gt;

&lt;p&gt;The AI observability market hit $510M in 2024, growing at 32% annually. That sounds like a mature space. It isn’t.&lt;/p&gt;

&lt;p&gt;The landscape splits into two camps that don’t talk to each other:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-native platforms&lt;/strong&gt; (Langfuse, LangSmith, Arize Phoenix, Helicone, Braintrust) understand prompts, tokens, and semantic evaluation — but have no context about your infrastructure, your SLOs, or your cost centers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional APM vendors&lt;/strong&gt; (Datadog, New Relic, Dynatrace, Grafana) understand infrastructure deeply — but treat AI as just another microservice, missing everything that makes AI systems different.&lt;/p&gt;

&lt;p&gt;OpenTelemetry’s GenAI Semantic Conventions are the closest thing to a unifying standard — still experimental as of Q1 2026, not GA. Every major vendor has adopted them as a wire format while building proprietary analytics on top. The instrumentation layer is converging. Everything above it is fragmented.&lt;/p&gt;

&lt;h2 id="four-gaps-practitioners-cant-close"&gt;Four gaps practitioners can’t close&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Inference cost is invisible at the decision layer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI inference cost is generated where routing decisions happen — model selection, retry logic, token budgets, context window management. Your observability monitors the infrastructure layer. These are different layers, and the gap between them is expensive.&lt;/p&gt;

&lt;p&gt;A typical pattern: a poorly optimized prompt costs more per day than the entire Kubernetes cluster running the application. One team discovered they were paying an LLM to be reminded of its job — sending the same system instructions hundreds of times daily. Reasoning models like o3 add internal “thinking tokens” that inflate consumption silently. Output tokens cost 3–10x more than input tokens.&lt;/p&gt;

&lt;p&gt;What looks like $500/month in a pilot becomes $15,000 at production scale. Before accounting for growth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. MCP traces break at the boundary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;97 million monthly SDK downloads. 5,800+ MCP servers in the ecosystem. And a fundamental tracing problem: when a user request flows from Agent → LLM Provider → MCP Server → External Tool, the trace breaks at the MCP boundary. Two disconnected traces. No correlation. No end-to-end visibility.&lt;/p&gt;

&lt;p&gt;Sentry shipped the first dedicated MCP monitoring tool in mid-2025 — after running their own MCP server at 50 million requests per month and discovering random user timeouts with no results and no errors. No way to even know how many users were affected.&lt;/p&gt;

&lt;p&gt;OpenTelemetry’s MCP semantic conventions remain in draft.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Silent semantic failures don’t trigger alerts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A single user request can trigger 15+ LLM calls across embedding generation, vector retrieval, context assembly, reasoning steps, and response synthesis. Every traditional metric can look healthy while the output is meaningless.&lt;/p&gt;

&lt;p&gt;44% of organizations still rely on manual methods to monitor AI agent interactions. The current state-of-the-art for detecting semantic failures in production is largely “a human reads logs and guesses.” Most teams discover problems through downstream business metrics — weeks after the damage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. SLOs don’t exist for non-deterministic systems&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the open question practitioners keep returning to. Traditional SRE practice assumes you can define expected behavior, measure deviation, and set error budgets. When the same input can legitimately produce different outputs, when “correct” requires semantic judgment, and when model providers silently update weights underneath you — the entire SLI/SLO framework needs rethinking.&lt;/p&gt;

&lt;p&gt;Nobody has solved this. The conversation is still at the “how do we even frame the problem” stage.&lt;/p&gt;

&lt;h2 id="the-cost-paradox"&gt;The cost paradox&lt;/h2&gt;

&lt;p&gt;Adding AI monitoring to Datadog increases observability bills by 40–200%. A typical RAG pipeline generates 10–50x more telemetry than an equivalent API call. LangSmith customers routinely sample down to 0.1% of production traffic to control costs.&lt;/p&gt;

&lt;p&gt;You end up paying significantly more to observe significantly less.&lt;/p&gt;

&lt;p&gt;Gartner predicts that more than 40% of agentic AI projects will be canceled by 2027. The Dynatrace 2026 Pulse of Agentic AI survey found that 51% of engineering leaders cite limited visibility into agent behavior as their top technical blocker.&lt;/p&gt;

&lt;h2 id="whats-actually-converging"&gt;What’s actually converging&lt;/h2&gt;

&lt;p&gt;OpenTelemetry is winning the instrumentation war. The GenAI SIG has defined semantic conventions for LLM spans, agent spans, tool execution, token metrics, and evaluation events. Every major vendor accepts OTel GenAI spans.&lt;/p&gt;

&lt;p&gt;That’s the one genuine convergence story. Everything above the wire format remains fragmented — comparable to cloud monitoring circa 2010–2012. Except OpenTelemetry’s existence may accelerate consolidation faster than it happened last time.&lt;/p&gt;

&lt;h2 id="the-practitioner-reality"&gt;The practitioner reality&lt;/h2&gt;

&lt;p&gt;This is the infrastructure monitoring crisis of 2010 all over again. The stakes are higher. The systems are non-deterministic. The failure modes are semantic rather than structural.&lt;/p&gt;

&lt;p&gt;If you’re an SRE or Platform Engineer who’s been handed responsibility for AI systems without the tools to properly operate them — that’s the actual state of the industry, not a gap in your skills or your team’s preparation.&lt;/p&gt;

&lt;p&gt;The tooling will converge. OpenTelemetry will help. The ecosystem is moving.&lt;/p&gt;

&lt;p&gt;But right now, in early 2026, most teams are flying partially blind — and the first step is naming the problem clearly enough to start solving it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Data points: Dynatrace 2026 Pulse of Agentic AI (919 leaders), KubeCon Atlanta 2025, OneUptime AI Observability Cost Analysis, Sentry MCP Server Monitoring launch, Gartner 2025–2027 predictions, Pydantic AI observability pricing analysis.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>sre</category>
      <category>observability</category>
      <category>aiagents</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Multi-Layer Cost Controls for Cloud Data Platforms</title>
      <dc:creator>Carlos Mario Mora Restrepo</dc:creator>
      <pubDate>Sat, 10 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/carlosmoradev/multi-layer-cost-controls-for-cloud-data-platforms-24ac</link>
      <guid>https://dev.to/carlosmoradev/multi-layer-cost-controls-for-cloud-data-platforms-24ac</guid>
      <description>&lt;p&gt;Managing costs in cloud data platforms is challenging, especially in sandbox environments where analysts experiment freely. A single misconfigured query can run for hours, consuming resources and exploding budgets.&lt;/p&gt;

&lt;p&gt;After experiencing several unexpected cost spikes in a Snowflake sandbox environment, I implemented a &lt;strong&gt;4-layer defense strategy&lt;/strong&gt; that reduced overages by 60% while maintaining analyst productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Sandbox environments are critical for data teams. Analysts need freedom to experiment, test queries, and explore data without the constraints of production. However, this freedom comes with risks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forgotten queries&lt;/strong&gt; : Analysts start a query, switch tasks, and forget to terminate it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inefficient SQL&lt;/strong&gt; : Experimentation means suboptimal queries that scan entire tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large compute&lt;/strong&gt; : “Let me just use XLARGE for this one query” becomes the default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No accountability&lt;/strong&gt; : Costs are aggregated, so individual users don’t see their impact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional approaches fail:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Budget alerts only&lt;/strong&gt; : By the time you get the alert, you’ve already spent the money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query timeouts&lt;/strong&gt; : Legitimate long-running analytics get killed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restrictive permissions&lt;/strong&gt; : Kills innovation and analyst productivity&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The 4-Layer Defense Strategy
&lt;/h2&gt;

&lt;p&gt;Instead of relying on a single mechanism, I implemented &lt;strong&gt;redundant layers&lt;/strong&gt; so that if one fails, others catch the issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Warehouse Configuration (Prevention)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Aggressive auto-suspend and right-sizing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="c1"&gt;-- Create warehouse with 60-second auto-suspend&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;WAREHOUSE&lt;/span&gt; &lt;span class="n"&gt;SANDBOX_WAREHOUSE&lt;/span&gt;
  &lt;span class="n"&gt;WAREHOUSE_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'SMALL'&lt;/span&gt;
  &lt;span class="n"&gt;AUTO_SUSPEND&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
  &lt;span class="n"&gt;AUTO_RESUME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;
  &lt;span class="n"&gt;INITIALLY_SUSPENDED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why 60 seconds?&lt;/strong&gt; Most analysts iterate on queries with 1-2 minute gaps. 60 seconds catches forgotten warehouses while allowing workflow continuity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Default to SMALL:&lt;/strong&gt; Unless there’s a documented need, sandbox warehouses start at SMALL. Analysts can scale up temporarily, but the default prevents “XLARGE for everything.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average warehouse utilization: 15 minutes/day per analyst (down from 2+ hours)&lt;/li&gt;
&lt;li&gt;70% reduction in idle warehouse costs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: Resource Monitors (Guardrails)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Per-user budget enforcement:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="c1"&gt;-- Create resource monitor for sandbox user&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;RESOURCE&lt;/span&gt; &lt;span class="n"&gt;MONITOR&lt;/span&gt; &lt;span class="n"&gt;USER_MONITOR&lt;/span&gt;
  &lt;span class="n"&gt;CREDIT_QUOTA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;MONTHLY_BUDGET&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="n"&gt;FREQUENCY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MONTHLY&lt;/span&gt;
  &lt;span class="n"&gt;START_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;IMMEDIATELY&lt;/span&gt;
  &lt;span class="n"&gt;TRIGGERS&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt; &lt;span class="n"&gt;PERCENT&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTIFY&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="n"&gt;PERCENT&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTIFY&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt; &lt;span class="n"&gt;PERCENT&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTIFY&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="n"&gt;PERCENT&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="n"&gt;SUSPEND_IMMEDIATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Assign to user's warehouse&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="n"&gt;WAREHOUSE&lt;/span&gt; &lt;span class="n"&gt;SANDBOX_WAREHOUSE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;RESOURCE_MONITOR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;USER_MONITOR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why per-user monitors?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Individual accountability (users see their own consumption)&lt;/li&gt;
&lt;li&gt;Graceful degradation (one user hitting limit doesn’t affect others)&lt;/li&gt;
&lt;li&gt;Data for user education (who needs query optimization training?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Progressive notifications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;75%: “You’re on track, no action needed”&lt;/li&gt;
&lt;li&gt;90%: “Slow down, review your queries”&lt;/li&gt;
&lt;li&gt;95%: “Critical - optimize or your warehouse suspends at 100%”&lt;/li&gt;
&lt;li&gt;100%: Immediate suspension (prevents overage)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero budget overages since implementation&lt;/li&gt;
&lt;li&gt;Users self-optimize before hitting 90% threshold&lt;/li&gt;
&lt;li&gt;Average spending maintained well within budget limits&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 3: Connection Pooling (Efficiency)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reuse connections to reduce cold-start costs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Python implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;snowflake.connector&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextmanager&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SnowflakeConnectionPool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Reuse connection if available, create new if needed&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_closed&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;snowflake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Connection failed, reset for next attempt
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SnowflakeConnectionPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each new connection incurs warehouse resume cost (if auto-suspended)&lt;/li&gt;
&lt;li&gt;Connection pooling achieves 80% reuse rate&lt;/li&gt;
&lt;li&gt;Warehouse stays “warm” during analyst work sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;80% connection reuse rate&lt;/li&gt;
&lt;li&gt;30% reduction in warehouse resume events&lt;/li&gt;
&lt;li&gt;Faster query execution (no cold-start delay)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 4: User Education (Culture)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Monthly cost transparency:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Send each analyst their personal cost report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Your Snowflake Usage - December 2025

Credits used: XX of YY (84%)
Queries executed: X,XXX
Most expensive query: X.X credits (view details)

Cost breakdown:
- Compute: XX credits
- Storage: X credits
- Data transfer: X credit

Top 3 expensive queries:
1. Full table scan on LARGE_TABLE (X.X credits)
   → Optimization: Add WHERE clause to filter data
2. Cartesian join (X.X credits)
   → Optimization: Add JOIN condition
3. Repeated aggregation (X.X credits)
   → Optimization: Materialize intermediate results

Tips for next month:
- Use LIMIT when exploring data
- Add filters before aggregations
- Check query profile before large runs

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;40% reduction in inefficient query patterns&lt;/li&gt;
&lt;li&gt;Users proactively optimize before hitting budget limits&lt;/li&gt;
&lt;li&gt;Cultural shift: “cost-aware” becomes default mindset&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Combined Impact
&lt;/h2&gt;

&lt;p&gt;The 4-layer strategy delivered:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;60% reduction&lt;/strong&gt; in unexpected sandbox overages&lt;/li&gt;
&lt;li&gt;Average spending per user maintained within budget&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero budget overruns&lt;/strong&gt; since implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;80% connection pooling&lt;/strong&gt; efficiency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;70% reduction&lt;/strong&gt; in idle warehouse costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;40% improvement&lt;/strong&gt; in query efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cultural metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users self-optimize at 90% threshold (before suspension)&lt;/li&gt;
&lt;li&gt;Proactive query profiling becomes standard practice&lt;/li&gt;
&lt;li&gt;Cost awareness embedded in daily workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Redundancy Matters
&lt;/h2&gt;

&lt;p&gt;Each layer catches different failure modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Scenario&lt;/th&gt;
&lt;th&gt;Layer That Catches It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analyst forgets to terminate warehouse&lt;/td&gt;
&lt;td&gt;Layer 1: Auto-suspend after 60 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inefficient query runs for hours&lt;/td&gt;
&lt;td&gt;Layer 2: Resource monitor suspends at 100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Many short queries throughout the day&lt;/td&gt;
&lt;td&gt;Layer 3: Connection pooling reduces resume costs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User habitually writes expensive queries&lt;/td&gt;
&lt;td&gt;Layer 4: Monthly report triggers education&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Healthcare compliance bonus:&lt;/strong&gt; In regulated environments, this approach provides audit trails showing cost governance without restricting legitimate data access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Checklist
&lt;/h2&gt;

&lt;p&gt;Want to implement this in your organization? Here’s the checklist:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 1: Infrastructure (Layers 1-2)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configure warehouse auto-suspend (aggressive timing)&lt;/li&gt;
&lt;li&gt;Set default warehouse size to SMALL&lt;/li&gt;
&lt;li&gt;Create per-user resource monitors with monthly quotas&lt;/li&gt;
&lt;li&gt;Set up progressive notification thresholds (75%, 90%, 95%, 100%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 2: Optimization (Layer 3)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement connection pooling in application code&lt;/li&gt;
&lt;li&gt;Measure connection reuse rate&lt;/li&gt;
&lt;li&gt;Monitor warehouse resume events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 3: Culture (Layer 4)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build monthly cost report automation&lt;/li&gt;
&lt;li&gt;Include query optimization recommendations&lt;/li&gt;
&lt;li&gt;Send first round of reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Week 4: Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dashboard for cost trends&lt;/li&gt;
&lt;li&gt;Alert on anomalies (user exceeding historical average)&lt;/li&gt;
&lt;li&gt;Quarterly review and adjustment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No single mechanism is enough&lt;/strong&gt; : Redundant layers provide resilience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make costs visible&lt;/strong&gt; : Users can’t optimize what they can’t see&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default to small&lt;/strong&gt; : Scaling up is easier than justifying scale-down&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive alerts work&lt;/strong&gt; : Users self-correct before hitting hard limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Culture beats controls&lt;/strong&gt; : Education changes behavior permanently&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Snowflake resource monitors&lt;/li&gt;
&lt;li&gt;Python connection pooling&lt;/li&gt;
&lt;li&gt;Automated reporting (SQL + pandas)&lt;/li&gt;
&lt;li&gt;TOML configuration management&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Want to discuss cost optimization strategies?&lt;/strong&gt; Connect with me on &lt;a href="https://linkedin.com/in/carlosmoradev" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/projects/snowflake-governance"&gt;Multi-Account Data Warehouse Governance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>finops</category>
      <category>costoptimization</category>
      <category>snowflake</category>
      <category>dataplatforms</category>
    </item>
  </channel>
</rss>
