DEV Community: Zenith AI Labs

[Boost]

Zenith AI Labs — Tue, 19 May 2026 21:38:26 +0000

Manveer Chawla

May 19

Can ClickHouse DELETE Data? A 2026 PR-by-PR Analysis

#database #sql #dataengineering #data

Comments

31 min read

[Boost]

Zenith AI Labs — Thu, 02 Apr 2026 21:43:36 +0000

How to build a secure WhatsApp AI assistant with Arcade and Claude Code (OpenClaw alternative) - DEV Community

I texted "prep me for my 2pm" on WhatsApp. Thirty seconds later, my phone buzzed back with a...

dev.to

[Boost]

Zenith AI Labs — Thu, 01 Jan 2026 19:54:19 +0000

Your AI SRE needs better observability, not bigger models.

Manveer Chawla ・ Jan 1

#sre #ai #devops #observability

[Boost]

Zenith AI Labs — Sat, 20 Dec 2025 23:54:46 +0000

Outgrowing Zapier, Make, and n8n for AI Agents: The Production Migration Blueprint

Manveer Chawla for Composio ・ Dec 20

#productivity #agents #tooling #ai

[Boost]

Zenith AI Labs — Sat, 20 Dec 2025 21:03:08 +0000

Enterprise AI Agent Management: Governance, Security & Control Guide (2026)

Manveer Chawla for Composio ・ Dec 20

#ai #governance #security #agents

A practical guide to observability TCO and cost reduction

Zenith AI Labs — Wed, 03 Dec 2025 18:04:03 +0000

For many engineering leaders, the observability bill has become one of the largest infrastructure expenses. OpenAI reportedly spends $170 million annually on Datadog alone. While most companies aren't operating at OpenAI's scale, teams consistently report that observability tools consume a significant portion of their total cloud spend, and the trend only goes in one direction: up.

The root cause? SaaS platforms charge per gigabyte ingested, per host monitored, or per high-cardinality metric tracked. The more visibility you need, the more you pay. You're stuck choosing between understanding your systems and staying within budget. This model prevents you from being able to “send everything.”

You can break this cycle by changing how you pay for observability. Instead of variable costs tied to data volume, you can move to predictable infrastructure costs. This guide shows you exactly how to calculate and reduce your observability Total Cost of Ownership (TCO) using a unified architecture powered by ClickHouse and the open-source ClickStack.

Key takeaways

Observability costs are driven by misaligned models: The primary problem is punitive SaaS pricing based on data ingestion or per-host metrics, forcing a choice between visibility and budget.
Incumbent architectures are inefficient: Traditional tools built on search indexes are ill-suited for observability workloads. They suffer from massive storage overhead and fail at high-cardinality analytics, causing costs to explode.
Columnar architecture is the solution: Shifting to a columnar database like ClickHouse is the single biggest cost-reduction lever. It provides superior compression (15-50x) and excels at high-cardinality queries that cripple other systems.
A true TCO must include "people costs": A self-hosted stack is not free. The "People TCO" for engineering maintenance and on-call duties can add $1,600-$4,800 per month, often making a managed service like ClickHouse Cloud more cost-effective, especially for bursty workloads.
A unified stack (ClickStack) eliminates silos: Adopting a unified architecture like the open-source ClickStack consolidates logs, metrics, and traces, eliminating data duplication and the high TCO of managing multiple, federated systems.
Significant savings are achievable: Industry leaders like Anthropic, Didi (30% cost cut, 4x faster), and Tesla (1 quadrillion rows ingested) have used this approach to achieve substantial savings.

Why your observability bill is exploding (and it's not your fault)

The explosion in observability costs comes down to architectural failure, not budget failure. Two core problems drive these costs: inefficient technology and misaligned pricing models.

Many traditional observability platforms rely on search indexes like Lucene. While these work well for text search, they're fundamentally mismatched for the aggregation-heavy analytical workloads that modern observability demands.

This mismatch creates two major cost drivers:

Massive storage and operational overhead: The Lucene inverted index creates huge storage overhead, often multiplying data size, and compresses data poorly. A team ingesting 100TB daily could face storage costs exceeding $100,000 per month. Worse, this architecture is fragile at scale. A single node failure can trigger a massive data rebalancing process that throttles the cluster and can take days to recover, severely impacting stability.
The high-cardinality crisis: Modern distributed systems generate telemetry rich with unique dimensions (user_id, session_id, pod_name). Systems like Prometheus struggle because every unique combination of labels creates a new time series, leading to an explosion in memory usage and slow queries. Index-based systems crumble under this load. Query times balloon, memory errors appear, and clusters become unstable.

Beyond architecture, misaligned pricing models penalize scale. SaaS vendors often charge a "tax" on visibility: you pay for ingestion, but to keep that data indexed and searchable requires a separate, expensive retention SKU. Furthermore, pricing models based on "per-host" or "per-container" counts punish modern microservices architectures, where infrastructure is ephemeral and highly distributed.

The fix requires two changes: switch to a columnar database like ClickHouse that compresses data properly and handles analytics efficiently, then separate storage from compute using cheap object storage like S3 as your primary data tier. This approach tackles both cost problems head-on.

Columnar storage groups similar data types together, enabling specialized compression codecs that achieve remarkable compression ratios. ClickHouse's internal observability platform compresses 100 PB of raw data down to just 5.6 PB. This level of efficiency contributes to significant cost savings, with our internal use case proving to be up to 200x cheaper than a leading SaaS vendor.

ClickHouse was built specifically for fast analytical queries scanning select columns across billions of rows. It handles high-cardinality aggregations that would bring other systems to their knees. Tesla's platform demonstrates this power, ingesting over one quadrillion rows with flat CPU consumption, solving the high-cardinality problem that cripples other metrics systems.

How to calculate your observability TCO: a practical framework

To make good financial decisions, you need a comprehensive Total Cost of Ownership (TCO) model. A proper TCO analysis includes all direct and indirect costs, especially engineering time that often gets overlooked.

This framework compares three primary architectural models:

SaaS platforms (e.g., Datadog, Splunk): All-in-one vendor solutions priced on data ingestion, hosts, or users.
Federated OSS (e.g., "LGTM" Stack): Self-managed stacks using separate open-source tools (Loki for logs, Mimir for metrics, Tempo for traces).
Unified OSS database (e.g., ClickHouse): Self-managed or cloud-hosted stacks built on a single, high-performance database for all telemetry.

Use this table as your TCO calculation template:

Cost category	Variable / calculation method	Key considerations by model (SaaS, Federated, Unified)
Licensing and service fees	($/GB Ingested) + ($/Host) + ($/User) + (Add-on Features)	SaaS: This is the primary cost. It is highly variable and scales directly with data volume and system complexity. Federated/Unified (OSS): $0 for open-source licenses. Unified (Cloud): A predictable service fee that bundles compute, storage, and support.
Infrastructure - compute	Instance Cost/hr × Hours/mo × # Nodes	SaaS: Bundled into the service fee. Federated OSS: very high. Requires provisioning and managing separate compute clusters for logs, metrics, and traces. Unified database: medium. A single cluster handles all data types. Cloud models can scale compute to zero.
Infrastructure - storage	(Price/GB-mo × Hot Data) + (Price/GB-mo × Cold Data)	SaaS: Bundled, but often with high markups and expensive "rehydration" fees to query older data. Federated OSS: medium. Data and metadata (e.g., labels) are often duplicated across three different systems. Unified database: low. A single store with high-compression (15-20x) and native tiering to object storage minimizes this cost.
Operational - personnel	SRE Hourly Rate × Hours/mo (for Maintenance, Upgrades, On-call)	SaaS: minimal. Covered by the vendor's service fee. Federated OSS: very high. Requires 24/7 on-call expertise for 3+ complex distributed systems. Unified database: high (for self-hosted) or minimal (for a managed cloud service).
Migration and training	(Engineer Hours × Rate) to rebuild assets & train staff	SaaS: High lock-in. Migrating off requires rebuilding all assets. Federated OSS: High. Team must learn and use 3+ different query languages and datastores (LogQL, PromQL, TraceQL), each with their own scaling properties and considerations. Unified database: Medium. Team learns one powerful, standard language (SQL).

The 'People TCO': a deeper look at personnel and training costs

The "Operational - personnel" and "Migration and training" line items deserve special attention. While they're easy to list, these "People TCO" categories often become the most significant and unpredictable factors in your entire cost model.

The operational personnel cost varies dramatically by architecture. SaaS platforms require minimal personnel investment. A unified database demands dedicated database and systems engineering expertise. But federated OSS stacks often cost the most, requiring 24/7 on-call coverage for three or more separate distributed systems.

Migration and training present two distinct challenges:

Asset migration: For established organizations, this is a large engineering project. You'll need to recreate hundreds or thousands of dashboards, alerts, and service integrations.
Cultural and educational shift: The ongoing training burden varies significantly. Federated OSS stacks force teams to learn multiple domain-specific query languages (LogQL, PromQL, TraceQL). A unified SQL approach consolidates training onto a single standard, though teams still need to transition from proprietary tools like Splunk's SPL.

Budget for this transition using:

Training Cost = (Number of Engineers x Avg. Training Hours x Loaded Engineer Hourly Rate) + (Productivity Dip %)

While this represents a real short-term cost, the long-term benefits are substantial. Engineers can grow from dashboard operators into data analysts. They gain the ability to run deep, ad-hoc SQL analyses and correlate observability telemetry directly with production business data. This level of insight remains impractical, if not outright impossible with proprietary, siloed tools.

But SQL empowerment brings its own challenge: incident response. When production is on fire, SREs don't have time to craft complex SQL joins. They need answers fast. That's why a database alone won't cut it. ClickStack includes HyperDX, which puts a familiar Datadog-style interface on top of the SQL engine. You get Lucene-style querying for quick debugging during incidents, plus full SQL access when you need to dig deeper for root cause analysis. Your on-call team stays productive while your senior engineers retain full analytical power.

Architectural trade-off: unified SQL vs. federated OSS stacks

Once you select the OSS deployment model for observability, you face a more fundamental decision: why choose a unified SQL database over popular alternatives like the federated "LGTM" stack?

The federated model follows a "divide and conquer" principle. Loki optimizes for index-free log ingestion and label-based querying. Mimir provides horizontally scalable Prometheus-compatible metric storage. Tempo handles high-volume trace ingestion and lookup by ID. Each component scales independently and excels at its specific task.

But this specialization creates significant long-term TCO and usability challenges. You're deploying and securing three or more complex distributed systems because the individual tools have no native support for other signals. For example, Prometheus cannot store logs or traces, and Loki cannot process metrics. Data remains siloed. This forces a choice between label-driven search (Loki) or fast aggregations (Mimir), but you cannot get both in one system. While Grafana provides correlation, it forces engineers into a rigid, opinionated workflow (e.g., metrics-to-traces-to-logs) and fails when exploratory analysis is needed. This model has two critical flaws:

It fails on high-cardinality data and encourages pre-aggregation. Prometheus's data model struggles with high-cardinality dimensions, leading to performance issues and cost explosion. The common workaround is pre-aggregation, which destroys data fidelity and prevents true root cause analysis because the raw, detailed data is lost before it's even stored.
Loki's design blocks exploratory analysis. It's fast for lookups by indexed labels (like a trace ID) but cannot be used for the 'search-style' discovery that SREs rely on in a crisis.

The unified SQL model treats observability as a single analytical data problem. All telemetry flows into one database, one system. This brings several key advantages:

True correlation: Engineers use standard SQL to JOIN across all three signals and correlate with business data to find root causes.
Operational simplicity: Manage a single, scalable data store instead of three.
No data duplication: Labels and metadata get stored once, not triplicated across separate databases.
A single query language: SQL provides a powerful, universal standard most engineers already know.

Furthermore, a unified database is the only architecture that enables true "observability science." This is the ability to JOIN observability data with business data (e.g., user signups, revenue tables) for deeper, more impactful root cause analysis. This strategic differentiator, highlighted by customers like Sierra, connects system performance with business outcomes in a way that siloed tools cannot.

While federated stacks offer specialized tools, the unified SQL approach delivers more power, better economics, and simpler operations by solving signal correlation at the database level.

Designing a cost-effective observability architecture

ClickHouse enables you to abandon the siloed "three pillars" model for a unified architecture where logs, metrics, and traces live in a single database. This eliminates data duplication and unlocks powerful cross-signal analysis through standard SQL.

This unified model also handles high-performance text search for logs, using features like inverted indices (currently in beta) and bloom filters, allowing it to replace both analytical (Prometheus) and search (ELK) backends in a single system.

You can deploy this architecture three ways, each with distinct TCO implications:

Self-hosted ClickHouse: Deploy and manage open-source ClickHouse on your infrastructure. Maximum control comes with the highest operational burden. Collect data with agents like OpenTelemetry Collector and visualize with tools like Grafana.
ClickStack (Open-source bundle): An end-to-end open-source stack bundling OpenTelemetry Collector, ClickHouse, and HyperDX UI for a cohesive, ready-to-deploy experience.
ClickHouse Cloud with ClickStack: The official managed service provides production-ready ClickHouse with zero infrastructure management. Near-zero operational overhead and elastic scaling let engineering teams focus on their core product. This option includes a bundled ClickStack experience, with HyperDX integrated into the Cloud console.

Choose based on your team's expertise, budget, and strategic priorities:

Feature	Self-hosted ClickHouse	ClickStack (Self-hosted)	ClickHouse Cloud with ClickStack
Typical cost model	Fixed infrastructure cost (CapEx/OpEx)	Fixed infrastructure cost (CapEx/OpEx)	Usage-based (compute, storage, egress)
Operational hours/week	High (5-15+ hours)	Medium (2-5 hours)	Very Low (<1 hour)
High availability (HA)	Manual setup (replication, keepers)	Manual setup (replication, keepers)	Built-in (2+ availability zones)
Security and compliance	User-managed	User-managed	Managed (SAML, HIPAA/PCI options)
Best for	Large teams with deep infra expertise and strict data residency needs.	Teams wanting a unified OSS experience without building from scratch.	Teams that want to focus on their product instead of infrastructure, especially those with variable workloads.

Self-managed vs. ClickHouse Cloud: a break-even analysis

The self-hosting versus managed service decision often reduces to simple math: when does paying engineers to manage the database exceed the managed service cost?

For small to medium workloads, managed services are often the clear winner. Just 5 hours of engineering time per month, at $150/hour, adds $750/month to your TCO. This "soft cost" often exceeds the entire ClickHouse Cloud fee, which handles all maintenance, upgrades, and on-call duties.

Bursty or unpredictable workloads make the case even clearer. Self-hosted clusters must handle peak load, leaving you paying for idle resources during quiet periods. This wastes 40-60% of your compute budget. ClickHouse Cloud's architecture, which separates compute from storage (using object storage), enables automatic scaling (including scale-to-zero) and converts that waste into savings. Users can also take advantage of compute-compute separation to isolate read and write workloads. This allows a lightweight, continuous compute layer for ingestion, while scaling read capacity dynamically based on demand - consuming only the resources required at any given time and keeping costs low. Combined with high compression and object storage, long-term data retention becomes exceptionally cost-efficient, approaching cloud storage provider pricing per terabyte.

Self-hosting becomes cost-effective only at very large scale with predictable workloads, and only if you already have mature SRE teams with deep distributed database expertise.

Frequently asked questions (faq)

How do I calculate the tco of my observability stack?

To calculate your Total Cost of Ownership (TCO), you must account for all direct and indirect costs. This includes infrastructure (compute, storage, network egress), operational costs (SRE/personnel time for setup, maintenance, and on-call duties), and any software licensing or support fees. The article provides a detailed TCO framework to compare a self-hosted solution against a managed one like ClickHouse Cloud.

What are the main factors that contribute to high and unpredictable observability costs?

High costs are driven by two main factors: inefficient technology and pricing models that penalize high data volume. Traditional platforms built on search indexes have massive storage overhead and struggle with high-cardinality data (user_id, pod_name). This leads to slow queries and instability. Unpredictable costs stem from ingest-based pricing (per-GB) and per-host fees, which cause your bill to spike with any increase in data volume or system scaling.

How does ClickHouse help reduce observability costs?

ClickHouse reduces costs at an architectural level. Its columnar storage enables highly efficient data compression (often 15-20x), drastically cutting storage needs. It is purpose-built for fast analytical queries on high-cardinality data, which eliminates performance bottlenecks common in other systems. This allows you to store more data for longer at a fraction of the cost and pay for predictable infrastructure rather than volatile data volume.

What is the difference between self-hosting ClickHouse and using ClickHouse Cloud?

Self-hosting ClickHouse gives you maximum control over your infrastructure but requires significant engineering time for setup, maintenance, scaling, and 24/7 on-call support. ClickHouse Cloud is a managed service that eliminates this operational overhead, provides elastic scaling, and includes enterprise-grade features like advanced security and support SLAs. ClickHouse Cloud is often more cost-effective for small-to-medium or bursty workloads once engineering time is factored into the TCO.

At what scale does a ClickHouse-based solution become cheaper than Datadog?

The break-even point typically occurs when a company's Datadog bill for data ingestion and indexing exceeds the cost of hiring 1-2 full-time engineers to manage an observability stack. For organizations ingesting more than a few terabytes of data per day, the savings from switching to a ClickHouse architecture (either self-hosted or cloud) become substantial, often exceeding 90%. For instance, ClickHouse's internal observability use case proved to be 200x cheaper than Datadog, demonstrating what is possible at a very large scale.

Can I use OpenTelemetry with ClickHouse for a cost-effective solution?

Yes. Pairing the open-source OpenTelemetry (OTel) Collector with ClickHouse is a highly effective and recommended architecture. OTel standardizes data collection for logs, metrics, and traces, preventing vendor lock-in. You can use the OTel Collector to intelligently filter, sample, and transform data at the edge before sending it to ClickHouse, further reducing storage and query costs.

What are the "hidden costs" of a self-hosted ClickHouse observability stack?

The primary hidden cost of self-hosting is engineering time. This includes the initial setup (2-4 weeks), ongoing maintenance, software upgrades, performance tuning, and 24/7 on-call responsibility. These operational duties can consume 10-20% of a full-time engineer's time, adding thousands of dollars per month to your TCO.

Is ClickHouse a direct replacement for Datadog?

ClickHouse is a high-performance database, not a full SaaS platform. It replaces the expensive backend of tools like Datadog. Create a complete solution by pairing ClickHouse with open-source tools for collection (OpenTelemetry) and visualization (Grafana or HyperDX). The open-source ClickStack bundles these components for an experience closer to all-in-one platforms.

[Boost]

Zenith AI Labs — Mon, 01 Dec 2025 19:51:18 +0000

The 2025 AI Agent Report: Why AI Agents Fail in Production and the 2026 Integration Roadmap

Manveer Chawla for Composio ・ Dec 1

#ai #agents #rag #mcp

Why Your Snowflake Bill is High and How to Fix It with a Hybrid Approach

Zenith AI Labs — Sat, 15 Nov 2025 19:45:00 +0000

Your Snowflake bill is high primarily because of its compute billing model, which enforces a 60-second minimum charge each time a warehouse resumes. This creates a significant "idle tax" on the frequent, short-running queries common in BI dashboards and ad-hoc analysis. You're often paying for compute you don't actually use.

A surprisingly high bill for a modest amount of data is frustrating. We see it all the time. The immediate question is, "Why is my bill so high when my data isn't that big?" The cost isn't driven by data at rest, it's driven by data in motion, specifically by compute patterns. For many modern analytical workflows, the bill inflates from thousands of frequent queries accumulating disproportionately high compute charges.

If you don't address this, you'll face budget overruns, throttled innovation, or pressure to undertake a costly and risky platform migration. The solution isn't always abandoning a powerful platform like Snowflake. You can augment it intelligently instead.

This guide provides a practical playbook for understanding the root causes of high Snowflake costs and a strategy for reducing them using internal optimizations and a modern hybrid architecture.

The Real Reason Your Snowflake Bill is So High

To control costs effectively, you need to diagnose the problem first. The primary driver of inflated Snowflake bills for bursty, interactive workloads is the platform's billing model for compute. It creates a significant hidden idle tax.

Snowflake bills for compute per-second, but only after a 60-second minimum is met each time a virtual warehouse resumes from a suspended state. A query that takes only five seconds to execute gets billed for a full minute of compute time. In this common scenario, you're paying for 55 seconds (over 91%) of compute resources that sit idle.

Here's what this looks like on a timeline. For a 5-second query, the billed duration on Snowflake versus a usage-based platform like MotherDuck is stark.

Snowflake (X-Small Warehouse):

MotherDuck (Pulse Compute):

When a BI dashboard executes 20 quick queries upon loading, each taking three seconds, this single page view could trigger 1,200 seconds (20 minutes) of billed compute time. The actual work took only one minute.

This problem gets worse with warehouse sizing. Each incremental size increase in a Snowflake warehouse doubles its credit consumption rate. We often see teams defaulting to 'Medium' or 'Large' warehouses for all tasks. That creates a 4x to 8x cost premium for workloads that could easily run on an 'X-Small' warehouse.

This combination of minimum billing increments and oversized compute creates exponential cost leak. Serverless features like Automatic Clustering and Materialized Views consume credits in the background too, contributing to credit creep that's difficult to trace without diligent monitoring.

Warehouse Size	Credits per Hour	Relative Cost
X-Small	1	1x
Small	2	2x
Medium	4	4x
Large	8	8x
X-Large	16	16x

First Aid: A Playbook to Immediately Optimize Snowflake

Before considering architectural changes, you can achieve significant savings by optimizing your existing Snowflake environment. These internal fixes are your first line of defense against cost overruns. They can often reduce spend by 20-40%.

1. Master Warehouse Management (Set AUTO_SUSPEND to 60s)

Set aggressive yet intelligent warehouse timeouts. For most workloads, set the AUTO_SUSPEND parameter to exactly 60 seconds. This ensures the warehouse suspends after one minute of inactivity, stopping credit consumption. Setting it lower than 60 seconds is counterproductive. A new query arriving within that first minute could trigger a second 60-second minimum charge.

Right-size warehouses by defaulting to smaller configurations. Use 'X-Small' warehouses by default and only scale up when a specific workload fails to meet its performance SLA. Consolidate workloads onto fewer, appropriately sized warehouses to prevent warehouse sprawl. Multiple underutilized compute clusters add up on your bill.

We helped one analytics team save approximately $38,000 annually by moving its BI queries from a Medium to a Small warehouse. They accepted a marginal 4-second increase in query time.

2. Leverage Snowflake's Caching Layers (Result & Warehouse)

Snowflake's multi-layered cache is one of its most powerful cost-saving features. Not using it leaves money on the table.

Result Cache: If you run the exact same query as one run previously (by anyone in the account) and the underlying data hasn't changed, Snowflake returns the results instantly from a global result cache. No warehouse starts. That's free compute. It's especially effective for BI dashboards where multiple users view the same default state.

Warehouse Cache (Local Disk Cache): When a query runs, the required data from storage gets cached on the SSDs of the active virtual warehouse. Subsequent queries that need the same data read it from this much faster local cache instead of remote storage. This dramatically speeds up queries and reduces I/O. Keeping a warehouse warm for related analytical queries can be beneficial.

Design workloads to maximize cache hits through consistent query patterns.

3. Optimize Inefficient Queries (Prune Partitions & Avoid SELECT *)

Poorly written queries burn credits unnecessarily. While comprehensive query tuning is a deep topic, these practices provide immediate savings:

Avoid SELECT *: Select only the columns you need. This reduces the amount of data processed and moved, improving caching and query performance.

Filter Early and Prune Partitions: Apply WHERE clauses that filter on a table's clustering key as early as possible. This lets Snowflake prune massive amounts of data from being scanned. It's the single most effective way to speed up queries on large tables.

Use QUALIFY for Complex Window Function Filtering: Instead of using a subquery or CTE to filter window function results, use the QUALIFY clause. It's more readable and often more performant.

4. Implement Cost Guardrails with Resource Monitors

Implement resource monitors as a critical safety net. Resource monitors track credit consumption and trigger actions like sending notifications or automatically suspending compute when usage hits predefined thresholds. They're the most effective tool for preventing budget overruns from runaway queries or misconfigured pipelines.

-- Create a monitor that notifies at 75% and suspends at 100%
CREATE OR REPLACE RESOURCE MONITOR monthly_etl_monitor
WITH CREDIT_QUOTA = 5000
TRIGGERS ON 75 PERCENT DO NOTIFY
        ON 100 PERCENT DO SUSPEND;
-- Assign the monitor to a warehouse
ALTER WAREHOUSE etl_heavy_wh SET RESOURCE_MONITOR = monthly_etl_monitor;

Actively monitor serverless feature costs too. Query the serverless_task_history view to track credits consumed by Automatic Clustering, Search Optimization, and other background tasks. This helps you understand your hidden costs and tune these features appropriately.

These internal fixes can significantly lower your Snowflake bill. To eliminate entire categories of spend, particularly from non-production workloads, you need a different approach to compute location.

Go Local: Slashing Dev & Test Costs with DuckDB

A substantial portion of cloud data warehouse spend gets consumed by non-production workloads. Every dbt run, data validation script, and ad-hoc analysis performed by engineers during development consumes expensive cloud compute credits. By adopting a local-first development workflow, you can shift this entire category of work off the cloud and reduce these costs to zero.

DuckDB makes this shift possible. It's a fast, in-process analytical database designed to run complex SQL queries directly on your laptop or within a CI/CD runner. DuckDB queries data files like Parquet and CSV directly. You don't need to load data into a separate database for local development. Engineers can build, test, and iterate on data models and pipelines locally before incurring any cloud costs.

This workflow saves money and dramatically improves developer velocity. You shorten the feedback loop from minutes (waiting for a cloud warehouse to provision and run) to seconds.

A typical local development pattern in Python is straightforward. You can prototype rapidly without any cloud interaction.

import duckdb
import pandas as pd

# Analyze a local Parquet file instantly
# No cloud warehouse, no compute credits consumed
df = duckdb.sql("""
    SELECT
        product_category,
        COUNT(DISTINCT order_id) as total_orders,
        AVG(order_value) as average_value
    FROM 'local_ecommerce_data.parquet'
    WHERE order_date >= '2024-01-01'
    GROUP BY ALL
    ORDER BY total_orders DESC;
""").df()
print(df)

Running analytics locally is powerful for development. For sharing insights and powering production dashboards, this local-first approach extends into a hybrid architecture.

The Hybrid Solution: MotherDuck for Cost-Effective Interactive Analytics

MotherDuck is a serverless data warehouse built on DuckDB. It provides a simpler, more cost-effective solution for workloads that are inefficient on traditional cloud data warehouses. It directly solves the idle tax problem by replacing the provisioned warehouse model with per-query, usage-based compute that bills in one-second increments.

This billing model profoundly impacts the cost of interactive analytics. Let's quantify the savings with a realistic scenario.

Breaking Down the Costs: A Tale of Two Queries

Consider a common BI dashboard used by an operations team. It refreshes every 5 minutes during an 8-hour workday to provide timely updates. Each refresh executes 10 small queries to populate various charts. Each query takes 4 seconds to run.

Workload Parameters:

Queries per refresh: 10
Execution time per query: 4 seconds
Refresh frequency: Every 5 minutes (12 refreshes per hour)
Operational hours: 8 hours/day, 22 days/month

Snowflake Cost Calculation (X-Small Warehouse):

Because of the high frequency, the team can't let the warehouse suspend between refreshes without incurring repeated 60-second minimums. Their most cost-effective option is running an X-Small warehouse continuously during the workday.

Total active hours per month: 8 hours/day * 22 days/month = 176 hours
Credits consumed per hour (X-Small): 1
Total credits per month: 176 hours * 1 credit/hour = 176 credits
Estimated Monthly Cost (@ $3.00/credit): 176 credits * $3.00/credit = **$528**

This assumes perfect management. A more common scenario where the warehouse runs 24/7 would cost $2,160 (720 hours * 1 credit/hr * $3.00/credit).

MotherDuck Cost Calculation (Pulse Compute):

MotherDuck bills only for the actual compute time used by queries.

Total queries per month: 10 queries/refresh * 12 refreshes/hr * 8 hrs/day * 22 days/month = 21,120 queries
Total execution time per month: 21,120 queries * 4 seconds/query = 84,480 seconds
Total execution hours: 84,480 seconds / 3,600 s/hr = 23.47 hours
Estimated Monthly Cost (@ $0.25/CU-hour, assuming 1 CU): 23.47 CU-hours * $0.25/CU-hour = **$5.87**

Even assuming a more complex query consuming 4 Compute Units, the cost would only be $23.48. This example shows how a usage-based model eliminates waste for bursty workloads, reducing costs by over 95% in this scenario.

This calculation focuses on compute cost, the primary driver. While negligible for this interactive pattern, a full TCO analysis would include data storage and egress, where MotherDuck also offers competitive pricing.

MotherDuck's architecture introduces "dual execution." Its query planner intelligently splits work between the local DuckDB client and the MotherDuck cloud service. This minimizes data transfer and latency by performing filters and aggregations locally before sending smaller, pre-processed datasets to the cloud. This hybrid model works ideal for interactive analytics, BI dashboards, and ad-hoc queries on sub-terabyte hot data.

-- Connect to MotherDuck from any DuckDB-compatible client
[ATTACH 'md:';](https://motherduck.com/docs/getting-started/)

-- This query joins a large cloud table with a small local file.
-- The filter on the local file is pushed down, so only matching
-- user_ids are ever requested from the cloud, minimizing data transfer.
SELECT
  cloud_events.event_name,
  cloud_events.event_timestamp,
  local_users.user_department
FROM my_db.main.cloud_events
JOIN read_csv_auto('local_user_enrichment.csv') AS local_users
  ON cloud_events.user_id = local_users.user_id
WHERE local_users.is_priority_user = TRUE;

Proven in Production: Real-World Case Studies of Significant Cost Savings

The savings from this new architecture aren't just theoretical. Companies are already using this model to achieve significant results.

Case Study: Definite Slashes Costs by 70%
The SaaS company Definite migrated its entire data warehouse from Snowflake to a DuckDB-based solution. The results were quick and significant, achieving an over 70% reduction in their data warehousing expenses. In their detailed write-up, the engineering team noted that even after accounting for the migration effort, the savings freed up a significant portion of their budget for core product development.

Case Study: Okta Eliminates a $60,000 Monthly Snowflake Bill
Okta's security engineering team needed to process trillions of log records for threat detection, with data volumes spiking daily. Their Snowflake solution was costing approximately $2,000 per day ($60,000 monthly). By building a clever system that used thousands of small DuckDB instances running in parallel on serverless functions, they significantly reduced their processing costs. This case shows that even at a large scale, the DuckDB ecosystem can be much cheaper than traditional cloud warehouses.

Case Study: A 79% BI Spend Reduction with a Simple Caching Layer
A data engineering team shared their story of implementing a smart caching layer for their BI tool. Instead of having every dashboard query hit Snowflake directly, they routed smaller, frequent queries to a DuckDB instance that served cached results. Large, complex queries were still sent to Snowflake. The impact was a 79% immediate reduction in their Snowflake BI spend, and average query times sped up by 7x. This highlights the power of a hybrid "best tool for the job" approach.

A Framework for Workload Triage

Understanding the tool landscape is one thing. Systematically deciding which of your workloads belong where requires a data-driven approach. By analyzing query history, you can classify every workload and route it to the most efficient engine.

The two most important axes for classification are Execution Time and Query Frequency. Consider a third axis too: data freshness requirements. A dashboard needing near real-time data has different constraints than one running on a nightly batch refresh.

A simple 2x2 matrix provides a clear framework for triage:

Low Execution Time, High Frequency: Short, bursty queries that run often.
Low Execution Time, Low Frequency: Quick, sporadic, ad-hoc queries.
High Execution Time, Low Frequency: Long-running, scheduled batch jobs.
High Execution Time, High Frequency: Often an anti-pattern indicating a need for data modeling or architectural redesign. It can occur in complex, near-real-time operational analytics.

You can analyze Snowflake's query_history using SQL to categorize your workloads. This query provides a starting point. We use MEDIAN instead of AVG for execution time because it's more robust to outliers and gives a better sense of typical query duration.

-- Analyze query patterns over the last 30 days
WITH query_stats AS (
    SELECT
        warehouse_name,
        user_name,
        query_id,
        execution_time / 1000 AS execution_seconds
    FROM
        snowflake.account_usage.query_history
    WHERE
        start_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
        AND warehouse_name IS NOT NULL
        AND execution_status = 'SUCCESS'
)
SELECT
    warehouse_name,
    user_name,
    COUNT(query_id) AS query_count,
    MEDIAN(execution_seconds) AS median_execution_seconds, -- More robust than AVG
    CASE
        WHEN query_count > 1000 AND median_execution_seconds < 30 THEN 'Interactive BI / High Frequency'
        WHEN query_count <= 1000 AND median_execution_seconds < 60 THEN 'Ad-Hoc Exploration'
        WHEN median_execution_seconds >= 300 THEN 'Batch ETL / Heavy Analytics'
        ELSE 'General Purpose'
    END AS workload_category
FROM
    query_stats
GROUP BY
    warehouse_name, user_name
ORDER BY
    query_count DESC;

Once categorized, map these workloads to the optimal tool:

Interactive BI / High Frequency (Short & Bursty): Prime candidates for migration to MotherDuck. The per-second, usage-based billing model eliminates the idle tax, offering dramatic cost savings for dashboards and embedded analytics.
Ad-Hoc Exploration (Short & Sporadic): This category fits well with MotherDuck or local DuckDB. For queries on smaller datasets or local files, DuckDB provides instant, free execution. For shared datasets, MotherDuck offers a cost-effective cloud backend.
Batch ETL / Heavy Analytics (Long & Scheduled): These large, resource-intensive jobs often work best on Snowflake. Its provisioned warehouses provide predictable performance for multi-terabyte transformations. Its mature ecosystem simplifies complex data pipelines.
Development & CI/CD: Move all non-production workloads to local DuckDB, regardless of their characteristics. This completely eliminates cloud compute costs during development and testing.

When the Hybrid Approach Isn't the Right Fit: Sticking with Snowflake

To build an effective architecture, you need to know a tool's limitations. The hybrid approach isn't a universal solution. Certain workloads are best suited for a mature, large-scale data warehouse like Snowflake. Acknowledging this builds trust and leads to better technical decisions.

Massive Batch ETL/ELT: For scheduled jobs processing many terabytes of data, Snowflake's provisioned compute model provides predictable power and performance. The 60-second minimum doesn't matter for jobs that run for hours.

Enterprise-Grade Governance and Security: Organizations with complex data masking requirements, deep Active Directory integrations, or strict regional data residency rules often rely on Snowflake's mature and comprehensive features.

Highly Optimized, Long-Running Workloads: If you have a workload that already runs consistently on a warehouse and maximizes its uptime (like a data science cluster running for 8 hours straight), the idle tax isn't a problem. There's little cost benefit to moving it.

The goal of a hybrid architecture is using the right tool for the right job, not replacing a tool that's already performing efficiently.

The Modern Alternatives Landscape: Where Does MotherDuck Fit?

While the Snowflake-and-MotherDuck hybrid model effectively addresses many common workloads, the broader data platform market offers other specialized solutions. Understanding where they fit provides a complete picture for architectural decisions.

Data lake query engines like Starburst and Dremio are powerful for organizations wanting to query data directly in object storage like S3. They offer flexibility but often come with significant operational overhead.

For use cases demanding sub-second latency at very high concurrency (like real-time observability), specialized engines like ClickHouse often provide superior price-performance.

Within classic cloud data warehouses, Google BigQuery presents a different pricing model. Its on-demand, per-terabyte-scanned pricing can be cost-effective for sporadic forensic queries. But it carries the risk of a runaway query where a single mistake leads to a massive bill.

MotherDuck carves a unique niche. It combines the serverless simplicity of BigQuery with the efficiency of a local-first workflow powered by DuckDB. This makes it highly cost-effective and productive for teams focused on speed, iteration, and interactive analytics. You don't get the cost penalty of a traditional warehouse or the operational complexity of a data lake.

Workload Type	Recommended Primary Tool	Rationale
Local Dev/Testing	DuckDB	Eliminates cloud compute cost for non-production work.
Interactive Dashboards (<5TB)	MotherDuck	Per-second billing avoids idle tax on bursty query patterns.
Large Batch ETL (>10TB)	Snowflake	Predictable performance and mature features for heavy jobs.
Real-Time Observability (High QPS)	ClickHouse	Optimized architecture for sub-second latency at high concurrency.
Sporadic Forensic Queries	BigQuery (On-Demand) / MotherDuck	Pay-per-use model is efficient for unpredictable, infrequent queries.

Conclusion and Path Forward

The path to a more efficient and cost-effective analytics stack doesn't require abandoning existing investments. You augment them intelligently. By adopting a three-tiered strategy, organizations gain control over their cloud data warehouse spending while empowering teams with better tools.

The strategy is simple:

Tune: Implement Snowflake-native optimizations like 60-second auto-suspend timers, right-sized warehouses, and resource monitors to immediately reduce waste.
Go Local: Shift all development and testing workloads to a local-first workflow with DuckDB. This eliminates an entire category of cloud compute spend.
Go Hybrid: Use the workload triage framework to identify bursty, interactive workloads. Offload them to MotherDuck, replacing the idle tax with fair, usage-based billing.

This hybrid architecture uses each platform's strengths. Snowflake handles massive, scheduled batch processing and enterprise governance. The DuckDB/MotherDuck ecosystem handles cost-effective development, ad-hoc exploration, and interactive analytics.

Start with your own data. Analyze your Snowflake query_history using the provided script. If you see a high volume of queries with median execution times under 30 seconds, that workload is a prime candidate for migration.

From there:

Audit: Use the provided SQL scripts to identify your most expensive and inefficient warehouses.
Experiment: Download DuckDB and run your next data model test locally.
Prototype: Sign up for MotherDuck's free tier, upload a dataset, and connect a BI tool to experience the performance and simplicity firsthand.

By taking these steps, teams transform their analytics budget from a source of stress into a driver of innovation.

Fix Slow Query: A Developer's Guide to Data Warehouse Performance

Zenith AI Labs — Sun, 09 Nov 2025 16:00:00 +0000

A developer pushes a new feature powered by a data warehouse query. In staging, it is snappy. In production, the user-facing dashboard takes five seconds to load, generating user complaints and performance alerts. This scenario is painfully common. The modern data stack promised speed and scale, yet developers constantly find themselves fighting inscrutable latency. Slow queries are not a vendor problem. They are a physics problem. Performance is governed by a predictable hierarchy of bottlenecks: reading data from storage (I/O), moving it across a network for operations like joins (Shuffle), and finally, processing it (CPU).

Without understanding this hierarchy, developers waste time optimizing the wrong things, such as rewriting SQL when the data layout is the issue. They burn money on oversized compute clusters and deliver poor user experiences. This article provides a developer-centric mental model to diagnose and fix latency at its source. By understanding the physical constraints of storage, network, and compute, you can build data systems that are not just fast, but predictably and efficiently so.

TL;DR

Query performance is a physics problem, with bottlenecks occurring in a specific order: I/O (storage), then Network (shuffle), then CPU (compute). Fixing them in this order is the most effective approach.
Your data layout strategy is your performance strategy. Columnar formats, optimal file sizes, partitioning, and sorting can cut the amount of data scanned by over 90%, directly targeting the largest bottleneck.
Distributed systems impose a "shuffle tax." The most expensive operations are large joins and aggregations that move terabytes of data between nodes. Avoiding the shuffle is the key to fast distributed queries.
There is no one-size-fits-all warehouse. A "Workload-Fit Architecture" matches the engine to the job's specific concurrency and latency needs, often leading to simpler, faster, and cheaper solutions for interactive workloads.

The Three-Layer Bottleneck Model: Why Queries Crawl

Latency is almost always I/O-bound first, then network-bound, then CPU-bound. A slow query is the result of a traffic jam in the data processing pipeline, and this congestion nearly always occurs in a predictable sequence across three fundamental layers. Developers often jump to optimizing SQL logic or scaling up compute clusters, which are CPU-level concerns. This is ineffective because the real bottleneck lies much earlier in the process: in the physical access of data from disk (I/O).

The hierarchy of pain begins with I/O. Reading data from cloud object storage like Amazon S3 is the slowest part of any query. An unoptimized storage layer can force an engine to read 100 times more data than necessary, a problem known as read amplification. Fixing data layout can yield greater performance gains than doubling compute resources.

Next comes the Network. In distributed systems, operations like joins and aggregations often require moving massive amounts of data between compute nodes in a process called the shuffle. This involves serialization, network transit, and potential spills to disk, making it orders of magnitude slower than memory access. The shuffle is a tax on distributed computing that must be minimized.

Finally, once the necessary data is located and moved into memory, the bottleneck becomes the CPU. At this stage, efficiency is determined by the engine's architecture. Modern analytical engines use vectorized execution, processing data in batches of thousands of values at a time instead of row-by-row, which dramatically improves computational throughput. Optimizing SQL is only impactful once the I/O and network bottlenecks have been resolved.

Scenario 1: Optimizing I/O for Slow Dashboards with Partitioning and Clustering

When a user-facing dashboard needs to fetch a small amount of data, such as sales for a single user, the query should be nearly instant. If it takes several seconds, the cause is almost always an I/O problem. The engine is being forced to perform a massive, brute-force scan to find a few relevant rows, a classic "needle in a haystack" problem. This occurs when the physical layout of the data on disk does not align with the query's access pattern.

The main culprits are partition and clustering misses. For example, a query filtering by user_id on a table partitioned by date forces the engine to scan every single date partition. Similarly, if data for a single user is scattered across hundreds of files, the engine must perform hundreds of separate read operations. The first time this data is read, it is a "cold cache" read from slow object storage, which carries the highest latency penalty.

The fix is to enable data skipping, where the engine uses metadata to avoid reading irrelevant data. Partitioning allows the engine to skip entire folders of data, while clustering (sorting) ensures that data for the same entity (like a user_id) is co-located in the same files. This allows the min/max statistics within file headers to be highly effective, letting the engine prune most files from the scan. This is addressed with features like Snowflake's Clustering Keys, BigQuery's Clustered Tables, Databricks' Z-Ordering, or Redshift's Sort Keys. Warehouses may also offer managed features to aid this, such as Snowflake's Search Optimization Service, which create index-like structures to accelerate these lookups at a cost.

From Theory to Practice: Implementing Data Layout

Understanding the need for a good data layout is the first step. Implementing it is the next. The most direct way to enforce clustering is to sort the data on write. Using SQL, you can create a new, optimized table by ordering the data by the columns you frequently filter on.

For example, to create a clustered version of a page_views table for fast user lookups:

CREATE TABLE page_views_clustered AS
SELECT * FROM page_views
ORDER BY user_id, event_timestamp;

This physical ordering ensures that all data for a given user_id is stored contiguously, dramatically reducing the number of files the engine needs to read for a query like WHERE user_id = 'abc-123'.

For teams using dbt, this can be managed directly within a model's configuration block. This approach automates the process and keeps the data layout logic version-controlled alongside the rest of the data transformations.

-- in models/marts/core/page_views.sql

{{
  config(
    materialized='table',
    partition_by={
      "field": "event_date",
      "data_type": "date",
      "granularity": "day"
    },
    cluster_by = ["user_id"]
  )
}}

SELECT
  ...
FROM
  {{ ref('stg_page_views') }}

This configuration tells the warehouse to partition the final table by day and then cluster (sort) the data within each partition by user_id, providing a highly efficient layout for user-facing dashboards.

Scenario 2: Fixing Slow Joins by Minimizing Network Shuffle

Large joins in distributed systems are slow because of the massive data movement required. This network bottleneck, known as the shuffle, is the tax paid for distributed processing. When joining two large tables, the engine must redistribute the data across the cluster so that rows with the same join key end up on the same machine. This involves expensive serialization, network transfer, and potential spills to disk if the data exceeds memory.

Distributed engines primarily use two join strategies. A Broadcast Join is used when one table is small (e.g., under a 10 MB default in Spark). The engine copies the small table to every node, allowing the join to occur locally without shuffling the large table. This is highly efficient. A Shuffle Join is used when both tables are large. Both tables are repartitioned across the network based on the join key. This is brutally expensive and is often the cause of a slow query. This is known as a Broadcast Join in Spark, but the concept of distributing a small dimension table to all compute nodes is a fundamental optimization in all MPP systems, including Snowflake and Redshift.

The performance of a shuffle join is further degraded by two killers: data skew and disk spills. Data skew occurs when one join key contains a disproportionate amount of data, creating a "straggler" task that bottlenecks the entire job. Disk spills happen when a node runs out of memory and is forced to write intermediate data to slow storage, turning a memory-bound operation into a disk-bound one.

From Theory to Practice: Reading an Execution Plan

Diagnosing a slow join requires inspecting the query's execution plan, which is the primary diagnostic tool. You can find this in Snowflake's Query Profile, BigQuery's Query execution details, or by running an EXPLAIN command in Databricks. While graphical plans are helpful, understanding the textual output is a critical skill. Consider a simplified plan for a shuffle join:

== Physical Plan ==
SortMergeJoin [left_key], [right_key], Inner
:- *(2) Sort [left_key ASC], false, 0
:  +- Exchange hashpartitioning(left_key, 200)
:     +- *(1) FileScan parquet table_A[left_key] Batched: true, DataFilters: [], Format: Parquet
+- *(4) Sort [right_key ASC], false, 0
   +- Exchange hashpartitioning(right_key, 200)
      +- *(3) FileScan parquet table_B[right_key] Batched: true, DataFilters: [], Format: Parquet

Here is how to interpret it:

Spot the Shuffle: The Exchange operator is the shuffle. It indicates that data is being repartitioned and sent across the network. If you see an Exchange on both sides of a join, it is a shuffle join. The absence of an Exchange on one side would suggest a more efficient broadcast join.
Identify the Scan: The FileScan operator shows where data is being read from storage. A well-optimized query will show partition filters here (e.g., PartitionFilters: [isnotnull(date), (date = 2024-10-26)]), confirming that partition pruning is working.
Find the Join Algorithm: The SortMergeJoin indicates the specific type of shuffle join. Another common type is ShuffleHashJoin. The choice of algorithm can have performance implications, but the presence of the Exchange is the bigger red flag.

When a query is slow, look for large bytes shuffled or time spent in shuffle metrics associated with the Exchange operator. If one task within the Exchange stage takes much longer than others, it is a clear sign of data skew.

For cases where you know a table is small enough to be broadcast but the optimizer fails to choose that strategy, you can often provide a hint in the SQL.

SELECT /*+ BROADCAST(country_lookup) */
  e.event_id,
  c.country_name
FROM
  events AS e
JOIN
  country_lookup AS c
  ON e.country_code = c.country_code;

This hint forces the engine to broadcast the country_lookup table, avoiding a costly shuffle of the massive events table.

Scenario 3: Solving Read Amplification with Columnar Formats like Parquet

Reading an entire file to answer a query that needs only one column is the most wasteful I/O operation and a clear sign of a poorly chosen file format. This happens with row-oriented formats like CSV or JSON, which store data in rows. To get the value from a single column, the engine must read and discard all other columns in that row. This is a primary cause of read amplification.

The solution is to standardize on columnar formats like Apache Parquet. Parquet stores data in columns, not rows, which immediately enables column pruning. If a query is SELECT avg(price) FROM sales, the engine reads only the price column and ignores all others. This can reduce storage footprints by up to 75% compared to raw formats and is a cornerstone of modern analytics performance.

Parquet's efficiency goes deeper, with a metadata hierarchy that enables further data skipping. Files are divided into row groups (e.g., 128 MB chunks), and the file footer contains min/max statistics for every column in each row group. When a query contains a filter like WHERE product_category = 'electronics', the engine first reads the footer. If the min/max statistics for a row group show it only contains 'books' and 'clothing', the engine can skip reading that entire 128 MB chunk of data. For this to be effective, data should be sorted by frequently filtered columns before being written, which makes the min/max ranges tighter and more precise.

From Theory to Practice: Writing Optimized Parquet

Creating an optimized data layout is a data engineering task performed during ETL/ELT. For teams using frameworks like Apache Spark, the write logic is the control point for implementing partitioning, sorting, and file compaction. A common pattern is to repartition the data by a low-cardinality key (like date) and then sort within those partitions by a higher-cardinality key (like user ID).

Here is a PySpark example demonstrating this pattern:

# Assuming 'df' is a Spark DataFrame with page view events

# Define output path and keys
output_path = "s3://my-bucket/page_views_optimized"
partition_key = "event_date"
cluster_key = "user_id"

(df
 .repartition(partition_key)
 .sortWithinPartitions(cluster_key)
 .write
 .mode("overwrite")
 .partitionBy(partition_key)
 .parquet(output_path)
)

This code snippet does three critical things:

repartition(partition_key): Groups data by the partition key, ensuring all data for a given date ends up on the same worker node before writing.
sortWithinPartitions(cluster_key): Sorts the data on each worker by user_id, making the min/max statistics in the resulting Parquet files highly effective for pruning.
partitionBy(partition_key): Writes the data out to a directory structure like /event_date=2024-10-26/, which enables partition pruning at the folder level.

This approach produces well-structured, skippable Parquet files that form the foundation of a high-performance data lakehouse.

The Economics of Speed: Cost vs. Performance Trade-offs

In the real world, performance is not an absolute goal. It is an economic decision. Engineers constantly balance query speed, compute cost, storage cost, and their own time. Without this context, performance advice remains academic and is insufficient for making business-justified architectural choices. Every optimization is a trade-off between paying now or paying later.

The most fundamental trade-off is between compute and storage. Optimizing data layout by sorting and compacting files is not free. It requires an upfront compute cost to perform the ETL/ELT job. This, in turn, may slightly increase storage costs if less efficient compression is used to favor faster reads. However, this one-time investment pays dividends over time by dramatically reducing the compute costs for every subsequent query that reads the data. A well-clustered table might cost $50 in compute to write but save thousands of dollars in query compute over its lifetime.

This economic model extends to managed features. When you enable a feature like Snowflake's Search Optimization Service or BigQuery's Clustering, you are making a conscious financial decision. You are agreeing to pay for the managed compute required to maintain an index-like structure and for the additional storage that structure consumes. In return, you avoid paying for massive, recurring compute costs from brute-force table scans. This is a sensible trade-off for high-value, frequently executed queries, but a poor one for ad-hoc analysis on cold data.

Finally, the human cost must be considered. An engineer's time is often the most expensive resource. Spending two weeks manually optimizing a data pipeline to shave 10% off a query's runtime might not be worth it if simply scaling up the virtual warehouse for ten minutes a day would achieve the same result for a fraction of the cost. The goal is to find the right balance, investing engineering effort in foundational layout optimizations that provide compounding returns and using compute resources flexibly to handle spiky or unpredictable workloads.

This economic reality leads to a crucial insight: if the primary performance killers for interactive queries are I/O latency from object storage and network shuffle, what if we could architect a system that bypasses them entirely for certain workloads? This is the central idea behind a modern, Workload-Fit Architecture.

The Solution: Adopting a Workload-Fit Architecture

Fixing common performance scenarios reveals a pattern: most problems are symptoms of an architectural mismatch. The era of using one massive, monolithic MPP warehouse for every job is over. It is often too complex and expensive for the task at hand. This leads to a more modern approach: Workload-Fit Architecture, which is the principle of matching the tool to the job's specific concurrency, latency, and cost requirements.

This approach explicitly re-evaluates the I/O, Network, and CPU trade-offs for a given workload.

I/O: An in-process engine like DuckDB, running on a developer's laptop or a cloud VM, can use the local operating system's page cache and achieve extremely low-latency I/O from local SSDs. For "hot" data that fits on a single machine, this is orders of magnitude faster than fetching data from remote object storage.
Network: The single biggest advantage of an in-process or single-node architecture is the complete elimination of the network shuffle tax. Joins and aggregations happen entirely in-memory or with spills to local disk, avoiding the expensive serialization and network transit inherent in distributed systems.
CPU: Without the overhead of network serialization and deserialization, more CPU cycles are spent on productive, vectorized computation. This allows in-process engines to achieve incredible single-threaded performance.

MotherDuck is a prime example of this workload-fit philosophy. It combines the speed of DuckDB's local-first, in-process vectorized engine with the persistence and scalability of a serverless cloud backend. It is not designed for petabyte-scale ETL. Instead, it excels at the vast majority of workloads: powering interactive dashboards, enabling ad-hoc analysis, and serving data apps on datasets from gigabytes to a few terabytes, where low latency is critical and the overhead of a distributed MPP system is unnecessary. Read more in our documentation about MotherDuck's Architecture.

Decision Matrix: Matching Your Workload to the Right Engine

Choosing the right architecture requires evaluating your workload along two critical axes: the number of simultaneous users or queries (Concurrency) and the required response time (Latency SLA). This matrix provides a framework for selecting the appropriate engine type.

	Sub-Second (<1s)	Interactive (1-10s)	Reporting (>10s)
Very High (1000+ users)	Real-time OLAP (ClickHouse, Druid) Specialized engines for user-facing analytics with high concurrency.	MPP Warehouse (Snowflake, BigQuery) Designed for enterprise BI with elastic scaling for thousands of users.	MPP Warehouse (Snowflake, BigQuery) Can scale out compute to handle massive batch reporting workloads.
Medium (10-100 users)	MotherDuck, ClickHouse Excellent for internal dashboards and data apps where latency is key for a team.	MotherDuck, DuckDB (large server) Ideal for interactive analysis by a team, providing speed without MPP overhead.	All Engines Most modern warehouses can handle this. Choice depends on cost and specific features.
Low (1-10 users)	DuckDB (local), MotherDuck Unparalleled speed for local analysis or embedded apps, with cloud persistence.	DuckDB, MotherDuck Perfect for individual data scientists or small teams exploring data. Fast and simple.	DuckDB, All Cloud Warehouses For a few users running long queries, any engine will work. DuckDB offers simplicity.

Conclusion: Performance is a Data Engineering Choice

Slow queries are not a mystery but a result of understandable physical principles. The path to performance is through disciplined data engineering: fixing I/O first by optimizing data layout, then minimizing network shuffles, and finally, choosing an architecture that fits the workload's economic and technical requirements. Performance is not a feature you buy from a vendor. It is a characteristic you design into your system. By addressing bottlenecks in the right order, I/O, then Network, then CPU, you can systematically build data applications that are fast, efficient, and cost-effective.

Path Forward

Analyze Your Own Query: Pick one of your slow queries and inspect its execution plan. Can you identify the bottleneck using the I/O-Network-CPU model? Look for signs of full table scans, large data shuffles, or disk spills.
Audit Your Data Layout: Examine the physical layout of your most frequently queried table. Is it stored in Parquet? Are file sizes optimized between 128MB and 1GB? Is the data sorted by columns commonly used in filters?
Consider Your Architecture: For your next interactive dashboard or data application project, evaluate if a Workload-Fit architecture could provide better performance and lower complexity than a traditional MPP warehouse. For many medium-data workloads, the answer is yes.

Frequently Asked Questions

Why does it take so long to show sales or page hits for a user?

This "needle in a haystack" problem is typically an I/O bottleneck, forcing the query engine to scan massive amounts of data just to find a few relevant rows for a single user. Optimizing your data layout with clustering and partitioning is the first step to enable data skipping and speed up these lookups. For workloads that demand consistently fast, interactive analytics, a modern data warehouse like MotherDuck leverages the power of DuckDB to deliver near-instant results for such queries.

How can we improve the speed of our data warehouse reports?

The most effective way to improve report speed is to tackle bottlenecks in order, starting with I/O by optimizing your data layout through partitioning and sorting. This dramatically reduces the amount of data scanned, which is the most common cause of slowness. Adopting a workload-fit architecture with a platform like MotherDuck can also provide a simpler, faster, and more cost-effective solution specifically for interactive reporting and analytics.

How can I optimize performance when using direct queries in a data warehouse environment?

For direct queries in data apps, performance hinges on minimizing I/O latency by aligning your physical data layout with common query patterns. Using techniques like partitioning and clustering allows the engine to skip most of the data and return results in milliseconds. This is where a serverless data warehouse like MotherDuck excels, providing the low-latency query engine needed to power snappy, user-facing applications without complex infrastructure management.

The ultimate guide to Open Source Observability in 2025: From silos to stacks

Zenith AI Labs — Sun, 09 Nov 2025 04:00:00 +0000

You've been told the three pillars of observability, logs, metrics, and traces, are the answer. But stitching together separate, best-of-breed tools has likely left you with data silos, slow queries, and a constant battle against rising infrastructure costs. During an incident, you're not debugging. Instead, you're manually correlating timestamps across three different UIs. This isn't a sustainable strategy.

The most effective and cost-efficient observability solution for 2025 isn't a collection of disparate tools. It's a unified, open-source stack built on a powerful data engine. This guide provides the architectural blueprints to help you understand why this shift is happening and how to build your stack the right way.

TL;DR

The "three pillars" (logs, metrics, traces) are just data types, not a solution. This model led to separate, siloed tools (like Elasticsearch for logs, Prometheus for metrics) that are difficult to correlate and expensive.
We compare the evolution of observability architectures: from “search-fortress” and "best-of-breed" silos to the modern, cost-efficient "unified database" approach.
The main challenge at scale is handling high-cardinality, unsampled data cost-efficiently, which is the critical test for any modern stack.
A database's ability to provide fast aggregations and high compression is the most important factor in determining the performance and cost of your entire stack.
ClickStack is an opinionated, open-source, unified observability stack (OTel Collector, ClickHouse, HyperDX UI) engineered to solve the core problems of correlation, cost, and scale.

Why does the "three pillars" model lead to three silos?

The concept of "three pillars," logs, metrics, and traces, became popular in the mid-2010s as a way to categorize the essential data types for understanding a system's state. This model became popular as powerful, specialized open-source tools for each data type matured: Prometheus for metrics, the ELK stack for logs, and Jaeger for traces. This naturally led organizations to adopt a separate "best-of-breed" tool for each pillar, creating distinct data silos by default.

However, this approach has a fundamental flaw. The pillars represent raw data inputs, not a complete observability solution. The model leaves the complex and critical task of data analysis and correlation to you, the end-user. This is a task made nearly impossible by the siloed architecture it encourages.

This fragmentation creates tangible pain. During a critical incident, an engineer's workflow becomes a slow, manual, and error-prone process of "swivel-chair analysis." An SRE gets a metric-based alert in Grafana, pivots to Kibana to hunt for related error logs, and then pivots again to Jaeger, hoping to find a trace ID that connects the dots. This constant context-switching between different UIs and query languages increases Mean Time to Resolution (MTTR) and raises the risk of missing crucial connections between signals.While commercial observability platforms abstract this UI fragmentation behind a single interface, they typically introduce new challenges, namely expensive consumption-based pricing and vendor lock-in.

How does OpenTelemetry standardize data collection?

As of 2025, OpenTelemetry (OTel) has matured into the undisputed, vendor-neutral industry standard for instrumenting and transporting telemetry data. As the second most active project in the Cloud Native Computing Foundation (CNCF), its massive adoption is based on a core principle: the clear separation of data ingestion from backend storage and analysis.

The heart of OTel is the OpenTelemetry Collector, a versatile proxy that acts as a pipeline for your data. It uses receivers to ingest data in various formats (like OTLP, Jaeger, or Prometheus), processors to batch or enrich that data, and exporters to send the processed data to one or more backends of your choice.

This modular design is a strategic advantage. It standardizes instrumentation, preventing vendor lock-in and giving you more flexibility. You can instrument your applications once with OTel and then route telemetry to any compatible backend simply by changing a configuration file. OTel perfectly solves the ingestion problem. With instrumentation standardized, the new bottleneck is the backend's ability to handle this massive flow of OpenTelemetry data. This leaves the most important question unanswered: Where should you send your data, and how can you query it at scale without breaking the bank? The answer lies in the architecture of your backend.

What are the three architectural blueprints for an Open-Source Observability stack?

The open-source observability landscape isn't a random collection of tools. It's an evolutionary journey. Each architectural pattern emerged to solve the problems of the last. Here are the three dominant blueprints, each with a litmus test to see where it breaks at scale.

Blueprint #1: The "search-fortress" (ELK/OpenSearch)

This blueprint is built on the ELK (Elasticsearch, Logstash, Kibana) or OpenSearch stack, which uses the Apache Lucene search library at its core.

Strengths: It is a fortress for unstructured, "Google-like" text search. Its inverted index makes it very effective for Security Information and Event Management (SIEM) and compliance use cases where analysts need to find a needle in a haystack of text.

The Breaking Point: For modern observability analytics, this architecture hits a wall.

Extremely high TCO: The Lucene inverted index is notoriously inefficient, creating massive storage overhead. It's common for the index to be multiple times the size of the original data. Combined with poor compression, this leads to budget-breaking infrastructure costs. A 100TB/day workload can cost $100,000+ per month on Elasticsearch.
Fails on high-cardinality analytics: The stack performs poorly when running aggregations on high-cardinality fields, which is the central task of modern observability. Engineers need to answer questions like, "What is the p99 latency trend for service_A across all 1,000 containers for the past 24 hours?" or "Group all errors by customer_id for the last 7 days." These are not text searches, they are analytical aggregations. The Lucene-based stack performs poorly on these queries, especially over wide time ranges or on high-cardinality fields (e.g. user_id or container_id). These analytical queries cause high JVM memory pressure, leading to slow performance, query timeouts, and even OutOfMemory errors that crash nodes.
High operational complexity: Managing an ELK cluster at scale is complex, often requiring a dedicated team of experts to handle shard management, capacity planning, and JVM tuning.

Blueprint #2: The "best-of-breed" siloed stack (LGTM)

This blueprint, often called the LGTM stack, uses specialized open-source tools for each signal: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir/Prometheus for metrics.

Strengths: It uses top-tier projects, each highly optimized for its specific data type. Loki, in particular, dramatically lowers the cost of log storage compared to ELK by only indexing metadata labels, not the full log content.

The breaking point: While an improvement, this model introduces its own set of critical challenges.

High operational overhead: You are now operating three or more separate, stateful database systems. This carries a significant hidden operational tax, requiring expertise in multiple distinct technologies and increasing engineering toil.
Cardinality and analytical gaps persist: The problem just moves, it isn’t solved. Prometheus is known to suffer from "cardinality explosion," which forces teams to rely heavily on pre-aggregation. This approach discards the raw data fidelity required for root-cause analysis, as you must anticipate your failure modes in advance. The problem then shifts to logs. Loki's cost-efficiency comes from its design of only indexing metadata labels, which makes its query performance on non-indexed log content slow by design. This restricts engineers to a very specific workflow (like finding logs for a known trace_id) and prevents the broad, exploratory analysis that is critical for finding unknown-unknowns. .
Fails on deep cross-signal correlation: This is the architecture's fatal flaw. While visualization tools like Grafana provide opinionated workflows to link signals (for example, clicking a trace to see its corresponding logs), this is a superficial, UI-level correlation. It is not native. Because the data lives in three or more separate databases, there is no way to perform deep, analytical queries across the signals. An engineer cannot, for instance, write a single query to join metric spikes with specific log attributes and trace durations to find a common cause. This forces engineers into a rigid, pre-defined debugging path, making it difficult to investigate complex issues that do not fit that specific pattern.

Blueprint #3: The "unified database" stack (ClickHouse)

This modern architecture consolidates all telemetry, including logs, metrics, and traces, into a single, high-performance analytical database like ClickHouse.

Strengths: This is the most scalable, cost-effective, and flexible model. A single store for all telemetry eliminates data duplication, solves the high-cardinality problem at its root, and enables powerful, native correlation.

Why It's a Superior Solution:

Extremely low TCO: ClickHouse's columnar storage and advanced compression achieve remarkable efficiency, using 10 times less storage space than Elasticsearch. It also integrates natively with low-cost object storage (like Amazon S3) for long-term retention, drastically reducing costs.
Passes high-cardinality analytics: The columnar architecture is purpose-built for this problem. Aggregating and filtering on high-cardinality data is a simple, sub-second GROUP BY query, not a cluster-threatening event. This is a primary driver for users optimizing high-volume logs or traces.
Passes on cross-signal correlation: With all data in one place, correlation is native to the data engine itself, not a feature stitched together at the application or UI layer. This allows for highly efficient, deep analysis. You can join logs, metrics, and traces with standard SQL in a single query, allowing you to go from alert to root cause in seconds.
- A critical nuance: While ClickHouse is exceptionally strong for logs and traces, it's important to be transparent about the current state of metrics support. It is excellent for general-purpose metric storage, but users deeply tied to the full PromQL ecosystem should be aware of current limitations in native PromQL compatibility. However, this area is evolving fast, with new enhancements being added in recent releases. For many, the immediate, high-value win comes from the powerful SQL-based analysis of logs and traces.

What makes the database the heart of a modern observability stack?

The success or failure of your observability stack depends on its data engine. The core challenge of modern observability is handling high-cardinality data, which explodes multiplicatively with every new service, server, or dimension you add (application × server × code_path × user_id). Search indexes and traditional time-series databases were not designed for this reality.

Here’s how the underlying database technologies compare for the demands of observability analytics.

Feature	Elasticsearch (Search Index)	ClickHouse (Columnar Database)
Core architecture	Inverted index optimized for full-text search. Stores data in row-oriented JSON documents.	Columnar storage optimized for analytical aggregations. Stores data for each column together. Also supports inverted indices and bloom filters at a columnar level to accelerate textual searches.
Data compression	Poor. High storage overhead from the index and `doc_values` leads to significant data amplification.	Excellent. Requires at minimum 10 times less storage through superior compression codecs and columnar format.
High-cardinality aggregations	Slow and memory-intensive. Prone to `OutOfMemory` errors and query timeouts.	Extremely fast. Purpose-built for sub-second `GROUP BY` queries on trillions of rows.
Primary query language	KQL / Lucene. Powerful for text search, but less suited for complex analytical joins.	Standard SQL. A universal, powerful language for deep, cross-signal analysis and joins. Also supports Lucene Natural Language Search (transpiled to SQL) to ease migration from Elastic and Opensearch and provide a natural exploration language for logs.
Cost-efficiency (TCO)	Very High. Driven by massive storage, compute, and operational complexity.	Very Low. Driven by extreme compression, efficient queries, and architectural simplicity.

Elasticsearch is an excellent tool for searching text. But observability analytics, like calculating p99 latencies, grouping errors by customer ID, and finding outliers, are aggregation-heavy workloads. ClickHouse was built from the ground up for this exact task, making it a better architectural choice.

What is ClickStack, the pre-built unified stack?

ClickStack is the pre-built, open-source implementation of the "Unified Database" architecture. It provides an opinionated, end-to-end stack tuned for performance and cost-efficiency, consisting of a pre-configured OpenTelemetry Collector for ingestion, ClickHouse as the unified database, and HyperDX as the integrated UI.

This approach provides tangible, immediate benefits over a fragmented DIY approach.

Capability	DIY OSS Stack (Prometheus + ELK + Jaeger)	Unified OSS Stack (ClickStack)
Data correlation	UI-Level & Rigid. Correlation is limited to UI pivots (e.g., Grafana linking trace IDs to logs). Lacks native database-level joins across signals.	Native & Deep. All data is in one database. Correlation is done efficiently at the database layer, enabling deep, cross-signal analysis.
Data exploration	Siloed & Slow. Exploratory analysis is difficult. Traditional search stacks (ELK) are slow for analytics, and specialized log tools (Loki) are slow for searching non-indexed log content.	Fast & Flexible. Optimized for both broad trend analysis (fast `GROUP BY`s) and fast discovery (text search via inverted indices and bloom filters).
Cost at scale	High. Driven by the significant storage and compute footprint of multiple data stores.	Low. Up to 90% lower storage costs due to ClickHouse's high compression rates and efficient architecture.
Query performance	Inconsistent. Slow for large-scale aggregations or high-cardinality metrics.	Consistently Fast. Sub-second query performance for complex analytics across massive datasets and fast text search (inverted indices, bloom filters)..
Maintenance overhead	Extremely High. Requires expertise to manage, scale, and secure at least three complex systems.	Dramatically Lower. A single, cohesive platform to manage, reducing operational complexity.

The power of this unified approach is proven by its use in some of the most demanding engineering organizations in the world.

"A lot of our peer companies are using ClickHouse for this exact use case. It’s battle-tested and just the right tool for the job." — Tesla, on building their quadrillion-row scale observability platform on ClickHouse.

"Previously, querying the last 10 minutes would take 1–2 minutes. With ClickStack, it was just a case of how fast I could blink. The performance is real." — Character.ai, after reducing log search times from minutes to milliseconds and cutting costs by 50% despite a 10x increase in log volume.

"With ClickHouse, the database is green, queries are lightning-fast, and money is not on fire." — Anthropic, on using ClickHouse to handle the "deluge of telemetry" from developing AI models like Claude 4.

Why a unified UI? HyperDX vs. Grafana

The first question many engineers ask is, "Why not just use Grafana?" It's a fair question. Grafana is the industry standard for dashboarding and includes an excellent ClickHouse plugin. Many organizations successfully use Grafana on top of ClickHouse for metrics visualization, and it remains a powerful option for building dashboards to monitor known KPIs.

However, monitoring pre-defined dashboards is a different workflow from debugging an active incident. This distinction highlights the different design philosophies:

Grafana is for dashboarding knowns: It excels at creating curated dashboards to monitor pre-defined metrics and Service Level Objectives (SLOs). Its strength lies in visualizing time-series data from one or many data sources.This design also encourages a rigid, metrics-first workflow (from an alert, to a trace, to logs) and is not built for the kind of exploratory, search-based analysis required to find unknown problems.
HyperDX is for debugging unknowns: It is purpose-built for the investigative workflow required during an incident. The user experience is designed to move seamlessly between signals to find the root cause of novel problems, not just visualize known metrics.

While an engineer can use Grafana with ClickHouse for monitoring, the ClickStack observability platform includes HyperDX because it provides a cohesive, out-of-the-box debugging experience. In Grafana, ClickHouse is a data source plugin, not a native backend. This limits its integration into Grafana's core, opinionated workflows. Furthermore, any deep analysis in Grafana requires the engineer to write and optimize raw SQL, a task that is complex and unfamiliar to many SREs.

HyperDX, by contrast, is built for the unified database model. It offers native cross-signal correlation and abstracts this complexity, providing an intuitive Lucene-like syntax for search. An engineer can one-click from a specific log line to the exact distributed trace that generated it, or from a slow trace span to all the logs emitted during that operation. This is a native workflow, not a stitched-together experience.

Furthermore, HyperDX integrates other essential debugging tools, such as an intuitive Lucene-like syntax for log search, full Application Performance Monitoring (APM) trace waterfall views, and Real User Monitoring (RUM) features like session replay. These are core components of the UI, not just additional panels on a metrics dashboard. This approach provides a single, cohesive interface that replaces the need for three separate UIs for logs, traces, and metrics.

When should you stick with ELK or Prometheus?

No single tool is perfect for every job. Building trust means being honest about limitations. While the unified stack represents the future for large-scale observability, there are specific scenarios where older tools still excel.

Retain ELK/OpenSearch for relevance based search and SIEM: For use cases where the primary requirement is not just finding text, but ranking it by relevance (like legal discovery or advanced SIEM threat hunting), Lucene's text-scoring engine remains the better choice. Modern observability platforms, including those using ClickHouse, also leverage inverted indices for fast, unstructured text search, but they are optimized for analytics and filtering, not relevance ranking.
Retain Prometheus for small to medium-scale metrics: For environments where cardinality is well-controlled and the scale is manageable for a single server, Prometheus's simplicity, pull-based model, and powerful PromQL offer a straightforward and effective monitoring solution.
Use specialized tools for non-OTel-native use cases: The ClickStack observability platform is focused on unifying the core signals (logs, metrics, traces) driven by OpenTelemetry at scale. For sub-use cases that fall outside this scope, such as universal profiling, deep database monitoring, or network monitoring, dedicated tools that provide the necessary out-of-the-box UI and collection agents are a better fit.

The strategic approach is not always to rip-and-replace, but to move new, high-volume, high-cardinality observability workloads to a unified stack while retaining specialized tools for the niche tasks they were designed for.

How do you get started: Stop juggling tools, start building a stack

The future of open-source observability isn't about which logging tool to choose; it's about building a unified stack on a database that can handle the scale and complexity of modern systems without compromising on cost or performance. The architectural shift from fragmented silos to a unified database is a direct response to the economic and technical limitations of previous generations of tools.

By consolidating all your telemetry into a single, powerful engine, you eliminate data silos, reduce your TCO, and help your teams solve problems faster.

Deploy the stack: Get started in minutes. Deploy the open-source ClickStack on your infrastructure.
Try the managed platform: See the power without the setup. Try ClickStack on ClickHouse Cloud.
Join the community: Have questions? Join a community of engineers building the future of observability.

Frequently asked questions

What are some open source observability stacks I can self-host?

When self-hosting, you generally have two architectural choices:

The "best-of-breed" siloed stack: This is the popular LGTM stack, which stands for Loki (logs), Grafana (visualization), Tempo (traces), and Mimir/Prometheus (metrics). While each component is powerful, this approach carries a high operational tax. You become responsible for managing, scaling, and updating three or more separate, stateful database systems. Its most significant weakness is the lack of native cross-signal correlation, forcing your engineers back into "swivel-chair analysis" to debug incidents.
The "unified database" stack: This is the modern, more efficient architecture. It consolidates all three signals into a single high-performance database. The leading open-source example is ClickStack, which combines the OpenTelemetry Collector, ClickHouse as the unified database, and HyperDX as the integrated UI. This model solves the correlation problem natively (you can join logs and traces with SQL) and dramatically lowers TCO and operational complexity by centralizing all telemetry data in one place.

What are some high-performance, scalable open-source alternatives to using Elasticsearch for an OpenTelemetry backend?

This is one of the most common challenges teams face at scale. Elasticsearch (and OpenSearch) is a "search-fortress" built on Lucene, which is excellent for full-text search but struggles with the demands of modern observability analytics which require the ability to perform aggregations to examine trends over time. Its inverted index leads to massive storage costs (often 12-19x more than alternatives) and it fails on high-cardinality aggregations, leading to slow queries and memory errors.

The best high-performance alternative is to move from a search index to a columnar analytical database.

The leading open-source choice in this category is ClickHouse. It was purpose-built for the exact type of high-cardinality, high-volume analytical queries that observability requires. It provides:

Extreme compression: Drastically reduces storage TCO.
Sub-second analytics: Handles high-cardinality GROUP BY queries (e.g., "group errors by user_id") with ease.
SQL interface: Uses a familiar, powerful query language, while also supporting Lucene for more exploratory log-based workflows

This is why ClickHouse is the data engine at the heart of ClickStack and is used by companies like Tesla, Character.ai, and Anthropic to power their observability platforms.

What are the most popular backends that offer native support for the full OpenTelemetry specification?

OpenTelemetry (OTel) is the industry standard for collecting and transporting data, but it doesn't store it. The OTel Collector can send data to many backends. The most popular open-source choices fall into two categories:

Commercial SaaS platforms: This includes major platforms like Datadog, Dynatrace, New Relic, and Splunk. They all support OTel ingestion to varying degrees, offering a managed, out-of-the-box experience. However, they are often the most expensive options, operate as "black boxes," and can lock you into their specific query languages and correlation UIs.
Siloed backends (The LGTM Stack): This involves using Loki for logs, Mimir/Prometheus for metrics, and Tempo for traces. While all are OTel-compatible, they are separate systems. This architecture perpetuates the "three silos" problem, making it difficult to analyze relationships between your signals.
Unified backends (The ClickHouse Stack): This architecture uses a single database, like ClickHouse, to store all three signals. ClickStack is the pre-built implementation of this. This is the only approach that natively supports full-stack correlation. You can ingest all your OTel data into one table and use SQL to join logs, metrics, and traces, which is impossible in the siloed model.

What managed observability platforms offer scalable storage and a simplified query experience while being compatible with open-source standards?

Most managed platforms are OTel-compatible, but the best ones are built on open-source foundations. This prevents vendor lock-in and ensures you're using a battle-tested engine.

The key is to look at the architecture the platform is built on. A modern managed platform should be built on a unified database to solve the core problems of scale, cost, and correlation.

This is the philosophy behind ClickStack on ClickHouse Cloud. It provides a fully managed platform that runs the open-source ClickStack (OTel, ClickHouse, HyperDX) for you. It directly delivers:

Scalable storage: Uses ClickHouse's superior compression and ability to use low-cost object storage (like S3) for massive scale at a low cost.
Simplified query experience: Provides a unified UI (HyperDX) for debugging and a powerful, standard SQL interface for deep analysis, eliminating the need to learn multiple, proprietary query languages.

Is a unified database good for all three signals, including logs, metrics, and traces?

Yes. The "wide event" model treats all telemetry as attributes of a single, context-rich event. A high-performance analytical database like ClickHouse is very good at storing and querying this wide, structured data. It can handle the high-volume, time-series nature of metrics, the rich metadata of traces, and the searchable content of logs within a single, efficient system.