<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: RAKESH THERANI</title>
    <description>The latest articles on DEV Community by RAKESH THERANI (@rakeshtherani).</description>
    <link>https://dev.to/rakeshtherani</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3915069%2F2cc58337-0e5c-494d-a95f-b451731b28b0.png</url>
      <title>DEV Community: RAKESH THERANI</title>
      <link>https://dev.to/rakeshtherani</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rakeshtherani"/>
    <language>en</language>
    <item>
      <title>Replacing Elasticsearch with ClickHouse : A 90% Cost-Reduction Migration</title>
      <dc:creator>RAKESH THERANI</dc:creator>
      <pubDate>Wed, 06 May 2026 04:20:01 +0000</pubDate>
      <link>https://dev.to/rakeshtherani/replacing-elasticsearch-with-clickhouse-for-otel-logs-traces-metrics-a-90-cost-reduction-28c</link>
      <guid>https://dev.to/rakeshtherani/replacing-elasticsearch-with-clickhouse-for-otel-logs-traces-metrics-a-90-cost-reduction-28c</guid>
      <description>&lt;p&gt;&lt;em&gt;A practical guide based on shipping this for a crypto-derivatives platform — annual observability bill went from high six figures to ~$50K, with faster queries and AI-powered log search as a bonus.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm writing this
&lt;/h2&gt;

&lt;p&gt;If you're paying mid-six figures for Elasticsearch and ~90% of your queries are aggregations (error counts, latency percentiles, service health), you're paying full price for a feature you barely use — full-text search.&lt;/p&gt;

&lt;p&gt;This post walks through how a crypto exchange replaced Elasticsearch with ClickHouse for OpenTelemetry logs, traces, and metrics. Same OTEL instrumentation, just a different backend. Result: 5× smaller storage footprint, 2-6× faster queries on benchmarks, and natural-language log queries via an AI agent — at ~10% of the cost.&lt;/p&gt;

&lt;p&gt;If you have an existing Elasticsearch + Kibana observability stack and you've been wondering whether ClickHouse is a serious alternative, this is the deep-dive. Includes the schema, the migration plan, the OTEL Collector configuration, the cost numbers, and the gotchas.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on numbers&lt;/strong&gt;: The cost figures below ($400K Elasticsearch → ~$50K ClickHouse) are real annual numbers from this deployment, on log volumes typical of a high-traffic trading platform (low-tens of TB ingested per year, 90-day hot retention, multi-region). Your mileage will vary substantially with log volume, retention, and cluster size. The architectural patterns are universal.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Executive Summary&lt;/li&gt;
&lt;li&gt;The Problem — Elasticsearch at $400K/Year&lt;/li&gt;
&lt;li&gt;Why ClickHouse for Observability&lt;/li&gt;
&lt;li&gt;Platform Architecture&lt;/li&gt;
&lt;li&gt;
OpenTelemetry — The Standard We Keep
5b. Collection Agent Options — OTEL vs Fluent Bit vs Vector
&lt;/li&gt;
&lt;li&gt;ClickHouse Schema — Logs, Traces, Metrics&lt;/li&gt;
&lt;li&gt;End-to-End Distributed Tracing — HTTP to ClickHouse&lt;/li&gt;
&lt;li&gt;Full-Text Log Search — Replacing Kibana Discover&lt;/li&gt;
&lt;li&gt;Visualization — Grafana Replaces Kibana&lt;/li&gt;
&lt;li&gt;Data Retention &amp;amp; Tiered Storage&lt;/li&gt;
&lt;li&gt;AI Layer — Natural Language Over Logs and Traces&lt;/li&gt;
&lt;li&gt;Migration Plan — Zero Downtime Cutover (Standard OTEL or BindPlane)&lt;/li&gt;
&lt;li&gt;Cost Analysis&lt;/li&gt;
&lt;li&gt;Risk Assessment&lt;/li&gt;
&lt;li&gt;Success Metrics&lt;/li&gt;
&lt;li&gt;Reference Links&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Executive Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Objective
&lt;/h3&gt;

&lt;p&gt;Replace the current Elasticsearch-based observability stack (application logs + OTEL traces + metrics) with ClickHouse — reducing annual infrastructure cost from $400K to ~$35-60K while gaining faster aggregations, better compression, unified storage with business data, and AI-powered log querying.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem Today
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Paying $400K/year for Elasticsearch managed service&lt;/li&gt;
&lt;li&gt;Elasticsearch is optimized for full-text search — most log queries are aggregations (error counts, latency percentiles, service health) where ClickHouse is &lt;strong&gt;2-6x faster on cold queries, 1.7-2.6x on hot queries&lt;/strong&gt; (ClickHouse/TextBench benchmark, OTEL logs at 1B–50B rows)&lt;/li&gt;
&lt;li&gt;Logs, traces, metrics, and business data live in separate systems — no cross-correlation&lt;/li&gt;
&lt;li&gt;Storage costs are high: same OTEL dataset takes &lt;strong&gt;5x more space in Elasticsearch&lt;/strong&gt; (49 GB vs 245 GB at 1B rows; 2.4 TB vs 12 TB at 50B rows)&lt;/li&gt;
&lt;li&gt;Kibana is the only query interface — no programmatic access, no AI layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keep OpenTelemetry (OTEL) as the instrumentation standard — zero application changes.
Change only the destination: swap Elasticsearch exporter → ClickHouse exporter in OTEL Collector.

All logs, traces, and metrics land in ClickHouse.
Grafana reads ClickHouse for dashboards and alerts.
AI agent queries logs/traces in plain English via the same LibreChat + MCP platform.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Expected Outcomes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Annual cost reduction&lt;/td&gt;
&lt;td&gt;$340-365K saved (~85-90% reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage reduction&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;5x smaller&lt;/strong&gt; total footprint (16x on column files) — real benchmark at 1B–50B OTEL rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query speed improvement&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2-6x faster&lt;/strong&gt; cold queries, &lt;strong&gt;1.7-2.6x&lt;/strong&gt; hot queries (ClickHouse/TextBench)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retention period&lt;/td&gt;
&lt;td&gt;Same or longer — at lower cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unification&lt;/td&gt;
&lt;td&gt;Logs + traces + metrics + business data in one DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI queries over logs&lt;/td&gt;
&lt;td&gt;Plain English → SQL → instant answer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  2. The Problem — Elasticsearch at $400K/Year
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where the Cost Comes From
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost Driver&lt;/th&gt;
&lt;th&gt;Elasticsearch Behaviour&lt;/th&gt;
&lt;th&gt;Annual Cost Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Row-oriented index, 2-3x compression, needs SSD&lt;/td&gt;
&lt;td&gt;~$100K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compute&lt;/td&gt;
&lt;td&gt;CPU-heavy indexing on every write, inverted index maintenance&lt;/td&gt;
&lt;td&gt;~$150K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Licensing&lt;/td&gt;
&lt;td&gt;Elastic managed service / Elastic Cloud premium&lt;/td&gt;
&lt;td&gt;~$100K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operations&lt;/td&gt;
&lt;td&gt;Shard management, index lifecycle management (ILM), tuning&lt;/td&gt;
&lt;td&gt;~$50K (eng time)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Technical Mismatch
&lt;/h3&gt;

&lt;p&gt;Elasticsearch was built for &lt;strong&gt;full-text search&lt;/strong&gt; on documents (web pages, articles). Application logs and OTEL telemetry are &lt;strong&gt;structured time-series data&lt;/strong&gt; — they need aggregations, not document search.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What teams actually query&lt;/th&gt;
&lt;th&gt;Elasticsearch efficiency&lt;/th&gt;
&lt;th&gt;ClickHouse efficiency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Error count per service last hour"&lt;/td&gt;
&lt;td&gt;Slow (aggregation on inverted index)&lt;/td&gt;
&lt;td&gt;Fast (columnar scan)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"P99 latency for /api/trade endpoint"&lt;/td&gt;
&lt;td&gt;Slow (percentile aggregation)&lt;/td&gt;
&lt;td&gt;Fast (built-in quantile functions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show logs for TraceId = abc123"&lt;/td&gt;
&lt;td&gt;Fast (indexed term lookup)&lt;/td&gt;
&lt;td&gt;Fast (bloom filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which services degraded after deploy at 14:00?"&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Free-text: find logs containing OutOfMemoryError"&lt;/td&gt;
&lt;td&gt;Fast (native)&lt;/td&gt;
&lt;td&gt;Good (tokenbf bloom filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;~90% of real observability queries are aggregations. Elasticsearch is paying full price for a capability (full-text search) that covers only ~10% of use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where ClickHouse wins most — benchmark by query type (cold, 50B rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Category&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;ClickHouse speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Log retrieval (text match + fetch rows)&lt;/td&gt;
&lt;td&gt;"Find logs containing OutOfMemoryError"&lt;/td&gt;
&lt;td&gt;Narrowest gap — ES competitive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error/match counts&lt;/td&gt;
&lt;td&gt;"Count 500 errors in last hour"&lt;/td&gt;
&lt;td&gt;Moderate advantage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service-level breakdowns&lt;/td&gt;
&lt;td&gt;"Group errors by service"&lt;/td&gt;
&lt;td&gt;Large advantage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time-series trend analysis&lt;/td&gt;
&lt;td&gt;"Error rate per minute over last 24h"&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Widest gap — 6x+ faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The speedup grows with analytical complexity. Retrieval-only queries (find and show matching rows) is ES's home turf. The moment you add grouping, aggregation, or time-bucketing on top of a text match — the dominant pattern in observability — ClickHouse's vectorized engine pulls away decisively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Current Stack Pain Points
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application
    │
    ▼ OTEL SDK (instrumented)
OTEL Collector
    │
    ▼ Elasticsearch Exporter
Elasticsearch Cluster
    │
    ▼
Kibana                   ← only UI, no programmatic access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;No way to join log data with business data (e.g., "which users were affected by this error?")&lt;/li&gt;
&lt;li&gt;Kibana dashboards require manual setup — no AI layer&lt;/li&gt;
&lt;li&gt;Retention limited by cost — older logs are deleted or archived to cold storage with no query access&lt;/li&gt;
&lt;li&gt;Every new service that emits logs increases Elasticsearch cost linearly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Why ClickHouse for Observability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Companies Already Doing This
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cloudflare&lt;/strong&gt; — Replaced Elasticsearch with ClickHouse for HTTP logs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;36 petabytes of data&lt;/li&gt;
&lt;li&gt;Queries that took 30+ seconds in Elasticsearch run in &amp;lt; 1 second in ClickHouse&lt;/li&gt;
&lt;li&gt;Cost reduced by 80%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Uber&lt;/strong&gt; — Migrated logging to ClickHouse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100+ billion log rows per day&lt;/li&gt;
&lt;li&gt;Storage reduced from petabytes (Elasticsearch) to terabytes (ClickHouse)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Contentsquare&lt;/strong&gt; — OTEL traces and logs in ClickHouse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;50 billion events/day&lt;/li&gt;
&lt;li&gt;Replaced both Elasticsearch and a custom Cassandra setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Langfuse&lt;/strong&gt; (now part of ClickHouse) — LLM observability platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores all LLM traces in ClickHouse&lt;/li&gt;
&lt;li&gt;Handles millions of traces per day&lt;/li&gt;
&lt;li&gt;This is the same Langfuse we use in the Agentic AI Platform&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ClickHouse vs Elasticsearch for Observability
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage model&lt;/td&gt;
&lt;td&gt;Inverted index (document-oriented)&lt;/td&gt;
&lt;td&gt;Columnar (OLAP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression ratio&lt;/td&gt;
&lt;td&gt;2-3x&lt;/td&gt;
&lt;td&gt;10-30x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregation speed&lt;/td&gt;
&lt;td&gt;Seconds (post-indexing)&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write throughput&lt;/td&gt;
&lt;td&gt;Medium (index maintenance overhead)&lt;/td&gt;
&lt;td&gt;Very high (append-only MergeTree)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full-text search&lt;/td&gt;
&lt;td&gt;Excellent (native inverted index)&lt;/td&gt;
&lt;td&gt;Good (bloom filter indexes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL native support&lt;/td&gt;
&lt;td&gt;Via Logstash/Beats (extra hop)&lt;/td&gt;
&lt;td&gt;Native OTEL exporter (direct)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-data joins&lt;/td&gt;
&lt;td&gt;Not possible&lt;/td&gt;
&lt;td&gt;Join logs with business tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI/LLM integration&lt;/td&gt;
&lt;td&gt;Kibana AI (limited)&lt;/td&gt;
&lt;td&gt;Full MCP + LLM stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiered storage&lt;/td&gt;
&lt;td&gt;ILM (complex config)&lt;/td&gt;
&lt;td&gt;TTL + S3 (2 lines of SQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;Elastic License (paid tiers)&lt;/td&gt;
&lt;td&gt;Apache 2.0 (fully open source)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational complexity&lt;/td&gt;
&lt;td&gt;High (shards, replicas, ILM)&lt;/td&gt;
&lt;td&gt;Low (MergeTree auto-manages)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Architectural Advantage
&lt;/h3&gt;

&lt;p&gt;ClickHouse already stores business data (trades, wallets, users, risk) via the Agentic AI Platform. Adding observability data to the &lt;strong&gt;same cluster&lt;/strong&gt; means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This query is IMPOSSIBLE in Elasticsearch:&lt;/span&gt;
&lt;span class="c1"&gt;-- "Which users were affected by the 500 errors on trading-service between 14:00-14:30?"&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'trading-service'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-17 14:00:00'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-17 14:30:00'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In ClickHouse: one query, instant answer.&lt;br&gt;
In Elasticsearch + separate business DB: impossible without custom ETL.&lt;/p&gt;
&lt;h3&gt;
  
  
  Horizontal Scaling — Parallel Replicas
&lt;/h3&gt;

&lt;p&gt;For high-volume workloads, ClickHouse scales a single query across multiple nodes using &lt;strong&gt;parallel replicas&lt;/strong&gt; — the query is split across the replica fleet and results are merged. At 50B rows (same TextBench dataset):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Nodes&lt;/th&gt;
&lt;th&gt;Total query runtime&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 node&lt;/td&gt;
&lt;td&gt;19.1s&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 nodes&lt;/td&gt;
&lt;td&gt;12.5s&lt;/td&gt;
&lt;td&gt;1.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9 nodes&lt;/td&gt;
&lt;td&gt;6.45s&lt;/td&gt;
&lt;td&gt;3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20 nodes&lt;/td&gt;
&lt;td&gt;3.27s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.8x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For text-search queries specifically, &lt;strong&gt;index sharding&lt;/strong&gt; distributes the inverted index analysis across the replica fleet — delivering a 5.8x speedup on full-text queries at scale. This is relevant for any high-volume environment as log volume grows: add nodes, queries get proportionally faster, no schema or application changes needed.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Platform Architecture
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Target Architecture
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    +------------------------------------------+
                    |         CRYPTO EXCHANGE APPS                |
                    |                                          |
                    | Trading Service  Wallet Service          |
                    | Risk Service     User Service            |
                    | API Gateway      Background Workers      |
                    +-------------------+----------------------+
                                        |
                           OTEL SDK (zero app code change)
                           Auto-instruments: HTTP, DB, Kafka, Redis
                                        |
                    +-------------------v----------------------+
                    |           OTEL COLLECTOR                 |
                    |                                          |
                    |  Receivers: OTLP gRPC/HTTP               |
                    |  Processors: batch, resource, filter     |
                    |  Exporters: clickhouseexporter           |
                    +-------------------+----------------------+
                                        |
                    +-------------------v----------------------+
                    |           CLICKHOUSE CLUSTER             |
                    |                                          |
                    |  database: otel                          |
                    |  ┌─────────────┬──────────┬──────────┐  |
                    |  │ otel_logs   │otel_traces│otel_metrics│ |
                    |  │ (Logs)      │(Traces)  │(Metrics) │  |
                    |  └─────────────┴──────────┴──────────┘  |
                    |                                          |
                    |  database: marts (business data)         |
                    |  ┌────────────────────────────────────┐  |
                    |  │ mart_trades  mart_users  mart_wallets│ |
                    |  └────────────────────────────────────┘  |
                    +---+------------------+-------------------+
                        |                  |
           +------------v---+     +--------v-----------+
           |    GRAFANA     |     |   LIBRECHAT + LLM  |
           |                |     |   (AI queries over |
           | Dashboards     |     |   logs in plain    |
           | Trace Viewer   |     |   English)         |
           | Log Explorer   |     |                    |
           | Alerts         |     |                    |
           +----------------+     +--------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  What Changes vs Current Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OTEL SDK (app)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;td&gt;Same — zero changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;td&gt;Same — only exporter config changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log/Trace UI&lt;/td&gt;
&lt;td&gt;Kibana&lt;/td&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerting&lt;/td&gt;
&lt;td&gt;Kibana Alerts&lt;/td&gt;
&lt;td&gt;Grafana Alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI queries&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;LibreChat + Qwen + MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-data correlation&lt;/td&gt;
&lt;td&gt;Not possible&lt;/td&gt;
&lt;td&gt;Native SQL joins&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  OTEL Collector Deployment — Two Architectures
&lt;/h3&gt;

&lt;p&gt;There are two ways to deploy the OTEL Collector. Choosing the wrong one causes data loss during ClickHouse restarts or network blips.&lt;/p&gt;
&lt;h4&gt;
  
  
  Agent-Only (not recommended for production)
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → OTEL Collector (on same host) → ClickHouse directly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each service runs its own collector. Simple to set up but fragile — if ClickHouse is briefly unreachable, the agent has no buffer and &lt;strong&gt;drops data&lt;/strong&gt;. Works for development or low-stakes services.&lt;/p&gt;
&lt;h4&gt;
  
  
  Aggregator (recommended for production)
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → OTEL Agent (lightweight, on each host)
          │
          ▼
    OTEL Aggregator (central, 1–2 instances)
    - batches writes
    - retries on ClickHouse failure
    - filters/transforms before storage
          │
          ▼
    ClickHouse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The agent on each host is lightweight — just forwards to the aggregator. The aggregator does the heavy work: batching, retry on failure, PII filtering, routing. If ClickHouse goes down for 5 minutes, the aggregator queues data and flushes when it comes back. &lt;strong&gt;No data loss.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# OTEL Agent config (on each service host — minimal, just forwards)&lt;/span&gt;
&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-aggregator.internal:4317&lt;/span&gt;   &lt;span class="c1"&gt;# sends to aggregator, not ClickHouse&lt;/span&gt;

&lt;span class="c1"&gt;# OTEL Aggregator config (central — does batching, retry, filtering)&lt;/span&gt;
&lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;send_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50000&lt;/span&gt;
  &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;

&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://clickhouse.internal:9000&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
    &lt;span class="na"&gt;compress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lz4&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For a crypto exchange:&lt;/strong&gt; Use the Aggregator pattern. Trading and wallet services cannot drop observability data — if an incident occurs during a ClickHouse maintenance window, you need the logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. OpenTelemetry — The Standard We Keep
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why OTEL is the Right Foundation
&lt;/h3&gt;

&lt;p&gt;OpenTelemetry is a CNCF standard for instrumentation — vendor-neutral, supported by every major cloud provider and observability tool. We keep OTEL as our instrumentation layer. Only the &lt;strong&gt;backend destination&lt;/strong&gt; changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application Code → OTEL SDK → OTEL Collector → [ANY BACKEND]
                                                   ↑
                              Swap this: Elasticsearch → ClickHouse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  OTEL Data Types We Capture
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured log events with severity, body, attributes&lt;/td&gt;
&lt;td&gt;"ERROR: failed to execute trade, user=john_doe, error=InsufficientBalance"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Traces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributed request flow across services with timing&lt;/td&gt;
&lt;td&gt;Full journey of a trade order: API → trading-service → PostgreSQL → Kafka&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Numeric measurements over time&lt;/td&gt;
&lt;td&gt;HTTP request rate, error rate, latency histogram, DB connection pool size&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  OTEL Collector — Only the Exporter Changes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# otel-collector-config.yaml&lt;/span&gt;

&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;protocols&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;grpc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4317&lt;/span&gt;
      &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4318&lt;/span&gt;

&lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;send_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;
  &lt;span class="na"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;limit_mib&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;512&lt;/span&gt;
  &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;attributes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;environment&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;upsert&lt;/span&gt;

&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# ─── BEFORE (Elasticsearch) ───────────────────────────────&lt;/span&gt;
  &lt;span class="c1"&gt;# elasticsearch:&lt;/span&gt;
  &lt;span class="c1"&gt;#   endpoints: [https://elastic-cluster:9200]&lt;/span&gt;
  &lt;span class="c1"&gt;#   logs_index: logs-%{+yyyy.MM.dd}&lt;/span&gt;
  &lt;span class="c1"&gt;#   traces_index: traces-%{+yyyy.MM.dd}&lt;/span&gt;

  &lt;span class="c1"&gt;# ─── AFTER (ClickHouse) ────────────────────────────────────&lt;/span&gt;
  &lt;span class="na"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://clickhouse:9000&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
    &lt;span class="na"&gt;logs_table_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;otel_logs&lt;/span&gt;
    &lt;span class="na"&gt;traces_table_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;otel_traces&lt;/span&gt;
    &lt;span class="na"&gt;metrics_table_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel_metrics&lt;/span&gt;
    &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="m"&gt;90&lt;/span&gt;               &lt;span class="c1"&gt;# days — auto-delete old data&lt;/span&gt;
    &lt;span class="na"&gt;compress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lz4&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;           &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;max_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;      &lt;span class="s"&gt;30s&lt;/span&gt;
      &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;300s&lt;/span&gt;

&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;logs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;traces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That is the only config change needed.&lt;/strong&gt; Applications keep emitting OTEL. The collector keeps receiving. Only the destination changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Instrumentation — Zero Code Changes
&lt;/h3&gt;

&lt;p&gt;For most languages, attach the OTEL agent at startup:&lt;/p&gt;

&lt;h4&gt;
  
  
  Java (Spring Boot / any JVM)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to JVM startup — auto-instruments HTTP, JDBC, Kafka, Redis&lt;/span&gt;
java &lt;span class="nt"&gt;-javaagent&lt;/span&gt;:opentelemetry-javaagent.jar &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.service.name&lt;span class="o"&gt;=&lt;/span&gt;trading-service &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.exporter.otlp.endpoint&lt;span class="o"&gt;=&lt;/span&gt;http://otel-collector:4317 &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.traces.exporter&lt;span class="o"&gt;=&lt;/span&gt;otlp &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.metrics.exporter&lt;span class="o"&gt;=&lt;/span&gt;otlp &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.logs.exporter&lt;span class="o"&gt;=&lt;/span&gt;otlp &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-jar&lt;/span&gt; trading-service.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Python (FastAPI / Django)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add 3 lines at startup — auto-instruments HTTP, PostgreSQL, Redis
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPIInstrumentor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.psycopg2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Psycopg2Instrumentor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.requests&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestsInstrumentor&lt;/span&gt;

&lt;span class="n"&gt;FastAPIInstrumentor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;instrument_app&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;Psycopg2Instrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nc"&gt;RequestsInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Node.js
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tracing.js — load before anything else via --require&lt;/span&gt;
&lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/auto-instrumentations-node/register&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Auto-instruments: express, pg, http, redis, mongoose, kafka&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5b. Collection Agent Options — OTEL vs Fluent Bit vs Vector
&lt;/h2&gt;

&lt;p&gt;OTEL Collector is not the only way to ship data to ClickHouse. Three agents are production-proven. The compression difference between them is significant enough to affect your storage cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compression Comparison (real benchmarks — identical log dataset)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Compression Ratio&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fluent Bit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;33×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mature&lt;/td&gt;
&lt;td&gt;HTTP JSONEachRow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Beta / widely used&lt;/td&gt;
&lt;td&gt;HTTP JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OTEL Collector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Alpha for ClickHouse exporter&lt;/td&gt;
&lt;td&gt;Native ClickHouse TCP&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fluent Bit achieves 33× compression because it gives you full control over the schema — you define exactly which fields land in which typed columns. OTEL Collector uses a fixed schema (Map types for attributes) which is more flexible but less compressible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a crypto exchange:&lt;/strong&gt; OTEL Collector is the right default because we already use OTEL instrumentation end-to-end and the fixed schema covers all signals. If storage cost becomes a concern at high volume, Fluent Bit is the migration path — it requires a custom schema but delivers the best compression.&lt;/p&gt;




&lt;h3&gt;
  
  
  When to Use Each
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Use When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OTEL Collector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You already emit OTEL signals (our case). Single config change, zero app changes. Traces + logs + metrics in one pipeline.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fluent Bit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K8s environment, log-heavy workload, storage cost is a priority. Best compression. Mature and battle-tested. Does not handle traces.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You want a fully custom schema with maximum flexibility. Good middle ground — better compression than OTEL, handles more data types than Fluent Bit.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Fluent Bit → ClickHouse Config (for reference)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# fluent-bit.yaml — ships K8s pod logs directly to ClickHouse&lt;/span&gt;
&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="s"&gt;Name          clickhouse&lt;/span&gt;
    &lt;span class="s"&gt;Match         *&lt;/span&gt;
    &lt;span class="s"&gt;Host          clickhouse.internal&lt;/span&gt;
    &lt;span class="s"&gt;Port          &lt;/span&gt;&lt;span class="m"&gt;8123&lt;/span&gt;
    &lt;span class="s"&gt;Database      otel&lt;/span&gt;
    &lt;span class="s"&gt;Table         fluent_logs&lt;/span&gt;
    &lt;span class="s"&gt;# Critical&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;async inserts prevent too-small-batch errors&lt;/span&gt;
    &lt;span class="s"&gt;async_insert  &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="s"&gt;flush         10s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Custom schema for Fluent Bit (33x compression)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fluent_logs&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt;               &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;pod_name&lt;/span&gt;                &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt;               &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;container_name&lt;/span&gt;          &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;severity&lt;/span&gt;                &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;                 &lt;span class="n"&gt;String&lt;/span&gt;                    &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;kubernetes_labels&lt;/span&gt;       &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pod_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Fluent Bit requires &lt;code&gt;async_insert=1&lt;/code&gt; and flush intervals ≥10 seconds. Without this, each log line triggers a separate HTTP insert and ClickHouse performance degrades significantly from too many small writes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  6. ClickHouse Schema — Logs, Traces, Metrics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Logs Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;           &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;             &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanId&lt;/span&gt;              &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceFlags&lt;/span&gt;          &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SeverityText&lt;/span&gt;        &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;-- ERROR, WARN, INFO, DEBUG&lt;/span&gt;
    &lt;span class="n"&gt;SeverityNumber&lt;/span&gt;      &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;         &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;                &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceSchemaUrl&lt;/span&gt;   &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;  &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeSchemaUrl&lt;/span&gt;      &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeName&lt;/span&gt;           &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeVersion&lt;/span&gt;        &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeAttributes&lt;/span&gt;     &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;       &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Skip indexes for fast filtering (replaces Elasticsearch inverted index for common patterns)&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_trace_id&lt;/span&gt;   &lt;span class="n"&gt;TraceId&lt;/span&gt;     &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;bloom_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_body&lt;/span&gt;       &lt;span class="n"&gt;Body&lt;/span&gt;        &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;tokenbf_v1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_service&lt;/span&gt;    &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_severity&lt;/span&gt;   &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toUnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Traces Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;           &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;             &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanId&lt;/span&gt;              &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ParentSpanId&lt;/span&gt;        &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceState&lt;/span&gt;          &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanName&lt;/span&gt;            &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;SpanKind&lt;/span&gt;            &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;-- SERVER, CLIENT, PRODUCER, CONSUMER&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;         &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;  &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Duration&lt;/span&gt;            &lt;span class="n"&gt;Int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                     &lt;span class="c1"&gt;-- nanoseconds&lt;/span&gt;
    &lt;span class="n"&gt;StatusCode&lt;/span&gt;          &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;-- STATUS_CODE_OK, STATUS_CODE_ERROR&lt;/span&gt;
    &lt;span class="n"&gt;StatusMessage&lt;/span&gt;       &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Events&lt;/span&gt;              &lt;span class="n"&gt;Nested&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;Timestamp&lt;/span&gt;       &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;Name&lt;/span&gt;            &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;Attributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Links&lt;/span&gt;               &lt;span class="n"&gt;Nested&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;TraceId&lt;/span&gt;         &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;SpanId&lt;/span&gt;          &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;TraceState&lt;/span&gt;      &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Attributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_trace_id&lt;/span&gt;   &lt;span class="n"&gt;TraceId&lt;/span&gt;     &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;bloom_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_span_name&lt;/span&gt;  &lt;span class="n"&gt;SpanName&lt;/span&gt;    &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_service&lt;/span&gt;    &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SpanName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toUnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Metrics Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_metrics&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;  &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceSchemaUrl&lt;/span&gt;   &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScopeName&lt;/span&gt;           &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScopeVersion&lt;/span&gt;        &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScopeAttributes&lt;/span&gt;     &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeDroppedAttrCount&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ScopeSchemaUrl&lt;/span&gt;      &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MetricName&lt;/span&gt;          &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;MetricDescription&lt;/span&gt;   &lt;span class="n"&gt;String&lt;/span&gt;                 &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;MetricUnit&lt;/span&gt;          &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Attributes&lt;/span&gt;          &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;StartTimeUnix&lt;/span&gt;       &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TimeUnix&lt;/span&gt;            &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;               &lt;span class="n"&gt;Float64&lt;/span&gt;                &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Flags&lt;/span&gt;               &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Exemplars&lt;/span&gt;           &lt;span class="n"&gt;Nested&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;FilteredAttributes&lt;/span&gt; &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;TimeUnix&lt;/span&gt;           &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;Value&lt;/span&gt;              &lt;span class="n"&gt;Float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;SpanId&lt;/span&gt;             &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;TraceId&lt;/span&gt;            &lt;span class="n"&gt;String&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;AggTemp&lt;/span&gt;             &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IsMonotonic&lt;/span&gt;         &lt;span class="n"&gt;UInt8&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TimeUnix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MetricName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toUnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TimeUnix&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TimeUnix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why &lt;code&gt;LowCardinality&lt;/code&gt; and &lt;code&gt;CODEC&lt;/code&gt; Matter
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LowCardinality(String)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dictionary-encodes repeated values (ServiceName, SeverityText)&lt;/td&gt;
&lt;td&gt;3-5x compression + faster filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CODEC(Delta, ZSTD(1))&lt;/code&gt; on Timestamp&lt;/td&gt;
&lt;td&gt;Delta encodes sequential timestamps, then ZSTD compresses&lt;/td&gt;
&lt;td&gt;5-10x compression on time columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CODEC(ZSTD(1))&lt;/code&gt; on String cols&lt;/td&gt;
&lt;td&gt;General purpose compression&lt;/td&gt;
&lt;td&gt;3-7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;bloom_filter&lt;/code&gt; index on TraceId&lt;/td&gt;
&lt;td&gt;Skips data blocks that can't contain the TraceId&lt;/td&gt;
&lt;td&gt;Near O(1) lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;tokenbf_v1&lt;/code&gt; index on Body&lt;/td&gt;
&lt;td&gt;Token-based bloom filter for keyword search&lt;/td&gt;
&lt;td&gt;Skips irrelevant blocks without full scan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PARTITION BY toDate(Timestamp)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One partition per day — old days auto-deleted by TTL&lt;/td&gt;
&lt;td&gt;Instant TTL, no maintenance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ORDER BY (ServiceName, SeverityText, Timestamp)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Most queries filter by service + severity + time&lt;/td&gt;
&lt;td&gt;Queries read minimal data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  7. End-to-End Distributed Tracing — HTTP to ClickHouse
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Full Trace Flow
&lt;/h3&gt;

&lt;p&gt;Every incoming HTTP request gets a &lt;code&gt;TraceId&lt;/code&gt;. That same ID propagates through every service and DB call. ClickHouse stores all spans. Grafana shows the waterfall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → [traceparent header injected]
              │
              ▼
[1] Nginx               → Span: http.server (method, url, status, duration)
              │           propagates traceparent to upstream
              ▼
[2] API Gateway         → Span: route handling
              │
              ▼
[3] Trading Service     → Span: business logic
              │
        ┌─────┴──────┐
        ▼             ▼
[4] PostgreSQL      [5] ClickHouse    → Span: db.query (SQL text, duration, rows)
(OTEL JDBC auto)    (native OTEL)
        │
        ▼
[6] Kafka Producer      → Span: messaging.publish (topic, partition)

All spans → OTEL Collector → ClickHouse otel_traces table
                                         │
                                         ▼
                                   Grafana Trace UI
                                   (full waterfall)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 1 — Nginx (Trace Entry Point)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# nginx.conf&lt;/span&gt;
&lt;span class="k"&gt;load_module&lt;/span&gt; &lt;span class="nc"&gt;modules/ngx&lt;/span&gt;&lt;span class="s"&gt;_otel_module.so&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;http&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;otel_exporter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;endpoint&lt;/span&gt; &lt;span class="nf"&gt;otel-collector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4317&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="kn"&gt;otel_service_name&lt;/span&gt;  &lt;span class="s"&gt;"api-gateway"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;otel_trace&lt;/span&gt;         &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;otel_trace_context&lt;/span&gt; &lt;span class="s"&gt;propagate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;# injects traceparent into upstream request&lt;/span&gt;

    &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/api/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;otel_trace&lt;/span&gt;      &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_name&lt;/span&gt;  &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$request_method&lt;/span&gt; &lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_attr&lt;/span&gt;  &lt;span class="s"&gt;http.method&lt;/span&gt;  &lt;span class="nv"&gt;$request_method&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_attr&lt;/span&gt;  &lt;span class="s"&gt;http.url&lt;/span&gt;     &lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_attr&lt;/span&gt;  &lt;span class="s"&gt;http.status&lt;/span&gt;  &lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;proxy_pass&lt;/span&gt;      &lt;span class="s"&gt;http://backend&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 2 — Application Services (Auto-Instrumented)
&lt;/h3&gt;

&lt;p&gt;OTEL Java agent auto-instruments every JDBC query. For ClickHouse queries, pass the context explicitly:&lt;/p&gt;

&lt;h4&gt;
  
  
  Python — Tracing ClickHouse Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.propagate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;clickhouse_connect&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trading-service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_risk_profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse.mart_user_risk_profile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;marts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM mart_user_risk_profile WHERE username = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Inject current trace context into ClickHouse HTTP headers
&lt;/span&gt;        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="nf"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clickhouse_connect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;http_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;    &lt;span class="c1"&gt;# ClickHouse reads traceparent from here
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM mart_user_risk_profile WHERE username = {username:String}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.rows_returned&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result_rows&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result_rows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Go — Tracing ClickHouse Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/ClickHouse/clickhouse-go/v2"&lt;/span&gt;
    &lt;span class="s"&gt;"go.opentelemetry.io/otel"&lt;/span&gt;
    &lt;span class="s"&gt;"go.opentelemetry.io/otel/attribute"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trading-service"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;getTopTraders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;Trader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"clickhouse.mart_trades_futures"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;End&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetAttributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"db.system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="s"&gt;"clickhouse"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"db.operation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"SELECT"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"db.statement"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"SELECT username, SUM(quantity*price) FROM mart_trades_futures ..."&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// clickhouse-go/v2 propagates ctx trace headers automatically&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;clickhouse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;clickhouse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Addr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"clickhouse:9000"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueryContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;`
        SELECT username, SUM(quantity * price) as volume
        FROM mart_trades_futures
        WHERE transaction_time &amp;gt;= today()
        GROUP BY username
        ORDER BY volume DESC
        LIMIT 10
    `&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parseRows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 3 — ClickHouse Native Tracing
&lt;/h3&gt;

&lt;p&gt;ClickHouse reads the &lt;code&gt;traceparent&lt;/code&gt; header from every query and emits its own internal spans to &lt;code&gt;system.opentelemetry_span_log&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ClickHouse internal spans for a specific trace&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;parent_span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;operation_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finish_time_us&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'clickhouse.query'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sql_query&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opentelemetry_span_log&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;trace_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'4bf92f3577b34da6a3ce929d0e0e4736'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Export these to &lt;code&gt;otel_traces&lt;/code&gt; via a materialized view so they appear in Grafana alongside app spans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Auto-export ClickHouse internal spans to otel_traces&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clickhouse_spans_mv&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;fromUnixTimestamp64Micro&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_span_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ParentSpanId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;''&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;TraceState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;operation_name&lt;/span&gt;                           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'SPAN_KIND_SERVER'&lt;/span&gt;                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanKind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;                             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                                    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attribute&lt;/span&gt;                                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finish_time_us&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;-- convert to nanoseconds&lt;/span&gt;
    &lt;span class="s1"&gt;'STATUS_CODE_OK'&lt;/span&gt;                         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;StatusCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;''&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;StatusMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Events.Timestamp`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Events.Name`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Events.Attributes`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.TraceId`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.SpanId`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.TraceState`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.Attributes`&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opentelemetry_span_log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What a Full Trace Looks Like in ClickHouse
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Full waterfall for a single trade request&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SpanName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Duration&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;                          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.method'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;http_method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.route'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.statement'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.rows_affected'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StatusCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'4bf92f3577b34da6a3ce929d0e0e4736'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result — full waterfall in one query:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SpanName                          Service              duration_ms
──────────────────────────────    ──────────────────── ───────────
http.server POST /api/trade       api-gateway   245.3 ms
  └─ trade.execute                trading-service      241.1 ms
       ├─ db.query (risk check)   trading-service       18.4 ms  ← SELECT mart_user_risk_profile
       ├─ db.query (insert trade) trading-service        4.2 ms  ← INSERT INTO mart_trades_futures
       ├─ SELECT (internal CH)    clickhouse             3.1 ms  ← ClickHouse internal span
       └─ kafka.produce           trading-service        2.1 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  8. Full-Text Log Search — Replacing Kibana Discover
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Updated Position — ClickHouse Now Competitive on Full-Text Search
&lt;/h3&gt;

&lt;p&gt;Full-text search used to be the primary reason to keep Elasticsearch. ClickHouse has significantly closed this gap with a &lt;strong&gt;new &lt;code&gt;text&lt;/code&gt; index type&lt;/strong&gt; that works natively on object storage (S3) with the same performance as local disk — removing the last major technical advantage Elasticsearch held.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reference: &lt;a href="https://clickhouse.com/blog/clickhouse-full-text-search-object-storage" rel="noopener noreferrer"&gt;ClickHouse Full-Text Search on Object Storage&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  New &lt;code&gt;text&lt;/code&gt; Index — How to Add It to otel_logs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Add the new text index to the Body column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;body_text_idx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'splitByNonAlpha'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;preprocessor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Materialize the index on existing data&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;body_text_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What the &lt;code&gt;text&lt;/code&gt; index supports:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hasToken&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hasToken(Body, 'OutOfMemoryError')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exact token match — fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hasAllTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hasAllTokens(Body, ['trade', 'failed'])&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;All tokens must appear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hasAnyTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hasAnyTokens(Body, ['ERROR', 'FATAL'])&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any token match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LIKE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Body LIKE '%OutOfMemoryError%'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wildcard match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;startsWith&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;startsWith(Body, 'WARN')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prefix match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;match&lt;/code&gt; (regex)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;match(Body, 'user_[0-9]+')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Regex search&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Performance:&lt;/strong&gt; 7.4x speedup vs full table scan on text search (ClickHouse benchmark, 10M rows with array tags).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works on S3:&lt;/strong&gt; The index uses sequential dictionary reads with front-coding compression — no random I/O, which is the key constraint on object storage. 94.5% of tokens appear in ≤6 rows, so embedded posting lists handle the vast majority of lookups without reading large posting lists.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Real-world proof point — gitTrends:&lt;/strong&gt; ClickHouse's own reference demo (&lt;a href="https://github.com/ClickHouse/gitTrends" rel="noopener noreferrer"&gt;github.com/ClickHouse/gitTrends&lt;/a&gt;) searches &lt;strong&gt;10 billion+ GitHub events&lt;/strong&gt; using &lt;code&gt;hasToken()&lt;/code&gt; on a &lt;code&gt;body&lt;/code&gt; text index — the exact same pattern used for log search here. The app lets users compare FTS index vs bloom-filter skip index vs full table scan in real time, with live row-scan counters streamed from ClickHouse. Sub-second queries at 10B rows validate the production viability of &lt;code&gt;hasToken()&lt;/code&gt; for high-volume text search workloads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How ClickHouse Handles Log Search
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Keyword Search — New &lt;code&gt;text&lt;/code&gt; index (preferred)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Uses new text index — 7.4x faster than full scan&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;hasToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'OutOfMemoryError'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- LIKE also uses the text index automatically&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%OutOfMemoryError%'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Exact TraceId Lookup (bloom filter)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Near O(1) — bloom_filter index makes this very fast&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'abc123def456'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Structured Attribute Search
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Filter on OTEL log attributes — common in structured logging&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.status_code'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'500'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Multi-Condition Log Search (most common Kibana query)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- "All ERROR logs from trading-service in last 2 hours that mention user_id"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'error_code'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'trading-service'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;    &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;hasToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;-- uses text index&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Case-Insensitive Search (preprocessor)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The text index preprocessor lowercases at index time — search is case-insensitive&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;hasToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'outofmemoryerror'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;-- matches OOM, oom, Oom, etc.&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full-Text Search Comparison — Updated
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Type&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Exact token match (&lt;code&gt;hasToken&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~15-30ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Near-comparable with text index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyword search (&lt;code&gt;LIKE&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~20-50ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;text index — 7.4x faster than scan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured attribute filter&lt;/td&gt;
&lt;td&gt;~50ms&lt;/td&gt;
&lt;td&gt;~5-20ms&lt;/td&gt;
&lt;td&gt;CH wins (columnar)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregation (error count by service)&lt;/td&gt;
&lt;td&gt;~200ms-2s&lt;/td&gt;
&lt;td&gt;~10-50ms&lt;/td&gt;
&lt;td&gt;CH wins significantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time-range + service filter&lt;/td&gt;
&lt;td&gt;~100ms&lt;/td&gt;
&lt;td&gt;~10-30ms&lt;/td&gt;
&lt;td&gt;CH wins (partition pruning)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TraceId lookup&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;td&gt;~20-50ms&lt;/td&gt;
&lt;td&gt;Comparable (bloom filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free-text fuzzy search&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good (regex via &lt;code&gt;match()&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;ES still leads for fuzzy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search on S3/object storage&lt;/td&gt;
&lt;td&gt;Degraded (SSD required)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Full speed on S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CH advantage — no SSD needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; ~95% of observability queries are structured (service + time + severity + attribute) — ClickHouse wins on all of those. For the remaining ~5% requiring keyword search in log bodies, the new &lt;code&gt;text&lt;/code&gt; index brings ClickHouse to near-Elasticsearch performance. The only remaining ES advantage is fuzzy/phrase search, which is rarely needed for structured application logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coming soon in ClickHouse:&lt;/strong&gt; phrase search (position-aware token matching) and JSON column indexing — which will close the remaining gap further.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Visualization — Grafana Replaces Kibana
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ClickHouse datasource plugin&lt;/span&gt;
grafana-cli plugins &lt;span class="nb"&gt;install &lt;/span&gt;grafana-clickhouse-datasource

&lt;span class="c"&gt;# Or in docker-compose:&lt;/span&gt;
environment:
  - &lt;span class="nv"&gt;GF_INSTALL_PLUGINS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;grafana-clickhouse-datasource
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Datasource Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# grafana/provisioning/datasources/clickhouse.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;datasources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClickHouse-OTEL&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana-clickhouse-datasource&lt;/span&gt;
    &lt;span class="na"&gt;uid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse-otel&lt;/span&gt;
    &lt;span class="na"&gt;jsonData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
      &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
      &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana_readonly&lt;/span&gt;
    &lt;span class="na"&gt;secureJsonData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${GRAFANA_CH_PASSWORD}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Dashboards to Build
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Service Health Overview
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Error rate per service — last 1 hour, 1-minute buckets&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;toStartOfMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;countIf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_rate_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Latency Distribution (P50 / P95 / P99)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- HTTP endpoint latency percentiles — last 30 minutes&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.route'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p50_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p95_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;request_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SpanKind&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'SPAN_KIND_SERVER'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;  &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Slowest Database Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Slowest ClickHouse queries in last 1 hour&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.statement'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.system'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'postgresql'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Log Explorer (Kibana Discover equivalent)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Live log tail with filtering — wire to Grafana Logs panel&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeFrom&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;-- Grafana time variable&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeTo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'$service'&lt;/span&gt;      &lt;span class="c1"&gt;-- Grafana template variable&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  5. Distributed Trace Waterfall
&lt;/h4&gt;

&lt;p&gt;Configure Grafana Explore → Traces with ClickHouse datasource:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;otel.otel_traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TraceID column&lt;/td&gt;
&lt;td&gt;TraceId&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SpanID column&lt;/td&gt;
&lt;td&gt;SpanId&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parent SpanID&lt;/td&gt;
&lt;td&gt;ParentSpanId&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Start time&lt;/td&gt;
&lt;td&gt;Timestamp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duration&lt;/td&gt;
&lt;td&gt;Duration (nanoseconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service name&lt;/td&gt;
&lt;td&gt;ServiceName&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operation name&lt;/td&gt;
&lt;td&gt;SpanName&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Click any log line with a TraceId → jump directly to full trace waterfall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correlating Logs + Traces + Business Data in Grafana
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Grafana panel: "Affected users for errors in selected time range"&lt;/span&gt;
&lt;span class="c1"&gt;-- This is impossible in Kibana — requires joining log data with business data&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_error&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'$service'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;    &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeFrom&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeTo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Grafana Plugin — Upcoming Features
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;These are confirmed items being prototyped by the ClickHouse Grafana plugin team — not yet shipped. Reference: &lt;a href="https://clickhouse.com/blog/grafana-plugin-vision" rel="noopener noreferrer"&gt;Our Vision for the ClickHouse Grafana Plugin&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;1. Deployment &amp;amp; K8s Annotation Presets&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Grafana annotations are vertical markers on time-series panels that flag notable events. The plugin will generate these automatically from OTel data — no manual query writing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it surfaces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment detection&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ResourceAttributes['service.version']&lt;/code&gt; change&lt;/td&gt;
&lt;td&gt;Service version change / rollback markers on dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K8s lifecycle events&lt;/td&gt;
&lt;td&gt;OTel resource attributes&lt;/td&gt;
&lt;td&gt;Pod restarts, OOM kills, autoscaling events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a crypto exchange: deploy a new trading-service version → the timestamp appears as a marker on all latency/error dashboards automatically. Immediately answers "did this error spike start at deployment?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. JWT Per-User Query Identity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Currently, all Grafana users share a single ClickHouse datasource credential. The roadmap item forwards each Grafana user's JWT identity to ClickHouse, enabling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row-level access control&lt;/strong&gt; based on actual user identity — compliance team sees only compliance-relevant tables, risk team sees risk tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-user audit trail&lt;/strong&gt; in ClickHouse query log — every dashboard query is attributed to a named person, not a shared service account&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-user cost tracking&lt;/strong&gt; — token/compute cost per team member&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This closes the gap between Kibana's per-user access model and the current Grafana/ClickHouse setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Visual Metrics Builder (OTel Map Column Support)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OTel metrics (CPU, memory, network I/O) currently require writing SQL aggregation queries by hand. The upcoming metrics builder provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select metric name from a dropdown (populated from the &lt;code&gt;otel_metrics&lt;/code&gt; table)&lt;/li&gt;
&lt;li&gt;Choose aggregation (sum, avg, max, p99)&lt;/li&gt;
&lt;li&gt;Add group-by dimensions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key improvement: OTel uses Map-type columns (&lt;code&gt;ResourceAttributes&lt;/code&gt;, &lt;code&gt;LogAttributes&lt;/code&gt;) containing key-value pairs. The builder will expose a &lt;strong&gt;key picker&lt;/strong&gt; so users can filter on &lt;code&gt;ResourceAttributes['k8s.namespace.name']&lt;/code&gt; or &lt;code&gt;ResourceAttributes['host.name']&lt;/code&gt; without writing bracket notation. Makes infrastructure metrics explorable without SQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Out-of-the-Box OTel + K8s Dashboards&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The plugin will ship importable JSON dashboards covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log volume by severity with per-service breakdown&lt;/li&gt;
&lt;li&gt;Trace duration distribution and service dependency map&lt;/li&gt;
&lt;li&gt;Per-service RED metrics (request rate, error rate, duration)&lt;/li&gt;
&lt;li&gt;Top spans visibility&lt;/li&gt;
&lt;li&gt;Kubernetes observability (namespaces, pods, nodes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Goal: &lt;strong&gt;from data ingestion to usable dashboards in minutes&lt;/strong&gt;, not hours of manual panel building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Compact Search-First Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A new query mode resembling the ClickStack UI — a search bar with filter pills, no SQL required for common tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click &lt;code&gt;+&lt;/code&gt; / &lt;code&gt;-&lt;/code&gt; on any field value in a log detail panel to instantly add include/exclude filters&lt;/li&gt;
&lt;li&gt;Select text within a log body to add a "line contains" full-text filter (backed by &lt;code&gt;hasToken()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Facet autocomplete for column names, operators, and values&lt;/li&gt;
&lt;li&gt;SQL preview pane shows the generated query live — "Edit as SQL" button opens the editor pre-populated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aimed at operators and on-call engineers who need to investigate quickly without knowing ClickHouse SQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Data Retention &amp;amp; Tiered Storage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Cost Driver — Retention × Volume
&lt;/h3&gt;

&lt;p&gt;Elasticsearch cost is roughly: &lt;code&gt;daily_volume_GB × retention_days × cost_per_GB_per_day&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;ClickHouse changes the equation at two levels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Compression:&lt;/strong&gt; 1TB of raw logs → ~200GB in ClickHouse (&lt;strong&gt;5x smaller&lt;/strong&gt; end-to-end; 16x on column files — ClickHouse/TextBench)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tiered storage:&lt;/strong&gt; Hot recent data on SSD, cold older data on cheap S3&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  TTL — Simple One-Line Retention
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Delete logs older than 90 days — runs automatically in background&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Different retention per severity&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;   &lt;span class="c1"&gt;-- default: 7 days&lt;/span&gt;
        &lt;span class="n"&gt;OVERRIDE&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- errors: 90 days&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;365&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'CRITICAL'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tiered Storage — Hot/Cold/Archive
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Storage policy: SSD (0-7 days) → S3 (7-90 days) → delete (90+ days)&lt;/span&gt;
&lt;span class="c1"&gt;-- Define in storage_configuration in config.xml:&lt;/span&gt;

&lt;span class="c1"&gt;-- hot:  /var/lib/clickhouse/data  (local SSD, fast)&lt;/span&gt;
&lt;span class="c1"&gt;-- cold: s3://clickhouse-cold-tier/ (S3, cheap)&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;VOLUME&lt;/span&gt; &lt;span class="s1"&gt;'cold'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- move to S3&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;-- delete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Storage Comparison
&lt;/h3&gt;

&lt;p&gt;Real benchmark numbers from ClickHouse/TextBench (identical hardware: AWS m6i.8xlarge, OTEL log data):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1B rows&lt;/td&gt;
&lt;td&gt;245 GB&lt;/td&gt;
&lt;td&gt;49 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10B rows&lt;/td&gt;
&lt;td&gt;~1.2 TB&lt;/td&gt;
&lt;td&gt;~245 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50B rows&lt;/td&gt;
&lt;td&gt;12 TB&lt;/td&gt;
&lt;td&gt;2.4 TB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Column file compression&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Elasticsearch Uses 5x More Storage — Component Breakdown
&lt;/h3&gt;

&lt;p&gt;The 5x gap is not just compression. It comes from four on-disk structures that Elasticsearch maintains but ClickHouse either handles more efficiently or doesn't need at all (at 50B rows):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Component&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;Why the difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Columnar storage (doc_values / column files)&lt;/td&gt;
&lt;td&gt;5.02 TiB&lt;/td&gt;
&lt;td&gt;1.92 TiB&lt;/td&gt;
&lt;td&gt;CH chains codecs per column (Delta+ZSTD, GCD) vs ES generic compression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inverted index&lt;/td&gt;
&lt;td&gt;3.37 TiB&lt;/td&gt;
&lt;td&gt;515 GiB&lt;/td&gt;
&lt;td&gt;CH inverted index is designed for analytics granules, not Lucene doc IDs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stored fields (&lt;code&gt;_source&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.00 TiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ES stores original JSON to reconstruct documents; CH reconstructs from columns directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Points + norms (BKD trees, relevance scoring)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;617 GiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ES maintains numeric range indexes and per-doc relevance weights; CH uses sparse primary index (320 MiB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12.01 TiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.43 TiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The two biggest drivers of Elasticsearch's overhead:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;_source&lt;/code&gt; (3 TiB) — a near-complete second copy of all log data stored as compressed JSON so Elasticsearch can reconstruct the original document on retrieval. ClickHouse has no equivalent because it reconstructs rows directly from individual column files.&lt;/li&gt;
&lt;li&gt;Points + norms (617 GiB) — BKD-tree numeric range indexes and per-document relevance scoring metadata. Useful for web search ranking; irrelevant for observability queries. ClickHouse's entire sparse primary index for 50B rows is 320 MiB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ingestion speed (50B rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Under 4 hours&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~5 days&lt;/strong&gt; (after pipeline tuning)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Estimated storage at this scale:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;90-day retention&lt;/td&gt;
&lt;td&gt;~9TB on SSD&lt;/td&gt;
&lt;td&gt;~1.8TB on SSD (or much less on S3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year retention&lt;/td&gt;
&lt;td&gt;~36TB&lt;/td&gt;
&lt;td&gt;~7TB on S3 (cheap)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage cost/year&lt;/td&gt;
&lt;td&gt;~$100K+&lt;/td&gt;
&lt;td&gt;~$2-5K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  11. AI Layer — Natural Language Over Logs and Traces
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Unique Advantage
&lt;/h3&gt;

&lt;p&gt;Since ClickHouse already hosts the Agentic AI Platform (LibreChat + Qwen + MCP), observability data in the same cluster is &lt;strong&gt;automatically queryable via plain English&lt;/strong&gt;. No additional setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example AI Queries Over Logs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Which services had the most errors in the last hour?"&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;

&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Show me all traces where ClickHouse queries took more than 500ms today"&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.statement'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_traces&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.system'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500000000&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;

&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Which users were affected by the trading-service errors between 2pm and 3pm today?"&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'trading-service'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;toIntervalHour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;toIntervalHour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last query is &lt;strong&gt;impossible in Kibana&lt;/strong&gt; — it crosses observability data (logs) with business data (users). In ClickHouse, it's one SQL query the AI generates automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Glossary Additions for Observability
&lt;/h3&gt;

&lt;p&gt;Add to &lt;code&gt;business_glossary.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Observability terms&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logs"&lt;/span&gt; &lt;span class="s"&gt;= otel_logs WHERE SeverityText = 'ERROR'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slow&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query"&lt;/span&gt; &lt;span class="s"&gt;= otel_traces WHERE SpanAttributes['db.system'] IN ('clickhouse','postgresql') AND Duration &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;500000000&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5xx"&lt;/span&gt; &lt;span class="s"&gt;= otel_logs WHERE LogAttributes['http.status_code'] LIKE '5%'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trade&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;service"&lt;/span&gt; &lt;span class="s"&gt;= ServiceName = 'trading-service'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace"&lt;/span&gt; &lt;span class="s"&gt;= otel_traces WHERE TraceId = '&amp;lt;id&amp;gt;'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency"&lt;/span&gt; &lt;span class="s"&gt;= Duration / 1e6 (milliseconds)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;p99"&lt;/span&gt; &lt;span class="s"&gt;= quantile(0.99)(Duration) / 1e6&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rate"&lt;/span&gt; &lt;span class="s"&gt;= countIf(SeverityText='ERROR') / count() * &lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  12. Migration Plan — Zero Downtime Cutover
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A — Standard OTEL Collector (Recommended for Simple Setups)
&lt;/h3&gt;

&lt;p&gt;If you run a small number of centralized OTEL Collectors (one per environment), the standard approach is sufficient — edit the collector config YAML to add the ClickHouse exporter alongside the existing Elasticsearch exporter for dual-write. No extra tooling needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B — BindPlane (Recommended for Large Collector Fleets)
&lt;/h3&gt;

&lt;p&gt;If you run OTEL Collectors on many individual servers/services, &lt;strong&gt;BindPlane&lt;/strong&gt; is worth considering. It is a centralized management platform for OTEL Collector fleets — instead of editing YAML configs on each server manually, you manage all collector configurations from one dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Without BindPlane&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;Edit otel-collector.yaml on server &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="s"&gt;Edit otel-collector.yaml on server &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="s"&gt;Edit otel-collector.yaml on server N&lt;/span&gt;
  &lt;span class="s"&gt;Restart each collector&lt;/span&gt;
  &lt;span class="s"&gt;...&lt;/span&gt;

&lt;span class="na"&gt;With BindPlane&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;Add ClickHouse destination once in BindPlane UI&lt;/span&gt;
  &lt;span class="s"&gt;Roll out to entire fleet in one click&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What BindPlane adds for this migration:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Central config management&lt;/td&gt;
&lt;td&gt;One change pushes to all collectors instantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dual-write in one click&lt;/td&gt;
&lt;td&gt;Route to Elasticsearch AND ClickHouse simultaneously without touching individual collectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service-by-service cutover&lt;/td&gt;
&lt;td&gt;Route &lt;code&gt;trading-service&lt;/code&gt; logs to ClickHouse first, validate, then add more services gradually&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Severity-based routing&lt;/td&gt;
&lt;td&gt;Route ERROR logs to Elasticsearch (keep during validation), INFO/DEBUG to ClickHouse only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safe fleet rollout&lt;/td&gt;
&lt;td&gt;Progressive rollout with automatic rollback on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;130+ sources and destinations&lt;/td&gt;
&lt;td&gt;Supports standard OTEL ClickHouse exporter as destination&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;BindPlane for self-managed ClickHouse:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BindPlane's native "ClickStack destination" connects to ClickHouse Cloud managed product. For self-managed ClickHouse, use the standard OTEL &lt;code&gt;clickhouseexporter&lt;/code&gt; as a generic OTLP destination in BindPlane — same outcome, slightly more manual config. Example BindPlane destination config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BindPlane destination config for self-managed ClickHouse&lt;/span&gt;
&lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otlp&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse-self-managed&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://clickhouse.internal:9000&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;x-clickhouse-database&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;insecure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reference:&lt;/strong&gt; &lt;a href="https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Phase 1 — Deploy &amp;amp; Validate (Week 1-2)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deploy ClickHouse schema&lt;/td&gt;
&lt;td&gt;Create otel_logs, otel_traces, otel_metrics tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add ClickHouse exporter to OTEL Collector&lt;/td&gt;
&lt;td&gt;Dual-write: send to both Elasticsearch AND ClickHouse (via YAML edit or BindPlane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set up Grafana&lt;/td&gt;
&lt;td&gt;Install ClickHouse datasource, build core dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validate data parity&lt;/td&gt;
&lt;td&gt;Compare row counts, spot-check log content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test trace waterfall&lt;/td&gt;
&lt;td&gt;Pick 5-10 real TraceIds, verify waterfall in Grafana matches Kibana&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 2 — Parallel Run (Week 3-4)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Run both stacks simultaneously&lt;/td&gt;
&lt;td&gt;Elasticsearch + ClickHouse receive same data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrate dashboards&lt;/td&gt;
&lt;td&gt;Rebuild all Kibana dashboards in Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrate alerts&lt;/td&gt;
&lt;td&gt;Recreate all Kibana alerts in Grafana Alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Train teams&lt;/td&gt;
&lt;td&gt;Grafana walkthrough for each team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build AI log queries&lt;/td&gt;
&lt;td&gt;Add observability terms to business glossary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gradual service cutover (optional)&lt;/td&gt;
&lt;td&gt;Use BindPlane to route one service at a time to ClickHouse-only, validate, expand&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 3 — Cutover (Week 5)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Confirm all dashboards working in Grafana&lt;/td&gt;
&lt;td&gt;Sign-off from each team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove Elasticsearch exporter from OTEL Collector&lt;/td&gt;
&lt;td&gt;Single line config change (or one click in BindPlane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verify ClickHouse-only flow&lt;/td&gt;
&lt;td&gt;24-hour monitoring window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cancel Elasticsearch subscription&lt;/td&gt;
&lt;td&gt;After 48-hour clean run&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 4 — Optimise (Week 6+)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tune TTL and tiered storage&lt;/td&gt;
&lt;td&gt;Configure S3 cold tier based on actual usage patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enable AI log queries in LibreChat&lt;/td&gt;
&lt;td&gt;Add observability glossary, test with teams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set up cross-data dashboards&lt;/td&gt;
&lt;td&gt;Logs + business data correlation panels in Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance tuning&lt;/td&gt;
&lt;td&gt;Review slow queries via system.query_log, tune ORDER BY keys if needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Risk Mitigation During Migration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dual-write period:&lt;/strong&gt; Both systems receive data simultaneously for 4 weeks — no data loss risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback:&lt;/strong&gt; Removing ClickHouse exporter (or reverting BindPlane config) restores Elasticsearch-only in seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No application changes:&lt;/strong&gt; OTEL SDK configuration is unchanged throughout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradual cutover:&lt;/strong&gt; Cut over one service at a time using BindPlane routing rules if preferred&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  13. Cost Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Current Elasticsearch Cost Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch managed service (compute)&lt;/td&gt;
&lt;td&gt;~$200K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage (SSD, replicated)&lt;/td&gt;
&lt;td&gt;~$100K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Licensing (Elastic managed / premium)&lt;/td&gt;
&lt;td&gt;~$50K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time (ops, tuning, ILM)&lt;/td&gt;
&lt;td&gt;~$50K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$400K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ClickHouse Cost (Self-Hosted)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse nodes (2x m7g.4xlarge, Graviton3)&lt;/td&gt;
&lt;td&gt;~$18K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage SSD (hot, last 7 days)&lt;/td&gt;
&lt;td&gt;~$3K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 (cold, 7-90 days)&lt;/td&gt;
&lt;td&gt;~$2K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Keeper (3x t3.small for consensus)&lt;/td&gt;
&lt;td&gt;~$2K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time (minimal ops)&lt;/td&gt;
&lt;td&gt;~$10K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$35K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ClickHouse Cost (ClickHouse Cloud — Recommended Start)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Cloud (auto-scaling)&lt;/td&gt;
&lt;td&gt;~$36-60K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 tiered storage&lt;/td&gt;
&lt;td&gt;Included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time&lt;/td&gt;
&lt;td&gt;~$5K (fully managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$41-65K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Savings Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;th&gt;Saving vs Elasticsearch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Current (Elasticsearch)&lt;/td&gt;
&lt;td&gt;$400K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Cloud&lt;/td&gt;
&lt;td&gt;$41-65K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$335-359K saved (84-90%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Self-Hosted&lt;/td&gt;
&lt;td&gt;$35K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$365K saved (91%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Additional Value (Not Counted in Savings)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AI queries over logs — eliminates ad-hoc log digging by engineering (~20 hrs/month)&lt;/li&gt;
&lt;li&gt;Cross-data correlation — compliance/risk can correlate errors with affected users instantly&lt;/li&gt;
&lt;li&gt;Longer retention — at ClickHouse costs, retain 1 year vs 90 days for same budget&lt;/li&gt;
&lt;li&gt;Unified cluster — observability + business data in one system, one operations team&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  14. Risk Assessment
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Likelihood&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full-text search gaps&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;95% of queries are structured; bloom filters cover keyword search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data loss during migration&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;4-week dual-write window eliminates risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana learning curve&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Grafana is widely used; team familiarity is high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse cluster instability&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;2-replica HA, Keeper for consensus, daily S3 backups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector overload&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Batch processor + memory limiter configured; scale collector horizontally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema changes in new service&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;MergeTree handles new columns gracefully; OTEL schema is stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold data access latency (S3)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;S3 queries are slower but acceptable for historical lookups&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  15. Success Metrics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  30-Day Targets (Post Cutover)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All dashboards migrated to Grafana&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch subscription cancelled&lt;/td&gt;
&lt;td&gt;Done&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log query latency (aggregation)&lt;/td&gt;
&lt;td&gt;&amp;lt; 500ms for last 24h queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace lookup latency&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 seconds for TraceId lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data compression vs Elasticsearch&lt;/td&gt;
&lt;td&gt;&amp;gt; 5x smaller footprint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alert parity&lt;/td&gt;
&lt;td&gt;All Kibana alerts recreated in Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  90-Day Targets
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Annual cost reduction&lt;/td&gt;
&lt;td&gt;&amp;gt; $300K vs Elasticsearch baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI log queries active&lt;/td&gt;
&lt;td&gt;Teams using LibreChat for log analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-data queries&lt;/td&gt;
&lt;td&gt;At least 5 dashboards joining logs + business data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retention extended&lt;/td&gt;
&lt;td&gt;From 90 days to 180+ days (same cost)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time saved&lt;/td&gt;
&lt;td&gt;20+ hrs/month (no more ad-hoc log queries)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  16. Reference Links
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ClickHouse Observability
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Observability docs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/observability" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/observability&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability solution guide&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/observability/overview" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/observability/overview&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse as Elasticsearch alternative&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/elasticsearch-to-clickhouse-for-logs" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/elasticsearch-to-clickhouse-for-logs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building an Observability solution with ClickHouse&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/storing-log-data-in-clickhouse-fluent-bit-vector-open-telemetry" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/storing-log-data-in-clickhouse-fluent-bit-vector-open-telemetry&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse for logs blog&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/using-clickhouse-for-log-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/using-clickhouse-for-log-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  OpenTelemetry + ClickHouse
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector ClickHouse exporter&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter" rel="noopener noreferrer"&gt;https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector contrib repo&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib" rel="noopener noreferrer"&gt;https://github.com/open-telemetry/opentelemetry-collector-contrib&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL ClickHouse schema reference&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/observability/schema-design" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/observability/schema-design&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Grafana + ClickHouse
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Grafana ClickHouse datasource&lt;/td&gt;
&lt;td&gt;&lt;a href="https://grafana.com/grafana/plugins/grafana-clickhouse-datasource" rel="noopener noreferrer"&gt;https://grafana.com/grafana/plugins/grafana-clickhouse-datasource&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana ClickHouse plugin docs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://grafana.com/docs/grafana/latest/datasources/clickhouse" rel="noopener noreferrer"&gt;https://grafana.com/docs/grafana/latest/datasources/clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Benchmarks &amp;amp; Case Studies
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare: ClickHouse for HTTP logs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://blog.cloudflare.com/log-analytics-using-clickhouse" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/log-analytics-using-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse vs Elasticsearch log analytics benchmark&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benchmark source code (reproducible)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/TextBench" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/TextBench&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BindPlane — OTEL fleet management for migrations&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse + ClickHouse (LLM observability)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/langfuse-llm-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/langfuse-llm-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Appendix A — Technology Stack Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instrumentation&lt;/td&gt;
&lt;td&gt;OTEL SDK (unchanged)&lt;/td&gt;
&lt;td&gt;Auto-instrument apps — zero code changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collection&lt;/td&gt;
&lt;td&gt;OTEL Collector + clickhouseexporter&lt;/td&gt;
&lt;td&gt;Receives and ships logs/traces/metrics to ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;ClickHouse otel database&lt;/td&gt;
&lt;td&gt;Logs, traces, metrics — compressed columnar storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualization&lt;/td&gt;
&lt;td&gt;Grafana + ClickHouse datasource&lt;/td&gt;
&lt;td&gt;Dashboards, trace waterfall, log explorer, alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI queries&lt;/td&gt;
&lt;td&gt;LibreChat + Qwen + MCP (existing)&lt;/td&gt;
&lt;td&gt;Plain English queries over logs and traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold storage&lt;/td&gt;
&lt;td&gt;S3 (tiered via ClickHouse TTL)&lt;/td&gt;
&lt;td&gt;Cheap long-term retention for historical data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HA&lt;/td&gt;
&lt;td&gt;ClickHouse 2-replica cluster&lt;/td&gt;
&lt;td&gt;Same HA setup as the existing ClickHouse cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Appendix B — Quick Reference: Kibana → Grafana Equivalents
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Kibana Feature&lt;/th&gt;
&lt;th&gt;Grafana Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discover (log search)&lt;/td&gt;
&lt;td&gt;Explore → Logs panel with ClickHouse query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dashboard&lt;/td&gt;
&lt;td&gt;Dashboard (same concept)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualize&lt;/td&gt;
&lt;td&gt;Panel with ClickHouse SQL query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APM (traces)&lt;/td&gt;
&lt;td&gt;Explore → Traces panel with ClickHouse datasource&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerts&lt;/td&gt;
&lt;td&gt;Grafana Alerting (same or better)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index Lifecycle Management&lt;/td&gt;
&lt;td&gt;ClickHouse TTL (simpler — one SQL line)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KQL (Kibana Query Language)&lt;/td&gt;
&lt;td&gt;SQL (standard, more powerful)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lens (drag-drop charts)&lt;/td&gt;
&lt;td&gt;Grafana panel builder&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;If you're considering this migration, the decisions that matter most:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't skip the OTEL Aggregator pattern.&lt;/strong&gt; Agent-only loses data on ClickHouse blips. Run a couple of central aggregators with retry-on-failure — that's the production-grade choice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use BindPlane if you have a large collector fleet.&lt;/strong&gt; Worth it for fleet-wide config rollout. For a handful of central collectors, standard YAML is fine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get the schema right the first time.&lt;/strong&gt; &lt;code&gt;ORDER BY (ServiceName, SeverityText, Timestamp)&lt;/code&gt; and the right CODECs are the difference between a 3× and 30× compression ratio. The schema in Section 6 has been validated at production scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run dual-write for 4 weeks&lt;/strong&gt;, not 1. The gradual cutover is cheap insurance and lets you validate every dashboard/alert before cutting Elasticsearch off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The AI layer pays for itself.&lt;/strong&gt; Plain-English log queries via LibreChat + an LLM means no more pinging engineering when the compliance team needs a one-off analysis. Once ClickHouse has the data, the AI integration is one config change.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full schema, OTEL Collector configs, Grafana queries, migration plan, and 16 production-tested recovery runbooks live in the &lt;a href="https://github.com/rakeshtherani/clickhouse-ai-dba" rel="noopener noreferrer"&gt;companion repo on GitHub&lt;/a&gt;. The repo also includes the AI DBA MCP server (152 tools for ClickHouse operations) — if you're operating at scale, that's worth a look.&lt;/p&gt;

&lt;p&gt;If you've migrated off Elasticsearch (or are mid-migration), I'd love to compare notes. Reach out via &lt;a href="https://www.linkedin.com/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or comment below.&lt;/p&gt;

&lt;p&gt;If this is useful to your team, the deeper architectural piece — &lt;em&gt;Building an Agentic AI Data Platform on ClickHouse&lt;/em&gt; — is coming next.&lt;/p&gt;

</description>
      <category>database</category>
      <category>infrastructure</category>
      <category>monitoring</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
