gauravdagde

Posted on Apr 23 • Originally published at preto.ai

How We Log LLM Requests at Sub-50ms Latency Using ClickHouse

#llm #clickhouse #backend #ai

We were logging 50,000 LLM requests per day to PostgreSQL. Query latency was fine. At 400,000 requests, cost aggregation queries started taking 3 seconds. At 2 million, the database was the slowest thing in the stack.

We switched to ClickHouse. Here's exactly what changed and why.

TL;DR:

LLM request logs are append-only, high-cardinality, analytics-heavy — that's a ClickHouse workload, not Postgres
Switching dropped our cost dashboard p95 from 3.2s to 12ms (with materialized views)
Our async Go write path keeps logging overhead under 2ms p95

The Schema That Broke PostgreSQL

Our initial PostgreSQL schema was straightforward — UUID primary key, indexes on (tenant_id, created_at), model, and feature. It worked fine at low volume.

The problem was our query pattern:

SELECT model, SUM(cost_usd), COUNT(*)
FROM llm_requests
WHERE tenant_id = $1
  AND created_at > NOW() - INTERVAL '30 days'
GROUP BY model;

At 10 million rows, these GROUP BY queries were scanning hundreds of thousands of rows per tenant. PostgreSQL is row-oriented — to compute SUM(cost_usd), it reads entire rows to extract one column. At scale, that's a lot of I/O for a field you could pre-aggregate.

Why LLM Logs Are a ClickHouse Workload

Three properties make LLM request logs a natural fit for ClickHouse:

1. Append-only writes. LLM logs are never updated or deleted (outside retention policies). ClickHouse is optimized for high-throughput inserts — it doesn't pay the MVCC overhead PostgreSQL does on write-heavy workloads.

2. Columnar storage. When you query SUM(cost_usd) GROUP BY model, ClickHouse reads only the cost_usd and model columns from disk. PostgreSQL reads entire rows. For a 20-column table where you're aggregating 2 columns, ClickHouse does 10% of the I/O.

3. Materialized views. ClickHouse pre-aggregates data at insert time. When a new log row lands, a materialized view fires and updates a running total in a summary table. Your dashboard hits the summary — not 10 million raw rows.

For reference: Langfuse uses PostgreSQL for its logging backend. It works well at lower volumes and for teams that need ACID transactions. If you're running more than 1M requests/month and querying with analytics patterns, you'll eventually feel the difference.

Our ClickHouse Schema

CREATE TABLE llm_requests (
  request_id    String,
  tenant_id     String,
  model         LowCardinality(String),
  provider      LowCardinality(String),
  feature       LowCardinality(String),
  user_id       String,
  input_tokens  UInt32,
  output_tokens UInt32,
  cost_usd      Float64,
  latency_ms    UInt32,
  cached        UInt8,
  created_at    DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (tenant_id, created_at)
SETTINGS index_granularity = 8192;

Two things worth noting:

LowCardinality(String) for model, provider, feature — bounded value sets. ClickHouse dictionary-encodes them, cutting storage ~4x and improving scan speed.
Monthly partitions let us drop old data with ALTER TABLE DROP PARTITION instead of slow DELETE queries.

Materialized Views That Pre-Aggregate at Insert Time

CREATE TABLE cost_by_model_daily (
  tenant_id   String,
  model       LowCardinality(String),
  day         Date,
  total_cost  AggregateFunction(sum, Float64),
  total_reqs  AggregateFunction(count, UInt64),
  avg_latency AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, model, day);

CREATE MATERIALIZED VIEW cost_by_model_mv
TO cost_by_model_daily AS
SELECT
  tenant_id,
  model,
  toDate(created_at)    AS day,
  sumState(cost_usd)    AS total_cost,
  countState()          AS total_reqs,
  avgState(latency_ms)  AS avg_latency
FROM llm_requests
GROUP BY tenant_id, model, day;

Dashboard queries now hit cost_by_model_daily — a pre-aggregated summary with orders of magnitude fewer rows than the raw logs.

Query latency comparison (10M rows):

Query	PostgreSQL p95	ClickHouse raw	ClickHouse mat. view
Cost by model, 30d	3.2s	180ms	12ms
Cost by feature, 7d	2.8s	160ms	9ms
Hourly spend, last 24h	4.1s	210ms	15ms

The Async Write Path: Keeping Logging Under 2ms

Getting queries fast was the first problem. Making sure logging never blocks a request was the second.

Our Go implementation:

// Buffered channel — fire and forget from request handler
var logCh = make(chan LogEntry, 10_000)

func (p *Proxy) logRequest(entry LogEntry) {
  select {
  case logCh <- entry:
    // buffered, will be flushed async
  default:
    // channel full — drop rather than block
    metrics.Increment("log_dropped")
  }
}

// Background goroutine: batch flush every 500ms
func runLogFlusher(ch <-chan LogEntry, db *clickhouse.Conn) {
  ticker := time.NewTicker(500 * time.Millisecond)
  batch := make([]LogEntry, 0, 500)

  for {
    select {
    case entry := <-ch:
      batch = append(batch, entry)
      if len(batch) >= 500 {
        flushBatch(db, batch)
        batch = batch[:0]
      }
    case <-ticker.C:
      if len(batch) > 0 {
        flushBatch(db, batch)
        batch = batch[:0]
      }
    }
  }
}

The client gets its LLM response immediately. The log entry goes into a buffered channel. A background goroutine flushes to ClickHouse in batches of 500 every 500ms. If the channel fills under load, we drop the entry — we'd rather lose a log than slow down a request.

This gives us p95 logging overhead under 2ms. The ClickHouse batch insert takes 10–30ms for 500 rows — invisible to any individual request.

When to Switch (and When Not To)

	PostgreSQL	ClickHouse
Write pattern	Mixed read/write, ACID	Append-only inserts
Query pattern	Point lookups, joins	Aggregations, scans
Scale	~5M rows comfortably	Billions of rows
Operational cost	Low	Medium
Right when	<500K req/month	>1M req/month

Don't over-engineer. If you're under 500K LLM requests per month, Postgres with a (tenant_id, created_at) index is fine. Migrate when your analytics queries start timing out — not before.

We're building Preto.ai — LLM cost intelligence that runs on this ClickHouse stack. One URL change to see your spend broken down by model and feature. Free up to 10K requests, no credit card required.

Top comments (2)

Suny Choudhary • Apr 24

This is a good example of not treating all logs the same. Cost and usage logs can behave like metrics. Security, compliance, and audit logs probably need stronger guarantees. That split is where the architecture decision gets interesting.

PEACEBINFLOW • Apr 24

The detail that jumps out at me isn't the ClickHouse schema or the query times—it's the line "we'd rather lose a log than slow down a request." That's a genuine shift in thinking that a lot of teams resist because logs feel sacred.

There's a quiet assumption in most backend work that logging is a non-negotiable side effect. Every request must produce its log, and the log must be durable, and if the logging system is down we should probably queue or block or at least alert loudly. But for LLM request logs specifically, that assumption doesn't really hold up to scrutiny. A dropped log means your cost dashboard is off by a fraction of a cent for that five-minute window. A blocked request means a user is staring at a spinner. The tradeoff isn't even close.

What I think you've done here—maybe without naming it this way—is treat observability data as a soft real-time stream rather than a transaction. The channel buffer with a drop-on-full policy is the same pattern you'd use for metrics or tracing spans. You're not losing logs because you're careless; you're losing them as a deliberate load-shedding strategy, and you've instrumented the drops so you know when it's happening.

It makes me wonder how many teams are running unnecessarily complex logging pipelines because they inherited the "logs must be transactional" mindset from an era when logs were small and debug-critical, and never questioned whether it still applies to high-volume analytics data.

When you first proposed "just drop it if the channel is full," did that cause any pushback internally, or was it an easy sell given the numbers?