DEV Community

Cover image for How We Log LLM Requests at Sub-50ms Latency Using ClickHouse
gauravdagde
gauravdagde

Posted on • Originally published at preto.ai

How We Log LLM Requests at Sub-50ms Latency Using ClickHouse

We were logging 50,000 LLM requests per day to PostgreSQL. Query latency was fine. At 400,000 requests, cost aggregation queries started taking 3 seconds. At 2 million, the database was the slowest thing in the stack.

We switched to ClickHouse. Here's exactly what changed and why.

TL;DR:

  • LLM request logs are append-only, high-cardinality, analytics-heavy — that's a ClickHouse workload, not Postgres
  • Switching dropped our cost dashboard p95 from 3.2s to 12ms (with materialized views)
  • Our async Go write path keeps logging overhead under 2ms p95

The Schema That Broke PostgreSQL

Our initial PostgreSQL schema was straightforward — UUID primary key, indexes on (tenant_id, created_at), model, and feature. It worked fine at low volume.

The problem was our query pattern:

SELECT model, SUM(cost_usd), COUNT(*)
FROM llm_requests
WHERE tenant_id = $1
  AND created_at > NOW() - INTERVAL '30 days'
GROUP BY model;
Enter fullscreen mode Exit fullscreen mode

At 10 million rows, these GROUP BY queries were scanning hundreds of thousands of rows per tenant. PostgreSQL is row-oriented — to compute SUM(cost_usd), it reads entire rows to extract one column. At scale, that's a lot of I/O for a field you could pre-aggregate.


Why LLM Logs Are a ClickHouse Workload

Three properties make LLM request logs a natural fit for ClickHouse:

1. Append-only writes. LLM logs are never updated or deleted (outside retention policies). ClickHouse is optimized for high-throughput inserts — it doesn't pay the MVCC overhead PostgreSQL does on write-heavy workloads.

2. Columnar storage. When you query SUM(cost_usd) GROUP BY model, ClickHouse reads only the cost_usd and model columns from disk. PostgreSQL reads entire rows. For a 20-column table where you're aggregating 2 columns, ClickHouse does 10% of the I/O.

3. Materialized views. ClickHouse pre-aggregates data at insert time. When a new log row lands, a materialized view fires and updates a running total in a summary table. Your dashboard hits the summary — not 10 million raw rows.

For reference: Langfuse uses PostgreSQL for its logging backend. It works well at lower volumes and for teams that need ACID transactions. If you're running more than 1M requests/month and querying with analytics patterns, you'll eventually feel the difference.


Our ClickHouse Schema

CREATE TABLE llm_requests (
  request_id    String,
  tenant_id     String,
  model         LowCardinality(String),
  provider      LowCardinality(String),
  feature       LowCardinality(String),
  user_id       String,
  input_tokens  UInt32,
  output_tokens UInt32,
  cost_usd      Float64,
  latency_ms    UInt32,
  cached        UInt8,
  created_at    DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (tenant_id, created_at)
SETTINGS index_granularity = 8192;
Enter fullscreen mode Exit fullscreen mode

Two things worth noting:

  • LowCardinality(String) for model, provider, feature — bounded value sets. ClickHouse dictionary-encodes them, cutting storage ~4x and improving scan speed.
  • Monthly partitions let us drop old data with ALTER TABLE DROP PARTITION instead of slow DELETE queries.

Materialized Views That Pre-Aggregate at Insert Time

CREATE TABLE cost_by_model_daily (
  tenant_id   String,
  model       LowCardinality(String),
  day         Date,
  total_cost  AggregateFunction(sum, Float64),
  total_reqs  AggregateFunction(count, UInt64),
  avg_latency AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, model, day);

CREATE MATERIALIZED VIEW cost_by_model_mv
TO cost_by_model_daily AS
SELECT
  tenant_id,
  model,
  toDate(created_at)    AS day,
  sumState(cost_usd)    AS total_cost,
  countState()          AS total_reqs,
  avgState(latency_ms)  AS avg_latency
FROM llm_requests
GROUP BY tenant_id, model, day;
Enter fullscreen mode Exit fullscreen mode

Dashboard queries now hit cost_by_model_daily — a pre-aggregated summary with orders of magnitude fewer rows than the raw logs.

Query latency comparison (10M rows):

Query PostgreSQL p95 ClickHouse raw ClickHouse mat. view
Cost by model, 30d 3.2s 180ms 12ms
Cost by feature, 7d 2.8s 160ms 9ms
Hourly spend, last 24h 4.1s 210ms 15ms

The Async Write Path: Keeping Logging Under 2ms

Getting queries fast was the first problem. Making sure logging never blocks a request was the second.

Our Go implementation:

// Buffered channel — fire and forget from request handler
var logCh = make(chan LogEntry, 10_000)

func (p *Proxy) logRequest(entry LogEntry) {
  select {
  case logCh <- entry:
    // buffered, will be flushed async
  default:
    // channel full — drop rather than block
    metrics.Increment("log_dropped")
  }
}

// Background goroutine: batch flush every 500ms
func runLogFlusher(ch <-chan LogEntry, db *clickhouse.Conn) {
  ticker := time.NewTicker(500 * time.Millisecond)
  batch := make([]LogEntry, 0, 500)

  for {
    select {
    case entry := <-ch:
      batch = append(batch, entry)
      if len(batch) >= 500 {
        flushBatch(db, batch)
        batch = batch[:0]
      }
    case <-ticker.C:
      if len(batch) > 0 {
        flushBatch(db, batch)
        batch = batch[:0]
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The client gets its LLM response immediately. The log entry goes into a buffered channel. A background goroutine flushes to ClickHouse in batches of 500 every 500ms. If the channel fills under load, we drop the entry — we'd rather lose a log than slow down a request.

This gives us p95 logging overhead under 2ms. The ClickHouse batch insert takes 10–30ms for 500 rows — invisible to any individual request.


When to Switch (and When Not To)

PostgreSQL ClickHouse
Write pattern Mixed read/write, ACID Append-only inserts
Query pattern Point lookups, joins Aggregations, scans
Scale ~5M rows comfortably Billions of rows
Operational cost Low Medium
Right when <500K req/month >1M req/month

Don't over-engineer. If you're under 500K LLM requests per month, Postgres with a (tenant_id, created_at) index is fine. Migrate when your analytics queries start timing out — not before.


We're building Preto.ai — LLM cost intelligence that runs on this ClickHouse stack. One URL change to see your spend broken down by model and feature. Free up to 10K requests, no credit card required.

Top comments (0)