Alexandr Bandurchin for Uptrace

Posted on Oct 20

Uptrace v2.0: How ClickHouse JSON Type Accelerates Trace Queries by 10x

#observability #clickhouse #microservices #opentelemetry

Uptrace v2.0 introduces native ClickHouse JSON type support for storing observability data, resulting in 10x query performance improvements. This guide covers deployment, configuration, and optimization strategies for production environments.

Key improvements in v2.0:

Native JSON column storage with dot notation queries
UI-based data transformations
Flexible retention policies per data type
Built-in Let's Encrypt SSL automation
Enhanced query builder with autocomplete

The Problem with Traditional Approaches

Observability systems handle nested JSON structures with unpredictable attribute sets. Each span or log entry contains variable fields:

{
  "trace_id": "abc123",
  "service_name": "checkout",
  "http.method": "POST",
  "http.target": "/api/v1/orders/12345",
  "user_id": "user_987",
  "error": true,
  "db.statement": "SELECT * FROM orders WHERE id = ?"
}

Traditional storage approaches have significant limitations:

Approach 1: Flattened Schema

CREATE TABLE spans (
  trace_id String,
  service_name String,
  http_method String,
  http_target String,
  user_id String,
  -- Hundreds of columns for all possible attributes
)

Drawbacks:

Schema bloat
Loss of flexibility
ETL pipeline required for each new attribute

Approach 2: String Storage + JSONExtractString

CREATE TABLE spans (
  trace_id String,
  attributes String  -- Raw JSON string
)

-- Query performance issue:
SELECT count()
FROM spans
WHERE JSONExtractString(attributes, 'service_name') = 'checkout';

Drawbacks:

JSON parsing on every query
No indexing on nested fields
2.7 seconds on 50M records

What Changed in Uptrace v2.0

ClickHouse introduced a native JSON data type, and Uptrace v2.0 fully leverages this capability.

Before (v1.x):

SELECT count()
FROM spans_old
WHERE JSONExtractString(attributes, 'service_name') = 'checkout';

-- Result: 2.754 seconds on 50M records
-- Processed: 50.00 million rows, 8.43 GB

After (v2.0):

SELECT count()
FROM spans_new
WHERE attributes.service_name = 'checkout';

-- Result: 0.287 seconds on 50M records ⚡
-- Processed: 50.00 million rows, 3.12 GB

10x performance improvement achieved through:

JSON parsing during insertion (once)
Columnar storage of JSON data
Native indexing on nested fields
Direct dot notation access

Installation Guide

Prerequisites

Docker and Docker Compose
4GB RAM minimum (8GB recommended)
10GB disk space

Quick Start

# Clone repository
git clone https://github.com/uptrace/uptrace
cd uptrace/example/docker

# Start all services
docker-compose up -d

Services included:

ClickHouse - Data storage
Uptrace - UI and API
PostgreSQL - Metadata storage

Access the UI at http://localhost:14318:

Username: admin@uptrace.local
Password: admin

Docker Compose Structure

The default setup includes:

services:
  clickhouse:    # Data storage engine
  postgres:      # Metadata and configuration
  uptrace:       # Main application
  otelcol:       # OpenTelemetry Collector
  redis:         # Caching layer

Configuration

Project Setup via UI

Navigate to Organization → New Org → New Project to create your first project through the web interface.

Required fields:

Name: Project identifier (e.g., production)
Organization: Company/team name
Token: Auto-generated for OTLP authentication

Configuration File Alternative

For infrastructure-as-code deployments, use uptrace.yml:

seed_data:
  users:
    - key: admin_user
      name: Admin
      email: admin@example.com
      password: change_this_password

  orgs:
    - key: main_org
      name: Company Name

  org_users:
    - key: org_admin
      org: main_org
      user: admin_user
      role: owner

  projects:
    - key: prod_project
      name: production
      org: main_org

Key feature: The key field enables declarative resource management. Uptrace automatically creates, updates, or removes resources based on configuration changes.

Sending Traces with OpenTelemetry

Example Node.js integration:

const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

// Configure exporter
const exporter = new OTLPTraceExporter({
  url: 'http://localhost:14318/v1/traces',
  headers: {
    'uptrace-dsn': 'http://project_token@localhost:14318/1'
  }
});

// Setup provider
const provider = new NodeTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'api-service',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
  })
});

provider.addSpanProcessor(new BatchSpanProcessor(exporter));
provider.register();

// Use in application
const tracer = provider.getTracer('api-service');

async function handleRequest(req, res) {
  const span = tracer.startSpan('handleRequest', {
    attributes: {
      'http.method': req.method,
      'http.target': req.url,
      'user_id': req.user?.id
    }
  });

  try {
    await processOrder(req.body);
    span.setStatus({ code: 1 }); // OK
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: 2, message: error.message }); // ERROR
    throw error;
  } finally {
    span.end();
  }
}

Query Builder Features

Uptrace v2.0 introduces an enhanced query builder with several powerful features:

1. Attribute Autocomplete

The query builder provides intelligent suggestions for available attributes as you type. Start typing user_ and see all user-related attributes.

2. Toggle Query Parts

Temporarily disable query conditions without deleting them, useful for debugging and exploration.

3. Search Clause

Combine structured filters with full-text search:

where service_name = "checkout" | search error|timeout|failed

This query finds all spans from the checkout service containing the words "error", "timeout", or "failed" in any attribute or log message.

Example: Debugging Failed Payment

where attributes.user_id = "user_12345" 
  and timestamp >= now() - interval 1 hour
  | search payment|stripe|checkout

This query locates all payment-related spans for a specific user within the last hour.

Data Transformations

Data transformations process telemetry data before storage, enabling:

Attribute normalization
PII removal
Data type conversion
Sampling strategies

Use Case 1: Reducing URL Cardinality

Problem: Dynamic URLs create high cardinality:

/user/123/orders/456
/user/124/orders/457
/user/125/orders/458
...

This causes index bloat and slow queries.

Solution:

// Project → Transformations → New Operation
setAttr("http_target", 
  replaceGlob(
    attr("http_target"), 
    "/user/*/orders/*", 
    "/user/{userId}/orders/{orderId}"
  )
);

Result: All URLs normalized to:

/user/{userId}/orders/{orderId}

Use Case 2: Type Conversion

Some libraries send numeric values as strings:

{
  "elapsed_ms": "1234.56"  // String instead of number
}

Fix with transformation:

setAttr("elapsed_ms", parseFloat(attr("elapsed_ms")))

Benefit: Enable aggregate functions:

SELECT avg(attributes.elapsed_ms) FROM spans;

Use Case 3: PII Removal

Remove personally identifiable information for GDPR compliance:

if (has(attr("user.email"))) {
  setAttr("user.email", "***@***")
}

Transformation language: Uptrace uses expr-lang for transformation logic, providing a simple yet powerful expression language.

Configuration location: Project → Transformations → New Operation

Retention Policies {#retention}

Different data types require different retention periods:

Traces: Recent debugging (7 days)
Logs: Audit trail (30 days)
Metrics: Long-term trends (90 days)

Configuration

projects:
  - name: production
    retention:
      spans: 168h      # 7 days
      logs: 720h       # 30 days  
      events: 720h     # 30 days
      metrics: 2160h   # 90 days

Alternative: Settings → Project → Data Retention

Cost Impact

Example calculation:

Traces: ~100GB/day
Retention all types for 90 days: 9TB
Retention with 7/30/90 split: 4TB
Storage savings: 56%

Migration Strategy

For zero-downtime migration from v1.x or other systems, use parallel deployment:

┌─────────────┐
│   Services  │
└──────┬──────┘
       │
       │ OpenTelemetry
       │
┌──────▼───────────────┐
│ OTel Collector       │
└──┬────────────────┬──┘
   │                │
   │                │
┌──▼────┐      ┌───▼─────┐
│ v1.x  │      │ v2.0    │
│ (old) │      │ (new)   │
└───────┘      └─────────┘

OpenTelemetry Collector Configuration

exporters:
  otlphttp/v1:
    endpoint: http://uptrace-v1:14318
    headers:
      uptrace-dsn: "http://token_v1@uptrace-v1:14318/1"

  otlphttp/v2:
    endpoint: http://uptrace-v2:14318
    headers:
      uptrace-dsn: "http://token_v2@uptrace-v2:14318/1"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/v1, otlphttp/v2]  # Dual export

Migration Steps

Deploy v2.0 alongside existing v1.x
Configure dual export to both instances
Monitor for 7 days to ensure stability
Migrate dashboards and alerts to v2.0
Decommission v1.x

Performance Benchmarks

Real-world performance comparison on 500M spans over 7 days:

Query 1: Top 5 Slowest Endpoints

SELECT 
  attributes.http_target as endpoint,
  count() as requests,
  quantile(0.95)(duration) as p95_duration
FROM spans
WHERE service_name = 'api-gateway'
  AND timestamp >= now() - interval 24 hour
GROUP BY endpoint
ORDER BY p95_duration DESC
LIMIT 5;

v1.x: 4.2 seconds
v2.0: 0.38 seconds
Improvement: 11x faster

Query 2: User Error Traces

SELECT 
  trace_id,
  span_name,
  attributes.error_message
FROM spans
WHERE attributes.user_id = 'user_12345'
  AND attributes.error = true
  AND timestamp >= now() - interval 7 day
ORDER BY timestamp DESC;

v1.x: 6.8 seconds
v2.0: 0.52 seconds
Improvement: 13x faster

Performance Factors

Columnar JSON storage
Native indexing on nested fields
Zero JSON parsing overhead per query

SSL Configuration

Uptrace v2.0 includes built-in Let's Encrypt integration:

# uptrace.yml
certmagic:
  enabled: true
  staging_ca: false  # Use true for testing
  http_challenge_addr: :80

listen:
  https:
    addr: :443
    domain: uptrace.example.com

Automatic features:

Certificate issuance
HTTP to HTTPS redirect
Auto-renewal every 60 days

Requirements:

Domain DNS points to server
Port 80 open for HTTP-01 challenge

When to Use Uptrace v2.0

Ideal Use Cases

✅ Microservices architecture (5+ services)

✅ High query performance requirements

✅ Need unified traces, logs, and metrics

✅ Comfortable running ClickHouse

✅ Want flexible data transformations

Not Recommended For

❌ Small projects (1-2 services)

❌ Existing Grafana stack working well

❌ Need managed SaaS only

❌ No infrastructure management capability

Conclusion

Uptrace v2.0's adoption of ClickHouse JSON type provides substantial performance improvements for observability workloads. The 10x query acceleration comes from architectural changes that eliminate JSON parsing overhead while maintaining storage flexibility.

Key benefits:

Native JSON storage with columnar performance
Real-time data transformations
Flexible retention policies
Built-in SSL automation
Unified observability platform

Resources:

DEV Community

Uptrace v2.0: How ClickHouse JSON Type Accelerates Trace Queries by 10x

The Problem with Traditional Approaches

Approach 1: Flattened Schema

Approach 2: String Storage + JSONExtractString

What Changed in Uptrace v2.0

Installation Guide

Prerequisites

Quick Start

Docker Compose Structure

Configuration

Project Setup via UI

Configuration File Alternative

Sending Traces with OpenTelemetry

Query Builder Features

1. Attribute Autocomplete

2. Toggle Query Parts

3. Search Clause

Example: Debugging Failed Payment

Data Transformations

Use Case 1: Reducing URL Cardinality

Use Case 2: Type Conversion

Use Case 3: PII Removal

Retention Policies {#retention}

Configuration

Cost Impact

Migration Strategy

OpenTelemetry Collector Configuration

Migration Steps

Performance Benchmarks

Query 1: Top 5 Slowest Endpoints

Query 2: User Error Traces

Performance Factors

SSL Configuration

When to Use Uptrace v2.0

Ideal Use Cases

Not Recommended For

Conclusion

Top comments (0)