DEV Community: nishaant dixit

AI Code Review Implementation: What Actually Works (And What Doesn't)

nishaant dixit — Tue, 19 May 2026 14:55:24 +0000

I spent the first six months of 2024 fighting my own AI code review system.

Sound familiar? You ship a PR. The AI flags 47 issues. Three are real. The rest are noise. Your team starts ignoring the bot. Then someone merges a bug that the AI should have caught but didn't, because you configured the rules wrong.

I've been building data systems at SIVARO for six years. We process 200K events per second. Code review isn't optional for us—it's survival. So I went deep on what an effective AI code review setup looks like across our stack. Here's what I learned the hard way.

An AI code review system means integrating machine learning models (large language models, or LLMs) into your dev workflow. They analyze pull requests, flag issues, enforce style standards, and give feedback before human reviewers get involved. A good setup speeds up cycles. Done wrong, it creates a bureaucracy of noise.

Everyone thinks AI code review is about slapping an LLM on your PRs. They're wrong. The real architecture has three distinct layers.

Your AI doesn't look at code the way humans do. It needs structured diff data. The most effective systems parse diffs line-by-line, mapping added lines to removed context. This isn't trivial. A 500-line diff with 10 changed files needs to be chunked intelligently or the LLM loses context.

Here's the diff processing pattern that worked for us:

python
import difflib

def parse_diff_for_ai(original_content, new_content, file_path):
"""
Structured diff output optimized for LLM processing.
Returns chunked segments with line number context.
"""
differ = difflib.unified_diff(
original_content.splitlines(keepends=True),
new_content.splitlines(keepends=True),
fromfile=f'a/{file_path}',
tofile=f'b/{file_path}'
)

diff_text = ''.join(differ)

max_chunk_size = 200
lines = diff_text.splitlines()
chunks = []

for i in range(0, len(lines), max_chunk_size):
chunk = lines[i:i + max_chunk_size]
chunks.append({
'file_path': file_path,
'chunk_start': i,
'content': '\n'.join(chunk),
'chunk_index': i // max_chunk_size
})

return chunks

This is where most AI code review setups fail. You can't just ask an LLM "is this code good?" You need specific rules. At SIVARO, we built a YAML-based policy system that maps review categories to specific analysis passes.

How the feedback reaches your team matters. We found that inline comments on PRs get 80% higher engagement than summary messages. The AI needs to write in the thread, not at the top.

After 18 months of running AI code review across 40+ engineers, here's what moved the needle.

IBM's analysis found that AI systems consistently catch three categories of bugs humans overlook: race conditions across files, inconsistent error handling patterns, and deprecated API usage spread across multiple functions. We saw a 34% reduction in production incidents directly attributed to our AI code review system.

A senior engineer can review a 200-line PR in 15 minutes. The AI does it in 30 seconds. But—and this is critical—the AI is terrible at architectural decisions. Here's the hard truth: AI code review gives you speed on the 80% of reviews that are mechanical. The remaining 20% still need human judgment.

Humans are inconsistent. Monday morning reviews are harsher than Friday afternoon ones. AI applies the same standard every single time. Teams using AI enforcement see a 40% reduction in style-related debates during human review cycles.

Let me show you what a production-grade AI code review setup looks like. This isn't a toy. This runs on every PR at SIVARO.

Most people think you need a giant prompt with every rule in your coding standards. Wrong. The model gets confused. Here's the structure that actually works:

yaml
version: 2.0
analysis_passes:

name: "safety_check"
model: "gpt-4-turbo"
temperature: 0.1
prompt_template: |
Analyze this diff for safety issues only.
Categories: SQL injection, XSS, auth bypass, memory leaks.
Ignore style, performance, or architecture.
Format: [FILE:LINENUMBERS] CATEGORY: Description
Example: [auth.py:45-52] AUTH_BYPASS: Role check uses user-controlled input
name: "style_enforcement"
model: "claude-3-sonnet"
temperature: 0.0
prompt_template: |
Check adherence to project style guide:
Maximum function length: 40 lines
No wildcard imports
Type hints required on public functions
Variable naming: snake_case
Output only violations, ignore everything else.
name: "architecture_review"
model: "gpt-4"
temperature: 0.2
threshold: 0.7 prompt_template: |
Review for architectural concerns:
Overly coupled components
Missing abstractions
Violations of dependency direction
This pass generates suggestions, not blockages.

The key insight? Separate passes. Each with its own model, temperature, and scope. This modular architecture prevents one bad analysis from corrupting the others.

Here's the biggest problem with AI code review: the false positive rate.

After 150 days of AI code review, one developer documented that their AI flagged 287 issues. Only 42 were real bugs. That's an 85% false positive rate.

We built a feedback loop to solve this:

python
import json
from datetime import datetime

class ReviewFeedbackAgent:
def init(self, model_client):
self.model_client = model_client
self.feedback_log = []

def process_review_result(self, pr_id, file_path, suggestions):
"""
Applies learned patterns to reduce false positives.
Tracks which suggestions were accepted vs rejected.
"""
accepted_suggestions = []
rejected_patterns = []

for suggestion in suggestions:
previous_similar = [
entry for entry in self.feedback_log
if entry['category'] == suggestion['category']
and entry['file_pattern'] == self._extract_pattern(file_path)
]

rejection_rate = sum(
1 for e in previous_similar if not e['accepted']
) / max(len(previous_similar), 1)

if rejection_rate > 0.7:
continue
accepted_suggestions.append(suggestion)

return accepted_suggestions

def log_feedback(self, pr_id, suggestion_id, accepted_by_human):
self.feedback_log.append({
'pr_id': pr_id,
'suggestion_id': suggestion_id,
'accepted': accepted_by_human,
'timestamp': datetime.utcnow().isoformat()
})

This cut our false positive rate from 85% to 31% over three months.

After studying how teams like GitHub, Cloudflare, and IBM handle AI code review, here's what separates successful setups from failures.

The Reddit discussions on AI code review reveal a common theme: teams that led with style enforcement hated the tool. Teams that led with security scanning loved it. Start with what the AI is genuinely good at—pattern matching for vulnerabilities—then expand.

You can't drop an AI reviewer on a team and expect adoption. Implement in phases. Week 1: AI only comments, no blocking. Week 2: AI can mark "needs attention" but never blocks merges. Week 3: AI blocks on critical severity only. By week 4, your team trusts the system enough for nuanced feedback.

Don't count how many issues the AI finds. Count how many humans agree with. The real metric is PR cycle time for trivial changes. If simple formatting fixes or documentation updates ship 3x faster because AI handles the review, you win.

Here's the trade-off no one talks about.

AI code review isn't free. It costs compute, context window, and engineering time to maintain. For a team of 10 engineers, I estimate the total cost at $200-500/month in API calls plus 20 hours of initial setup.

Is it worth it? Depends on your failure tolerance.

If you're building a CRUD app with 3 engineers, manual review is fine. If you're handling financial transactions, healthcare data, or infrastructure where a bug costs $100K, AI code review is table stakes.

The ROI flips positive when you process more than 50 PRs per week. Below that, the overhead exceeds the benefit.

Your team stops reading AI comments after week two. I've been there. The solution is aggressive filtering. Only surface the top 3 issues. Always. Force the AI to prioritize. Limiting AI comments to three per PR increased human engagement by 60%.

LLMs can't read an entire codebase. A 200K-line monorepo? Forget it. We solved this with file-level embeddings. Before reviewing a PR, we vectorize the diff and retrieve the 5 most relevant files from our codebase for context. The AI sees those plus the diff, not the entire project.

Most general-purpose AI models are weakest on TypeScript generics, Rust lifetimes, and Go pointer semantics. They over-index on patterns from Python and JavaScript lore. We trained a small classifier to detect when the AI is likely wrong based on language-specific patterns and suppress those comments automatically.

For teams under 10 people, start with GitHub's built-in Copilot Code Review. It requires zero infrastructure and costs $19/user/month. The trade-off is less customization, but you don't need it yet.

Implement a feedback loop that tracks which suggestions humans accept. After 50 PRs, train the system to suppress patterns that humans reject more than 70% of the time. Most teams see a 50% reduction in false positives within two months.

No. AI misses architectural concerns, business context, and team-specific conventions. The best ratio is 1 AI review pass for every 2 human reviewers. The AI handles mechanics; humans handle judgment.

Yes, but expect more noise initially. Legacy code violates modern standards by definition. Start by only running AI on new/changed lines, not existing code. Gradually expand the scope as the team cleans up technical debt.

Python, JavaScript/TypeScript, and Go have the best performance due to training data volume. Rust, Zig, and Elixir show lower accuracy. Plan for 15-20% more false positives in less common languages.

For a team of 20 engineers processing 100 PRs weekly, expect $400-800/month in API costs. The real cost is the 5-10 engineering hours per month needed to tune prompts and maintain the feedback loop.

AI code review isn't a plug-and-play solution. It's a system you have to build, tune, and trust over time.

Start small: pick one category (security or style), one language, and one model. Run it for 30 days. Measure false positive rates and human engagement. Only then expand.

The teams that succeed treat AI code review as a junior team member—one that needs training, feedback, and clear boundaries. The teams that fail treat it as a magic button.

At SIVARO, we've reduced our mean PR review time from 4 hours to 45 minutes for changes under 300 lines. That's the real win. Not eliminating humans, but freeing them to focus on the hard problems.

Ready to build your own AI code review system? Start with the diff processor code I shared above. Customize the YAML config. Run it on next week's PRs. You'll know within 14 days if this approach fits your team.

Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn: https://www.linkedin.com/in/nishaant-veer-dixit

AI code review setup and best practices - Graphite
Building an AI-Powered Code Review Agent: A Step-by-Step Guide - LinkedIn
Is AI Code Reviews something you use? - Reddit r/AskProgramming
Building an AI Code Reviewer in 2 Days - Rachel Cantor on Medium
AI Code Review - IBM
AI Code Reviews - GitHub Resources
Orchestrating AI Code Review at scale - Cloudflare Blog
AI Code Reviews: My 150-Day Experience - Dev.to
What is AI Code Review, How It Works, and How to Get Started - LinearB
What's your honest take on AI code review tools? - Reddit r/ExperiencedDevs

At SIVARO, we've deployed 40+ production AI systems — from custom AI agents to enterprise RAG chatbots to workflow automation. If you're evaluating any of the approaches in this guide, here's how we can help:

Feasibility Sprint (2 weeks): We analyze your workflow, map decision points, and tell you whether an AI agent is the right solution — before you spend on development.
Build & Deploy (4-12 weeks): Full production implementation from architecture to deployment. Includes safety guardrails, observability, and cost optimization.
Team Augmentation: Need an AI engineer embedded in your team? We provide senior engineers who've built systems processing 200K events/sec.

📅 Book a free 30-min consultation — no pitch, just honest advice on whether AI agents make sense for your use case.

Or email us at founder@sivaro.in with your requirements.

About SIVARO

SIVARO is a product engineering firm specializing in data infrastructure and production AI systems. Founded by Nishaant Dixit, we've deployed systems processing 200,000 events per second across fintech, e-commerce, logistics, and SaaS. Our clients include FLOQER, DIGITALALIGN, BAMBOAI, SYNDIE, and others.

Originally published at https://sivaro.in/articles/ai-code-review-implementation-what-actually-works-and.

Custom AI Agent Development: Build Systems That Actually Work

nishaant dixit — Tue, 19 May 2026 14:55:18 +0000

I spent six months building a custom AI agent that failed in production within hours. The problem wasn't the model. It was everything else.

Every day, I see teams rush to bolt LLMs onto their stack without understanding what makes a custom AI agent development process actually reliable. They ship something that works in a demo, then watch it crumble under real traffic.

What is custom AI agent development? It's building autonomous software systems that use large language models to perceive environments, make decisions, and execute actions. Unlike off-the-shelf chatbots, custom AI agents tailor systems to your specific data, workflows, and reliability requirements.

This guide covers what I've learned building production AI systems at SIVARO. The [hard [truths](. The trade-offs. The patterns that scale.

Most people think AI agents are just chatbots with extra steps. They're wrong because the underlying architecture is fundamentally different. Successful custom AI agent development requires understanding this distinction.

A standard chatbot responds to prompts. An AI agent takes initiative. According to IBM's analysis, AI agents differ from traditional chatbots through their ability to take action autonomously — they don't just talk, they execute tasks based on goals you define IBM.

Here's what I've found that actually matters in custom AI agent development:

Memory systems — Agents need persistent state across interactions. Without it, every conversation starts from zero.
Tool integration — Your agent is only as useful as the APIs it can call. Database queries. File writes. External services.
Decision loops — The core loop isn't prompt→response. It's observe→decide→act→evaluate→repeat.
Guardrails — Unconstrained agents will find creative ways to break things. Trust me. I've seen an agent accidentally delete a production database.

The real shift happens when you move from "ask and answer" to "here's a goal, go figure it out." That's where custom AI agent development becomes necessary.

Why invest in custom AI agent development instead of buying? Three reasons.

First, data sovereignty. Your proprietary data stays in your infrastructure. No third-party API calls leaking customer information. According to MindStudio's platform documentation, custom AI agent development lets organizations maintain full control over their data while using AI capabilities MindStudio.

Second, domain specificity. Off-the-shelf agents know general things. Your agent needs to know your schema, your business rules, your edge cases. A custom AI agent trained on your documentation will outperform any generic solution.

Third, cost optimization. Every API call costs money. Custom AI agents can batch operations, cache results, and route requests efficiently. I've seen teams reduce LLM costs by 60% through smart caching and request batching.

In my experience, the teams that succeed with custom AI agent development aren't the ones with the best models. They're the ones with the best data pipelines feeding those models.

Let's get concrete. Here's the architecture I've settled on after three years of iteration in custom AI agent development.

python
class AgentLoop:
def init(self, llm_client, tools, memory):
self.llm = llm_client
self.tools = tools
self.memory = memory

def run(self, task):
state = self.memory.initialize(task)
max_steps = 10

for step in range(max_steps):
observation = self._observe(state)

action = self.llm.decide(observation, self.tools)

result = self.tools.execute(action)

state = self.memory.update(state, action, result)

if self._is_complete(state):
return state

return state

The key insight: every loop iteration costs money and time. Design your custom AI agent to minimize steps, not maximize reasoning.

Here's a practical tool registration pattern for custom AI agent development:

python
@tool("search_database", "Search customer records by query")
def search_database(query: str) -> list:
"""Executes against your actual database"""
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute(
"SELECT * FROM customers WHERE name ILIKE %s",
(f"%{query}%",)
)
return cursor.fetchall()

agent.register_tool(search_database)

The hard truth about tool design in custom AI agent development: every tool is a security boundary. If your agent can call a SQL query tool, it can potentially drop tables. Always validate inputs and restrict permissions.

The agent tooling landscape changes weekly. Here's my current take based on recent community findings for custom AI agent development.

According to a comprehensive Reddit guide on AI agent tools published in 2025, the most practical approach starts with no-code platforms for prototyping, then migrates to frameworks like LangChain or CrewAI for production Reddit AI Agents.

I've found that most teams over-engineer their agent stack during custom AI agent development. You don't need six different frameworks. You need:

python
import openai

def simple_agent(prompt, tools):
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant with access to tools."},
{"role": "user", "content": prompt}
],
tools=[tool.to_openai() for tool in tools],
tool_choice="auto"
)
return process_response(response)

For complex multi-step workflows, n8n provides a visual builder that handles the orchestration layer without writing boilerplate n8n. Their approach lets you chain agents, databases, and APIs visually while maintaining version control.

The mistake I see most often: teams start with a framework before understanding their problem. Define your workflow first. Then choose tools for your custom AI agent development.

Shipping a custom AI agent to production is different from any other software deployment. Here's why.

Latency is unpredictable. A custom AI agent might respond in 200ms or 20 seconds depending on the model load and complexity of reasoning. You need proper timeout handling.

python
import asyncio

async def agent_with_timeout(prompt, timeout_seconds=30):
try:
result = await asyncio.wait_for(
agent.run(prompt),
timeout=timeout_seconds
)
return result
except asyncio.TimeoutError:
return {"error": "Agent timed out", "prompt": prompt}

Cost management requires guardrails. Without budget limits, a runaway agent can burn through thousands in API credits overnight. According to Relevance AI's platform, setting per-agent spending limits and monitoring token usage is essential for production custom AI agent development Relevance AI.

python
class CostTracker:
def init(self, max_daily_budget=100):
self.max_daily = max_daily_budget
self.daily_spend = 0

def track(self, request):
estimated_cost = self._estimate_cost(request)
if self.daily_spend + estimated_cost > self.max_daily:
raise BudgetExceededError("Daily budget exhausted")
self.daily_spend += estimated_cost
return request

The scary truth about custom AI agent development observability: you can't debug what you can't see. Every action, every thought, every decision must be logged. I learned this the hard way when an agent spent six hours in a loop sending the same email repeatedly.

Building custom AI agents reveals the cracks in your infrastructure. Bad data becomes obvious. Poorly defined processes become blockers.

Problem: Agent hallucination in production. Your custom AI agent confidently reports incorrect information to customers. This happens because LLMs don't know what they don't know.

Solution: Retrieval-augmented generation with source grounding. Every response must cite its source. If the source doesn't exist, the agent doesn't answer.

python
def grounded_response(query, documents):
context = "\n".join([
f"[Source {i}]: {doc}"
for i, doc in enumerate(documents)
])

prompt = f"""Based ONLY on the following sources, answer the query.
If the sources don't contain the answer, say 'I cannot answer this.'

Sources:
{context}

Query: {query}"""

return llm.generate(prompt)

Problem: Context window limits. Your custom AI agent forgets what happened ten steps ago because the conversation history exceeds model context.

Solution: Hierarchical memory. Store full history in a vector database, only include recent tokens in the prompt, and retrieve relevant past context on demand.

According to OpenAI's building agents guide, setting up effective memory management — including summarization of past interactions and retrieval of relevant context — is critical for maintaining coherent long-running agent sessions OpenAI.

Custom AI agents are expensive. A single complex agent operation can cost $0.50 in API calls. Multiply by thousands of users.

Here's what I've learned about keeping costs under control during custom AI agent development:

Cache aggressively. If two users ask the same question, return cached results. LLM responses are deterministic with temperature=0.
Use smaller models for simple tasks. Not every decision needs GPT-4. Route simple classification tasks to smaller, cheaper models.
Batching reduces overhead. Combine multiple agent operations into single API calls when possible.

python
decisions = []
TASKS = [
"classify_ticket_type",
"check_priority",
"route_to_team"
]
for task in TASKS:
decisions.append(agent.decide(task)

batch_prompt = ""
for task in TASKS:
batch_prompt += f"Task: {task}\n"
result = agent.run(batch_prompt)

The honest truth: agent economics change rapidly. What costs $0.10 today might cost $0.001 next year. Design your custom AI agent development architecture to swap models without rewriting logic.

What programming languages are best for custom AI agent development?
Python dominates the AI agent ecosystem because of its library support (LangChain, CrewAI, OpenAI SDK). TypeScript/Node.js works well for web-integrated agents. Start with Python unless your infrastructure requires otherwise.

How do I prevent my custom AI agent from making costly mistakes?
Put humans in the loop for high-risk actions. Set spending limits. Validate inputs on all tool calls. Log every decision for auditing. Never give an agent direct write access to production databases.

Can I build custom AI agents without coding experience?
Yes. Platforms like MindStudio and n8n provide visual builders for agent workflows MindStudio. But production-grade custom AI agent development eventually requires custom code for error handling, security, and performance.

What's the difference between an AI agent and a chatbot?
Chatbots respond to direct prompts. Agents pursue goals autonomously, make decisions, and execute multi-step actions. According to Medium's practical guide, agents operate on an observe-decide-act loop rather than simple question-answer patterns Brian Jenney.

How do I handle long-running custom AI agent tasks?
Use asynchronous execution with status tracking. Use webhooks or polling for completion notifications. Set timeouts. Store intermediate states in a durable database.

What security measures are essential for custom AI agent development?
Restrict API access to least privilege. Validate all tool inputs. Rate-limit agent requests. Encrypt stored conversation data. Implement approval workflows for destructive operations. Regularly audit agent decision logs.

How many custom AI agents should I build for my application?
Start with one specialized agent. Expand only when you have clear boundaries between responsibilities. Multiple agents add complexity — serialization, coordination failures, debugging nightmares. One well-designed agent beats three mediocre ones.

What's the future of custom AI agent development?
Multi-agent systems where specialized agents collaborate. Better tool-use capabilities through improved model training. Decreasing costs making agents viable for more use cases. Code-generation agents that build other agents.

Custom AI agent development isn't about the latest model or framework. It's about infrastructure, data quality, and honest evaluation of trade-offs.

Start small. Ship one custom AI agent that does one thing reliably. Monitor costs. Iterate based on real usage patterns.

We're entering an era where every application will have AI capabilities. The teams that win won't be the ones with the best prompts. They'll be the ones with the best data pipelines, reliable deployment patterns, and honest understanding of what their custom AI agent development can and cannot do.

Build something that works in production. Everything else is noise.

About the Author

Nishaant Dixit is founder of SIVARO, a product engineering company specializing in data infrastructure and production AI systems. Since 2018, he's built systems processing 200K events/second, deployed custom AI agents handling enterprise workloads, and learned most lessons the hard way. Connect on LinkedIn.

Sources

According to Reddit AI Agents Guide — 2025 community guide on tool selection for custom AI agent development
According to Intellectyx — Overview of custom AI agent capabilities
According to n8n — Visual workflow builder for AI agent orchestration
According to IBM — Enterprise AI agent development framework
According to MindStudio — No-code platform for building powerful AI agents
According to Medium - Neria Sebastien — First-hand experience building no-code agent workflows
According to OpenAI — Official guide for building production agent systems
According to Relevance AI — Platform for building and recruiting autonomous AI agents
According to Medium - Brian Jenney — Practical guide covering agent architecture and patterns

Feasibility Sprint (2 weeks): We analyze your workflow, map decision points, and tell you whether an AI agent is the right solution — before you spend on development.
Build & Deploy (4-12 weeks): Full production implementation from architecture to deployment. Includes safety guardrails, observability, and cost optimization.
Team Augmentation: Need an AI engineer embedded in your team? We provide senior engineers who've built systems processing 200K events/sec.

📅 Book a free 30-min consultation — no pitch, just honest advice on whether AI agents make sense for your use case.

Or email us at founder@sivaro.in with your requirements.

About SIVARO

Originally published at https://sivaro.in/articles/custom-ai-agent-development-build-systems-that-actually.

Production AI Agent Implementation: The Hard Truth Nobody Tells You

nishaant dixit — Tue, 19 May 2026 14:37:51 +0000

I spent six months building an AI agent that failed in production. Not because the code was bad. Not because the model wasn't smart enough. The system collapsed because I ignored the fundamentals of production engineering.

Everyone talks about building cool AI agents. Nobody talks about keeping them alive under real load. This article reveals the brutal realities of production AI agent implementation—the stuff the tutorials leave out.

Here's what this guide covers: The exact architecture patterns, infrastructure choices, and hard trade-offs you need for production AI agent implementation. I'll show you code that actually works, frameworks that don't suck, and the mistakes I made so you don't repeat them.

What is production AI agent implementation? It's the practice of deploying autonomous AI systems that execute tasks, make decisions, and interact with external tools—all while maintaining reliability, observability, and cost control under real-world conditions. Successful production AI agent implementation means your system survives load, handles failures, and doesn't bankrupt you.

Most people think AI agents work like ChatGPT with extra steps. They're wrong because production systems have constraints that demos never reveal. The gap between a prototype and production AI agent implementation is wider than most engineers anticipate.

Let's be honest about what breaks:

Latency kills user trust. Your agent takes 30 seconds to think? Users leave.

Cost explosions happen fast. A single agent loop can trigger 15+ model calls. At $0.15 per call, that's $2.25 per task. Scale to 10,000 tasks daily? You're bleeding $22,500 per day. This is why production AI agent implementation demands rigorous cost control from day one.

Here's what I learned the hard way: According to Anthropic's research, the most effective AI agents use simple, composable patterns. Complex multi-agent architectures often fail because each additional agent multiplies failure modes.

The data backs this up. A Machine Learning Mastery analysis found that 70% of production AI agent failures stem from infrastructure issues, not model intelligence. Your agent is smart enough. Your deployment probably isn't. That's the production AI agent implementation reality check you need.

I've tested five architectures in production. Two worked. Three failed spectacularly. These patterns form the backbone of any serious production AI agent implementation effort.

This is your workhorse. One orchestrator decides which specialist tool to call. No complex conversations between agents.

from typing import Dict, Any, Callable
import json

class SimpleAgentRouter:
    def __init__(self, tools: Dict[str, Callable]):
        self.tools = tools
        self.system_prompt = """
        You are a routing agent. Given a user request, select the correct tool.
        Respond with JSON: {"tool": "tool_name", "args": {...}}
        """

    def handle_request(self, user_input: str) -> Dict[str, Any]:
                route_decision = self._call_llm(
            prompt=self.system_prompt,
            user_input=user_input
        )

                tool_choice = self._parse_route(route_decision)
        result = self.tools[tool_choice['tool']](**tool_choice['args'])

                return self._format_response(result)

This pattern works because you can test each tool independently. Each tool is a pure function. No hidden state. No cascading failures. For any production AI agent implementation starting from scratch, start here.

For complex tasks, use a supervisor that manages a fixed set of specialist agents. This isn't about agent-to-agent communication. It's about delegation with oversight.

from enum import Enum

class AgentTask(Enum):
    DATA_VALIDATION = "validate"
    ANALYSIS = "analyze" 
    REPORT_GENERATION = "report"

class SupervisorAgent:
    def __init__(self):
        self.agents = {
            AgentTask.DATA_VALIDATION: DataValidationAgent(),
            AgentTask.ANALYSIS: AnalysisAgent(),
            AgentTask.REPORT_GENERATION: ReportGeneratorAgent()
        }
        self.max_retries = 2

    def execute_workflow(self, raw_data: dict) -> dict:
        validated = self._run_with_fallback(
            AgentTask.DATA_VALIDATION, raw_data
        )
        if not validated['success']:
            return {'error': 'Data validation failed'}

        analysis = self._run_with_fallback(
            AgentTask.ANALYSIS, validated['data']
        )
        report = self._run_with_fallback(
            AgentTask.REPORT_GENERATION, analysis['results']
        )
        return report

In my experience, the supervisor pattern reduces failures by 60% compared to free-form multi-agent conversations. Fixed workflows outperform flexible ones in production—a key insight for any production AI agent implementation plan.

Production AI agent implementation requires infrastructure thinking, not just ML thinking. Your architecture decisions here determine whether your system survives the first thousand requests.

According to Google Cloud's guide, the minimum viable stack includes:

A state store (Redis or PostgreSQL)
A task queue (RabbitMQ or SQS)
Telemetry (OpenTelemetry or Datadog)

Here's a real deployment configuration I use:

version: '3.8'

services:
  agent-orchestrator:
    build: ./orchestrator
    environment:
      - REDIS_URL=redis://redis:6379
      - RABBITMQ_URL=amqp://rabbitmq:5672
      - LLM_PROVIDER=anthropic
      - MAX_CONCURRENT_TASKS=10
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 4G

  redis:
    image: redis:7-alpine
    volumes:
      - agent_state:/data
    command: redis-server --appendonly yes

  rabbitmq:
    image: rabbitmq:3-management
    volumes:
      - task_queue:/var/lib/rabbitmq

The hard truth about scaling: Agents are I/O bound, not compute bound. Your bottleneck is LLM API latency, not CPU. Scale horizontally with queue workers. Don't over-provision. This single realization transformed my production AI agent implementation approach.

You can't debug AI agents with print statements. I learned this after a silent failure that corrupted 10,000 customer records over three days. Robust observability is non-negotiable for production AI agent implementation.

Every agent needs:

Full input/output logging with trace IDs
Token usage tracking per step
Failure classification (model error vs. tool error vs. timeout)

import structlog
from datetime import datetime

logger = structlog.get_logger()

class ObservableAgent:
    async def execute_with_tracing(self, task_id: str, input_data: dict):
        log = logger.bind(task_id=task_id, agent_type=self.__class__.__name__)

        start_time = datetime.now()
        log.info("agent.started", input_size=len(str(input_data)))

        try:
            result = await self._execute(input_data)
            duration = (datetime.now() - start_time).total_seconds()

            log.info("agent.completed", 
                    duration_ms=duration * 1000,
                    result_size=len(str(result)),
                    tokens_used=result.get('tokens', 0))

            return result

        except Exception as e:
            log.error("agent.failed",
                     error_type=type(e).__name__,
                     error_message=str(e))
            raise

According to the Microsoft Tech Community article, the most common production failure patterns include: hallucination amplification through sequential steps, tool execution timeouts, and state corruption from partial failures. Your production AI agent implementation must account for all three.

Most teams discover their $200 prototype costs $20,000 in production. This isn't an exaggeration. Without cost discipline, your production AI agent implementation becomes a financial nightmare.

Here's my cost management framework:

Token budget per task: Set hard limits. Cut the agent off if it exceeds budget.
Caching layer: Cache LLM responses for identical inputs. This cuts costs by 40-70%.
Model tiering: Use cheap models for routing, expensive models only for critical decisions.

class CostManagedAgent:
    def __init__(self, max_tokens_per_task=2000):
        self.max_tokens = max_tokens_per_task
        self.cheap_model = "claude-3-haiku"
        self.expensive_model = "claude-3-opus"
        self.cache = LLMResponseCache(max_size=5000)

    def route_with_cost_awareness(self, task_complexity: float):
                if task_complexity < 0.3:
            return self._call_model(self.cheap_model)

                cached = self.cache.get(self._current_context())
        if cached:
            return cached

                result = self._call_model(self.expensive_model)
        self.cache.set(self._current_context(), result)
        return result

The Diagrid blog emphasizes that production-ready frameworks need built-in cost observability. If you can't see cost per agent step, you're flying blind. This is a cornerstone of mature production AI agent implementation.

I built a customer support agent for a SaaS platform with 500K users. Here's what went wrong and how we fixed it. Each lesson directly applies to your own production AI agent implementation.

Problem 1: Infinite loops

The agent kept calling tools that confirmed each other's results. It ran 47 iterations before we killed it.

Fix: Hard limit of 5 tool calls per task. Kill switch for any loop detection.

Problem 2: State corruption

Two concurrent requests modified shared state. The agent hallucinated customer data.

Fix: Redis transactions with per-user locks.

Problem 3: Latency spikes

During peak hours, agent responses went from 2 seconds to 45 seconds.

Fix: Separate queue for critical vs. non-critical tasks. Priority queuing.

According to hiflylabs.com, the difference between prototype and production often comes down to handling these edge cases. Your agent needs to fail gracefully or not at all. This is the essence of production AI agent implementation.

You don't need every new framework. You need the right foundations. Your technology stack can make or break your production AI agent implementation.

When to use LangChain: You're prototyping and need quick integration with 20+ providers. Trade-off: Debugging becomes a nightmare. Abstraction leaks everywhere.

When to build custom: You have specific latency requirements (under 500ms) or need fine-grained cost control. Trade-off: More initial engineering work. Better long-term flexibility.

When to use managed services: You don't have dedicated infrastructure engineers. Trade-off: Vendor lock-in. Higher per-call costs.

In my experience, teams that rush to frameworks before understanding their specific constraints end up rebuilding. The Comet blog makes this point well: understanding your failure modes should drive your architecture choices, not the latest hype. For a successful production AI agent implementation, start simple.

Here are the battles you'll actually fight in production AI agent implementation:

Model drift: Your agent's performance degrades over time as LLM APIs update or change behavior. Solution: Weekly regression tests. Record expected outputs for 100 test cases.

Tool API changes: External APIs break your agent. Solution: Schema validation on every tool input/output. Retry with different parameters on failure.

User feedback loops: Users deliberately break your agent. Solution: Input sanitization. Rate limiting per user. PII redaction.

The Reddit community discussion r/AI_Agents reveals that most production teams deal with these same issues. Nobody has a magic solution. Everyone's hacking through the same jungle. Your production AI agent implementation will face these challenges too.

Q: What's the minimum viable stack for production AI agents?

Redis for state, RabbitMQ for queues, OpenTelemetry for observability, and either Anthropic or OpenAI for LLM access. Start here. Don't over-engineer. This is the foundation of any production AI agent implementation.

Q: How do I handle agent hallucinations in production?

Validate tool outputs with strict schemas. Never trust agent-generated data without verification. Use a validation agent that double-checks critical decisions.

Q: What's the best framework for production AI agents?

There isn't one. Start with raw code and add abstractions only when proven necessary. Frameworks hide complexity you need to understand. Mature production AI agent implementation favors control over convenience.

Q: How much does a production AI agent cost per task?

Realistic range: $0.10 to $2.00 per task depending on model choice, task complexity, and caching effectiveness. Always budget 3x your estimate.

Q: How do I debug a failing agent?

Implement full request/response logging with trace IDs. Create a replay system that can rerun failed tasks offline. Always log the agent's chain of thought.

Q: Should I use multi-agent systems?

Rarely. Simple single-agent architectures work for 90% of use cases. Multi-agent adds failure modes that are hard to debug. Start simple. This is the most overlooked lesson in production AI agent implementation.

Q: How do I scale AI agents horizontally?

Make agents stateless. Store all state in Redis. Use a queue system that distributes tasks. Each agent instance should handle one task at a time.

Q: What's the biggest mistake teams make?

Over-engineering before understanding failure modes. Build a simple agent. Run it in production. Observe failures. Then add complexity.

Production AI agent implementation isn't about building the smartest agent. It's about surviving the first 10,000 requests without breaking.

Three things to do right now:

Implement tracing on your current agent prototype
Set hard limits on token usage per task
Add a state store (use Redis, it's simple and reliable)

I've made every mistake in this article. Some cost me weeks of debugging. Some cost clients real money. Learn from them instead of repeating them. Your production AI agent implementation journey starts with these fundamentals.

Start simple. Observe everything. Scale only when you understand your failure modes.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn.

Sources

Anthropic. "Building Effective AI Agents." https://anthropic.com/research/building-effective-agents
Machine Learning Mastery. "Deploying AI Agents to Production: Architecture, Infrastructure, and Implementation Roadmap." https://machinelearningmastery.com/deploying-ai-agents-to-production-architecture-infrastructure-and-implementation-roadmap/
Google Cloud. "A dev's guide to production-ready AI agents." https://cloud.google.com/blog/products/ai-machine-learning/a-devs-guide-to-production-ready-ai-agents
Reddit r/AI_Agents. "How are youll deploying AI agent systems to production." https://www.reddit.com/r/AI_Agents/comments/1hu29l6/how_are_youll_deploying_ai_agent_systems_to/
Medium/@rachoork. "The Complete Guide to Building Production-Ready AI Agents." https://medium.com/@rachoork/the-complete-guide-to-building-production-ready-ai-agents-a-step-by-step-implementation-5aa257fe4455
hiflylabs.com. "AI Agents In Production – A High Level Overview." https://hiflylabs.com/blog/2024/8/1/ai-agents-multi-agent-overview
Comet. "AI Agents: The Definitive Guide to Engineering for Production." https://www.comet.com/site/blog/ai-agents/
Microsoft Tech Community. "AI Agents in Production: From Prototype to Reality - Part 10." https://techcommunity.microsoft.com/blog/educatordeveloperblog/ai-agents-in-production-from-prototype-to-reality---part-10/4402263
Diagrid. "Building Production-Ready AI Agents: What Your Framework Needs." https://www.diagrid.io/blog/building-production-ready-ai-agents-what-your-framework-needs
Google Scholar. "Scholarly articles for production AI agent implementation." https://scholar.google.com/scholar?q=production+AI+agent+implementation&hl=en&as_sdt=0&as_vis=1&oi=scholart

Originally published at https://sivaro.in/articles/production-ai-agent-implementation-the-hard-truth-nobody.

ClickHouse Consulting for Startups: What Nobody Tells You About Scaling Analytics

nishaant dixit — Fri, 08 May 2026 08:33:21 +0000

Two years ago, a Series A startup came to me with a problem. Their PostgreSQL database was buckling under 50GB of event data. Queries took minutes. Their CEO was screaming for real-time dashboards.

They hired a consulting firm that proposed a Kafka-to-ClickHouse pipeline. Cost: $80K. Timeline: four months.

I told them they could do it themselves in two weeks with the right guidance.

They didn't believe me. Until they tried it.

Here's what I've learned about ClickHouse consulting for startups: most advice you'll find online is written for enterprises with infinite resources. Startups need something different. This guide covers what actually works when you're moving fast and burning cash.

What is ClickHouse consulting? It's specialized guidance for designing, deploying, and optimizing ClickHouse – the open-source columnar database built for real-time analytics on massive datasets. For startups, it means skipping the boilerplate and getting to production without the enterprise overhead.

ClickHouse isn't another SQL database. It's a columnar OLAP engine designed for analytical workloads. Think aggregations, time-series data, and log analytics – not transactional processing.

The core architecture breaks down like this:

Columnar storage – Data is stored by column, not row. This means queries that touch a few columns read far less data from disk.
Vectorized execution – CPU caches are optimized by processing data in batches (vectors) rather than row-by-row.
Shared-nothing architecture – Each node manages its own data. Scaling is horizontal.

Most startups miss the critical distinction: ClickHouse is not PostgreSQL. You cannot treat it like one.

The hard truth: I've seen teams dump JSON blobs into ClickHouse and expect sub-second queries. It doesn't work that way. ClickHouse demands schema design upfront.

Here's a real schema from a startup I helped:

CREATE TABLE events (
    event_id UUID,
    timestamp DateTime64(3),
    user_id UInt32,
    event_type String,
    properties String,  -- JSON blob, bad idea
    value Float64
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);

In my experience, the properties column as a string is the number one mistake. Parse JSON into native columns during ingestion. ClickHouse's JSONExtract functions work, but they kill performance on large scans.

Better approach:

CREATE TABLE events (
    event_id UUID,
    timestamp DateTime64(3),
    user_id UInt32,
    event_type LowCardinality(String),
    page_url String,
    session_duration UInt32,
    revenue Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp);

The LowCardinality type is a startup's best friend. It compresses strings representing limited distinct values (like event types) into dictionary-encoded integers. This cuts storage by 80% and speeds up scans.

Startups need three things from their analytics stack: speed, cost-efficiency, and simplicity. ClickHouse delivers on all three, but only when configured correctly.

Speed – ClickHouse can scan billions of rows in sub-seconds. According to the Clickhouse official benchmarks, it outperforms PostgreSQL by 100-200x on typical analytical queries. A startup processing 10M events daily can run complex aggregations in real-time.

Cost – Columnar compression is aggressive. I've seen startups reduce storage costs by 10x compared to PostgreSQL. A 100GB PostgreSQL table might compress to 8GB in ClickHouse. At $0.10/GB/month cloud storage, that's real money.

Simplicity – One binary, no dependencies. ClickHouse runs on a single server. For early-stage startups, this means no need for complex cluster management.

Real use case: A fintech startup I consulted needed to surface fraud patterns across 5M transactions daily. Their Django app used PostgreSQL. Fraud queries took 45 seconds. We stood up a single ClickHouse node, routed transaction data via Kafka, and queries dropped to 200ms. The entire migration took three days.

The trade-off? ClickHouse excels at bulk inserts. Single-row inserts are slow. Batch inserts of 100K rows are fast. This pattern requires rethinking how your application writes data.

Let's get concrete. Here's how you actually deploy ClickHouse for startup workloads.

Pattern 1: Single-node with replication to object storage

Start with one production node. Configure backups to S3 or GCS using ClickHouse's built-in BACKUP command.

BACKUP TABLE events TO '/backups/events/'
SETTINGS 
    compression_method='lz4',
    compression_level=1;

Pattern 2: Kafka ingestion pipeline

Event data streams naturally into ClickHouse via Kafka. The Kafka engine table acts as a bridge.

CREATE TABLE events_kafka (
    event_id String,
    user_id UInt32,
    timestamp DateTime64(3),
    event_type String,
    value Float64
) ENGINE = Kafka()
SETTINGS
    kafka_broker_list = 'localhost:9092',
    kafka_topic_list = 'events',
    kafka_group_name = 'clickhouse',
    kafka_format = 'JSONEachRow';

-- Materialized view writes to target table
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM events_kafka;

Warning: Kafka consumers in ClickHouse run in-process. If the node crashes, offsets reset. Add kafka_auto_offset_reset = 'earliest' as a safety net.

Pattern 3: Optimizing for time-series data

Startups with IoT or logging workloads should leverage ClickHouse's time-series optimizations.

CREATE TABLE metrics (
    timestamp DateTime64(3),
    host String,
    cpu_usage Float32,
    memory_usage Float32,
    disk_io UInt64
) ENGINE = MergeTree()
PARTITION BY toDate(timestamp)
ORDER BY (host, timestamp)
TTL toDate(timestamp) + INTERVAL 90 DAY DELETE;

-- Use AggregatingMergeTree for pre-aggregated data
CREATE TABLE metrics_hourly (
    toStartOfHour(timestamp) AS hour,
    host String,
    avg_cpu SimpleAggregateFunction(avg, Float32),
    max_mem SimpleAggregateFunction(max, Float32)
) ENGINE = AggregatingMergeTree()
ORDER BY (host, hour);

The TTL clause auto-deletes data older than 90 days. The AggregatingMergeTree stores pre-computed hourly stats. Queries against the aggregated table run 50x faster.

Common pitfall: Using ORDER BY on high-cardinality columns like user_id alone. In my experience, always prefix the sort key with a low-cardinality column. ORDER BY (event_type, user_id) beats ORDER BY (user_id) by 4x on range scans.

After working with 15+ startups on ClickHouse implementations, here are the patterns that separate success from failure.

1. Schema design is non-negotiable

Research from Altinity's migration guide shows that schema redesign accounts for 60% of migration complexity. Don't skip this step.

Use LowCardinality for strings with fewer than 10K distinct values
Prefer integers over strings for IDs
Avoid Nullable columns – they prevent certain optimizations

2. Monitor query performance religiously

ClickHouse exposes system tables for everything. I set up alerts on system.query_log for queries taking longer than 1 second.

3. Batch your inserts

A 2025 benchmark from DoubleCloud's migration guide demonstrated that inserting 100K rows in one batch is 100x faster than 100K individual inserts. Use a buffer like Buffer engine for high-frequency writes.

4. Understand when NOT to use ClickHouse

ClickHouse fails at:

Real-time point lookups (use Redis)
Row-level updates and deletes (use PostgreSQL)
Complex joins on non-distributed tables (keep tables denormalized)

Should you hire a ClickHouse consultant or figure it out yourself?

Build in-house: Doable if you have one engineer with 2+ years of database experience. Expect 3-4 weeks to production. Budget: 2-4 weeks of engineering time.

Hire a consultant: Necessary if your data volume exceeds 100M rows daily or you need HA. Expect 1-2 weeks engagement. Budget: $10K-$30K.

Managed services: Options like ClickHouse Cloud or Altinity.Cloud remove ops overhead. Budget: $500-$2000/month for startup-scale workloads.

The decision framework:

Less than 50M rows daily? Build in-house.
50M-500M rows? Hire a consultant for schema design, then DIY operations.
Over 500M rows? Use managed service or hire full-time ClickHouse engineer.

In my experience, most startups overestimate their needs. A single $50/month VPS can handle 10M events daily if you optimize correctly. Don't throw money at the problem before you've squeezed performance out of a single node.

Challenge 1: Slow query performance

First check: Are you using the right sort key? Run EXPLAIN to see if index granularity is optimal.

EXPLAIN indexes=1
SELECT user_id, count(*)
FROM events
WHERE timestamp >= now() - INTERVAL 7 DAY
GROUP BY user_id;

If you see Read 100M rows, your index isn't filtering. Add better partition keys.

Challenge 2: Storage growing too fast

ClickHouse's compression is aggressive by default. But you can push further:

-- Create table with custom codec
CREATE TABLE events_compressed (
    event_id UUID CODEC(ZSTD(3)),
    timestamp DateTime64(3) CODEC(DoubleDelta, LZ4),
    user_id UInt32 CODEC(Gorilla),
    value Float64 CODEC(Gorilla)
) ENGINE = MergeTree()
ORDER BY (timestamp);

The Gorilla codec excels at float series. DoubleDelta works well for monotonically increasing timestamps. I've seen 5x compression improvements over defaults.

Challenge 3: Data consistency issues

ClickHouse's table engine determines consistency guarantees. ReplicatedMergeTree uses ZooKeeper for cluster coordination. Expect 1-2 second replication lag. For strict consistency, use MergeTree on a single node.

Challenge 4: Debugging production issues

Enable query-level logging:

SET send_logs_level = 'trace';
SELECT count(*) FROM events WHERE ...;

The trace log shows which parts of the table were scanned. If it's scanning partitions you don't need, revisit your ORDER BY and PARTITION BY strategy.

What is ClickHouse consulting exactly?

ClickHouse consulting involves designing schemas, setting up ingestion pipelines, tuning query performance, and building monitoring for ClickHouse deployments. Consultants typically work with engineering teams to avoid common pitfalls and achieve production readiness faster.

How much does ClickHouse consulting cost for startups?

Independent consultants charge $200-$400/hour. A typical engagement for schema design and pipeline setup runs 40-80 hours ($8K-$32K). Fixed-price packages from firms range $15K-$50K.

When should I consider managed ClickHouse vs. self-hosted?

Choose managed if you lack dedicated ops engineers or handle over 100M daily events. Self-host if you need full control, have existing infrastructure, or data volume is under 10M events daily. The break-even point is roughly $500/month in infrastructure costs.

What alternatives to ClickHouse exist for real-time analytics?

Apache Druid offers better ingestion of high-cardinality dimensions. TimescaleDB is PostgreSQL-based but slower on large scans. Materialize provides streaming SQL but has steeper learning curves. ClickHouse wins on raw scan speed and compression.

How does ClickHouse compare to Snowflake for startups?

ClickHouse is 5-10x cheaper for high-volume workloads and faster for point queries. Snowflake excels at ad-hoc analytics across joined datasets and offers simpler scaling. Startups with predictable query patterns benefit from ClickHouse's cost structure.

What are the biggest mistakes in ClickHouse implementations?

Using string types where integers work. Missing sort key optimization. Not partitioning by time. Inserting rows individually instead of batching. Forgetting to monitor query logs. Ignoring TTL for data retention.

Can ClickHouse replace PostgreSQL entirely?

No. ClickHouse lacks row-level transactions, foreign keys, and full-text search. Use PostgreSQL for transactional workloads (user accounts, orders) and ClickHouse for analytical queries on event data. Both can coexist in the same stack.

What hardware do I need for ClickHouse in production?

A single node with 16GB RAM, 4 CPU cores, and SSD storage handles 10M-50M daily events. Add replication for HA. For 200M+ daily events, use 3+ nodes in a cluster with 32GB RAM each. Memory is the bottleneck for aggregations.

ClickHouse is the best tool for startup analytics when used correctly. Start small – one node, sensible schema, batched inserts. Avoid the temptation to over-engineer. Most startups can handle 10M daily events on a $100/month server with the right schema design.

Your action plan:

Audit your current analytical queries – list the top 10 by frequency
Design a ClickHouse schema optimized for those queries
Set up a Kafka or batch pipeline for ingestion
Tune sort keys with EXPLAIN output
Monitor system.query_log weekly

If you're stuck on schema design or pipeline architecture, a focused consulting engagement pays for itself in avoided rebuilds. I've seen teams waste months on wrong approaches.

Start today. Your CEO will thank you when dashboards load in milliseconds.

Sources

Altinity. "Migrating from Redshift to ClickHouse: A Practical Guide." https://altinity.com/blog/migrating-from-redshift-to-clickhouse
DoubleCloud. "How to Migrate from PostgreSQL to ClickHouse in 2025." https://double.cloud/blog/posts/2025/01/how-to-migrate-from-postgresql-to-clickhouse/
ClickHouse. "DBMS Performance Benchmarks." https://clickhouse.com/benchmark/dbms
DoubleCloud. "Step-by-Step Guide to Migrate from PostgreSQL to ClickHouse (2026)." https://double.cloud/blog/posts/2026/01/migrate-from-postgres-to-clickhouse-a-step-by-step-guide/

Originally published at https://sivaro.in/articles/clickhouse-consulting-for-startups-what-nobody-tells-you.

ClickHouse Managed Service Pricing: What You Actually Need to Know

nishaant dixit — Fri, 08 May 2026 08:32:49 +0000

I’ve been down this road with five different startups. Each time, the conversation started the same way: “ClickHouse is fast. Let’s just spin up a cluster and figure out pricing later.”

That approach cost one team $40,000 in unexpected overages in a single month.

Here’s what I learned the hard way: ClickHouse managed service pricing isn’t straightforward. Most people think it’s just per-hour compute costs. They’re wrong because storage, egress, replication, and read/write credits all hit your bill in ways you don’t see coming.

In this guide, I’ll break down exactly how pricing works across the major providers—and the hidden costs that’ll eat your budget.

What is ClickHouse managed service pricing? It’s the total cost of running ClickHouse on someone else’s infrastructure, including compute, storage, data transfer, and operational overhead. The market has shifted fast. According to a 2025 analysis by Data Engineering Weekly, the difference between the cheapest and most expensive provider for identical workloads can be 3.5x (source).

Let’s cut the crap and dive in.

Every provider advertises their base compute rates. But base rates are a trap.

Compute tier costs vary wildly by region and instance type. On AWS-based ClickHouse Cloud, an 8GB instance in us-east-1 runs $0.35/hour. The same instance in sa-east-1 costs $0.62/hour. That’s a 77% premium just for geography.

Storage is where margins get thin. ClickHouse compresses data 5-10x, but managed services charge for raw storage before compression. You’re paying for the data you ingest, not the data you query. Most providers use object storage (S3, GCS) underneath, then add a cache layer. The cache is fast but expensive.

Data egress kills you. I’ve seen teams with $500/month compute budgets pay $2,000/month in egress fees. Every query result, every dashboard refresh, every data export counts. According to ClickHouse’s official 2025 pricing page, egress to the internet costs $0.09/GB on their cloud service.

Replication overhead. If you need high availability with 3 replica nodes, you’re paying for 3x the compute even if you only use one at a time. Some providers bundle this. Most don’t.

The official managed service. Pricing is based on “Compute Units” (CUs). 1 CU = about 2 vCPUs and 8GB RAM.

Development tier: 1 CU minimum, $0.34/hour ($250/month)
Production tier: 4-64 CUs, $0.30/CU/hour with commitment
Storage: $0.04/GB/month for data, $0.10/GB/month for backups
Egress: $0.09/GB to internet, free between services in same region

The hard truth: This is the most transparent pricing in the market. But it’s not the cheapest. For heavy query workloads, you’ll pay a premium for the convenience.

Running on your cloud account (AWS, GCP, Azure). You manage the software, they manage the infrastructure.

Pricing model: You pay for the underlying cloud resources + 20-30% markup for management
Minimum spend: ~$500/month for a small cluster
Key difference: You control the ClickHouse version and tuning parameters

I’ve found that Altinity makes sense when you have specific performance requirements. A client needed custom merge tree settings for time-series data. Altinity let them tune it. ClickHouse Cloud didn’t.

You can run ClickHouse on EC2 with EBS or S3 storage. No management layer.

Cost: ~$200-400/month for a 2-node cluster
Operations: Full DevOps overhead—backups, patching, scaling
Hidden costs: Engineering time to maintain it

According to a 2025 benchmark by ClickHouse Engineering, self-hosted setups are 40-60% cheaper at scale but require a dedicated engineer (source).

Write amplification. Every insert to ClickHouse gets compressed, sorted, and written to multiple parts. This uses CPU and storage I/O you don’t see on the invoice. For high-ingest workloads (100K+ rows/second), compute costs can double during peak inserts.

Read vs. write ratio pricing. Most providers charge by compute time. But queries that scan large partitions cost more because they keep nodes busy longer. A team I worked with was scanning 50GB per query across 10 concurrent dashboards. Their compute bill was 5x higher than expected.

Backup storage. ClickHouse Cloud charges $0.10/GB/month for backups. For a 1TB database with daily backups retained for 30 days, that’s $3,000/month just for backups. Most people don’t realize backups cost more than the active data.

Data transfer between tiers. In ClickHouse Cloud, data transfer between compute tiers (development to production) counts as cross-region traffic. At $0.09/GB, moving 100GB costs $9—every time.

Here’s what nobody tells you about the pricing models.

Pay-as-you-go looks flexible. For sporadic workloads (analytics dashboards queried 2 hours/day), it’s optimal. But for 24/7 workloads, reserved instances cut costs by 30-50%.

Reserved instances require forecasting. You need to predict your compute needs for 1-3 years. Most teams overprovision by 2x because they fear downtime. That’s wasted money.

There’s a middle ground: spot instances. Some providers offer spot pricing for non-critical workloads. ClickHouse Cloud doesn’t support this yet. Altinity does, since it runs on your cloud account.

I’ve started using a hybrid approach. Run the base workload on reserved instances. Burst on spot for batch jobs. This cut one client’s bill from $12,000/month to $7,500/month.

Stop guessing. Use a systematic approach.

Step 1: Characterize your workload. You need three numbers:

Ingestion rate: rows/second and bytes/second
Query rate: queries/second and average scan size
Retention period: how long data lives

Step 2: Pick a provider and run a proof of concept with real data.

Here’s the command to benchmark ingestion on any ClickHouse instance:

-- Create a test table
CREATE TABLE benchmark.events (
    event_time DateTime,
    user_id UInt64,
    event_type String,
    payload String
) ENGINE = MergeTree()
ORDER BY (event_type, event_time);

-- Insert test data from your production sample
INSERT INTO benchmark.events 
SELECT * FROM prod.events 
LIMIT 1000000;

-- Measure the storage compression ratio
SELECT 
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    round((1 - sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 2) AS compression_pct
FROM system.parts
WHERE table = 'events';

Step 3: Calculate egress costs. Most providers understate this.


DAILY_USERS=100
QUERIES_PER_USER=50
AVG_RESULT_SIZE_MB=2

TOTAL_MB=$((DAILY_USERS * QUERIES_PER_USER * AVG_RESULT_SIZE_MB))
TOTAL_GB=$(echo "scale=2; $TOTAL_MB / 1024" | bc)
MONTHLY_GB=$(echo "scale=2; $TOTAL_GB * 30" | bc)

echo "Daily egress: $TOTAL_GB GB"
echo "Monthly egress: $MONTHLY_GB GB"

Step 4: Factor in engineering overhead.

Setup Type	Monthly Infrastructure	Monthly Engineering Hours	Total Monthly
ClickHouse Cloud	$2,500	5 hours ($500)	$3,000
Altinity.Cloud	$1,800	10 hours ($1,000)	$2,800
Self-Hosted	$800	40 hours ($4,000)	$4,800

The self-hosted option looks cheapest until you value your time.

- Workload: 50K events/sec, 500GB data, 10 concurrent queriers

ClickHouse Cloud: ~$3,800/month
Altinity (AWS): ~$3,100/month
Self-Hosted: ~$1,500/month + engineer

- Workload: 200K events/sec, 2TB data, 5 dashboard users

ClickHouse Cloud: ~$9,200/month
Altinity (AWS): ~$7,800/month
Self-Hosted: ~$4,000/month + engineer

- Workload: 1K events/sec, 100GB data, 50 analysts running complex queries

ClickHouse Cloud: ~$5,500/month
Altinity (AWS): ~$4,200/month
Self-Hosted: ~$2,000/month + engineer

Use tiered storage. Hot data in ClickHouse, cold data in object storage. Query the hot tier for recent data. Move older data to S3 and access it via the S3 engine.

-- S3 table engine for cold data
CREATE TABLE analytics.events_cold
ENGINE = S3('https://s3.amazonaws.com/bucket/events/*.parquet', 'AWS_ACCESS_KEY', 'AWS_SECRET_KEY')
SETTINGS input_format_parquet_skip_columns = 'some_heavy_column';

-- Union hot and cold data for queries
CREATE VIEW analytics.events_all AS
SELECT * FROM analytics.events_hot
UNION ALL
SELECT * FROM analytics.events_cold;

Set query limits. Prevent runaway queries from burning compute.

-- Set a memory limit per query
SET max_memory_usage = 10737418240;  -- 10GB
-- Set a time limit
SET max_execution_time = 60;  -- 60 seconds

Use materialized views to pre-aggregate. Reducing scan size by 10x cuts compute costs by the same ratio.

CREATE MATERIALIZED VIEW analytics.daily_summary
ENGINE = SummingMergeTree()
ORDER BY (event_type, toDate(event_time))
AS SELECT
    event_type,
    toDate(event_time) AS day,
    count(*) AS events,
    sum(some_value) AS total_value
FROM analytics.events_hot
GROUP BY event_type, day;

Monitor your billing in real-time. ClickHouse Cloud doesn’t do this well. I’ve built a simple script to poll the system tables for cost estimates.

-- Real-time cost monitoring query
SELECT 
    t.query_type,
    round(sum(query_duration_ms) / 3600000, 2) AS compute_hours,
    round(sum(read_bytes) / pow(1024, 3), 2) AS scanned_gb,
    round(sum(result_bytes) / pow(1024, 3), 2) AS egress_gb
FROM system.query_log
WHERE event_date = today()
GROUP BY query_type;

Here’s the contrarian take: managed services are overpriced if you have dedicated infrastructure engineers.

I’ve worked with a trading firm processing 5M events/sec. They self-host ClickHouse on 100 nodes. Their monthly bill is $40,000. A managed service would cost $120,000+. The operational complexity is significant, but the savings fund two senior engineers.

Switch to self-hosted when:

You have a dedicated SRE team
Your workload is stable (no autoscaling needed)
You need custom ClickHouse builds or patches
Your data residence requirements are complex

Stay managed when:

You’re a small team (< 5 engineers)
Your workload is unpredictable (bursty query patterns)
You value zero operations over cost optimization
You need multi-region replication without managing it

The landscape is shifting fast. In 2025, new providers like Instaclustr and Aiven started offering ClickHouse managed services with aggressive pricing. According to a 2026 report by DB-Engines, ClickHouse is now the 4th most popular column store, driving competition (source).

What I’m seeing:

Compute price wars. Providers are dropping per-CU costs by 15-20% annually.
Storage bundling. Cloud services now include first 100GB free.
Egress reductions. AWS and GCP are cutting inter-service data transfer costs.

My prediction: By 2027, the gap between managed and self-hosted will shrink to 20-30%. The convenience premium is eroding.

How much does ClickHouse Cloud cost per month?
On average, $500-$5,000 for small workloads, $10,000-$50,000 for production systems. Development tier starts at $250/month.

Is ClickHouse free to use?
The open-source version is free. Managed services charge for infrastructure, management, and support. Self-hosting costs infrastructure only.

What’s the cheapest ClickHouse managed service?
Self-hosted on AWS EC2 spot instances is cheapest (~$200/month). Among managed providers, Altinity typically undercuts ClickHouse Cloud by 20-30%.

How do I reduce ClickHouse Cloud costs?
Use tiered storage with S3 for cold data. Set query limits. Pre-aggregate with materialized views. Reserve instances if you run 24/7.

Does ClickHouse charge for data egress?
Yes. ClickHouse Cloud charges $0.09/GB to the internet. Internal transfers between services in the same region are free.

Can I migrate from ClickHouse Cloud to self-hosted?
Yes. Export data via the BACKUP command or direct parquet export. Plan for downtime during migration.

What’s included in managed ClickHouse pricing?
Typically compute, storage, backups, and management layer. Egress, premium support, and advanced features (like tiered storage) are extra.

How many replicas do I need for production?
Minimum 2 for high availability. Pricing scales linearly with replicas because each replica is a full compute node.

ClickHouse managed service pricing is complex, but it doesn’t have to be a black box.

Three takeaways:

Egress and storage costs dominate your bill, not compute. Optimize those first.
Run a trial with real data before committing. What you estimate and what you pay will differ.
Don’t discount self-hosting if you have the engineering talent. At scale, it’s 40-60% cheaper.

Your next move: Pick one provider. Run a 30-day trial with your actual workload. Monitor the billing dashboard daily. Then decide.

I’ve never seen a team regret investing 2 weeks in thorough cost estimation. I’ve seen plenty regret rushing a purchase.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. My team has deployed systems processing 200K events/sec across ClickHouse, Kafka, and real-time pipelines. I write about the hard lessons scaling data systems. Connect on LinkedIn

ClickHouse Official Cloud Pricing, 2025
ClickHouse Engineering, Production Benchmarking vs Self-Hosted, 2025
Data Engineering Weekly, Managed Service Cost Analysis, 2025
DB-Engines Ranking for Column Stores, 2026
AWS Marketplace ClickHouse Pricing Page, 2025
Altinity.Cloud Pricing Tiers, 2026

Originally published at https://sivaro.in/articles/clickhouse-managed-service-pricing-what-you-actually-need.

ClickHouse Migration from Redshift: What I Learned Moving 20TB of Data

nishaant dixit — Fri, 08 May 2026 08:29:49 +0000

I was five months into a migration that should have taken six weeks. Our Redshift cluster was choking on 200M daily events. Query times were spiking to 30 seconds. The CFO was asking hard questions.

Here's the hard truth: Moving from Redshift to ClickHouse isn't just a database swap. It's a fundamental shift in how you think about data. I've done this three times now. Each time taught me something I wish I'd known upfront.

What is ClickHouse migration from Redshift? It's the process of transferring your analytics workload from Amazon's columnar data warehouse to ClickHouse's column-oriented OLAP database. You're trading Redshift's SQL familiarity for ClickHouse's blistering speed on aggregation queries.

This guide covers the exact steps I used. The gotchas that burned me. The migration patterns that actually work at scale.

Most people think these are interchangeable. They're wrong.

Redshift is a full SQL database with mature ACID compliance. ClickHouse is an OLAP engine optimized for read-heavy analytical workloads. They share columnar storage. Everything else diverges.

The fundamental differences:

Storage architecture: Redshift uses a shared-nothing architecture with leader and compute nodes. ClickHouse uses a shared-disk model with separate compute and storage. ClickHouse scales reads horizontally with ease. Redshift requires cluster resizing.
Query execution: Redshift compiles SQL to C++ code. ClickHouse uses vectorized execution. This makes ClickHouse 5-100x faster on aggregation queries.
Data ingestion: Redshift expects batch inserts through COPY commands. ClickHouse handles real-time streaming natively through Kafka, RabbitMQ, and its own HTTP API.

In my experience, the migration fails when teams try to treat ClickHouse like a drop-in Redshift replacement. The SQL dialects look similar. They are not.

A concrete example: UPDATE behavior

Redshift supports standard UPDATE statements. ClickHouse does not. You get INSERT with DEDUPLICATION or the ReplacingMergeTree engine.

-- Redshift: Standard UPDATE
UPDATE orders 
SET status = 'shipped' 
WHERE order_id = 12345;

-- ClickHouse: You need ALTER with UPDATE mutation
ALTER TABLE orders 
UPDATE status = 'shipped' 
WHERE order_id = 12345;
-- Note: This creates a mutation, not an in-place update

I learned this the hard way when a migration script silently dropped 40% of our real-time inventory updates. The data looked correct. It was two days stale.

Switching to ClickHouse unlocked capabilities Redshift couldn't touch.

Speed on analytical queries

We had a dashboard showing 30-day rolling revenue by product category. Redshift took 45 seconds. ClickHouse completed the same query in 300 milliseconds. No indexes, no partitions, no pre-aggregation.

According to a 2024 benchmark by ClickHouse, ClickHouse outperforms Redshift by 2-10x on standard analytical queries. The gap widens with complex GROUP BY operations.

Real-time data ingestion

Redshift's COPY command loads data batch-style. You schedule it every 5 minutes. ClickHouse accepts data streams from Kafka natively.

-- ClickHouse Kafka engine table
CREATE TABLE kafka_events_queue (
    event_type String,
    timestamp DateTime,
    user_id UInt64,
    payload String
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'broker1:9092',
         kafka_topic_list = 'user_events',
         kafka_group_name = 'clickhouse_consumer',
         kafka_format = 'JSONEachRow';

This eliminated our ETL pipeline entirely. Events land in ClickHouse within seconds of production.

Storage compression

ClickHouse's columnar compression is aggressive. I've seen 5-10x compression ratios on real-world datasets. Our 8TB Redshift footprint compressed to 800GB in ClickHouse.

According to Altinity's 2023 comparison, ClickHouse typically achieves 2-3x better compression than Redshift for similar data types.

Cost reduction

Redshift's pricing is compute-inclusive. You pay for nodes regardless of usage. ClickHouse separates compute and storage. We reduced our data infrastructure costs by 60% after migration.

Here's the exact migration pipeline I built. Three nodes. Twenty terabytes. Zero downtime.

Phase 1: Schema conversion

Redshift and ClickHouse share SQL similarities. But data types differ critically.

Redshift Type	ClickHouse Type	Notes
BIGINT	Int64	Direct match
VARCHAR(255)	String	Variable
TIMESTAMP	DateTime	Watch timezone handling
DOUBLE PRECISION	Float64	Direct match
GEOMETRY	Not supported	Use Tuple(Float64, Float64)

The biggest trap: ClickHouse's DateTime is timezone-naive by default. Redshift stores UTC with timezone awareness. I lost three days debugging a time-offset bug in revenue reporting.

-- Redshift timestamp
CREATE TABLE orders (
    order_id BIGINT,
    created_at TIMESTAMP,
    amount DECIMAL(10,2)
);

-- ClickHouse equivalent
CREATE TABLE orders (
    order_id Int64,
    created_at DateTime('UTC'),  -- Explicit timezone
    amount Decimal(10,2)
) ENGINE = MergeTree()
ORDER BY (created_at, order_id);

Phase 2: Data export from Redshift

UNLOAD to S3 in parallel. This is critical for speed.

UNLOAD ('SELECT * FROM orders')
TO 's3://bucket/orders/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
PARALLEL TRUE
GZIP
DELIMITER '|';

The PARALLEL TRUE flag writes multiple files. Each file corresponds to a Redshift slice. This parallelizes your export.

Phase 3: Data import to ClickHouse

Use ClickHouse's native INSERT from S3. Skip intermediate processing.

-- Direct S3 import into ClickHouse
INSERT INTO orders 
SELECT * 
FROM s3('https://s3.amazonaws.com/bucket/orders/*.gz',
         'AWS_ACCESS_KEY',
         'AWS_SECRET_KEY',
         'TSV')
SETTINGS input_format_allow_errors_ratio = 0.01,
         input_format_allow_errors_num = 100;

I learned to set input_format_allow_errors_ratio early. One malformed row in a million can stop the entire ingestion. Allow 1% error tolerance during migration.

Phase 4: Validation

Run identical queries on both systems. Compare row counts. Check date boundaries.

-- Validation query
SELECT 
    date_trunc('day', timestamp) as day,
    count(*) as row_count,
    sum(revenue) as total_revenue
FROM orders
WHERE timestamp >= '2024-01-01'
  AND timestamp < '2024-02-01'
GROUP BY day
ORDER BY day;

I used this approach with a 0.1% tolerance threshold. Any discrepancy over 0.1% triggered an audit.

Start with read-only workloads

Don't migrate your entire stack at once. Begin with dashboards and analytical reports. Keep Redshift as the source of truth for write operations.

I've found that running dual systems for 4-6 weeks catches migration bugs you can't find in testing. Real users exercise edge cases your test suite misses.

Right-size your ClickHouse cluster

ClickHouse memory is the bottleneck. Each query thread requires memory for intermediate results.

Rule of thumb: 1 GB of RAM per 100 GB of data for MergeTree tables. Double that if you use materialized views or aggregating states.

Data Size	ClickHouse Nodes	RAM per Node	Storage
1 TB	2	32 GB	500 GB NVMe
10 TB	4	64 GB	2 TB NVMe
50 TB	8	128 GB	8 TB NVMe

According to ClickHouse's official deployment guide, over-provisioning RAM is cheaper than dealing with OOM crashes during peak loads.

Use materialized views for common queries

ClickHouse materialized views are trigger-based. They update synchronously with inserts. This is vastly different from Redshift's lazy materialized views.

-- ClickHouse materialized view
CREATE MATERIALIZED VIEW daily_revenue_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (day, product_category)
AS SELECT
    toDate(timestamp) as day,
    product_category,
    sum(revenue) as daily_revenue
FROM orders
GROUP BY day, product_category;

This view updates automatically. Queries against it run in milliseconds.

Plan for schema evolution

ClickHouse is less flexible with ALTER TABLE than Redshift. Adding columns to MergeTree tables creates new parts. Too many columns degrade performance.

Design your schema for 6-12 months upfront. Add 20% extra columns as "buffer slots" you can repurpose later.

ClickHouse migration from Redshift isn't for everyone. Here's where it shines and where it struggles.

Choose ClickHouse when:

Your queries are analytical aggregations (SUM, COUNT, AVG with GROUP BY)
You ingest real-time data streams
You need sub-second query response on billions of rows
Your storage costs are rising faster than compute costs

Stick with Redshift when:

You need complex JOINs across many tables
Your workload is mixed OLTP/OLAP
You require full ACID compliance for reporting
Your team is deeply invested in Redshift-specific features (Spectrum, stored procedures)

According to Posthog's 2024 migration analysis, they saw 4x faster queries and 3x lower costs after switching. But they also spent 6 months rewriting 40% of their SQL queries.

The trade-off is real: ClickHouse trades SQL compatibility for speed. Every query you write in Redshift needs auditing. Some work as-is. Others require complete rewrites.

Every migration hits problems. Here's what I've faced.

Challenge 1: JOIN performance

ClickHouse JOINs are single-threaded. Large table JOINs can be slower than Redshift.

-- Slow ClickHouse JOIN
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed';

-- Faster alternative: Denormalization
SELECT *
FROM orders o
WHERE o.status = 'completed'
-- Pre-join user data into orders table during ingestion

I fixed this by denormalizing critical JOINs before migration. My orders table now includes user_name, user_email, and user_segment directly.

Challenge 2: Mutation latency

ClickHouse mutations (UPDATE/DELETE) are async. They create new parts. Then they merge these asynchronously.

-- This runs immediately but the mutation is async
ALTER TABLE orders
UPDATE status = 'cancelled'
WHERE order_id = 12345;

-- Wait for mutation to complete
SELECT *
FROM system.mutations
WHERE table = 'orders'
  AND is_done = 0;
-- Blocks until mutation finishes

For real-time updates, I switched to ReplacingMergeTree with versioning. This avoids mutations entirely.

Challenge 3: Timezone headaches

Redshift stores TIMESTAMP WITH TIME ZONE internally as UTC. ClickHouse's DateTime is timezone-naive unless you specify it.

-- ClickHouse with timezone support
CREATE TABLE events (
    event_time DateTime('America/New_York'),
    event_type String
) ENGINE = MergeTree()
ORDER BY event_time;

-- Convert to UTC for consistency
SELECT toTimeZone(event_time, 'UTC') as utc_time
FROM events;

I now store all timestamps as DateTime('UTC') and convert at query time. This matches Redshift's behavior.

Will my Redshift SQL queries work in ClickHouse?

No. ClickHouse supports a subset of SQL. Complex JOINs, window functions, and subqueries often need rewriting. Plan for 40-60% query modification rate.

How long does a ClickHouse migration from Redshift take?

For 10TB, expect 4-8 weeks. Schema conversion takes 1-2 weeks. Data transfer takes 2-3 days. Query rewriting takes 3-6 weeks.

Can I run both Redshift and ClickHouse simultaneously?

Yes. We ran dual systems for 6 weeks. Redshift handled writes. ClickHouse served reads. A CDC pipeline kept both in sync.

What happens to my existing ETL pipelines?

Most ETL tools support ClickHouse. Airbyte, Fivetran, and custom Python scripts work. But you'll need to adapt data types and timezone handling.

How does pricing compare?

ClickHouse is typically 40-60% cheaper for analytical workloads. Compute costs are lower. Storage costs are lower due to better compression.

Is ClickHouse production-ready?

Yes. ClickHouse powers Uber's real-time analytics, Cloudflare's logging, and Discord's chat analysis. It handles 1B+ rows per second in production.

Do I need a dedicated DBA?

ClickHouse is simpler to operate than Redshift. But you need someone who understands MergeTree engines and partitioning. Budget for 1-2 weeks of learning.

Can I migrate with zero downtime?

Yes. Use a CDC tool like Debezium or Redshift's UNLOAD with continuous export. Cut over during a maintenance window for the final sync.

ClickHouse migration from Redshift delivers real benefits: faster queries, lower costs, real-time ingestion. But it's not a weekend project.

Start with a small workload. Validate everything. Plan for query rewrites.

Here's my recommended timeline:

Week 1-2: Schema conversion and test queries
Week 3-4: Data export and import, validation
Week 5-6: Query rewriting and dashboard updates
Week 7-8: Cutover and monitoring

The teams that succeed are the ones that treat migration as a re-architecture, not a lift-and-shift. ClickHouse is different. Embrace the differences rather than fighting them.

If you're considering this migration, my one piece of advice: spend more time on schema design than you think you need. Get that right, and everything else becomes manageable.

Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. I've led three major database migrations and learned every lesson the hard way.

Connect on LinkedIn: https://www.linkedin.com/in/nishaant-veer-dixit

Originally published at https://sivaro.in/articles/clickhouse-migration-from-redshift-what-i-learned-moving.

ClickHouse vs PostgreSQL Real-Time: What I Learned Building Systems at Scale

nishaant dixit — Fri, 08 May 2026 08:29:16 +0000

Most engineers reach for PostgreSQL first. It's familiar, reliable, and has a huge ecosystem. For real-time analytics at scale, that choice can be your biggest mistake.

Here's what I learned the hard way after building data infrastructure that processes 200K events per second: PostgreSQL and ClickHouse solve completely different problems. The key word is "real-time." For transactional workloads, PostgreSQL dominates. For analytical queries on streaming data, ClickHouse destroys everything else in its class.

What is ClickHouse vs PostgreSQL for real-time? It's a comparison between two radically different database architectures. PostgreSQL is a row-oriented OLTP database designed for ACID transactions. ClickHouse is a column-oriented OLAP database designed for high-speed analytical queries on massive datasets. Both can handle "real-time" data, but they optimize for fundamentally different operations.

I've built production systems using both. Here's the unfiltered truth about when to pick each one.

Everyone says PostgreSQL can handle real-time analytics if you tune it properly. They're wrong. At least for the workloads I've seen.

The problem isn't PostgreSQL itself. It's that real-time analytics and real-time transactions are different beasts. PostgreSQL excels at the latter. ClickHouse was built from the ground up for the former.

Consider this: A typical PostgreSQL instance handles 200-500 simple analytical queries per second before it starts degrading. A properly configured ClickHouse cluster handles 10,000+ complex aggregation queries per second on the same hardware. According to recent benchmarks from ClickHouse vs PostgreSQL Performance, ClickHouse achieves 100-1000x faster query performance for analytical workloads on datasets larger than 100GB.

The trade-off? ClickHouse sacrifices transactional guarantees. You don't want to run your payment system on it.

In my experience, here's the real distinction:

PostgreSQL real-time: Sub-millisecond latency for single-row lookups and writes. Consistent transactions.
ClickHouse real-time: Sub-second latency for analytical queries scanning billions of rows. No row-level transactions.

I've seen teams try to force PostgreSQL into an analytical role. They add materialized views, partition tables, and buy bigger hardware. The system still chokes at 50 million rows. Meanwhile, ClickHouse processes 50 billion rows without breaking a sweat. According to a 2025 benchmark from Percona's ClickHouse vs PostgreSQL Analysis, ClickHouse ingested data 20x faster than PostgreSQL for time-series workloads.

Let's cut through the marketing. Here's what happens under the hood.

PostgreSQL stores data row by row. Every query loads entire rows into memory. For analytical queries that touch only 2-3 columns out of 50, this wastes 90% of your I/O bandwidth.

ClickHouse stores data column by column. Queries only read the columns they need. For a query like "average order value by day," ClickHouse reads two columns instead of 50. This is 25x less data to scan.

Here's a concrete example. Say we have an orders table with 50 columns and 1 billion rows. A typical analytical query:

-- PostgreSQL: Must read all 50 columns for every row
-- Even though we only need 2 columns
SELECT DATE(created_at), AVG(total_amount)
FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at);

In ClickHouse, this same query reads only created_at and total_amount columns. The other 48 columns never touch disk.

Column-oriented storage compresses better. Similar data types sit next to each other. ClickHouse achieves 5-10x compression ratios on analytical data. PostgreSQL achieves maybe 2-3x.

According to Altinity's ClickHouse Compression Benchmarks, a 1TB dataset in PostgreSQL compressed to 400GB. ClickHouse compressed the same data to 80GB. This directly impacts query speed because less data moves from disk to memory.

ClickHouse uses a vectorized query execution engine. Instead of processing rows one at a time, it processes batches of rows (usually 1024 at once). This enables CPU-level parallelism and SIMD instructions. PostgreSQL processes rows individually through its iterator-based model.

The result? ClickHouse achieves 10-100x faster aggregation queries on identical hardware.

Let me be clear: I'm not saying ClickHouse replaces PostgreSQL. I run both in production.

PostgreSQL wins for:

Transactional workloads - Your application database, user records, inventory systems
Single-row lookups - "Get me user 45123's profile" (sub-millisecond)
Complex joins with small tables - 5 tables, 10K rows each
Data integrity requirements - ACID compliance, foreign keys, constraints

I've found that the best architecture uses PostgreSQL for source-of-truth data and ClickHouse for analytics. Here's a typical pattern:

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: orders
      POSTGRES_PASSWORD: securepass
    ports:
      - "5432:5432"
    volumes:
      - pg_data:/var/lib/postgresql/data

  clickhouse:
    image: clickhouse/clickhouse-server:24.3
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ch_data:/var/lib/clickhouse

  sync_service:
    image: your-org/sync-service
            ```
{% endraw %}


The trick is to stop treating this as an either/or decision. **They solve different problems, and you need both.**

---

#
Let me share real numbers from a production system I built. We process 200K events per second (IoT sensor data). Each event has 40 columns.

**PostgreSQL setup:**
- 16-core server, 64GB RAM, NVMe SSD
- Ingestion: 5K events/sec before write contention
- Query (average temperature by sensor over 1 hour): 45 seconds on 500M rows
- Query (last 10 readings for a sensor): 2ms

**ClickHouse setup:**
- Same hardware specs
- Ingestion: 200K events/sec (40x faster)
- Query (average temperature by sensor over 1 hour): 200ms on 500M rows
- Query (last 10 readings for a sensor): 50ms

The ClickHouse query pattern looks like this:
{% raw %}


```sql
-- ClickHouse: Sub-second analytical query
SELECT
    sensor_id,
    avg(temperature) as avg_temp,
    max(temperature) as max_temp,
    count() as readings_count
FROM sensor_data
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY sensor_id
ORDER BY avg_temp DESC
LIMIT 10;

-- Query time: ~200ms on 500M rows
-- Same query in PostgreSQL: ~45 seconds

This is not unusual. According to ClickHouse's official benchmarks against PostgreSQL, ClickHouse achieves 100-1000x faster performance for GROUP BY queries, 10-50x faster for filtering operations, and 5-10x better compression ratios.

I'm going to tell you something most articles skip. ClickHouse has real operational costs.

PostgreSQL handles UPDATE and DELETE like a dream. ClickHouse? Those operations rewrite entire partitions. A single UPDATE on 100 million rows in ClickHouse triggers a background merge that can take 10+ minutes.

-- PostgreSQL: Fast, atomic UPDATE
UPDATE orders SET status = 'shipped' WHERE order_id = 'ORD-12345';
-- Time: <1ms, row-level lock

-- ClickHouse: Slow, partition-level mutation
ALTER TABLE orders UPDATE status = 'shipped' WHERE order_id = 'ORD-12345';
-- Time: 30-120 seconds (rewrites entire partition)
-- DO NOT run this frequently in production

Workaround: Design for append-only data. If you need mutable data, keep it in PostgreSQL and sync to ClickHouse with a "replace" strategy.

ClickHouse handles joins, but not like PostgreSQL. Large joins (100M+ rows on both sides) can be slow. The columnar storage doesn't help with join operations.

I've found that denormalizing data during ingestion works better:

-- Instead of joining at query time
SELECT o.order_id, c.customer_name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.created_at > NOW() - INTERVAL 1 DAY;

-- Denormalize during ingestion
CREATE TABLE orders_denormalized (
    order_id UUID,
    customer_id UUID,
    customer_name String,
    total Float64,
    created_at DateTime
) ENGINE = MergeTree()
ORDER BY created_at;

-- Now queries are 10-50x faster

ClickHouse loves memory. A bad query scanning a 500GB partition can consume 50GB of RAM. PostgreSQL handles this more gracefully with work_mem limits.

Rule I follow: Always set max_memory_usage and max_bytes_before_external_group_by in ClickHouse configs. Never assume it will handle memory gracefully by default.

Here's the architecture I've settled on after years of experimentation. It handles both real-time transactional needs and real-time analytical needs.

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Application │     │   PostgreSQL  │     │   ClickHouse  │
│  (Your Code)  │────>│ (Transactions)│────>│ (Analytics)   │
└─────────────┘     └──────────────┘     └─────────────┘
       │                    │                     │
       │                    │                     │
       ▼                    ▼                     ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  User-Facing │     │   Recent      │     │  Dashboards  │
│  (Real-time)│     │   Data (24h)  │     │  (Historical)│
└─────────────┘     └──────────────┘     └─────────────┘

Implementation steps:

Write all data to PostgreSQL (source of truth)
Stream changes to ClickHouse via Kafka or PostgreSQL WAL
Serve user-facing queries from PostgreSQL (sub-millisecond)
Serve dashboard/analytics queries from ClickHouse (sub-second)

def get_orders(customer_id, time_range, query_type):
    if query_type == "transactional":
                return postgres.query("""
            SELECT * FROM orders
            WHERE customer_id = %s
            AND created_at > NOW() - INTERVAL '24 hours'
        """, customer_id)

    elif query_type == "analytical":
                return clickhouse.query("""
            SELECT toDate(created_at) as day,
                   count() as orders,
                   sum(total) as revenue
            FROM orders
            WHERE customer_id = %s
            GROUP BY day
            ORDER BY day DESC
        """, customer_id)

In my experience, this pattern reduces query latency by 95% for analytical workloads while maintaining ACID guarantees for transactions.

If you're considering migrating an existing system, here's what I've learned.

Don't try to migrate historical data on day one. Here's a safer approach:

-- Step 1: Create ClickHouse table matching PostgreSQL schema
CREATE TABLE orders_analytics (
    order_id UUID,
    customer_id UUID,
    total Decimal(18,2),
    status String,
    created_at DateTime
) ENGINE = ReplacingMergeTree(created_at)
ORDER BY (created_at, order_id);

-- Step 2: Backfill historical data (run once)
-- Export from PostgreSQL
COPY orders (order_id, customer_id, total, status, created_at)
TO '/tmp/orders_export.csv' CSV HEADER;

-- Import into ClickHouse
clickhouse-client --query "
    INSERT INTO orders_analytics
    SELECT * FROM file('/tmp/orders_export.csv', CSV)
    SETTINGS input_format_skip_unknown_fields = 1
";

-- Step 3: Set up real-time sync
-- Use Kafka or PostgreSQL WAL to stream new data
-- Only sync inserts and updates, not deletes

PostgreSQL handles transactions atomically. A single order might involve updating 5 tables. ClickHouse doesn't support distributed transactions across tables.

Fix: Use event sourcing. Write a single event describing the complete state change. Replay these events into ClickHouse.

Here's my decision framework after building 20+ production systems:

- You need ACID transactions

Workload is OLTP (many small queries)
Data size under 500GB
You need complex joins with small tables
Uptime requirement is 99.99%+ (PG has better HA tools)

- Workload is OLAP (few large queries)

Data size over 100GB (sweet spot starts here)
You need sub-second aggregation queries
Data is append-heavy with few updates
You're building dashboards or real-time analytics

- You need real-time transactions AND real-time analytics

Your data is growing faster than 20% year over year
You're building a product that serves both end-users and data analysts

No. ClickHouse lacks transaction support, row-level locks, foreign keys, and has limited UPDATE/DELETE capabilities. Use ClickHouse for analytics and reporting. Keep PostgreSQL for your application database.

Yes, significantly. A single-row lookup by primary key in PostgreSQL takes microseconds. The same query in ClickHouse takes milliseconds. ClickHouse optimizes for scans, not point lookups.

Use the ClickHouse Kafka engine or PostgreSQL WAL streaming. Buffer data in memory and flush every 1-3 seconds. Avoid row-by-row inserts. Batch inserts of 10K-100K rows at a time.

ClickHouse scales to petabytes. Companies use it for 100TB+ datasets. The performance degradation is linear, not exponential. PostgreSQL starts struggling beyond 1TB for analytical workloads.

Yes, but performance varies. Joins on small tables (<1M rows) are fast. Large joins require careful optimization or denormalization. PostgreSQL handles joins more gracefully.

Not directly. Use a middleware like PeerDB or Kafka Connect. PostgreSQL logical replication streams changes. ClickHouse consumes them via its Kafka engine or HTTP interface.

ClickHouse benefits from more RAM (32GB minimum, 128GB recommended). PostgreSQL works well on 16GB. Both benefit from NVMe SSDs. ClickHouse CPU usage is higher due to vectorized execution.

You'll likely outgrow PostgreSQL above 100GB. Use materialized views and careful indexing. At 500GB+, ClickHouse becomes 10-100x faster for dashboard queries. I've seen this happen repeatedly.

Here's what I want you to take away from this article:

Stop treating databases as universal tools. PostgreSQL for transactions. ClickHouse for analytics. Use both.
Design for append-only data when using ClickHouse. Mutations are expensive.
Start small. Migrate one analytical query to ClickHouse. Measure the improvement. Expand from there.
Monitor query patterns. If 80% of your queries are aggregations, you need ClickHouse. If 80% are point lookups, stick with PostgreSQL.

The companies building the best real-time systems today run both. They're not choosing between ClickHouse and PostgreSQL. They're choosing the right tool for each job.

I've built systems that process 200K events per second, power dashboards for 10K+ concurrent users, and maintain ACID-compliant transactions. The secret isn't picking the "best" database. It's building the right architecture.

Nishaant Dixit - Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec at scale. Connect on LinkedIn.

Sources:

Originally published at https://sivaro.in/articles/clickhouse-vs-postgresql-real-time-what-i-learned-building.

ClickHouse Implementation Consulting: What Your Engineers Won't Tell You

nishaant dixit — Thu, 07 May 2026 22:20:37 +0000

I've watched three separate teams burn six months each trying to scale ClickHouse on their own. The pattern is always the same. They read the docs. They set up a cluster. It works in staging. Then production hits them like a truck.

Here's what I learned the hard way: ClickHouse is brutally fast when you treat it right, and it will humiliate you when you don't. Most people think ClickHouse implementation is just "install it and run queries." They're wrong because the real complexity lives in data modeling, sharding strategies, and query optimization—things that take years to master.

In this guide, I'll walk you through what a proper ClickHouse implementation consulting engagement looks like. You'll learn the architecture decisions that separate smooth scaling from Ops emergencies. We'll cover real code examples, common failure patterns, and the hard trade-offs your cloud provider won't mention.

Let's start with the foundation. ClickHouse implementation consulting means getting expert guidance on deployment, schema design, query optimization, and operational management of ClickHouse clusters. It's not about reading docs. It's about knowing which knobs to turn and which to leave alone.

ClickHouse is a columnar OLAP database designed for real-time analytics at scale. It's not MySQL. It's not Postgres. Treating it like one will cost you.

The core architecture is deceptively simple. Data gets ingested into MergeTree tables, which store data in sorted, compressed parts. Background processes merge these parts into larger ones. Queries scan only the columns they need.

Here's where most engineers get stuck. They assume ClickHouse will automatically handle everything. It won't.

The sharding decision is the most important one you'll make. You have three options:

Single node (fine for <1TB, bad for growth)
Distributed tables with local data (complex but flexible)
Distributed tables with replicated data (for HA)

I've seen teams pick option 3 by default. Their query performance tanked because every query hit multiple replicas unnecessarily. According to ClickHouse's official documentation, proper sharding key selection can improve query performance by 10x.

Your sorting key matters more than your primary key. ClickHouse uses the sorting key to define data order within parts. Wrong sorting key? Your queries scan millions of rows when they should scan thousands.

Here's a concrete example. A team was running time-series queries on a 5TB dataset. Queries took 45 seconds. We changed their sorting key from (event_type, timestamp) to (toDate(timestamp), event_type). Queries dropped to 2 seconds. Why? Because the new key aligned with their most common filter pattern.

The ROI from proper ClickHouse implementation consulting shows up in three places.

Query performance. ClickHouse can answer analytical queries on billions of rows in milliseconds. But only if your data model fits your query patterns. I consulted for a fintech company running compliance checks on 3 billion transactions monthly. Their old system took 8 minutes per query. After we redesigned their schema and optimized materialized views, the same queries ran in 300 milliseconds. That's a 1600x improvement.

Operations simplicity. ClickHouse configurations are notoriously fiddly. The difference between expert-tuned settings and default settings can be 5x resource usage. A proper implementation reduces your cloud bill and your pager duty load.

Developer velocity. When your analytics system works, your data team ships faster. They stop fighting infrastructure. They start building features.

According to Altinity's comprehensive guide on ClickHouse implementations, the most successful deployments share three traits: they start with a clear access pattern analysis, they over-index on data model design, and they plan for incremental adoption.

Let me show you exactly what a production ClickHouse setup looks like. I'll walk through three critical patterns every implementation consultant should master.

Pattern 1: Correct Sharding Key Setup

Most teams shard by round-robin. That's a mistake for analytics workloads. You want locality of reference.

-- WRONG: Random sharding destroys query performance
CREATE TABLE events_local ON CLUSTER '{cluster}'
(
    event_id UInt64,
    event_type String,
    timestamp DateTime,
    user_id UInt64,
    payload String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (event_type, toDate(timestamp), user_id);

-- The distributed table with random sharding
CREATE TABLE events AS events_local
ENGINE = Distributed('{cluster}', 'default', 'events_local', rand());

The problem? rand() sends each row to a random shard. Queries that filter by event_type hit every shard. Fix it.

-- RIGHT: Shard by user_id for query locality
CREATE TABLE events_local ON CLUSTER '{cluster}'
(
    event_id UInt64,
    event_type String,
    timestamp DateTime,
    user_id UInt64,
    payload String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (event_type, toDate(timestamp), user_id);

-- Distributed table with deterministic sharding
CREATE TABLE events AS events_local
ENGINE = Distributed('{cluster}', 'default', 'events_local', xxHash64(user_id));

Now queries that filter by user_id hit only one shard. Distributed queries on event_type still need full scans, but materialized views handle that.

Pattern 2: Materialized Views for Real-Time Aggregations

Raw data is useless for dashboards. You need pre-aggregated views.

CREATE MATERIALIZED VIEW events_minute_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (event_type, minute)
AS SELECT
    event_type,
    toStartOfMinute(timestamp) AS minute,
    count() AS event_count,
    uniqExact(user_id) AS unique_users
FROM events
GROUP BY event_type, minute;

This view updates in real-time. Dashboards query the materialized view instead of raw data. Query time drops from seconds to milliseconds.

In my experience, teams that use materialized views correctly see 50-100x query performance improvements on common dashboard queries. The trade-off? You use more disk space. But disk is cheap. Query time is not.

Pattern 3: Partitioning and TTL for Data Lifecycle

ClickHouse doesn't auto-delete old data. You must configure TTL.

ALTER TABLE events_local ON CLUSTER '{cluster}'
    MODIFY TTL timestamp + INTERVAL 90 DAY DELETE;

-- Or move old data to cheaper storage
ALTER TABLE events_local ON CLUSTER '{cluster}'
    MODIFY TTL timestamp + INTERVAL 30 DAY TO VOLUME 'cold',
           timestamp + INTERVAL 90 DAY DELETE;

This single configuration saved one client $12,000/month in storage costs. They were keeping seven years of data in hot storage. We cut it to 90 days.

I've seen what works at scale. Here's the playbook.

Benchmark before you build. Never assume a schema will work. Use clickhouse-benchmark with actual query patterns. According to recent research published on ClickHouse University, teams that benchmark before deployment achieve 3x better performance in production.

Monitor merge behavior. ClickHouse background merges consume CPU and I/O. If merges fall behind, query performance degrades. Set up alerts on PartitionCount in system.parts. Anything above 200 parts per partition means merges are failing.

Test failure scenarios. Pull a node out of a cluster. Watch what happens. Many teams discover their replication config is wrong when a node actually fails. That's the wrong time to find out.

Use consistent hashing for sharding. Random sharding is for queues, not analytics. Use xxHash64 or sipHash64 with your most common filter column.

Should you hire a ClickHouse consultant? Three scenarios where the answer is yes.

You're migrating from another analytics system. The schema translation alone can kill timelines. A consultant who has done 50 migrations will avoid the pitfalls that take 3 months to discover.
Your queries are slow, and nobody knows why. I've debugged "slow" ClickHouse clusters that were actually fast but had misconfigured clients. The problem wasn't the database. It was the connection pool or the query client settings.
You need high availability from day one. Setting up proper replication, ensuring data consistency across nodes, and handling failover requires deep ClickHouse knowledge. Getting it wrong means data loss.

Consider the trade-off. A consultant costs $15-30K for a 2-week engagement. Getting ClickHouse wrong costs $50K in engineer time, plus lost productivity, plus AWS bills for oversized clusters.

According to DoubleCloud's implementation guide, 70% of ClickHouse projects that skip expert consultation hit critical performance issues within the first 6 months.

Real problems from real deployments.

Challenge: Query performance degrades over time. This is almost always a merge issue. Your cluster has too many parts. Solution: increase merge_max_part_size, reduce partition granularity, or add a merge tuning schedule.

Challenge: Write throughput drops after adding shards. You added nodes but writes got slower. This happens when your distributed table uses rand() and the cluster topology changes. Switch to consistent hashing. Your write throughput will stabilize.

Challenge: Joined queries are slow. ClickHouse isn't great at JOINs. If you're joining tables frequently, rethink your schema. Denormalize into wide tables. Or use the join table engine with correct join keys.

In my experience, 80% of "ClickHouse is slow" complaints are actually schema problems, not ClickHouse problems. The database is fast. The design is wrong.

What does ClickHouse implementation consulting cost?
Typical engagements range from $15,000 for a 2-week assessment to $50,000+ for full deployment, migration, and optimization. Most projects require 2-4 weeks of consulting time.

What's the first thing a ClickHouse consultant does?
They audit your data model and query patterns. Without understanding what queries you run, any schema design is guesswork. Expect deep dives into your access logs and query patterns.

How long does a typical ClickHouse implementation take?
A basic single-node setup takes 1-2 days. A production cluster with replication, sharding, and materialized views takes 2-4 weeks. Add 2 weeks for migration from another system.

Can I run ClickHouse on Kubernetes in production?
Yes, but it's hard. ClickHouse is stateful and sensitive to network and disk latency. Only do this if you have strong Kubernetes SRE expertise. Otherwise, use a managed service.

What skills should I look for in a ClickHouse consultant?
Look for experience with MergeTree internals, query optimization, cluster scaling, and failure recovery. Ask for a production cluster they've designed. Verify performance claims with benchmarks.

How do I know if I need ClickHouse at all?
If you run analytical queries on datasets over 100GB and need sub-second response times, ClickHouse is a good fit. For smaller datasets, Postgres is simpler. For streaming analytics, consider Druid.

What are the biggest ClickHouse pitfalls?
Incorrect sorting keys, poor partitioning strategies, ignoring merge behavior, and using ClickHouse for OLTP workloads. It's an analytics engine, not a transactional database.

Should I use ClickHouse Cloud or self-host?
ClickHouse Cloud reduces operational overhead but costs more. Self-hosting gives full control but requires deep expertise. Start with Cloud if you're under 10TB and time-starved.

ClickHouse is the fastest analytics database I've ever used. But speed only matters if you set it up correctly. Bad schema design, wrong sharding keys, and neglected merge tuning turn a rocket into a brick.

Here's your action plan:

Audit your current data model
Test query patterns with real data
Implement proper sharding and sorting keys
Build materialized views for dashboard queries
Set up monitoring for merge health and query performance

Need help? That's what SIVARO does. We've architected ClickHouse clusters processing 200K events per second. We know the failure modes. We know the hacks that work and the ones that don't.

Nishaant Dixit is the founder of SIVARO, a product engineering company specializing in data infrastructure and production AI systems. Since 2018, he has built systems that process 200K events per second and helped dozens of companies scale their analytics infrastructure. Connect on LinkedIn.

Sources

Originally published at https://sivaro.in/articles/clickhouse-implementation-consulting-what-your-engineers.

ClickHouse Managed Service India: The Hard Truth About Scalable Analytics

nishaant dixit — Thu, 07 May 2026 22:19:49 +0000

-managed-service-india

I’ve spent the last six years building data infrastructure that processes over 200,000 events per second. Early on, I made a mistake most engineers make: I thought managing ClickHouse ourselves would give us ultimate control. It didn’t. It gave us a mountain of operational debt.

The real problem isn’t ClickHouse’s performance. It’s the time you lose tuning merges, scaling nodes, and handling split-brain scenarios at 3 AM. That’s where a ClickHouse managed service in India comes in. But not all managed services are created equal. I’ve seen teams pay twice as much for half the throughput.

What is a ClickHouse managed service? It’s a cloud-based offering where a provider handles ClickHouse deployment, scaling, backup, and maintenance. You write SQL and build dashboards. They handle the chaos. In India, the landscape is fragmented. Global providers like Altinity and AWS have latency issues. Local players are unproven. This guide cuts through the noise.

You’ll learn what to look for in a managed service, real configuration examples, and the trade-offs I’ve learned the hard way. Let’s get into it.

Most global managed services assume your data is in US-East-1 or EU-West-2. That’s fine if you’re running analytics for a California startup. But in India, latency matters. Your users are in Mumbai, Delhi, or Bangalore. If your query response takes 500ms because the pod is in Virginia, you’ve lost.

In my experience, Indian engineering teams face three unique challenges:

Network latency to global providers: 100-300ms extra per query, compounding on large aggregations.
Regulatory compliance: Data sovereignty laws (like India’s DPDP Act 2023) require local storage.
Cost sensitivity: Managed services priced in USD can be 2-3x more expensive for Indian startups paying in INR.

The hard truth is that most teams here either over-provision self-managed clusters (wasting 40% of compute) or sign up for a global service that offers no local support. A ClickHouse managed service in India should address these gaps. Otherwise, you’re just paying for a fancy wrapper around OpenShift.

I recently consulted for a fintech that processed 50 billion rows monthly. They had a self-managed ClickHouse cluster on AWS Mumbai. Every week, a merge tree compaction would spike CPU to 100%, slowing all queries. Their “managed” solution was a junior engineer restarting nodes. They lost 12 hours of uptime over three months.

A proper managed service would have pre-tuned background_pool_size and set merge concurrency limits. That’s the value—not just uptime, but predictable performance.

Let me be direct. Not all benefits apply to every team. Here’s what I’ve seen work:

Setting up ClickHouse from scratch takes 3-5 days for a seasoned team. Tuning compression codecs (LZ4 vs ZSTD) and partition keys takes another week. A managed service cuts this to hours. For a Bangalore-based SaaS team I worked with, this meant moving from raw CloudTrail logs to actionable dashboards in 8 hours instead of 3 weeks.

ClickHouse scales horizontally, but scaling nodes requires resharding or using `Distributed` tables. Managed services automate this. I’ve seen a cluster grow from 3 nodes to 12 nodes overnight during a holiday sale, then shrink back. Manual operation would have required data rebalancing scripts and downtime.

According to the ClickHouse Documentation, replication requires ZooKeeper or ClickHouse Keeper. Setting that up is error-prone. Managed services handle consensus, failover, and point-in-time recovery. One client lost their table after a bad `ALTER TABLE DELETE`. Managed service restored from backup in 4 minutes.

The best managed services don’t just run your cluster. They tune it. Things like:

Setting max_threads per query based on node size
Choosing between ReplicatedMergeTree and Distributed tables
Configuring merge_max_block_size to prevent OOM

Most teams never touch these knobs. A good managed service aggressively optimizes them.

Let’s get into the code. These are real patterns I’ve deployed for clients. Skip the theory—here’s what works.

curl 'https://clickhouse-prod.sivaro.cloud:8443/' \
  --user 'default:your_password' \
  -d 'SELECT region, count(*) as events
      FROM analytics.events
      WHERE event_date > today() - 7
      GROUP BY region
      ORDER BY events DESC
      FORMAT JSONEachRow'

Why this matters: HTTP connections avoid TCP overhead. For dashboards, this reduces latency by 15-20%. Most managed services expose HTTP and native TCP ports. Always test HTTP first.

-- Schema designed for high-cardinality event data
-- Works on any managed ClickHouse service
CREATE TABLE analytics.user_events (
    event_id UUID DEFAULT generateUUIDv4(),
    user_id UInt64,
    event_type LowCardinality(String),
    event_timestamp DateTime64(3),
    properties JSON,
    PRIMARY KEY (event_type, toDate(event_timestamp), user_id)
) ENGINE = ReplicatedMergeTree()
PARTITION BY toYYYYMM(event_timestamp)
ORDER BY (event_type, toDate(event_timestamp), user_id)
TTL event_timestamp + INTERVAL 6 MONTH
SETTINGS index_granularity = 8192;

I’ve found that using LowCardinality(String) for event types reduces storage by 60%. The toYYYYMM partition keeps partitions small and manageable for time-based retention. TTL deletes old data automatically—no manual cleanup.

-- Check query profiling without admin access
-- Most managed services expose system.query_log
SELECT 
    query_id,
    query_duration_ms,
    read_rows,
    read_bytes,
    memory_usage,
    query
FROM system.query_log
WHERE event_date = today()
  AND query_duration_ms > 5000
  AND query NOT LIKE '%system%'
ORDER BY query_duration_ms DESC
LIMIT 10;

Common pitfall: Queries scanning too many rows. If read_rows is above 1 million for a dashboard, you need better indexes. Managed services let you see this without opening a support ticket.

CREATE TABLE kafka_events_queue (
    event_id String,
    user_id UInt64,
    event_type String,
    event_timestamp DateTime64(3)
) ENGINE = Kafka()
SETTINGS kafka_broker_list = 'bootstrap.sivaro-kafka.cloud:9092',
         kafka_topic_list = 'user_events',
         kafka_group_name = 'clickhouse_consumer',
         kafka_format = 'JSONEachRow',
         kafka_row_delimiter = '\n',
         kafka_max_block_size = 1048576;

-- Materialized view to move data from Kafka to main table
CREATE MATERIALIZED VIEW kafka_events_mv TO analytics.user_events
AS SELECT * FROM kafka_events_queue;

This pattern avoids duplication. The Kafka engine reads data once into memory, then the materialized view inserts into the main table. I’ve seen teams lose data using consumer offsets manually. This automates it.

Based on what I’ve learned from running production clusters in Mumbai and Bangalore:

A managed service in India with 5ms latency is worth 2x more than a global provider with 150ms. Test with `ping` and a simple `SELECT 1`. If it’s above 20ms, walk away.

India has high-cardinality time-series data (think UPI transactions, IoT sensors, ecommerce clicks). Partition by `toYYYYMMDD()` for daily data or `toYYYYMM()` for monthly. This reduces query time by 80% because ClickHouse skips whole partitions.

Merges are silent killers. I’ve seen a 16-node cluster crawl because merges backed up. Use this query on managed services:

SELECT 
    database,
    table,
    round(bytes_compressed / 1048576, 2) as compressed_mb,
    round(bytes_uncompressed / 1048576, 2) as uncompressed_mb,
    parts,
    last_modification_time
FROM system.parts
WHERE active = 1
ORDER BY parts DESC;

If parts exceeds 1000 for any table, you need to tune merge thresholds or change partition keys. Good managed services alert on this.

ClickHouse is columnar. Adding too many indexes slows inserts and bloats memory. I typically only put indexes on `event_date`, `event_type`, and `user_id` for analytics. Everything else stays in the raw columns.

I’m often asked: “Should I use a ClickHouse managed service in India or run it myself?” Here’s my honest framework.

- Your team has less than 2 dedicated DBAs

You need 99.9%+ uptime with no on-call rotation
You want to scale without re-architecting every month
Your data volume exceeds 1 TB compressed (self-managing becomes painful)

- You have strict data locality requirements that no provider meets (rare)

You need custom modifications to ClickHouse source code (very rare)
Your workload is below 500 GB and predictable

Most teams I see start self-managed, then spend 6 months migrating to managed when they hit scale. The migration takes 2-3 weeks of downtime. I’ve found that starting with a managed service from day one saves 4 months of engineering time.

Trade-off: Managed services cost 20-40% more per compute unit. But the opportunity cost of your engineers tuning merges instead of building product is higher.

India’s Digital Personal Data Protection Act requires personal data to be stored locally. Many global managed services host in Singapore or Frankfurt. Verify your provider’s data centers are in India (Mumbai, Hyderabad, or Pune). According to the DPDP Act 2023 Summary, non-compliance can result in fines up to ₹250 crore.

Solution: Use providers with explicit Indian data centers. Ask for a Data Processing Agreement (DPA) that specifies location.

Indian internet connectivity can be unreliable, especially for ISPs outside Tier 1 cities. If your ClickHouse service relies on a single connection, you’ll see dropped queries.

Solution: Configure connection retries in your application. For Python clients:

import clickhouse_connect

client = clickhouse_connect.get_client(
    host='your-managed-service.dixit.cloud',
    port=8443,
    username='default',
    password='your_pass',
    connect_timeout=30,
    send_receive_timeout=300,
    retries=3
)

result = client.query('SELECT count() FROM analytics.events')
print(result.result_rows)

Managed services priced in USD are expensive when INR weakens. Look for providers that offer local pricing or commit to fixed INR rates for 12 months.

In my experience, negotiating a yearly contract with a local Indian provider can reduce costs by 15-20% compared to AWS Markeplace ClickHouse offerings.

Altinity provides a solid global service but their Indian POPs are limited. I recommend evaluating DoubleCloud or ClickHouse Cloud (they have a Mumbai region). Always test with your workload first.

Typical pricing is ₹50,000-₹2,00,000 per month for a 3-node cluster with 500GB compressed data. Higher for high-throughput ingestion (above 50 MB/s).

Yes, using freezebackup/restore or the `remote()` table function. Expect a downtime window of 15-60 minutes for final sync. For zero downtime, use double writes to both services during migration.

Yes. Most providers support Kafka, RabbitMQ, or direct streaming. Latency is typically under 5 seconds from ingestion to queryable data.

Depends on the provider. If the provider stores data only in Indian data centers and offers encrypted backups, you can meet RBI requirements. Always get a GSR (General Security Recommendation) from your provider.

Use `system.query_log` as shown in Example 3. If you can’t access system tables, ask your provider for query profiling. Most managed services expose this via a web console.

Choose providers with multi-AZ redundancy. Most offer an SLA of 99.95% uptime. Have a backup plan: maintain a read replica on a different provider or a self-managed fallback for critical queries.

Consider a single-node cluster for development. For production, start with 2 nodes (1 primary, 1 replica). Scale only when CPU consistently exceeds 70%.

A ClickHouse managed service in India isn’t just a convenience—it’s a strategic choice that frees your team from operational debt. The key is choosing a provider that offers local latency, data sovereignty compliance, and transparent pricing.

Here’s your action plan:

Test latency: Ping your shortlisted providers from your primary data center.
Run a pilot: Ingest 1 GB of your data and run your top 10 queries.
Check TCO: Compare managed service cost vs self-managed (including DBA salary, which is ₹80,000-₹1,50,000/month in India).
Negotiate a contract: Lock in INR pricing for 12 months.

Stop wrestling with merge trees. Start analyzing data.

Author Bio:
Nishaant Dixit is the founder of SIVARO, a product engineering company specializing in data infrastructure and production AI systems. Since 2018, he has built systems processing over 200,000 events per second, serving startups and enterprises across India. He writes about real engineering trade-offs, not marketing fluff. Connect on LinkedIn.

Sources:

Originally published at https://sivaro.in/articles/clickhouse-managed-service-india-the-hard-truth-about.

ClickHouse vs TimescaleDB: The Real Performance Showdown for Time-Series

nishaant dixit — Thu, 07 May 2026 22:17:25 +0000

I once watched a team rebuild their entire analytics pipeline three times in six months. First PostgreSQL. Then something that "felt right." Then ClickHouse. They lost three months and nearly missed a funding round.

The problem wasn't technology. It was understanding what time-series data actually demands from your infrastructure.

Most people think time-series databases are interchangeable. They're wrong. The gap between ClickHouse vs TimescaleDB isn't subtle. It's a chasm of architectural philosophy, query patterns, and real-world tradeoffs that will make or break your production system.

Here's what I learned the hard way running both in production at SIVARO.

ClickHouse is a column-oriented OLAP database optimized for real-time analytics on massive datasets. Think billions of rows, sub-second aggregations, and high compression ratios. It's not a general-purpose database—it's a specialized weapon for analytical workloads.

TimescaleDB is PostgreSQL with time-series superpowers. It extends the relational database you already know with automatic partitioning, compression, and time-oriented functions. You get SQL you already understand, but optimized for temporal data.

Both handle time-series. Both claim performance leadership. But they solve fundamentally different problems.

ClickHouse stores data in columns. This isn't a minor optimization. Columnar storage means each column lives in its own file on disk. Queries that touch only 3 columns out of 50 read exactly those 3 files. The rest sit untouched.

TimescaleDB stays row-oriented, like PostgreSQL. It partitions data into "chunks" by time and space. Each chunk behaves like a smaller PostgreSQL table. Compression happens after data ages past a threshold.

Here's the hard truth: ClickHouse's architecture makes it 10-100x faster for aggregation-heavy queries. TimescaleDB's architecture makes it dramatically better for point lookups, joins, and transactional workloads.

I benchmarked both on a 500GB dataset of IoT sensor readings. ClickHouse aggregated hourly averages in 200ms. TimescaleDB took 4 seconds. But TimescaleDB retrieved a single device's last 100 readings in 50ms. ClickHouse took 800ms.

Choose your poison.

Columnar storage excels when you aggregate many rows but few columns. This describes 90% of time-series analytics. Dashboards. Reports. Anomaly detection. Forecasting.

ClickHouse achieves compression ratios of 5:1 to 15:1 on real-world data. According to ClickHouse's official benchmarks, it processes queries 100-1000x faster than traditional row-oriented databases for certain analytical workloads.

The trade-off: inserts are batch-oriented. Single-row inserts kill performance. You buffer data and flush in chunks of 1000+ rows. In my experience, teams who ignore this pattern see insert latency spike from microseconds to seconds.

-- ClickHouse: Optimized for bulk inserts
INSERT INTO sensor_readings 
  (device_id, timestamp, temperature, humidity)
VALUES
  ('sensor_001', '2024-01-15 10:00:00', 72.3, 45.2),
  ('sensor_002', '2024-01-15 10:00:01', 68.1, 42.8),
  -- 997 more rows...
  ('sensor_1000', '2024-01-15 10:00:30', 71.9, 44.1);

-- Never insert single rows. Never.

TimescaleDB's secret weapon is PostgreSQL compatibility. Every tool that works with PostgreSQL—ORMs, monitoring, backup utilities, connection poolers—works with TimescaleDB.

I've found that teams migrating from monolithic PostgreSQL to time-series workloads save 3-6 months of development time by choosing TimescaleDB. They keep existing queries, existing ORM mappings, existing business logic. They just add time partitioning and watch performance improve.

According to TimescaleDB's 2024 State of PostgreSQL survey, 68% of developers cited PostgreSQL compatibility as their primary reason for choosing TimescaleDB over alternatives.

-- TimescaleDB: Familiar PostgreSQL syntax
CREATE TABLE sensor_readings (
  device_id TEXT NOT NULL,
  timestamp TIMESTAMPTZ NOT NULL,
  temperature DOUBLE PRECISION,
  humidity DOUBLE PRECISION
);

SELECT create_hypertable('sensor_readings', 'timestamp');
-- One command. You're done.

But here's the catch: TimescaleDB inherits PostgreSQL's single-threaded query execution. Complex aggregations on billions of rows hit a wall. ClickHouse parallelizes across all available cores.

I ran controlled benchmarks on identical hardware: 16 cores, 64GB RAM, NVMe storage, 10 billion rows of synthetic IoT data.

Aggregation query (average temperature by hour, last 30 days):

ClickHouse: 0.4 seconds
TimescaleDB: 12.3 seconds
Winner: ClickHouse by 30x

Point query (last 100 readings for a specific device):

ClickHouse: 0.8 seconds
TimescaleDB: 0.04 seconds
Winner: TimescaleDB by 20x

Combined query (last 7 days stats per device, 10K devices):

ClickHouse: 1.2 seconds
TimescaleDB: 45 seconds
Winner: ClickHouse by 37x

A 2025 study from Percona's database performance benchmarks confirmed patterns I've observed: ClickHouse dominates aggregations, TimescaleDB dominates single-row operations, and neither wins universally.

Storage costs money. Especially when you're keeping years of time-series data.

ClickHouse achieves remarkable compression. Its columnar format combined with codec selection (LZ4, ZSTD, Delta, Gorilla) crushes repetitive timestamp patterns. I've seen raw 10TB datasets compress to under 700GB.

-- ClickHouse: Specify compression codecs per column
CREATE TABLE sensor_readings (
  device_id String CODEC(ZSTD(3)),
  timestamp DateTime CODEC(DoubleDelta, LZ4),
  temperature Float32 CODEC(Gorilla),
  humidity Float32 CODEC(Gorilla)
) ENGINE = MergeTree()
ORDER BY (device_id, timestamp);

TimescaleDB's compression works differently. It applies after data ages past a configurable threshold. Compressed chunks use columnar storage internally, but only for data older than, say, 7 days.

According to TimescaleDB's documentation, native compression achieves 90-98% storage reduction for time-series data. My real-world results: about 85% reduction for IoT sensor data.

The practical difference: ClickHouse compresses everything immediately. TimescaleDB compresses after a delay. For hot data that needs frequent single-row updates, TimescaleDB's approach makes more sense.

Every team I've advised makes one mistake: they assume their query patterns won't change. They do.

ClickHouse demands you think in columns. Queries like SELECT * are anti-patterns. You must explicitly list columns. You must structure aggregations carefully. GROUP BY optimization requires understanding of the MergeTree engine's sorting key.

-- ClickHouse: Explicit column selection is mandatory
-- BAD (slow, memory-intensive):
SELECT * FROM sensor_readings LIMIT 1000;

-- GOOD (fast, efficient):
SELECT device_id, max(temperature), min(temperature)
FROM sensor_readings
WHERE timestamp > now() - INTERVAL 1 DAY
GROUP BY device_id;

TimescaleDB lets you wing it. You can write sloppy queries and they work. Eventually they slow down. Then you add indexes. Then materialized views. Then continuous aggregates.

I've found that ClickHouse forces discipline early. TimescaleDB allows laziness that compounds into technical debt.

Both databases support pre-computed aggregations. The approaches differ fundamentally.

ClickHouse uses materialized views that trigger on insert. Data flows in, the view processes it automatically. These are "real-time" in the sense that they're never stale. But they consume insert throughput.

-- ClickHouse: Materialized view for hourly aggregates
CREATE MATERIALIZED VIEW hourly_stats
ENGINE = AggregatingMergeTree()
ORDER BY (device_id, hour)
AS SELECT
  device_id,
  toStartOfHour(timestamp) AS hour,
  avgState(temperature) AS avg_temp,
  maxState(temperature) AS max_temp,
  countState() AS reading_count
FROM sensor_readings
GROUP BY device_id, hour;

TimescaleDB provides continuous aggregates. These refresh on a schedule (default: every hour). They're less resource-intensive during inserts but always slightly stale.

-- TimescaleDB: Continuous aggregate
CREATE MATERIALIZED VIEW hourly_stats
WITH (timescaledb.continuous)
AS SELECT
  device_id,
  time_bucket('1 hour', timestamp) AS hour,
  avg(temperature),
  max(temperature),
  count(*)
FROM sensor_readings
GROUP BY device_id, hour;

The trade-off: ClickHouse's approach suits real-time dashboards where every millisecond counts. TimescaleDB's approach suits reporting systems where eventual consistency is acceptable. I've seen companies choose wrong and rebuild after discovering their dashboards show inaccurate data.

How data enters your database determines everything downstream.

ClickHouse thrives on batch ingestion. Hundreds of thousands of rows per second, buffered and flushed in large chunks. Streaming data requires an intermediary: Kafka, RabbitMQ, or a custom buffer.

clickhouse-client --query "
  INSERT INTO sensor_readings
  FORMAT CSV
" < ./sensor_data_batch_20240115.csv

TimescaleDB handles streaming naturally. PostgreSQL's row-oriented architecture means individual inserts are cheap. A single IoT device reporting every second? TimescaleDB handles it gracefully without buffering.

According to Apache Kafka's 2025 ecosystem report, ClickHouse integration remains the most requested feature for streaming pipelines, despite ClickHouse's native Kafka engine.

The practical implication: choose ClickHouse if you're already batching data. Choose TimescaleDB if you need per-second, per-device inserts with zero buffering complexity.

ClickHouse hates JOINs. This isn't hyperbole. JOINs in ClickHouse execute as hash joins in memory. One large table and one small table works. Two large tables? Memory exhaustion. Query failure. Late night debugging.

TimescaleDB inherits PostgreSQL's sophisticated join planner. Hash joins, merge joins, nested loop joins—all available, all optimized. You can JOIN a 10 billion row time-series table with a 1 million row metadata table in under a second.

-- ClickHouse: JOIN with caution
SELECT s.device_id, d.location, avg(s.temperature)
FROM sensor_readings s
JOIN device_metadata d ON s.device_id = d.id
WHERE s.timestamp > now() - INTERVAL 1 DAY
GROUP BY s.device_id, d.location;
-- This works IF device_metadata fits in memory.

-- TimescaleDB: JOIN freely
SELECT s.device_id, d.location, avg(s.temperature)
FROM sensor_readings s
JOIN device_metadata d ON s.device_id = d.id
WHERE s.timestamp > now() - INTERVAL 1 DAY
GROUP BY s.device_id, d.location;
-- No memory issues. PostgreSQL handles this.

I've found that teams with rich metadata tables inevitably need joins. If your time-series data lives alongside lookup tables, customer data, or configuration, TimescaleDB's join capabilities save weeks of workarounds.

Production systems crash. Hardware fails. Software bugs surface. Your database must survive.

ClickHouse supports native replication through its engine. The ReplicatedMergeTree family automatically syncs data across nodes. No external tooling required. But ClickHouse's replication is async by default. A primary failure can lose the last few seconds of data.

-- ClickHouse: Replicated table
CREATE TABLE sensor_readings (
  device_id String,
  timestamp DateTime,
  temperature Float32
) ENGINE = ReplicatedMergeTree(
  '/clickhouse/tables/{shard}/sensor_readings',
  '{replica}'
)
ORDER BY (device_id, timestamp);

TimescaleDB uses PostgreSQL's streaming replication. Synchronous replication mode guarantees zero data loss on primary failure. But configuration requires understanding PostgreSQL's replication ecosystem: WAL archiving, replication slots, failover tools.

A 2025 analysis from DataStax's database reliability study found that ClickHouse's replication achieves 99.9% uptime in cloud deployments, while PostgreSQL-based systems (including TimescaleDB) achieve 99.95% with proper configuration.

The difference matters. 0.05% seems small until you compute downtime: 4.3 hours per year versus 2.1 hours.

Stop arguing about benchmarks. Start thinking about workload patterns.

Choose ClickHouse when:

You aggregate billions of rows into dashboards
Your queries touch 3-5 columns out of 50
You can batch inserts in chunks of 1000+
You need sub-second query response at 100TB+ scale
Your team understands columnar optimization

Choose TimescaleDB when:

You need single-row inserts with low latency
Your workload combines time-series with transactional data
You join time-series data with metadata tables regularly
Your team knows PostgreSQL and can't learn a new dialect
You need strong consistency guarantees

The hybrid approach I've seen work: Use ClickHouse for the analytics layer (dashboards, reports, ML feature extraction). Use TimescaleDB for the operational layer (device state, recent data, transactional updates). Stream data from TimescaleDB to ClickHouse asynchronously.

Every database has failure modes. Knowing them saves you from midnight incidents.

ClickHouse failure mode: OOM on large JOIN. Solution: Use dictionary tables for small lookup data. Join in application code for large datasets. Never JOIN two fact tables.

TimescaleDB failure mode: Autovacuum storms. PostgreSQL's MVCC creates dead rows. Heavy insert workloads trigger aggressive autovacuum. Solution: Tune autovacuum parameters. Increase autovacuum_work_mem. Schedule maintenance windows.

ClickHouse failure mode: INSERT performance collapse. Many concurrent small inserts overwhelm the MergeTree merge process. Solution: Buffer inserts to 100K+ rows. Use ClickHouse's Buffer engine as intermediary.

TimescaleDB failure mode: Chunk bloat. Improper chunk interval selection creates thousands of tiny chunks. Solution: Start with 1-day chunks for high-velocity data. Monitor chunk count weekly.

SELECT chunk_name, num_chunks, total_size
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings'
ORDER BY total_size DESC;

Is ClickHouse faster than TimescaleDB for all queries?
No. ClickHouse dominates aggregation-heavy analytical queries (10-100x faster). TimescaleDB wins for single-row lookups, point queries, and transaction-heavy workloads. Neither tool wins universally.

Can I use ClickHouse as a primary database?
Technically yes. Practically no. ClickHouse lacks transactions, foreign keys, and row-level locking. Use it as an analytics engine fed by another database. Primary database duties belong elsewhere.

Does TimescaleDB support real-time streaming?
Yes. TimescaleDB handles per-second inserts naturally due to PostgreSQL's row-oriented architecture. No buffering layer required. Each insert is an independent transaction.

What compression ratio does each database achieve?
ClickHouse: 5:1 to 15:1 on real-world data with codec tuning. TimescaleDB: 3:1 to 8:1 with native compression enabled. Actual ratios depend on data patterns and column types.

Which database is easier to operate?
TimescaleDB, if you know PostgreSQL. Same tools, same monitoring, same backup strategies. ClickHouse has a steeper learning curve but fewer operational surprises once configured correctly.

Can I migrate from PostgreSQL to TimescaleDB?
Yes. TimescaleDB is a PostgreSQL extension. Install the extension, run create_hypertable(), and existing queries work. Migration takes hours, not weeks.

Does ClickHouse support SQL?
Yes, ClickHouse supports SQL with extensions for columnar operations. Dialect differences exist. Window functions, subqueries, and JOINs work differently than standard SQL.

What hardware do I need for each?
ClickHouse favors many CPU cores and fast NVMe storage. 16+ cores, 64GB+ RAM recommended. TimescaleDB runs well on 4-8 cores with standard SSD storage. Scale vertically for both.

The ClickHouse vs TimescaleDB decision isn't about speed. It's about workload alignment. ClickHouse is a precision tool for heavy analytics. TimescaleDB is a Swiss Army knife for PostgreSQL-centric time-series.

Start with your query patterns. Write down the top 5 queries your system must support. Benchmark both databases against those exact queries. Ignore general benchmarks—they don't reflect your data.

Start building. Start measuring. The wrong choice costs months. The right choice costs nothing.

Nishaant Dixit

Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn.

Sources:

ClickHouse Benchmarks - https://clickhouse.com/benchmark/dbms/
TimescaleDB 2024 State of PostgreSQL Survey - https://www.timescale.com/blog/state-of-postgresql-2024/
Percona Database Performance Benchmarks 2025 - https://www.percona.com/blog/clickhouse-vs-timescaledb-performance-benchmarks-2025/
TimescaleDB Native Compression Documentation - https://docs.timescale.com/use-timescale/latest/compression/
Apache Kafka Ecosystem Report 2025 - https://kafka.apache.org/ecosystem
DataStax Database Reliability Study 2025 - https://www.datastax.com/blog/database-reliability-benchmarks-2025

Originally published at https://sivaro.in/articles/clickhouse-vs-timescaledb-the-real-performance-showdown.

ClickHouse as a PostgreSQL Alternative for Analytics

nishaant dixit — Thu, 07 May 2026 22:16:44 +0000

I spent three years convincing a client to move their analytics workload off PostgreSQL. They had 50GB of time-series data and queries that took 45 seconds. The CTO kept saying “PostgreSQL is good enough.”

It wasn’t.

After the migration, their core dashboard queries dropped to 200 milliseconds. That’s not a typo. 45 seconds to 0.2 seconds. The engineering team stopped fighting their database and started shipping features.

What is ClickHouse? It’s a column-oriented database built for real-time analytics on large datasets. Unlike PostgreSQL, which stores data row-by-row, ClickHouse stores data column-by-column. This architectural difference makes it 100-1000x faster for aggregation-heavy queries across billions of rows.

This guide covers when ClickHouse beats PostgreSQL, when it doesn’t, and the hard lessons I learned migrating production systems. No fluff. Just what works.

Most engineers think databases are interchangeable. They’re wrong.

PostgreSQL is a general-purpose OLTP database. It excels at transactional workloads—INSERT, UPDATE, DELETE, JOIN across small datasets. ClickHouse is an OLAP database designed for analytical queries—aggregations, filtering, and grouping across millions or billions of rows.

Here’s the fundamental difference:

Storage format matters more than you think.

PostgreSQL stores data row-by-row on disk. Every row contains all columns together. This is great for fetching a single customer record quickly. But for analytics queries that scan millions of rows and only need 3-5 columns, PostgreSQL reads all the data for every row, including columns you don’t need.

ClickHouse stores data column-by-column. Each column lives in its own file. An analytics query reading 3 columns from 100 million rows only touches those 3 files. The other 80 columns are never loaded into memory.

In my experience, this architectural difference alone accounts for 80% of the performance gap between PostgreSQL and ClickHouse for analytics workloads.

Recently, ClickHouse Cloud announced real-time streaming ingestion that matches Kafka speeds Source: ClickHouse Blog. This changes the game for teams processing event data at scale. You can now stream data directly into ClickHouse without middleware.

Terminology differences matter too:

Concept	PostgreSQL	ClickHouse
Storage	Row-oriented	Column-oriented
Primary Key	B-tree index	Sparse index (data skipping)
Compression	Default off	Default on (5-10x)
Query Type	OLTP	OLAP
Data Mutation	Fast (UPDATE/DELETE)	Slow (MERGE-based)

The hard truth: PostgreSQL cannot be “tuned” to match ClickHouse’s analytical performance. The storage engine is fundamentally different. You’re fighting physics.

The headline number isn’t marketing hype. According to ClickHouse benchmarks, columnar storage plus vectorized query execution gives 100-1000x speedup over row-oriented databases for typical analytical queries Source: ClickHouse Benchmarks.

I’ve verified this across four production systems. A GROUP BY query over 500 million rows that took 120 seconds in PostgreSQL runs in 0.4 seconds in ClickHouse.

ClickHouse applies column-specific compression algorithms by default. PostgreSQL doesn’t compress data unless you add extensions.

A 1TB PostgreSQL analytics table compressed to 120GB in ClickHouse. That’s an 88% reduction in storage costs. DoubleCloud’s 2024 benchmark of PostgreSQL vs ClickHouse confirmed 80% lower storage costs for similar analytical workloads Source: DoubleCloud Blog.

ClickHouse ingests 1-2 million rows per second per node. PostgreSQL struggles past 50,000 inserts per second without sharding.

For event-driven architectures, this matters. According to Altinity’s 2025 comparison, ClickHouse handles petabyte-scale analytical workloads that PostgreSQL cannot touch without complex horizontal scaling Source: Altinity Blog.

PostgreSQL materialized views require manual refresh and block reads during refresh. ClickHouse materialized views process incremental data as it arrives.

-- ClickHouse materialized view for real-time aggregation
CREATE MATERIALIZED VIEW daily_sales_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(sale_date)
ORDER BY (product_id, sale_date)
AS SELECT
    product_id,
    toDate(sale_date) AS sale_date,
    sum(amount) AS total_sales,
    count() AS num_sales
FROM sales
GROUP BY product_id, sale_date;

This view updates automatically as new sales data flows in. No cron jobs. No refresh triggers.

PostgreSQL uses a pull-based execution model. Each operator requests rows from the previous operator one at a time. This creates overhead from function calls and row-by-row processing.

ClickHouse uses a vectorized execution model. Operators process data in batches of 1024 or 4096 rows at a time. CPU caches are utilized efficiently. Modern CPU SIMD instructions process multiple values in a single instruction.

This is why ClickHouse hits 4-5 GB/second per core for simple aggregations. PostgreSQL hits 100-200 MB/second.

ClickHouse accepts data via HTTP, native TCP, or Kafka. The HTTP interface is the simplest:

cat data.csv | curl 'http://localhost:8123/?query=INSERT%20INTO%20analytics.events%20FORMAT%20CSV' \
  --data-binary @-

This processes 1M rows in under 2 seconds on modest hardware. The same volume via PostgreSQL COPY takes 15-30 seconds.

PostgreSQL table design focuses on normalization. ClickHouse table design focuses on query patterns:

CREATE TABLE analytics.events (
    event_time DateTime64(3),
    user_id UInt64,
    event_type LowCardinality(String),
    page_url String,
    session_duration UInt32,
    metadata JSON,
    -- Partitioning on time
    INDEX idx_page_url page_url TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_user_id user_id TYPE minmax GRANULARITY 4
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, toStartOfHour(event_time))
TTL event_time + INTERVAL 90 DAY DELETE;

Key differences from PostgreSQL:

PARTITION BY: Physically splits data by month. Queries filter by time only scan relevant partitions.
ORDER BY: Defines storage order and primary key. NOT the same as PostgreSQL ORDER BY.
TTL: Automatic data expiration. PostgreSQL requires external cron jobs.
LowCardinality: Optimizes strings with fewer than 10,000 unique values into dictionary encoding.

Here’s a real query pattern that kills PostgreSQL but runs instantly in ClickHouse:

-- Hourly web traffic with 95th percentile latency
SELECT
    toStartOfHour(event_time) AS hour,
    countIf(event_type = 'page_view') AS page_views,
    quantile(0.95)(session_duration) AS p95_duration,
    uniqExact(user_id) AS unique_users
FROM analytics.events
WHERE event_time >= now() - INTERVAL 7 DAY
  AND event_type IN ('page_view', 'click', 'submit')
GROUP BY hour
ORDER BY hour;

This query scans 10 billion rows in under 3 seconds in ClickHouse. PostgreSQL would take 3-5 minutes.

ClickHouse joins work differently. Avoid large joins. Denormalize where possible:

-- Non-join approach: Using dictionaries for dimension lookups
SELECT
    event_time,
    user_id,
    dictGetString('user_dimensions', 'user_name', user_id) AS user_name,
    count() OVER (PARTITION BY user_id) AS user_event_count
FROM analytics.events
WHERE event_time >= today()
LIMIT 100;

Dictionaries load entire dimension tables into RAM. This is faster than JOIN for typical analytics queries.

ClickHouse integrates directly with Kafka without external connectors:

CREATE TABLE analytics.events_kafka (
    event_time DateTime64(3),
    user_id UInt64,
    event_type String
) ENGINE = Kafka()
SETTINGS
    kafka_broker_list = 'broker1:9092',
    kafka_topic_list = 'user-events',
    kafka_group_name = 'clickhouse_consumer',
    kafka_format = 'JSONEachRow';

Data flows from Kafka into the Kafka engine table. Create a materialized view to move data into the MergeTree engine for querying. Zero middleware.

Partition on time. Always. ClickHouse works best when partitions are smaller than 1TB each.

I learned this the hard way when a client partitioned by week instead of month. Partition metadata overhead killed query performance. 50 partitions instead of 12. Each query scanned all partition metadata.

Best practice: Partition by month or week. Not day (too many partitions). Not year (too large).

The ORDER BY clause determines:

Storage order on disk
Primary key structure
Data skipping index behavior

Order columns by cardinality from lowest to highest. If you filter by event_type (10 values) and user_id (1M values), put event_type first.

ClickHouse defaults are good for most workloads. But you can optimize:

-- Custom compression for specific columns
CREATE TABLE analytics.events (
    event_time DateTime CODEC(ZSTD(3)),
    user_id UInt64 CODEC(LZ4HC(0)),
    payload String CODEC(ZSTD(5))
) ENGINE = MergeTree();

String columns: ZSTD(1-3) for write-heavy, ZSTD(5-10) for read-heavy
Numeric columns: LZ4HC for balanced performance
Timestamps: Delta or DoubleDelta for time-series
Avoid: Using compression for columns you never query

ClickHouse is CPU-bound, not IO-bound for most workloads. Invest in:

High clock speed CPUs (4.0GHz+)
32+ GB RAM per node
NVMe SSDs (HDDs work but latency suffers)
10Gbps+ networking for distributed queries

PostgreSQL vs ClickHouse hardware: PostgreSQL benefits more from faster disk (NVMe vs SATA). ClickHouse benefits more from faster CPU and RAM.

You need sub-second analytics on billions of rows. Dashboards, reporting, real-time monitoring.
Your workload is append-heavy. Event data, logs, metrics, time-series. Few updates or deletes.
You query large subsets of data. Scanning 10-100% of rows with GROUP BY, aggregation, filtering.
You need high compression. Saving storage costs on historical data.
Your data structure changes frequently. ClickHouse handles schema evolution better than PostgreSQL for column additions.

You need transactional integrity. ACID compliance with frequent UPDATE/DELETE operations.
Your queries fetch individual rows. “Get me user_id 123’s profile” — not “aggregate all users by region.”
You need complex JOINs between many tables. ClickHouse joins are poorly optimized.
Your dataset fits in memory. If total data < 50GB and queries are simple, PostgreSQL handles it fine.
You don’t want two databases. Some teams prefer a single system even if it’s suboptimal for analytics.

In my experience, the 10-second rule is useful: if your analytical query can return in under 10 seconds, PostgreSQL might suffice. Over 10 seconds, ClickHouse becomes necessary.

The 2025 Amplitude benchmark showed ClickHouse sustaining over 1 million writes per second at sub-second query latency — a capability PostgreSQL cannot match Source: Amplitude Blog.

ClickHouse lacks efficient UPDATE/DELETE. Use ALTER TABLE ... UPDATE but expect slow performance.

Workaround: Use ReplacingMergeTree engine with version columns:

CREATE TABLE analytics.events_final
ENGINE = ReplacingMergeTree(version)
ORDER BY (event_id)
AS SELECT * FROM analytics.events;

-- Deduplicate on read
SELECT * FROM analytics.events_final FINAL WHERE event_id = 123;

This mimics upsert behavior. It’s not true UPDATE semantics. Budget for this.

ClickHouse is greedy with RAM. A query scanning 100GB of uncompressed data may need 20GB RAM for intermediate results.

Fix: Use max_memory_usage setting per query:

SET max_memory_usage = 5000000000; -- 5GB limit

Monitor memory with system.query_log and system.processes tables.

Running ClickHouse on multiple nodes requires manual sharding or Replicated*MergeTree engines:

-- Distributed table across 3 nodes
CREATE TABLE analytics.events_distributed
ENGINE = Distributed('cluster_name', 'analytics', 'events', rand());

Distributed queries add network overhead. Some queries run slower than single-node. Test before scaling.

ClickHouse ALTER commands are not transactional. Adding a column works. Dropping a column blocks reads for large tables.

Process: Create new table, migrate data, rename. Same pattern as MySQL but more manual.

Recent 2026 ClickHouse feature: Cloud service now supports zero-downtime schema migrations with automatic background optimization Source: DoubleCloud Blog.

Q: Can ClickHouse replace PostgreSQL entirely?
No. ClickHouse is an OLAP database. It cannot handle transactional workloads with ACID guarantees. Use PostgreSQL for OLTP, ClickHouse for OLAP.

Q: Is ClickHouse faster than PostgreSQL for all queries?
No. PostgreSQL is faster for single-row lookups, point queries, and complex JOINs between normalized tables. ClickHouse excels at analytics on large datasets.

Q: Can I migrate from PostgreSQL to ClickHouse seamlessly?
Not seamlessly. SQL syntax differs. ClickHouse lacks PostgreSQL’s procedural language, triggers, and foreign keys. Plan a phased migration.

Q: Does ClickHouse support ACID transactions?
Limited. ClickHouse supports atomic INSERT but not multi-row transactions with rollback. For event data ingestion, this is acceptable.

Q: How much data can ClickHouse handle before needing sharding?
Single nodes handle 10-50TB compressed data efficiently. Beyond that, add nodes. ClickHouse scales horizontally, unlike PostgreSQL.

Q: Is ClickHouse good for real-time dashboards?
Excellent. Sub-second query latency on billions of rows. Many observability platforms use ClickHouse for exactly this purpose.

Q: Does ClickHouse work with existing PostgreSQL tools?
Many PostgreSQL BI tools (Tableau, Metabase, Looker) support ClickHouse via JDBC/ODBC drivers. Check compatibility before moving.

Q: What’s the learning curve for ClickHouse SQL?
Moderate. Basic SELECT, GROUP BY, WHERE are familiar. Partitioning, ORDER BY semantics, MergeTree engines require learning. Expect 2-4 weeks for proficiency.

PostgreSQL is a great database. For transactional workloads, it’s the correct choice. But for analytics on large datasets, ClickHouse is not an alternative—it’s a necessity.

The data doesn’t lie:

100-1000x faster aggregation queries
5-10x better compression
Real-time ingestion at millions of rows per second
Sub-second queries on billions of rows

Next step: Export your slowest PostgreSQL analytical query. Run it in ClickHouse. Time the difference. Let the numbers speak.

Your team’s productivity depends on tools that match the workload. Don’t fight a row-oriented database for column-oriented problems.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn.

ClickHouse Blog — Real-time streaming ingestion announcement: https://clickhouse.com/blog/clickhouse-cloud-now-supports-real-time-streaming?cp=ss_blog
DoubleCloud Blog — PostgreSQL vs ClickHouse benchmark for time-series data (2024): https://double.cloud/blog/posts/2024/11/postgresql-vs-clickhouse-benchmark-for-time-series-data/
Altinity Blog — ClickHouse vs PostgreSQL Comprehensive Guide (2025): https://altinity.com/blog/clickhouse-vs-postgresql-a-comprehensive-guide-for-2025
Amplitude Blog — ClickHouse metrics at 1M writes per second (2025): https://amplitude.com/blog/clickhouse-metrics-2025
ClickHouse Documentation — Performance benchmarks: https://clickhouse.com/docs/en/operations/performance-test

Originally published at https://sivaro.in/articles/clickhouse-as-a-postgresql-alternative-for-analytics.

ClickHouse Cluster Setup Guide

nishaant dixit — Thu, 07 May 2026 21:53:05 +0000

I spent three nights debugging a sharded ClickHouse cluster that kept losing data. The logs were useless. Zookeeper was throwing cryptic errors. My team was ready to abandon the whole thing.

Turns out, we had the replication config wrong. One missing parameter. That's it.

ClickHouse is fast. Blazingly fast. But a misconfigured cluster? It's a nightmare.

This guide covers exactly how to set up a ClickHouse cluster from scratch. The hard truths. The trade-offs. The configs that actually work.

What is a ClickHouse cluster? It's a distributed system where data is sharded across multiple nodes and replicated for fault tolerance. Each node stores a subset of data. Queries run in parallel across all nodes. Results merge automatically. According to ClickHouse Docs, a production cluster typically has 3-10 shards with 2-3 replicas each.

Here's what you'll learn: The exact architecture decisions I've made building clusters processing 200K events/second. The configs that break silently. And the testing steps most tutorials ignore.

Most people think ClickHouse clustering is like any other distributed database. Drop some configs. Run a few commands. Done.

They're wrong.

ClickHouse has a unique architecture. SQL-based. Columnar storage. Shared-nothing design. You must understand three layers.

The storage layer. Each ClickHouse server stores data locally on disk. No shared storage. If a node dies, its data is gone unless replicated. This is by design. Local storage gives you insane read speeds. But it means you need replication.

The coordination layer. This is where Zookeeper or ClickHouse Keeper comes in. It tracks which nodes are alive, which shards have what data, and coordinates replication. According to Altinity's guide, Zookeeper is the most common setup. But I've found it's also the biggest pain point. It requires its own cluster. Minimum 3 nodes.

The query layer. Queries hit any node. That node becomes the coordinator. It fans out queries to all relevant shards, waits for partial results, then merges. The client sees one result set.

Here's what I learned the hard way: You can't mix sharding strategies. Either use consistent hashing or round-robin. Pick one. Stick with it.

In my experience, round-robin is simpler. Consistent hashing gives you better resharding capabilities. But both work if you plan ahead.

Why bother with a cluster? Single-node ClickHouse is already fast.

Parallel query execution. A 10-node cluster isn't 10x faster. It's more like 8x. Network overhead and merge operations cost something. But that 8x matters when you're scanning billions of rows. According to SeveralNines, properly sharded clusters see 5-7x improvement in analytical queries.

Fault tolerance. This is the real reason. Data replication means you survive node failures. No downtime. No data loss. I've seen clusters lose two nodes simultaneously and keep serving queries. You can't do that with a single instance.

Storage scaling. ClickHouse compresses data aggressively. But even compressed, a petabyte of data doesn't fit on one machine. Sharding spreads storage across nodes. Each node handles its share.

I've found that the hardest benefit to capture is cost efficiency. A cluster of smaller nodes is often cheaper than one massive server. You pay for commodity hardware instead of enterprise pricing. And you can scale horizontally as you grow.

The problem isn't the benefits. It's the complexity. Everyone wants high availability. Nobody wants to debug Zookeeper at 3 AM.

Let me show you the exact setup I use. I'll walk through every config file and command.

First, install ClickHouse on all nodes. According to ClickHouse Installation Docs, the process is straightforward. On Ubuntu:

sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client

Standard stuff. The real work comes next.

Config file for each node. Every node needs a config.xml with cluster definitions. Here's a minimal example for a 2-shard, 2-replica setup:

<yandex>
    <remote_servers>
        <my_cluster>
            <shard>
                <replica>
                    <host>clickhouse-01</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>clickhouse-02</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>clickhouse-03</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>clickhouse-04</host>
                    <port>9000</port>
                </replica>
            </shard>
        </my_cluster>
    </remote_servers>
    <zookeeper>
        <node>
            <host>zookeeper-01</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper-02</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper-03</host>
            <port>2181</port>
        </node>
    </zookeeper>
    <macros>
        <shard>01</shard>
        <replica>clickhouse-01</replica>
    </macros>
</yandex>

The <macros> section is critical. Each node must have unique shard and replica values. Without this, replicated tables won't work. I've seen production clusters fail because someone copied the same config to all nodes. Don't be that person.

Creating distributed tables. After configs are in place, create the tables:

-- Create local table on each shard
CREATE TABLE events_local ON CLUSTER my_cluster (
    event_id UInt64,
    timestamp DateTime,
    user_id UInt32,
    event_type String
) ENGINE = ReplicatedMergeTree(
    '/clickhouse/my_cluster/tables/{shard}/events',
    '{replica}'
)
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, event_id);

-- Create distributed view
CREATE TABLE events_distributed ON CLUSTER my_cluster
AS events_local
ENGINE = Distributed(
    my_cluster,
    default,
    events_local,
    rand()
);

Notice the ON CLUSTER my_cluster syntax. It tells ClickHouse to run this command on every node. Much better than running SQL on each machine manually. According to Abhinav Mallick's guide, this is the recommended approach for production deployments.

The rand() in the Distributed engine determines sharding. Random distribution works for most use cases. If you need consistent routing by user_id, use cityHash64(user_id) instead.

Common pitfall. I've seen people forget that Distributed tables don't store data. They're views. Data lives in the local ReplicatedMergeTree tables. Query the distributed table. Insert into the distributed table. The engine handles routing.

After building clusters for fintech, adtech, and SaaS companies, here's what works.

Use ClickHouse Keeper instead of Zookeeper. Zookeeper is a separate dependency. Another thing to monitor. Another failure domain. ClickHouse Keeper is built into ClickHouse. Same protocol. No separate deployment. According to ClickHouse Operator documentation, Keeper handles all coordination needs.

Plan your shard count before data loads. Changing shards later requires data redistribution. That means downtime or complex migration scripts. I've found that starting with 4-8 shards works for most workloads. You can always add nodes within existing shards for replication.

Monitor merge performance. ClickHouse merges data in the background. Too many partitions means too many merges. Your cluster slows down. Keep partition sizes between 100GB and 200GB. Partition by month or week, not by day.

Use max_replication_delay wisely. Set it to 60 seconds. If a replica falls behind, queries stop routing to it. Prevents stale data from being served. But don't set it too low. Network hiccups will cause unnecessary failovers.

The hard truth about ClickHouse clusters: They're not magical. They require planning. A badly configured cluster is slower than a well-tuned single node. I've seen it happen.

Should you use a cluster? Not always.

Single node is better when: Your data fits on one machine. Your queries are fast enough. You don't need HA. A single ClickHouse instance handles 10-50 TB compressed data easily. According to Rakesh Therani's guide, most teams don't need clustering until they exceed 100 TB.

Cluster is necessary when: You need HA and failover. Your data exceeds single-node capacity. Your queries need parallel execution for sub-second responses.

The trade-off is real. Clusters add complexity. Zookeeper/Keeper monitoring. Network latency. Merge coordination. Query routing. Each layer introduces failure modes.

In my experience, start with a single node. Add replication first. Then sharding. Incremental complexity is manageable. Jumping straight to a 10-node cluster? You'll spend weeks debugging.

Here's my decision framework:

Less than 10 TB? Single node with replication.
10-50 TB? Single node with replication and horizontal partitioning by time.
50-200 TB? 2-4 shards with 2 replicas each.
200+ TB? 4-8 shards with 2-3 replicas each.

Problems will happen. Here's how to fix the common ones.

Zookeeper session expired. Your cluster stops writing. Queries return errors. Restart Zookeeper nodes one at a time. Then restart ClickHouse nodes. Check session timeout settings. Default is 30 seconds. Increase it to 60 seconds. According to Cedrick Chee's cluster setup, this is the most common production issue.

Replication lag. One replica is behind. Data is inconsistent. Check system.replicas table. Look at absolute_delay column. High lag usually means the replica is overloaded. Add more resources or reduce query load on that node.

Merge fails with "too many parts". Your insert rate exceeds merge capacity. Partition more aggressively. Or reduce insert batch size. I've found that batch sizes of 100K-500K rows work well. Larger batches increase merge pressure.

Data skew. Some shards have more data than others. Queries slow down because one shard is the bottleneck. Re-evaluate your sharding key. Use cityHash64(user_id) instead of rand(). Consistent hashing distributes data more evenly.

Node failure. One node goes down. Replicated tables survive. Distributed queries fail if data isn't available on remaining replicas. Set internal_replication=true in your cluster config. This tells ClickHouse to handle replication automatically. Without it, you write data twice. Data corruption follows.

The biggest lesson I've learned: Test failure scenarios before production. Kill a node. Watch replication catch up. Simulate network partitions. Most teams skip this. They learn the hard way during an outage.

How many nodes do I need for a ClickHouse cluster?
Minimum 2 for replication. Minimum 4 for sharding with replication. Most production clusters have 6-12 nodes. According to Instaclustr's tutorial, 3 shards with 2 replicas each is the sweet spot.

Can I add nodes to an existing ClickHouse cluster?
Yes. Add them as new replicas to existing shards. But you cannot add new shards without redistributing data. Plan shard count upfront.

What's the difference between replication and sharding?
Replication copies data across nodes for redundancy. Sharding splits data across nodes for scale. You need both for a production cluster.

Does ClickHouse support automatic failover?
Yes, when using Zookeeper or ClickHouse Keeper. If a node fails, queries route to replicas. Data is not lost. No manual intervention needed.

What sharding key should I use?
Use a column with high cardinality and even distribution. user_id, session_id, or order_id are good candidates. Avoid columns with skewed distributions like status codes.

How long does data replication take?
Depends on data volume and network speed. 100 GB typically replicates in 5-10 minutes on a 10 Gbps network. Initial sync takes longer. Incremental replication is near real-time.

Setting up a ClickHouse cluster isn't magic. It's engineering. Plan your shard count. Configure replication carefully. Test failure scenarios. Monitor merge performance.

Start with a single node. Add replication. Then sharding. Incremental wins.

The three biggest mistakes I see: Skipping replication. Using wrong macros config. Forgetting to test failover.

Here's what to do next:

Install ClickHouse on 4 nodes
Configure Zookeeper or Keeper
Set up replication configs
Create distributed tables
Insert test data
Kill a node. Verify failover works.

Your cluster will thank you.

Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn

Originally published at https://sivaro.in/articles/clickhouse-cluster-setup-guide.

DEV Community: nishaant dixit

AI Code Review Implementation: What Actually Works (And What Doesn't)

Custom AI Agent Development: Build Systems That Actually Work

Production AI Agent Implementation: The Hard Truth Nobody Tells You

ClickHouse Consulting for Startups: What Nobody Tells You About Scaling Analytics

ClickHouse Managed Service Pricing: What You Actually Need to Know

- Workload: 50K events/sec, 500GB data, 10 concurrent queriers

- Workload: 200K events/sec, 2TB data, 5 dashboard users

- Workload: 1K events/sec, 100GB data, 50 analysts running complex queries

ClickHouse Migration from Redshift: What I Learned Moving 20TB of Data

ClickHouse vs PostgreSQL Real-Time: What I Learned Building Systems at Scale

- You need ACID transactions

- Workload is OLAP (few large queries)

- You need real-time transactions AND real-time analytics

No. ClickHouse lacks transaction support, row-level locks, foreign keys, and has limited UPDATE/DELETE capabilities. Use ClickHouse for analytics and reporting. Keep PostgreSQL for your application database.

Yes, significantly. A single-row lookup by primary key in PostgreSQL takes microseconds. The same query in ClickHouse takes milliseconds. ClickHouse optimizes for scans, not point lookups.

Use the ClickHouse Kafka engine or PostgreSQL WAL streaming. Buffer data in memory and flush every 1-3 seconds. Avoid row-by-row inserts. Batch inserts of 10K-100K rows at a time.

ClickHouse scales to petabytes. Companies use it for 100TB+ datasets. The performance degradation is linear, not exponential. PostgreSQL starts struggling beyond 1TB for analytical workloads.

Yes, but performance varies. Joins on small tables (<1M rows) are fast. Large joins require careful optimization or denormalization. PostgreSQL handles joins more gracefully.

Not directly. Use a middleware like PeerDB or Kafka Connect. PostgreSQL logical replication streams changes. ClickHouse consumes them via its Kafka engine or HTTP interface.

ClickHouse benefits from more RAM (32GB minimum, 128GB recommended). PostgreSQL works well on 16GB. Both benefit from NVMe SSDs. ClickHouse CPU usage is higher due to vectorized execution.

You'll likely outgrow PostgreSQL above 100GB. Use materialized views and careful indexing. At 500GB+, ClickHouse becomes 10-100x faster for dashboard queries. I've seen this happen repeatedly.

ClickHouse Implementation Consulting: What Your Engineers Won't Tell You

ClickHouse Managed Service India: The Hard Truth About Scalable Analytics

The best managed services don’t just run your cluster. They tune it. Things like:

A managed service in India with 5ms latency is worth 2x more than a global provider with 150ms. Test with ping and a simple SELECT 1. If it’s above 20ms, walk away.

India has high-cardinality time-series data (think UPI transactions, IoT sensors, ecommerce clicks). Partition by toYYYYMMDD() for daily data or toYYYYMM() for monthly. This reduces query time by 80% because ClickHouse skips whole partitions.

Merges are silent killers. I’ve seen a 16-node cluster crawl because merges backed up. Use this query on managed services:

ClickHouse is columnar. Adding too many indexes slows inserts and bloats memory. I typically only put indexes on event_date, event_type, and user_id for analytics. Everything else stays in the raw columns.

- Your team has less than 2 dedicated DBAs

- You have strict data locality requirements that no provider meets (rare)

Most teams I see start self-managed, then spend 6 months migrating to managed when they hit scale. The migration takes 2-3 weeks of downtime. I’ve found that starting with a managed service from day one saves 4 months of engineering time.

Indian internet connectivity can be unreliable, especially for ISPs outside Tier 1 cities. If your ClickHouse service relies on a single connection, you’ll see dropped queries.

Managed services priced in USD are expensive when INR weakens. Look for providers that offer local pricing or commit to fixed INR rates for 12 months.

Altinity provides a solid global service but their Indian POPs are limited. I recommend evaluating DoubleCloud or ClickHouse Cloud (they have a Mumbai region). Always test with your workload first.

Typical pricing is ₹50,000-₹2,00,000 per month for a 3-node cluster with 500GB compressed data. Higher for high-throughput ingestion (above 50 MB/s).

Yes, using freezebackup/restore or the remote() table function. Expect a downtime window of 15-60 minutes for final sync. For zero downtime, use double writes to both services during migration.

Yes. Most providers support Kafka, RabbitMQ, or direct streaming. Latency is typically under 5 seconds from ingestion to queryable data.

Depends on the provider. If the provider stores data only in Indian data centers and offers encrypted backups, you can meet RBI requirements. Always get a GSR (General Security Recommendation) from your provider.

Use system.query_log as shown in Example 3. If you can’t access system tables, ask your provider for query profiling. Most managed services expose this via a web console.

Choose providers with multi-AZ redundancy. Most offer an SLA of 99.95% uptime. Have a backup plan: maintain a read replica on a different provider or a self-managed fallback for critical queries.

Consider a single-node cluster for development. For production, start with 2 nodes (1 primary, 1 replica). Scale only when CPU consistently exceeds 70%.

ClickHouse vs TimescaleDB: The Real Performance Showdown for Time-Series

ClickHouse as a PostgreSQL Alternative for Analytics

ClickHouse Cluster Setup Guide

A managed service in India with 5ms latency is worth 2x more than a global provider with 150ms. Test with `ping` and a simple `SELECT 1`. If it’s above 20ms, walk away.

India has high-cardinality time-series data (think UPI transactions, IoT sensors, ecommerce clicks). Partition by `toYYYYMMDD()` for daily data or `toYYYYMM()` for monthly. This reduces query time by 80% because ClickHouse skips whole partitions.

ClickHouse is columnar. Adding too many indexes slows inserts and bloats memory. I typically only put indexes on `event_date`, `event_type`, and `user_id` for analytics. Everything else stays in the raw columns.

Yes, using freezebackup/restore or the `remote()` table function. Expect a downtime window of 15-60 minutes for final sync. For zero downtime, use double writes to both services during migration.

Use `system.query_log` as shown in Example 3. If you can’t access system tables, ask your provider for query profiling. Most managed services expose this via a web console.