Aditya Gread

Posted on Jun 13 • Edited on Jun 20

Scaling to 10M+ Daily Records: Engineering a High-Throughput Enterprise-scale Audit Trail with Go, Kafka and ClickHouse

#go #distributedsystems #kafka #database

Section 1: Introduction and the Core Architectural Challenge

When designing audit and compliance logging systems for enterprise-scale platforms, the baseline engineering expectations are unforgiving. Security events, access logs, and policy compliance records are not things an organization can afford to lose. A single dropped log can mean a failed compliance audit or a critical blind spot during a post-incident security analysis.

Recently, I tackled the challenge of building a high-volume activity logging service designed to securely process over 10 million records on a daily basis. While an average of 10 million events a day translates to roughly 115 records per second, production traffic is rarely a flat, predictable line. Real-world systems run on sudden spikes, system-wide access storms, or high-burst traffic anomalies where throughput can instantly jump to thousands of events per second.

To address this, I established a clear, dual architectural mandate:

0% Data Loss Durability: Under no condition, whether network partitions, high traffic bursts, or downstream database maintenance, could the microservice silently drop a single security log.
Low-Latency Analytical Queries: Compliance dashboards and security analysts required complex aggregations and cross-tenant lookups over historical records to execute with sub-150ms query response times.

Why the Traditional Database Stack Fails

In a standard web application, the default instinct is to write log records directly into an existing relational database like MySQL or PostgreSQL, or a general-use NoSQL database like MongoDB. However, at a production scale of tens of millions of rows, this approach falls apart due to two fundamental architectural bottlenecks:

Write Amplification and Row Contention: Relational engines use B-Tree indexes to keep data sorted for quick lookups. Every time a new log is inserted, the database must not only write the row to disk but also dynamically update every associated index structure. As the table grows into hundreds of millions of rows, these index updates require frequent, expensive random disk I/O operations and row-level locks, causing incoming write pipelines to stall completely.
The Inherent Conflict Between Reads and Writes: Traditional transactional databases store data row-by-row on disk. This is fantastic for transactional writes, but it is highly inefficient for heavy analytical processing. If a security engineer runs a heavy reporting query to calculate aggregate log patterns across a month's worth of data, the database engine must scan entire rows into memory just to read a few specific columns. This starves the database of CPU and disk bandwidth, bottlenecking live write ingestion.

Decoupling Ingestion from Analytics

To resolve this bottleneck, the core design principle I implemented is the complete decoupling of the synchronous write layer from the asynchronous analytical storage layer.

By inserting a distributed event streaming buffer between the Go microservice and the analytical database, I ensured that the system can absorb massive traffic bursts instantly, while the database consumes the data cleanly in structured batches.

This structural separation ensures that even if the analytical database goes down entirely for emergency maintenance, the ingestion microservice remains fully operational, safely capturing every transaction into a fault-tolerant message queue.

Section 2: Data Governance, Serialization, and Edge-Batching (Go, Avro, and Schema Registry)

At a volume of tens of millions of records, the formatting of data on the wire dictates both network costs and CPU utilization. The industry-standard default for web microservices is plaintext JSON. While JSON is highly human-readable and trivial to implement, I recognized it as a severe architectural bottleneck at scale.

Plaintext JSON requires repetitive key-value structures for every single row. If an audit log contains fifteen fields, those fifteen field names are serialized and transmitted over the network millions of times a day, resulting in significant data bloat. Furthermore, parsing plaintext strings into application memory requires intensive CPU cycle allocation for reflection and string manipulation.

To overcome this, I enforced a strict data governance model using Apache Avro for binary serialization, a Centralized Schema Registry, and an optimized edge-batching layer built natively into the Go producer application.

Schema Management and In-Memory Go Optimization

Apache Avro compiles data into a compact binary format that strips away all field names, leaving only the raw data bytes. To decode these bytes downstream, the consumer must know the exact schema version used to write them. I integrated a Centralized Schema Registry to handle this dependency seamlessly.

To eliminate the latency overhead of making an HTTP network request to the Schema Registry for every single incoming log event, I implemented an in-memory optimization strategy within the Go microservice. At application startup, the Go service executes a synchronous call to the Schema Registry to fetch the latest schema definitions for all six core logging topics. These schema structures are compiled and cached directly in the application's RAM.

When a live audit event hits the ingestion endpoint, the Go application passes the payload through the pre-compiled in-memory Avro schema. The data converts into raw binary instantly, and the 5-byte Confluent wire-format header containing the specific Schema ID is prepended to the payload. This optimization keeps producer-side serialization latency down to the microsecond level.

Go Edge-Batching: Protecting Kafka from Request Exhaustion

Once the data is serialized into a compact binary format, I design the system to hold off on instantly firing it over the network to Kafka. Doing so would exhaust the Go network socket pool and flood the Kafka brokers with millions of small, fragmented TCP requests. Instead, I built an edge-batching mechanism using native Go primitives like channels and worker pools.

When an Avro-encoded message is ready, it is dispatched into a buffered Go channel. A pool of background worker goroutines monitors this channel, accumulating individual messages into an in-memory slice.

The edge-batcher relies on a dual-trigger strategy to flush data to Kafka:

Size Threshold: The batch immediately flushes if the accumulated messages reach a specific memory threshold or a record count of 5,000 records.
Time Threshold: A background time ticker triggers a mandatory flush every few hundred milliseconds, even if the record count has not been met. This guarantees that log data never sits stale in application memory, keeping real-time streaming constraints intact.

By grouping individual events into highly compressed binary micro-batches right at the edge of the microservice, I maximize network packet utilization and ensure that the producer communicates with the Kafka brokers with optimal efficiency.

Asynchronous Batching on the Consumer Side

I mirror this batching philosophy on the consumer side. Apache Kafka is built to handle sequential block disk reads and writes exceptionally well. The pipeline completely avoids single-row processing loops where a consumer pulls one record, writes it to the database, and commits the offset.

Instead, the analytical database layer leverages native streaming connectors designed to pull massive chunks of records from Kafka topics in parallel. The database engine buffers these streams, performing heavy bulk-inserts that align perfectly with the message batches produced by the Go server. This creates an end-to-end, stream-batched pipeline that reduces network friction and ensures systemic durability.

Section 3: Hardening the Ingestion Pipeline (Sarama Go Producer & KRaft)

A distributed messaging pipeline is only as reliable as its weakest link. If a network partition occurs between the microservice and the message broker, or if a hosting node drops out unexpectedly, standard default cluster setups will silently leak data. To build a system that guarantees 0% data loss under high-burst traffic anomalies, I had to tightly coordinate and harden both the physical topography of the Apache Kafka cluster and the configuration of the Go producer client.

In this architecture, I moved completely away from legacy ZooKeeper deployments, choosing instead a modern Kafka Raft (KRaft) consensus model to manage cluster metadata natively. I then paired this cluster with a highly tuned Go implementation using the Sarama library.

The Cluster Topography and Partitioning Strategy

Managing data state across separate infrastructure nodes requires a deliberate layout. I structured the message broker layer with a strict, active-active topology designed for immediate failover.

3-Node Co-located KRaft Cluster: The infrastructure consists of 3 dedicated Kafka brokers. Running in KRaft mode means these nodes act simultaneously as brokers and metadata controllers. Metadata is managed as an internal raft log, eliminating the synchronization bottlenecks and external JVM memory footprint associated with ZooKeeper. If a leader broker fails, a new controller election happens within milliseconds.
Topic Architecture: The microservice publishes data across 6 core logging topics, with each topic segregated by log classification or source category.
3x Replication Factor: I explicitly provisioned every single topic with a replication factor of 3. For every log packet written, one primary broker holds the partition leader, while the remaining two brokers hold matching partition followers. The cluster can tolerate the absolute catastrophic failure of a physical node without any data exposure or service disruption.
3-Partition Parallelism: I divided each topic into 3 distinct partitions. This matches the broker count, distributing the layout so that each physical node acts as a leader for one partition per topic. This prevents a single broker from bearing the brunt of the disk I/O, allowing parallel write streams across the machines.

Deep-Dive: Hardening the Sarama Go Producer Configuration

In Golang, the Sarama client library provides low-level control over Kafka communication protocol behavior. To enforce a true zero-data-loss mandate while maximizing the edge-batching capabilities, I explicitly tuned the sarama.Config initialization away from default values.

The configuration layout is defined through three core pillars: Durability, Idempotency, and Producer-Side Batching.

1. Strict Durability Guarantees

config := sarama.NewConfig() config.Producer.RequiredAcks = sarama.WaitForAll

By default, a producer might use a weaker acknowledgment setting, which returns a success status as soon as the partition leader writes the record to its local memory cache. If that broker fails before replicating the data to its peers, that log is lost forever. Setting sarama.WaitForAll forces an atomic write transaction: the Go producer blocks and waits until the leader broker and all in-sync replicas (ISRs) have successfully committed the batch to their write-ahead logs.

2. Idempotency and Ordering Safeguards

config.Producer.Idempotent = true config.Net.MaxOpenRequests = 1

Enabling Idempotent = true is a critical mechanism for zero-loss integrity. If a network glitch occurs exactly after Kafka writes a batch of data but right before it can send the success acknowledgment back to the Go microservice, the application will naturally attempt a retry. Without idempotency, this results in duplicate data downstream.

With idempotency active, Kafka assigns a unique producer ID and tracking sequence number to every batch sent by the Go client. If Kafka detects a sequence number it has already committed, it discards the duplicate write while returning a success status to Go. To ensure idempotency behaves correctly under the hood, I clamped MaxOpenRequests to 1, guaranteeing that the pipeline preserves absolute chronological sorting order during in-flight retries.

3. Optimizing the Sarama Batching Layer

To work hand-in-hand with the edge-batching goroutine architecture, I explicitly configured Sarama's internal flush triggers to align with the high-throughput requirements.

config.Producer.Flush.Messages = 5000 config.Producer.Flush.Frequency = 200 * time.Millisecond config.Producer.Flush.Bytes = 1024 * 1024 // 1 MB Buffer size config.Producer.Return.Successes = true config.Producer.Return.Errors = true

These parameters ensure that the Sarama client acts as a secondary efficiency gate. Rather than flooding the network with individual requests, Sarama groups incoming messages. It will hold data in memory and execute a high-speed sequential socket flush only when 5,000 logs have accumulated, 1 megabyte of binary data is reached, or 200 milliseconds have elapsed, whichever boundary condition is satisfied first.

4. Resiliency and Exponential Backoff Retries

config.Producer.Retry.Max = 15 config.Producer.Retry.Backoff = 100 * time.Millisecond

If a broker undergoes a brief network hiccup or partition leader rebalancing, I must prevent the producer from instantly giving up and throwing a critical error. I configured the application to retry up to 15 times, introducing a structural sleep buffer between retries that scales dynamically. This allows transient infrastructural blips to self-heal transparently in the background, keeping the data pipeline active and uninterrupted.

Section 4: ClickHouse Architecture for High-Volume Writes (Kafka Engine and Keeper)

Choosing the right database for an analytical pipeline requires understanding the physical trade-offs of how data hits the disk. ClickHouse is an incredibly fast, column-oriented OLAP database capable of scanning billions of rows in milliseconds. However, it achieves this performance through a highly specialized storage design that introduces a major architectural challenge when handling streaming data: the "Too Many Parts" problem.

To scale the storage layer to support tens of millions of rows cleanly, I had to design an ingestion pipeline that respects ClickHouse's structural boundaries while removing any reliance on external infrastructure like ZooKeeper. I achieved this by pairing ClickHouse’s native Kafka Engine with a distributed replication topology managed entirely by ClickHouse Keeper.

Mitigating the "Too Many Parts" Bottleneck

ClickHouse stores data in immutable directory structures on disk called "parts." When you insert data, ClickHouse creates a new part, and a background merge process continuously combines these smaller parts into larger, optimized ones.

If you execute small, frequent, single-row inserts (such as writing individual log events as they happen), ClickHouse will generate parts faster than the background threads can merge them. This eventually triggers a protective database panic: Too many parts in all data parts in table, which instantly stalls all incoming writes.

To completely bypass this constraint, I avoided writing a custom consumer middleware in Go to bridge Kafka and ClickHouse. Instead, I let ClickHouse manage its own batching using its native Kafka Engine table modifier.

I set up two of the three nodes in the ClickHouse cluster as active ingestion servers. I split the workload evenly between them, routing three of the six Kafka topics to the first node and the remaining three topics to the second node.

The Kafka Engine table functions as a persistent, streaming consumer group. Rather than writing individual entries, it polls Kafka and pulls data in large blocks. It buffers these blocks in memory and flushes them to disk only when it satisfies an optimal boundary condition, such as hitting 10,000 records or waiting for a specific time threshold to elapse.

Piping Streams via Materialized Views

Because a Kafka Engine table behaves like a shifting stream, it cannot be queried directly for analytics. When a message is read, the offset advances and the data vanishes from the engine's perspective. To store the logs permanently, I used a design pattern combining a streaming engine table, a permanent storage table, and an active Materialized View acting as the glue between them.

First, I defined the source streaming interface table using the AvroConfluent format parameters discussed in Section 2:

CREATE TABLE default.kafka_audit_stream
(
    event_id String,
    timestamp Int64,
    actor_id String,
    action_type String,
    source_ip String,
    status String
)
ENGINE = Kafka
SETTINGS 
    kafka_broker_list = 'kafka-1:9092,kafka-2:9092,kafka-3:9092',
    kafka_topic_list = 'security-audit-logs',
    kafka_group_name = 'clickhouse-ingest-group',
    kafka_format = 'AvroConfluent',
    format_avro_schema_registry_url = 'http://schema-registry:8081';

Next, I created the destination table where the data resides permanently. I chose the ReplicatedReplacingMergeTree engine, which handles data deduplication automatically based on a primary sorting key during background disk merges:

CREATE TABLE default.replicated_audit_logs
(
    event_id String,
    event_date Date,
    timestamp Int64,
    actor_id String,
    action_type String,
    source_ip String,
    status String
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/replicated_audit_logs', '{replica}')
PARTITION BY toYYYYMM(event_date)
PRIMARY KEY (action_type, event_date)
ORDER BY (action_type, event_date, timestamp, event_id);

Finally, I bound the two structures together with a MATERIALIZED VIEW. The Materialized View acts as a reactive database trigger. The moment the Kafka Engine table finishes pooling a block of messages, the Materialized View wakes up, transforms the data types if necessary, and writes the batch into the permanent columnar table:

CREATE MATERIALIZED VIEW default.mv_kafka_to_audit_logs TO default.replicated_audit_logs AS
SELECT 
    event_id,
    toDate(toDateTime(timestamp)) AS event_date,
    timestamp,
    actor_id,
    action_type,
    source_ip,
    status
FROM default.kafka_audit_stream;

Multi-Node Coordination via ClickHouse Keeper

To manage data replication and cluster synchronization without the operational overhead of ZooKeeper, I implemented ClickHouse Keeper. Keeper is an internal coordination system written in C++ that executes the Raft consensus protocol natively within the ClickHouse binary.

I deployed ClickHouse Keeper across all three nodes in the cluster. This native coordination layer manages distributed DDL queries and handles replication logs with a minimal CPU and memory footprint compared to a separate Java-based ZooKeeper stack.

By declaring the destination table as a ReplicatedReplacingMergeTree, I established a resilient, circular replication topology across the entire three-node cluster. When Ingestion Node 1 processes a micro-batch from its Kafka topics and commits the blocks to disk, ClickHouse Keeper records the transaction and automatically coordinates the asynchronous replication of those data parts to Node 2 and Node 3 in the background.

This design means that while individual nodes specialize in handling specific ingestion streams to balance the write workload, the underlying consensus layer ensures that every node in the cluster eventually holds a matching copy of the entire analytical dataset.

Section 5: High Availability Reads and Smart Routing (chproxy)

In a distributed cluster where data eventually replicates across all instances, any individual node can technically serve analytical queries. However, running heavy, user-facing reporting queries on the exact same nodes executing high-speed, continuous batch insertions from Kafka creates severe hardware resource contention. CPU cycles, RAM, and disk I/O are finite resources. If an application executes a complex monthly aggregate query on a node currently flushing a massive, compressed Avro batch, query latencies will spike, or worse, the ingestion pipeline will begin to lag.

To enforce a strict separation of concerns at the compute layer while completely eliminating any single point of failure for our dashboards, I integrated chproxy. This is a dedicated HTTP proxy and load balancer designed specifically to handle traffic routing, user queuing, and caching for ClickHouse clusters.

Implementing Sticky Read Isolation

Instead of treating all three database servers as identical targets in a generic round-robin load-balancing rotation, I designed a configuration that splits the cluster into distinct functional profiles based on priority routing.

I designated Ingestion Node 1 and Ingestion Node 2 to handle the heavy lifting of streaming writes from Kafka. I then dedicated Node 3 primarily to executing user-facing analytical queries. By leveraging chproxy's native user routing and node routing configurations, I achieved total isolation of the query traffic under normal operating conditions.

To implement this architecture seamlessly without changing application-side query logic, I configured the routing rules inside a centralized chproxy.yaml file:

# chproxy.yaml - Operational Topology Configuration
clusters:
  - name: "production_logging_cluster"
    nodes:
      # Node 3 is assigned the highest priority (1) for analytics reading
      - { url: "http://clickhouse-node-03:8123", priority: 1 }

      # Nodes 1 and 2 are assigned priority 2, acting strictly as read failovers
      - { url: "http://clickhouse-node-01:8123", priority: 2 }
      - { url: "http://clickhouse-node-02:8123", priority: 2 }

users:
  - name: "reporting_service"
    password: "secure_api_password"
    to_cluster: "production_logging_cluster"
    to_user: "default"

    # Enforce traffic constraints inside the proxy layer
    max_execution_time: 30s
    max_concurrent_queries: 32

The Failover Mechanics: Maintaining 100% Query Uptime

This proxy configuration achieves two critical engineering objectives simultaneously:

The Steady State (Normal Operations): When the Go Reporting API requests data using the reporting_service credentials, chproxy evaluates the cluster priorities. It sees that Node 3 holds priority: 1, meaning it handles 100% of the analytical query traffic. Nodes 1 and 2 preserve all of their hardware resources exclusively for streaming Kafka logs into disk parts. This isolation ensures that dashboard query response times consistently hover around the 150 ms mark, unbothered by write amplification.
The Failover State (High Availability): If Node 3 undergoes routine OS patches, suffers a hardware failure, or drops offline due to a network blip, chproxy instantly detects the health-check failure. Within milliseconds, it marks Node 3 as unhealthy and dynamically shifts all incoming reporting queries to the priority: 2 servers.

Because ClickHouse Keeper maintains continuous circular asynchronous replication across the entire cluster, Nodes 1 and 2 can serve the exact same analytical data seamlessly. The corporate dashboards remain online without a single second of downtime, and the system gracefully degrades performance slightly until Node 3 is recovered and automatically reassumes the primary read workload.

Section 6: Operational Visibility and Native Alerting

When engineering a distributed system, it is easy to default to deploying an industry-standard monitoring matrix like Prometheus, Alertmanager, and Grafana. While that ecosystem is incredibly powerful for macro-level infrastructure monitoring, it introduces substantial operational complexity, a non-trivial memory footprint, and separate storage engines that must be maintained.

Because the pipeline relies on the foundational robustness of Apache Kafka as a durable shock absorber, I realized that I did not need real-time, sub-second scraping matrices to protect the system data. Instead, I built a lightweight, specialized background daemon service that runs natively as a scheduled cron execution utility to provide direct operational visibility.

The Five-Minute Health Matrix

This monitoring service wakes up every five minutes and executes two distinct, low-overhead check routines designed to catch systemic ingestion failures before they escalate into data loss incidents.

1. Calculating Consumer Ingestion Lag
The primary metric that indicates the health of a streaming ingestion pipeline is consumer lag. Lag represents the distance between the latest data produced to a Kafka partition and the latest data read by the database consumer group.

To fetch this information without installing heavy external monitoring agents, my Go utility interfaces directly with Kafka's KRaft admin metadata endpoints using a clean client interface. The utility calculates the delta programmatically:

Consumer Lag = Partition Log End Offset - ClickHouse Consumer Committed Offset

If the calculated lag across any of the six logging topics crosses a safety threshold of 50,000 records, it means that while the Go producer is functioning perfectly, ClickHouse has stalled. This behavior typically points to an underlying issue, such as an exhausted disk I/O pool, a table locking exception, or network degradation between the brokers and the database nodes.

2. ClickHouse Node Availability Verifications
The second routine performs a rapid validation of the storage cluster's physical layer. The daemon issues lightweight HTTP requests to the chproxy interface targeting the native /ping health endpoints of each individual database instance.

If a node fails to respond with a standard HTTP 200 OK status within a strict three-second timeout window, the daemon flags that specific node instance as offline.

Micro-Alerting and the Kafka Safety Buffer

If either check routine registers a violation, the monitoring utility immediately bypasses complex alerting middleware and shoots a structured JSON payload directly to a configured corporate webhook (such as Slack or Microsoft Teams). This high-priority alert drops critical diagnostic data directly into the engineering team's triage channel, highlighting the exact topic experiencing lag or the precise node IP that has stopped responding.

The beauty of this architectural approach lies in the interplay between the lightweight visibility tool and Kafka's retention policies. Because every topic in the KRaft cluster is hardened with a strict 72-hour log retention lifecycle, an engineer does not need to wake up at 3:00 AM for a minor database hiccup.

If ClickHouse crashes entirely, the Go microservice continues to serialize and dump micro-batches into Kafka safely. The background daemon flags the failure within five minutes, giving the engineering team a comfortable window of hours or even days to log in, analyze the infrastructure, and restart the database services. Once ClickHouse comes back online, it reads its last committed offset from the KRaft metadata layer and processes the accumulated backlog at maximum disk speed without dropping a single byte of data.

Section 7: Summary of Architectural Trade-offs and System Matrix

Engineering a distributed system is rarely about finding a flawless solution; it is about choosing which trade-offs you are willing to accept. By choosing to decouple the ingestion layer from the analytical storage layer, I traded real-time, strong transactional consistency for massive write throughput and high availability.

Because data replicates asynchronously across the ClickHouse cluster, a log written to Kafka might take a few seconds to reflect on the reporting dashboard. For compliance and audit trail use cases, this eventual consistency is an incredibly reasonable trade-off. The immediate safety of the data on disk inside the KRaft cluster is significantly more valuable than seeing a dashboard chart update a fraction of a second faster.

To summarize how this architecture behaves under various real-world operational scenarios, I mapped out a definitive system state matrix. This serves as a playbook for what happens when infrastructure encounters turbulent production environments.

The Operational State Matrix

1. Steady State Operation

Producer App Behavior: The Go microservice smoothly groups incoming logs into Avro-serialized micro-batches, instantly flushing them to Kafka every 5,000 records or 200 milliseconds.
Kafka Cluster Status: Healthy. Partitions are evenly balanced across the 3 KRaft nodes.
ClickHouse Cluster Status: Healthy. Ingestion Nodes 1 and 2 pull blocks via the native Kafka Engine. ClickHouse Keeper replicates data parts to Node 3 asynchronously.
Dashboard Read Path: chproxy routes 100% of read traffic directly to Node 3 Query latencies remain rock-solid at sub-150ms.
Data Loss Risk: 0% Risk.

2. Single Kafka Broker Failure

Producer App Behavior: The Sarama client transparently routes traffic away from the failed broker using local metadata updates. It experiences zero dropped messages thanks to retry loops.
Kafka Cluster Status: Degraded but functional. The remaining 2 KRaft nodes elect new partition leaders instantly.
ClickHouse Cluster Status: Healthy. The Kafka Engine consumers reconnect to the surviving brokers and continue polling streams.
Dashboard Read Path: Unaffected. chproxy continues routing reporting queries to Node 3.
Data Loss Risk: 0% Risk. The 3x replication factor preserves complete data integrity across the remaining infrastructure.

3. ClickHouse Ingestion Node Failure (Node 1 or Node 2)

Producer App Behavior: Unaffected. The Go producer continues dumping micro-batches into the Kafka cluster without knowing a database node is offline.
Kafka Cluster Status: Healthy. The brokers act as a buffer, safely holding logs and advancing offsets only when acknowledged.
ClickHouse Cluster Status: Degraded. One ingestion stream halts. The remaining active ClickHouse node continues to ingest its assigned topics normally.
Dashboard Read Path: Unaffected. chproxy continues serving dashboard analytics from Node 3.
Data Loss Risk: 0% Risk. Kafka securely retains unread logs for up to 72 hours. Once the failed ingestion node recovers, it resumes reading from its last tracked offset.

4. ClickHouse Analytical Node Failure (Node 3)

Producer App Behavior: Unaffected. The write pipeline continues moving at maximum velocity.
Kafka Cluster Status: Healthy.
ClickHouse Cluster Status: Functional. Nodes 1 and 2 continue consuming Kafka streams and storing raw logs to disk normally.
Dashboard Read Path: Handled gracefully. chproxy instantly flags Node 3 as unhealthy and reroutes 100% of user reporting queries to Nodes 1 and 2. Dashboard uptime remains at 100%.
Data Loss Risk: 0% Risk. Data replication is unaffected.

5. Total Database Cluster Outage (Nodes 1, 2, and 3 Offline)

Producer App Behavior: Unaffected. The Go application experiences zero pressure because it communicates exclusively with the messaging layer.
Kafka Cluster Status: Healthy. The 3-node KRaft cluster acts as a massive shock absorber, storing all incoming binary payloads sequentially on disk.
ClickHouse Cluster Status: Critical. Ingestion halts completely and dashboards become temporarily unavailable.
Dashboard Read Path: chproxy returns HTTP 502 Bad Gateway errors as all backend targets fail health checks.
Data Loss Risk: 0% Risk, provided the database cluster is recovered within the 72-hour window. As soon as the nodes are restarted, ClickHouse triggers massive parallel bulk-reads from Kafka, clearing the backlog at maximum disk speed until consumer lag drops back to zero.

Final Thoughts

By moving away from standard relational database paradigms and leaning heavily into distributed patterns, I constructed an architecture capable of processing tens of millions of records with absolute durability.

Using Go’s native concurrency primitives allows for highly efficient serialization and batching right at the edge of the application. Pairing that performance with the raw ingestion power of a column-oriented database like ClickHouse creates a resilient production pipeline that handles massive traffic anomalies with ease. This design proves that with the right combination of data governance, architectural decoupling, and smart proxy routing, you never have to compromise between write durability and analytical read speed.

DEV Community