Alexander Alten

Posted on Jan 24

What Breaks When Kafka Meets Iceberg at Scale

#kafka #dataengineering #iceberg #devops

I work on KafScale, as disclaimer. But I've also spent time in GitHub issues for Kafka Connect, Flink, and Hudi trying to understand why Kafka-to-Iceberg pipelines break in production. I wrote a longer version of this on our company blog with more detail on each failure mode.

The marketing makes it look simple. The reality is different.

The Problem

Every data team eventually wants the same thing: streaming data in queryable tables. Kafka handles the streaming. Iceberg won the table format war. Getting data from one to the other should be straightforward.

It's not.

Search GitHub for "Iceberg sink" and you'll find 344 open issues. "Kafka Connect Iceberg" adds 89 more. "Flink Iceberg checkpoint" brings 127. "Hudi streaming" pulls up over 1,200.

I read through enough of them to see the same failures repeating.

What Actually Breaks

Kafka Connect Iceberg Sink

The connector looks straightforward in demos. Production is different.

Silent coordinator failures. Mismatch between consumer group IDs and the connector fails silently. No error messages. Data flows in, nothing comes out. You find out three hours later.

Dual offset tracking. Offsets stored in two different consumer groups. Reset one, forget the other, lose data or duplicate it.

Schema evolution crashes. Drop a column, recreate it with a different type. Connector crashes.

Version hell. Avro converter wants 1.11.4, Iceberg 1.8.1 ships with 1.12.0. ClassNotFoundException at startup.

Timeout storms. Under load: TimeoutException: Timeout expired after 60000ms while awaiting TxnOffsetCommitHandler. Task killed. Manual intervention required.

Flink + Iceberg

Flink is the standard answer. It comes with its own problems.

Small file apocalypse. Frequent checkpoints create thousands of KB-sized files. Query performance collapses. Metadata overhead explodes.

Compaction conflicts. Compact the same partition your streaming job writes to. Get write failures or corruption.

Checkpoint ghost commits. Checkpoints complete but metadata files don't update. Tencent built a custom operator because the default "will be invalid."

Recovery failures. FileNotFoundException on checkpoint recovery. No automatic fix.

Hudi

Similar story.

30-minute latency. Kafka to Spark to Hudi to AWS. Pipeline that should be seconds takes half an hour.

Upgrade breakage. Version 0.12.1 to 0.13.0 breaks second micro-batch.

Connection pool exhaustion. Metadata service enabled, HTTP connections leak.

The Cost

A 1 GiB/s streaming pipeline writing to Iceberg through connectors can cost $3.4 million annually in duplicate storage and transfer fees.

Data gets written to Kafka, copied to a connector, transformed, written to S3, then registered in a catalog. Four hops. Four failure points. Four cost centers.

The Options

1. Kafka Connect Iceberg Sink

Works if you have simple schemas, moderate throughput, and an ops team that knows Connect.

Doesn't work if you have schema evolution, high throughput, or need reliability without manual intervention.

2. Flink

Works if you have Flink expertise and can tune checkpoints, manage compaction separately, handle the small file problem.

Doesn't work if you want something simple or don't have Flink ops experience.

3. Confluent Tableflow

Works if you're on Confluent Cloud and topics have schemas.

Doesn't work for topics without schemas, self-managed Kafka, or external catalog sync.

Upsert mode has limits: 30B unique keys, 20K events/sec under 6B rows. Additional charges coming 2026.

4. Storage-Native Architecture

This is the approach I took with KafScale.

Write streaming data directly to S3 in a format analytical tools can read. No connector layer. No broker involvement for reads.

The Iceberg Processor reads .kfs segments from S3, converts to Parquet, writes to Iceberg tables. Works with Unity Catalog, Polaris, AWS Glue.

Zero broker load for analytical workloads.

apiVersion: kafscale.io/v1
kind: IcebergProcessor
metadata:
  name: events-to-iceberg
spec:
  source:
    topics: [events]
    s3:
      bucket: kafscale-data
      prefix: segments/
  sink:
    catalog:
      type: unity
      endpoint: https://workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg
      warehouse: /Volumes/main/default/warehouse
    database: analytics
  processing:
    parallelism: 4
    commitIntervalSeconds: 60

The tradeoff: you're coupled to the .kfs format, not just the Kafka protocol. But the format is documented and public, and it's more stable than trying to keep Kafka Connect and Flink versions aligned.

When to Use What

Simple schemas, low throughput, Connect expertise? Kafka Connect Iceberg Sink.

Complex transformations, Flink team on staff? Flink.

Confluent Cloud, schema registry everywhere? Tableflow.

Want to skip the connector layer entirely? KafScale or similar storage-native approach.

Need transactions? Flink or Tableflow. Not KafScale.

Why I Built Another Thing

I kept seeing the same pattern: teams build Kafka-to-Iceberg pipelines, hit one of these issues, spend weeks debugging, then either add more infrastructure or accept the latency.

The connector model assumes brokers are the only way to access data. That made sense when storage was expensive. S3 at $0.02/GB/month changed the math.

If your storage format is documented, processors can read directly from S3. No broker load. No connector framework. Kubernetes pods that scale independently.

The .kfs format is public. Build your own processors if you want. The Iceberg Processor is just the one we needed first.

DEV Community