<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexander Alten</title>
    <description>The latest articles on DEV Community by Alexander Alten (@novatechflow).</description>
    <link>https://dev.to/novatechflow</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3672851%2F8fbb013f-e3bb-467b-acec-a661b4fe9151.jpeg</url>
      <title>DEV Community: Alexander Alten</title>
      <link>https://dev.to/novatechflow</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/novatechflow"/>
    <language>en</language>
    <item>
      <title>What Breaks When Kafka Meets Iceberg at Scale</title>
      <dc:creator>Alexander Alten</dc:creator>
      <pubDate>Sat, 24 Jan 2026 10:29:26 +0000</pubDate>
      <link>https://dev.to/novatechflow/what-breaks-when-kafka-meets-iceberg-at-scale-2e82</link>
      <guid>https://dev.to/novatechflow/what-breaks-when-kafka-meets-iceberg-at-scale-2e82</guid>
      <description>&lt;p&gt;I work on &lt;a href="https://kafscale.io" rel="noopener noreferrer"&gt;KafScale&lt;/a&gt;, as disclaimer. But I've also spent time in GitHub issues for Kafka Connect, Flink, and Hudi trying to understand why Kafka-to-Iceberg pipelines break in production. I wrote a &lt;a href="https://www.scalytics.io/blog/47-github-issues-that-explain-your-iceberg-latency" rel="noopener noreferrer"&gt;longer version of this on our company blog&lt;/a&gt; with more detail on each failure mode.&lt;/p&gt;

&lt;p&gt;The marketing makes it look simple. The reality is different.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fth1qz6ar7tlvrlfenrom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fth1qz6ar7tlvrlfenrom.png" alt="The streaming integration tax" width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Every data team eventually wants the same thing: streaming data in queryable tables. Kafka handles the streaming. Iceberg won the table format war. Getting data from one to the other should be straightforward.&lt;/p&gt;

&lt;p&gt;It's not.&lt;/p&gt;

&lt;p&gt;Search GitHub for "Iceberg sink" and you'll find 344 open issues. "Kafka Connect Iceberg" adds 89 more. "Flink Iceberg checkpoint" brings 127. "Hudi streaming" pulls up over 1,200.&lt;/p&gt;

&lt;p&gt;I read through enough of them to see the same failures repeating.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Breaks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Kafka Connect Iceberg Sink
&lt;/h3&gt;

&lt;p&gt;The connector looks straightforward in demos. Production is different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silent coordinator failures.&lt;/strong&gt; Mismatch between consumer group IDs and the connector &lt;a href="https://github.com/apache/iceberg/issues/12610" rel="noopener noreferrer"&gt;fails silently&lt;/a&gt;. No error messages. Data flows in, nothing comes out. You find out three hours later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dual offset tracking.&lt;/strong&gt; Offsets stored in &lt;a href="https://github.com/databricks/iceberg-kafka-connect" rel="noopener noreferrer"&gt;two different consumer groups&lt;/a&gt;. Reset one, forget the other, lose data or duplicate it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema evolution crashes.&lt;/strong&gt; Drop a column, recreate it with a different type. &lt;a href="https://github.com/getindata/kafka-connect-iceberg-sink" rel="noopener noreferrer"&gt;Connector crashes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version hell.&lt;/strong&gt; Avro converter &lt;a href="https://github.com/apache/iceberg/issues/12571" rel="noopener noreferrer"&gt;wants 1.11.4&lt;/a&gt;, Iceberg 1.8.1 ships with 1.12.0. ClassNotFoundException at startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeout storms.&lt;/strong&gt; Under load: &lt;code&gt;TimeoutException: Timeout expired after 60000ms while awaiting TxnOffsetCommitHandler&lt;/code&gt;. &lt;a href="https://github.com/apache/iceberg/issues/13457" rel="noopener noreferrer"&gt;Task killed&lt;/a&gt;. Manual intervention required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flink + Iceberg
&lt;/h3&gt;

&lt;p&gt;Flink is the standard answer. It comes with its own problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Small file apocalypse.&lt;/strong&gt; Frequent checkpoints create &lt;a href="https://github.com/apache/iceberg/issues/7568" rel="noopener noreferrer"&gt;thousands of KB-sized files&lt;/a&gt;. Query performance collapses. Metadata overhead explodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compaction conflicts.&lt;/strong&gt; Compact the same partition your streaming job writes to. Get &lt;a href="https://github.com/apache/iceberg/issues/9089" rel="noopener noreferrer"&gt;write failures or corruption&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Checkpoint ghost commits.&lt;/strong&gt; Checkpoints complete but metadata files don't update. Tencent &lt;a href="https://github.com/apache/iceberg/issues/4557" rel="noopener noreferrer"&gt;built a custom operator&lt;/a&gt; because the default "will be invalid."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recovery failures.&lt;/strong&gt; FileNotFoundException on checkpoint recovery. No automatic fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hudi
&lt;/h3&gt;

&lt;p&gt;Similar story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;30-minute latency.&lt;/strong&gt; Kafka to Spark to Hudi to AWS. Pipeline that should be seconds takes &lt;a href="https://github.com/apache/hudi/issues/11118" rel="noopener noreferrer"&gt;half an hour&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upgrade breakage.&lt;/strong&gt; Version 0.12.1 to 0.13.0 &lt;a href="https://github.com/apache/hudi/issues/8890" rel="noopener noreferrer"&gt;breaks second micro-batch&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection pool exhaustion.&lt;/strong&gt; Metadata service enabled, &lt;a href="https://github.com/apache/hudi/issues/8191" rel="noopener noreferrer"&gt;HTTP connections leak&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Cost
&lt;/h2&gt;

&lt;p&gt;A 1 GiB/s streaming pipeline writing to Iceberg through connectors can cost &lt;a href="https://aiven.io/blog/why-dont-apache-kafka-and-iceberg-get-along" rel="noopener noreferrer"&gt;$3.4 million annually&lt;/a&gt; in duplicate storage and transfer fees.&lt;/p&gt;

&lt;p&gt;Data gets written to Kafka, copied to a connector, transformed, written to S3, then registered in a catalog. Four hops. Four failure points. Four cost centers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Kafka Connect Iceberg Sink
&lt;/h3&gt;

&lt;p&gt;Works if you have simple schemas, moderate throughput, and an ops team that knows Connect.&lt;/p&gt;

&lt;p&gt;Doesn't work if you have schema evolution, high throughput, or need reliability without manual intervention.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Flink
&lt;/h3&gt;

&lt;p&gt;Works if you have Flink expertise and can tune checkpoints, manage compaction separately, handle the small file problem.&lt;/p&gt;

&lt;p&gt;Doesn't work if you want something simple or don't have Flink ops experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Confluent Tableflow
&lt;/h3&gt;

&lt;p&gt;Works if you're on Confluent Cloud and topics have schemas.&lt;/p&gt;

&lt;p&gt;Doesn't work for &lt;a href="https://docs.confluent.io/cloud/current/topics/tableflow/overview.html" rel="noopener noreferrer"&gt;topics without schemas&lt;/a&gt;, self-managed Kafka, or external catalog sync.&lt;/p&gt;

&lt;p&gt;Upsert mode has limits: 30B unique keys, 20K events/sec under 6B rows. Additional charges coming 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Storage-Native Architecture
&lt;/h3&gt;

&lt;p&gt;This is the approach I took with KafScale.&lt;/p&gt;

&lt;p&gt;Write streaming data directly to S3 in a format analytical tools can read. No connector layer. No broker involvement for reads.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://kafscale.io" rel="noopener noreferrer"&gt;Iceberg Processor&lt;/a&gt; reads .kfs segments from S3, converts to Parquet, writes to Iceberg tables. Works with Unity Catalog, Polaris, AWS Glue.&lt;/p&gt;

&lt;p&gt;Zero broker load for analytical workloads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafscale.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IcebergProcessor&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;events-to-iceberg&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;s3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafscale-data&lt;/span&gt;
      &lt;span class="na"&gt;prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;segments/&lt;/span&gt;
  &lt;span class="na"&gt;sink&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;catalog&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unity&lt;/span&gt;
      &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg&lt;/span&gt;
      &lt;span class="na"&gt;warehouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/Volumes/main/default/warehouse&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;analytics&lt;/span&gt;
  &lt;span class="na"&gt;processing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;parallelism&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
    &lt;span class="na"&gt;commitIntervalSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tradeoff: you're coupled to the .kfs format, not just the Kafka protocol. But the format is &lt;a href="https://kafscale.io" rel="noopener noreferrer"&gt;documented and public&lt;/a&gt;, and it's more stable than trying to keep Kafka Connect and Flink versions aligned.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Simple schemas, low throughput, Connect expertise?&lt;/strong&gt; Kafka Connect Iceberg Sink.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complex transformations, Flink team on staff?&lt;/strong&gt; Flink.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confluent Cloud, schema registry everywhere?&lt;/strong&gt; Tableflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want to skip the connector layer entirely?&lt;/strong&gt; KafScale or similar storage-native approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Need transactions?&lt;/strong&gt; Flink or Tableflow. Not KafScale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Built Another Thing
&lt;/h2&gt;

&lt;p&gt;I kept seeing the same pattern: teams build Kafka-to-Iceberg pipelines, hit one of these issues, spend weeks debugging, then either add more infrastructure or accept the latency.&lt;/p&gt;

&lt;p&gt;The connector model assumes brokers are the only way to access data. That made sense when storage was expensive. S3 at $0.02/GB/month changed the math.&lt;/p&gt;

&lt;p&gt;If your storage format is documented, processors can read directly from S3. No broker load. No connector framework. Kubernetes pods that scale independently.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://kafscale.io" rel="noopener noreferrer"&gt;.kfs format&lt;/a&gt; is public. Build your own processors if you want. The Iceberg Processor is just the one we needed first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.scalytics.io/blog/47-github-issues-that-explain-your-iceberg-latency" rel="noopener noreferrer"&gt;Full writeup on Scalytics blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kafscale.io" rel="noopener noreferrer"&gt;KafScale Iceberg Processor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://iceberg.apache.org/docs/latest/kafka-connect/" rel="noopener noreferrer"&gt;Apache Iceberg Kafka Connect Sink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/cloud/current/topics/tableflow/overview.html" rel="noopener noreferrer"&gt;Confluent Tableflow Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aiven.io/blog/why-dont-apache-kafka-and-iceberg-get-along" rel="noopener noreferrer"&gt;Aiven: Why Kafka and Iceberg Don't Get Along&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>iceberg</category>
      <category>devops</category>
    </item>
    <item>
      <title>SQL on Kafka Data Does Not Require a Streaming Engine</title>
      <dc:creator>Alexander Alten</dc:creator>
      <pubDate>Wed, 14 Jan 2026 07:55:00 +0000</pubDate>
      <link>https://dev.to/novatechflow/sql-on-kafka-data-does-not-require-a-streaming-engine-3kfe</link>
      <guid>https://dev.to/novatechflow/sql-on-kafka-data-does-not-require-a-streaming-engine-3kfe</guid>
      <description>&lt;p&gt;Stream processing engines solved a real problem: continuous computation over unbounded data. Flink, ksqlDB, and Kafka Streams gave teams a way to run SQL-like queries against event streams without writing custom consumers.&lt;/p&gt;

&lt;p&gt;The operational cost of that solution is widely acknowledged. Confluent's own documentation notes that Flink "poses difficulties with deployment and cluster operations, such as tuning performance or resolving checkpoint failures" and that "organizations using Flink tend to require teams of experts dedicated to developing and maintaining it."&lt;/p&gt;

&lt;p&gt;For a large share of the questions teams ask their Kafka data, a simpler architecture exists: SQL on immutable segments in object storage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ahgbacjjt30lnc0pf0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ahgbacjjt30lnc0pf0n.png" alt="Most teams need a streaming interface, not a streaming engine." width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What engineers actually ask Kafka
&lt;/h3&gt;

&lt;p&gt;In production debugging sessions and ops reviews, the questions are repetitive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is in this topic right now?&lt;/li&gt;
&lt;li&gt;What happened around an incident window?&lt;/li&gt;
&lt;li&gt;Where is the message with this key?&lt;/li&gt;
&lt;li&gt;Are all partitions still producing data?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not streaming problems. They are bounded lookups over historical data. They run once, terminate, and do not need windows, watermarks, checkpoints, or state recovery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kafka data is already structured for this
&lt;/h3&gt;

&lt;p&gt;Kafka does not persist records individually. It appends them to log segments and rolls those segments by size or time. Each partition is an ordered, immutable sequence of records. Once a segment is closed, it is immutable.&lt;/p&gt;

&lt;p&gt;Kafka also maintains sparse indexes so readers can seek by offset and timestamp efficiently. Each segment file is accompanied by lightweight offset and timestamp indexes that allow consumers to seek directly to specific message positions without scanning entire files.&lt;/p&gt;

&lt;p&gt;Retention deletes whole segments. Compaction rewrites segments. This means Kafka data is already organized like a SQL-on-files dataset. The only difference is where the files live.&lt;/p&gt;

&lt;p&gt;Since Kafka 3.6.0, tiered storage allows these segments to live in object storage like S3. As of Kafka 3.9.0, this feature is production-ready. Durability is now decoupled from compute without changing the data model.&lt;/p&gt;

&lt;h3&gt;
  
  
  The streaming engine "tax"
&lt;/h3&gt;

&lt;p&gt;Streaming engines pay for capabilities most queries never use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributed state backends&lt;/li&gt;
&lt;li&gt;Coordinated checkpoints&lt;/li&gt;
&lt;li&gt;Watermark tracking&lt;/li&gt;
&lt;li&gt;Long-running cluster operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That cost is justified for continuous aggregation, joins, and real-time inference.&lt;/p&gt;

&lt;p&gt;It is wasted for "show me the last 10 messages".&lt;/p&gt;

&lt;p&gt;Production experience confirms this. Riskified migrated from ksqlDB to Flink, noting that ksqlDB's strict limitations on evolving schemas made it impractical for real-world production use cases and that operational complexity required fighting the system more than working with it.&lt;/p&gt;

&lt;p&gt;The scale mismatch is also documented. Vendor surveys from Confluent and Redpanda show that approximately 56% of all Kafka clusters run at or below 1 MB/s. Most Kafka usage is small-data, yet teams pay big-data operational costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL on immutable segments
&lt;/h3&gt;

&lt;p&gt;If Kafka data lives as immutable segments with sparse indexes, querying it looks like any other SQL-on-files workload.&lt;/p&gt;

&lt;p&gt;The query planner:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resolves the topic to segment files&lt;/li&gt;
&lt;li&gt;Filters by timestamp or offset metadata&lt;/li&gt;
&lt;li&gt;Reads only relevant segments&lt;/li&gt;
&lt;li&gt;Applies predicates and returns results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No consumer groups. No offset commits. No streaming job lifecycle.&lt;/p&gt;

&lt;p&gt;Expose Kafka-native fields as columns and the common queries become trivial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;TAIL&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-08 09:00'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-08 09:05'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'order-12345'&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'24 hours'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not stream processing. It is indexed file access with SQL semantics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency, realistically
&lt;/h3&gt;

&lt;p&gt;Yes, object storage is slower than broker-local disk. Remote storage typically has higher latency than local block storage.&lt;/p&gt;

&lt;p&gt;That is fine.&lt;/p&gt;

&lt;p&gt;Most of these queries are debugging and ops workflows. Waiting one or two seconds is acceptable. Waiting minutes to deploy or restart a streaming job is not.&lt;/p&gt;

&lt;p&gt;If you need sub-second continuous results, use a streaming engine. That boundary is clear.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost visibility beats hidden complexity
&lt;/h3&gt;

&lt;p&gt;The real risk with SQL on object storage is unbounded scans. Object storage pricing is calculated based on the amount of data stored and the number of API calls made.&lt;/p&gt;

&lt;p&gt;The solution is not more infrastructure. It is transparency.&lt;/p&gt;

&lt;p&gt;Every query should show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many segments will be read&lt;/li&gt;
&lt;li&gt;How many bytes will be scanned&lt;/li&gt;
&lt;li&gt;The estimated request cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Queries without time bounds should require explicit opt-in.&lt;/p&gt;

&lt;p&gt;This keeps cost a conscious decision instead of a surprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where streaming engines still belong
&lt;/h3&gt;

&lt;p&gt;Streaming engines are still the right tool for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuous aggregations&lt;/li&gt;
&lt;li&gt;Joins over live streams&lt;/li&gt;
&lt;li&gt;Real-time scoring&lt;/li&gt;
&lt;li&gt;Exactly-once outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most Kafka interactions are not those.&lt;/p&gt;

&lt;p&gt;They are lookups and inspections that were forced into streaming infrastructure because no better interface existed.&lt;/p&gt;

&lt;p&gt;Once Kafka data is durable as immutable segments, SQL becomes the simpler tool.&lt;/p&gt;




&lt;h3&gt;
  
  
  The takeaway
&lt;/h3&gt;

&lt;p&gt;Most teams do not need a streaming engine to answer Kafka questions.&lt;/p&gt;

&lt;p&gt;They need a clean, bounded way to query immutable data.&lt;/p&gt;

&lt;p&gt;SQL on Kafka segments does exactly that.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Read a more deeper post at &lt;a href="https://www.novatechflow.com/2026/01/sql-on-streaming-data-does-not-require.html" rel="noopener noreferrer"&gt;https://www.novatechflow.com/2026/01/sql-on-streaming-data-does-not-require.html&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>eventdriven</category>
    </item>
    <item>
      <title>S3-Native Kafka Alternatives: What's Actually Different</title>
      <dc:creator>Alexander Alten</dc:creator>
      <pubDate>Fri, 02 Jan 2026 15:03:01 +0000</pubDate>
      <link>https://dev.to/novatechflow/s3-native-kafka-alternatives-whats-actually-different-11d2</link>
      <guid>https://dev.to/novatechflow/s3-native-kafka-alternatives-whats-actually-different-11d2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fon9fbzlvet5sea9t37gn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fon9fbzlvet5sea9t37gn.png" alt="KafScale stateless processor architecture" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I work on KafScale, so take this with appropriate salt. But I've also spent time looking at WarpStream, AutoMQ, and Bufstream, and the marketing pages don't tell you what you actually need to know.&lt;/p&gt;

&lt;p&gt;They all store data in S3. They all claim to be cheaper than Kafka. Here's what's actually different.&lt;/p&gt;




&lt;h2&gt;
  
  
  WarpStream
&lt;/h2&gt;

&lt;p&gt;Confluent bought them in September 2024. The agents run in your VPC, but metadata and coordination run in Confluent's cloud.&lt;/p&gt;

&lt;p&gt;Latency is 400-600ms p99. That's the cost of writing directly to S3 with no local buffer.&lt;/p&gt;

&lt;p&gt;If you're already a Confluent shop and want S3 pricing without running infrastructure, this makes sense. If you don't want a cloud dependency, look elsewhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  AutoMQ
&lt;/h2&gt;

&lt;p&gt;Fork of Kafka with a new storage layer. Uses EBS as a write-ahead log, then tiers to S3.&lt;/p&gt;

&lt;p&gt;Latency is around 10ms p99 because of the EBS buffer. That's close to real Kafka.&lt;/p&gt;

&lt;p&gt;The catch: you're still managing EBS volumes. It's simpler than Kafka, but it's not stateless. Also BSL licensed, so read the terms if you're building a platform.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bufstream
&lt;/h2&gt;

&lt;p&gt;From the Buf/Protobuf people. S3 for storage, PostgreSQL for metadata. Native Iceberg output.&lt;/p&gt;

&lt;p&gt;Latency around 500ms p99. Similar to WarpStream.&lt;/p&gt;

&lt;p&gt;If you're building on Iceberg and want Kafka-compatible ingestion, this is purpose-built for that. If you're not in the lakehouse world, the PostgreSQL dependency is extra infrastructure for no benefit.&lt;/p&gt;




&lt;h2&gt;
  
  
  KafScale
&lt;/h2&gt;

&lt;p&gt;Stateless Go brokers, S3 for storage, etcd for coordination. Apache 2.0 license.&lt;/p&gt;

&lt;p&gt;Latency around 400ms p99. Same ballpark as WarpStream.&lt;/p&gt;

&lt;p&gt;No transactions. No compacted topics. If you need those, use something else.&lt;/p&gt;

&lt;p&gt;What's different: the segment format is documented and open. You can write processors that read directly from S3 without hitting brokers. That matters if you have analytical workloads (batch replay, Iceberg materialization, AI agents pulling context) that you want to keep separate from your streaming traffic.&lt;/p&gt;

&lt;p&gt;The tradeoff is coupling. Your processors depend on the &lt;code&gt;.kfs&lt;/code&gt; format, not just the Kafka protocol.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to use what
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Need low latency (&amp;lt;100ms)?&lt;/strong&gt; AutoMQ or stick with Kafka/Redpanda.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want managed S3 streaming with Confluent ecosystem?&lt;/strong&gt; WarpStream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building on Iceberg?&lt;/strong&gt; Bufstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want Apache 2.0 license and direct S3 reads?&lt;/strong&gt; KafScale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Need transactions?&lt;/strong&gt; Not KafScale. Kafka or Bufstream.&lt;/p&gt;




&lt;h2&gt;
  
  
  The latency thing
&lt;/h2&gt;

&lt;p&gt;400-500ms is fine for most workloads. Log aggregation, ETL, async events, audit trails. If you're honest about your actual requirements, you probably don't need 10ms.&lt;/p&gt;

&lt;p&gt;But if you do need it, the pure S3 options won't work for you. AutoMQ with EBS is the compromise.&lt;/p&gt;




&lt;h2&gt;
  
  
  The license thing
&lt;/h2&gt;

&lt;p&gt;WarpStream is proprietary (Confluent). AutoMQ was BSL until May 2025[1]. Bufstream is proprietary. KafScale is Apache 2.0.&lt;/p&gt;

&lt;p&gt;If you care about this, you already know why. If you don't, it probably won't matter until it does.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I built another one
&lt;/h2&gt;

&lt;p&gt;After looking at all of these, I still wrote KafScale. Here's why.&lt;/p&gt;

&lt;p&gt;WarpStream got the architecture right: stateless brokers, S3 storage, no disk ops. But it's proprietary and now owned by Confluent. I wanted that architecture without the dependency.&lt;/p&gt;

&lt;p&gt;More importantly, I wanted processors that bypass brokers entirely. Kubernetes pods that read directly from S3, process historical data, write to Iceberg, feed AI agents. No connector framework. No fighting for broker resources. Just pods and object storage.&lt;/p&gt;

&lt;p&gt;Someone pointed out that this makes processors "fat clients" coupled to the storage format. Fair. But Kafka's message format has had three versions in 15 years. V2 has been stable since 2017. The entire ecosystem depends on it not changing. That's a bet I'm willing to make.&lt;/p&gt;

&lt;p&gt;The alternative is routing everything through brokers. Then your batch replay jobs compete with your real-time consumers. Your AI training pipeline spikes latency for everyone. That's the problem I was trying to solve.&lt;/p&gt;

&lt;p&gt;Open format. Open license. Processors that scale independently from brokers. That's the gap none of the others filled.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;More on the architecture: &lt;a href="https://www.scalytics.io/blog/streaming-data-becomes-storage-native" rel="noopener noreferrer"&gt;Streaming Data Becomes Storage-Native&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;[1] Was BSL until May 2025. Changed for Strimzi compatibility to support K8s rollouts. No community announcement. &lt;/p&gt;

</description>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
    <item>
      <title>Data Processing Does Not Belong in the Message Broker</title>
      <dc:creator>Alexander Alten</dc:creator>
      <pubDate>Mon, 29 Dec 2025 14:10:08 +0000</pubDate>
      <link>https://dev.to/novatechflow/data-processing-does-not-belong-in-the-message-broker-54mn</link>
      <guid>https://dev.to/novatechflow/data-processing-does-not-belong-in-the-message-broker-54mn</guid>
      <description>&lt;p&gt;Kafka changed the industry by making event streaming practical at scale.&lt;/p&gt;

&lt;p&gt;Over time, people started pushing data processing into the streaming platform itself. Kafka Streams, ksqlDB, broker-side transforms. It looks convenient on paper. In production, it often turns into operational friction.&lt;/p&gt;

&lt;p&gt;Incidents, benchmarks, and vendor documentation all point to the same conclusion: &lt;strong&gt;data processing does not belong in the streaming platform.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  State recovery does not scale
&lt;/h2&gt;

&lt;p&gt;Kafka Streams restores state by replaying changelog topics. There is no checkpointing mechanism. Recovery time grows with state size.&lt;/p&gt;

&lt;p&gt;One publicly documented incident: state store restoring from offset 0 to over 2.8 million records. Took more than two minutes. Producer transaction timeout was one minute. The application entered an ERROR state with no automatic recovery.&lt;/p&gt;

&lt;p&gt;Practitioners regularly raise this problem when running Kafka Streams at scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://stackoverflow.com/questions/48277466/kafka-streams-state-store-restore-time" rel="noopener noreferrer"&gt;Kafka Streams state restore discussion (Stack Overflow)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/platform/current/streams/architecture.html#state-store-recovery" rel="noopener noreferrer"&gt;Kafka Streams state recovery explanation (Confluent docs)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recovery by replay means restart time is proportional to how much state you accumulated. Once the state grows beyond "small", recovery becomes part of your availability risk.&lt;/p&gt;

&lt;p&gt;Processing engines took a different approach years ago. They checkpoint state and restore from snapshots instead of replaying everything from the beginning. That difference shows up the first time you actually need to recover under load.&lt;/p&gt;




&lt;h2&gt;
  
  
  Exactly-once is more limited than it sounds
&lt;/h2&gt;

&lt;p&gt;Kafka's exactly-once semantics apply inside Kafka. Spring's official documentation states it clearly: the read and process steps are still at-least-once.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-kafka/reference/html/#exactly-once" rel="noopener noreferrer"&gt;Spring Kafka exactly-once semantics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As soon as you write to a database, call an external service, or touch anything outside Kafka, duplicate handling is your problem again.&lt;/p&gt;

&lt;p&gt;Kafka also documents the scaling problems this creates. Before Kafka 2.5, exactly-once required one transactional producer per input partition. At scale, that meant thousands of producers, each with its own buffers, threads, and network connections.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics" rel="noopener noreferrer"&gt;KIP-447: Producer scalability for EOS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka explicitly calls this an architecture that does not scale well as partition counts increase.&lt;/p&gt;




&lt;h2&gt;
  
  
  ksqlDB made the limits obvious
&lt;/h2&gt;

&lt;p&gt;Riskified published their migration story in 2025. Schema evolution in ksqlDB did not automatically include new fields. Fixing it required dropping and recreating streams, disrupting offsets and production pipelines. Shared clusters made recovery unpredictable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/big-data/unlock-self-serve-streaming-sql-with-amazon-managed-service-for-apache-flink/" rel="noopener noreferrer"&gt;Riskified migration case (AWS Big Data Blog)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ksqlDB was not sustainable for production. They moved to Flink.&lt;/p&gt;

&lt;p&gt;Confluent's own documentation backs this up. Push queries create continuous consumers. Pull queries create burst consumers. Both add load that is hard to predict and can affect other workloads in the same cluster.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/platform/current/ksqldb/concepts/queries.html" rel="noopener noreferrer"&gt;ksqlDB query types and resource usage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Even vendors draw a boundary
&lt;/h2&gt;

&lt;p&gt;Redpanda's Data Transforms documentation is explicit. Transforms are limited to single-message operations. No joins. No aggregations. No external access. A small number of output topics. At-least-once semantics only.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.redpanda.com/current/develop/data-transforms/" rel="noopener noreferrer"&gt;Redpanda Data Transforms limitations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For anything more complex, their recommendation is to use a dedicated processing engine like Apache Flink.&lt;/p&gt;

&lt;p&gt;Confluent acquired Immerok, a managed Flink provider, and is integrating Flink into its cloud offering. That move acknowledges what the architecture already tells you: serious stream processing requires a different execution model than a Kafka-native library.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architectural issue
&lt;/h2&gt;

&lt;p&gt;Streaming platforms are built for durable logs, ordering guarantees, fan-out, and backpressure.&lt;/p&gt;

&lt;p&gt;They are not built to be stateful compute engines with fast recovery, checkpoint coordination, or complex query runtimes with strong resource isolation.&lt;/p&gt;

&lt;p&gt;Once transport and processing are coupled, scaling, recovery, and cost are coupled too. You cannot scale processing without scaling brokers. You cannot tune recovery independently. Compute costs get buried inside your Kafka bill.&lt;/p&gt;




&lt;h2&gt;
  
  
  What works in practice
&lt;/h2&gt;

&lt;p&gt;Separating concerns works.&lt;/p&gt;

&lt;p&gt;Kafka or Redpanda handle transport. A dedicated processing engine handles state, checkpoints, and complex logic. When pipelines span multiple engines, something like &lt;a href="https://wayang.apache.org/" rel="noopener noreferrer"&gt;Apache Wayang&lt;/a&gt; can orchestrate across them.&lt;/p&gt;

&lt;p&gt;Lightweight transformations inside the streaming layer still make sense. Filtering, format normalization, and simple enrichment cover many cases.&lt;/p&gt;

&lt;p&gt;Core business logic with state, joins, and external writes does not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you are running joins, aggregations, or external writes inside your streaming platform today, what happens when you need to recover?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;I published the full version with vendor documentation quotes, Kafka KIPs, migration case studies, and architecture diagrams on my blog:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.novatechflow.com/2025/12/data-processing-does-not-belong-in.html" rel="noopener noreferrer"&gt;Data Processing Does Not Belong in the Message Broker&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>data</category>
    </item>
    <item>
      <title>Stateless Kafka-compatible brokers backed by object storage, k8s native</title>
      <dc:creator>Alexander Alten</dc:creator>
      <pubDate>Sun, 21 Dec 2025 16:04:15 +0000</pubDate>
      <link>https://dev.to/novatechflow/stateless-kafka-compatible-brokers-backed-by-object-storage-k8s-native-4lo1</link>
      <guid>https://dev.to/novatechflow/stateless-kafka-compatible-brokers-backed-by-object-storage-k8s-native-4lo1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox2ytyzduzsuzh7b3dwq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox2ytyzduzsuzh7b3dwq.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running Kafka-style systems on Kubernetes is possible, but it often feels like fighting the model rather than working with it.&lt;/p&gt;

&lt;p&gt;Stateful brokers, disks, tight coupling between compute and storage, painful scaling events, and recovery paths that are harder than they should be are common operational pain points. This becomes especially visible when clusters grow, traffic patterns fluctuate, or upgrades are frequent.&lt;/p&gt;

&lt;p&gt;We started experimenting with an alternative design that keeps Kafka protocol compatibility but changes the underlying assumptions. &lt;/p&gt;

&lt;p&gt;The core ideas are simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Brokers are stateless and disposable&lt;/li&gt;
&lt;li&gt;Message segments are stored in object storage (S3 or compatible)&lt;/li&gt;
&lt;li&gt;Scaling brokers becomes a compute concern, not a data migration problem&lt;/li&gt;
&lt;li&gt;Retention and durability are handled by object storage lifecycle policies&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;KafScale was initially released yesterday and is not meant to replace Kafka everywhere. It’s an DevOps-friendly drop-in with minimal ops in mind. &lt;/p&gt;

&lt;p&gt;We’ve been building this as an open-source project, licensed under Apache 2.0 and designed to be fully self-hosted.&lt;/p&gt;

&lt;p&gt;Repository and technical details:&lt;br&gt;
&lt;a href="https://github.com/novatechflow/kafscale" rel="noopener noreferrer"&gt;https://github.com/novatechflow/kafscale&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Architecture and Docs:&lt;br&gt;
&lt;a href="https://kafscale.io" rel="noopener noreferrer"&gt;https://kafscale.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this stage, the most valuable thing for us is feedback from people who operate streaming systems in production: where this model makes sense, where it breaks down, and what tradeoffs matter most in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;:&lt;br&gt;
A deeper architectural and historical analysis of stateless Kafka-compatible brokers is now available here:&lt;br&gt;
&lt;a href="https://www.novatechflow.com/2025/12/kafka-on-object-storage-was-inevitable.html" rel="noopener noreferrer"&gt;https://www.novatechflow.com/2025/12/kafka-on-object-storage-was-inevitable.html&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>kubernetes</category>
      <category>opensource</category>
      <category>kafka</category>
    </item>
  </channel>
</rss>
