<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Conduktor</title>
    <description>The latest articles on DEV Community by Conduktor (conduktor).</description>
    <link>https://dev.to/conduktor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7353%2Fe70b4d17-904a-4b56-9963-4dd2ac5aa071.png</url>
      <title>DEV Community: Conduktor</title>
      <link>https://dev.to/conduktor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/conduktor"/>
    <language>en</language>
    <item>
      <title>Kafka Cost Optimization Starts with Usage</title>
      <dc:creator>Stéphane Derosiaux</dc:creator>
      <pubDate>Mon, 29 Jun 2026 14:32:43 +0000</pubDate>
      <link>https://dev.to/conduktor/kafka-cost-optimization-starts-with-usage-3lfb</link>
      <guid>https://dev.to/conduktor/kafka-cost-optimization-starts-with-usage-3lfb</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fpfvlbwdkw5n53i3qecfi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fpfvlbwdkw5n53i3qecfi.png" alt="Kafka costs are about usage not just infra" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I sit in a lot of Kafka reviews. Vendors, instances, replication, tiered storage, advanced stuff like fetch-from-follower, networking, partitions, best practices etc. Most discussions are driven by tech only, instead of looking at the big picture: how this beautiful infra is being used.&lt;/p&gt;

&lt;p&gt;Unpopular opinion: &lt;strong&gt;most of your Kafka cost is not due to infrastructure, it's due to a usage problem.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the cost actually comes from
&lt;/h2&gt;

&lt;p&gt;Vendor calculators are hard to compare because of so many assumptions. Replication multipliers, disk class, compression ratio, tiered storage (billed at the replicated rate or the actual S3 rate). The price you see is almost never what you pay.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RF=3 multiplies the per-GB price by 3 everywhere.&lt;/strong&gt; And tiered storage is often &lt;em&gt;still&lt;/em&gt; billed at the replicated rate even though only one copy lives in S3. You're paying the RF=3 rate for data Kafka no longer replicates. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-VPC, in-region traffic&lt;/strong&gt; between your account and the vendor's lands on &lt;em&gt;your&lt;/em&gt; cloud bill, roughly 1c/GB each way depending on the path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without fetch-from-follower, most consumer fetches cross AZ boundaries.&lt;/strong&gt; With three balanced AZs, ~2/3 of consumer reads go cross-AZ, because the leader lives in one AZ and the other two reads come from elsewhere.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compression is often just... off.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With zstd at sane batch sizes, JSON-ish logs and metrics commonly compress 8–10x:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;compression.type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;zstd&lt;/span&gt;
&lt;span class="py"&gt;batch.size&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;65536          # 64KB&lt;/span&gt;
&lt;span class="py"&gt;linger.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Going from 5x to 10x halves your stored bytes &lt;em&gt;and&lt;/em&gt; halves the replication bytes flowing inside the cluster. You pay for that traffic three times over at RF=3, so the ratio matters.&lt;/p&gt;

&lt;p&gt;And fetch-from-follower, available since Kafka 2.4, is a broker + consumer config away. Same-AZ traffic inside your VPC is free on AWS, so no cross-AZ tax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# broker
&lt;/span&gt;&lt;span class="py"&gt;replica.selector.class&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;org.apache.kafka.common.replica.RackAwareReplicaSelector&lt;/span&gt;
&lt;span class="py"&gt;broker.rack&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;us-east-1a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# consumer — must match the broker's rack value
&lt;/span&gt;&lt;span class="py"&gt;client.rack&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;us-east-1a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do all of it: fetch-from-follower, tiered storage, compression enforcement, partition right-sizing, BYOC to apply your existing cloud discount, single-AZ topics where you can tolerate it. But notice that &lt;em&gt;it's still infrastructure tuning.&lt;/em&gt; Let's go up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost is a stack, not a line item
&lt;/h2&gt;

&lt;p&gt;When you tune anything in Kafka, you think in layers, bottom-up: hardware, JVM, broker config, producer/consumer tuning, topic design, application code. Same for cost:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cloud infrastructure&lt;/strong&gt;: instance types, AZ placement, networking, BYOC negotiation. At big contract sizes, negotiated networking discounts can hit 90%, but only if traffic flows through &lt;em&gt;your&lt;/em&gt; account. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broker &amp;amp; protocol tuning&lt;/strong&gt;: compression, retention, RF, fetch-from-follower, tiered storage, partition count. Easy, they're config changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture&lt;/strong&gt;: diskless topics, Iceberg topics, single-AZ topics, proxies between clients and brokers, virtual clusters for multi-tenancy and non-prod consolidation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage&lt;/strong&gt;: fan-out, governance, discovery, self-service. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Often &lt;strong&gt;payoff goes up as you go higher (more system thinking)&lt;/strong&gt; Everyone's comfortable arguing about GP2 vs GP3 (volume types on AWS. Almost nobody thinks "why are 40% of these partitions doing nothing?"&lt;/p&gt;

&lt;p&gt;Speaking of which: most clusters carry &lt;a href="https://www.conduktor.io/blog/the-surprising-cost-of-kafka-partition-waste" rel="noopener noreferrer"&gt;40–70% partition waste&lt;/a&gt;, did you know that? On managed Kafka that's per-partition-hour billing. On self-managed, you hit the ~4,000–6,000 partition-replicas-per-broker ceiling (RF=3 turns 100k partitions into 300k replicas to host and track). KRaft raises the ceiling but it doesn't make the waste free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fan-out is the whole point of Kafka
&lt;/h2&gt;

&lt;p&gt;Kafka exists so that one byte written can be read by N independent consumers, decoupled in time, with zero coordination back to the producer. That's the log abstraction's reason to live.&lt;/p&gt;

&lt;p&gt;Do you measure your average fan-out? If it's 1, you probably shouldn't be running Kafka at all, you're paying for a distributed log to do a point-to-point queue's job. LinkedIn famously ran at ~5.4: the same bytes, written once, read by 5.4 independent teams.&lt;/p&gt;

&lt;p&gt;Cluster cost stays flat while consumers grow, so cost-per-business-outcome is decreasing the more we consume existing topics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cost_per_use_case = cluster_cost / fan_out

fan-out 1  -&amp;gt;  $X      (one team carries the whole bill)
fan-out 3  -&amp;gt;  $X / 3
fan-out 5  -&amp;gt;  $X / 5  (same hardware, five outcomes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Is our Kafka usage growing?" is the wrong question. More business use-cases reading existing data is the best money you'll ever spend. Duplicated topics because nobody could find the existing one is pure waste, more storage, more replication, more pipelines, all because discovery and ownership are missing.&lt;/p&gt;

&lt;p&gt;The same goes for partitions: people over-provision because nobody knows how to size them, and you can't reduce partition count after the fact (breaks key ordering). The only way to surface that waste is &lt;a href="https://www.conduktor.io/blog/chargeback-attribute-map-kafka-costs-to-your-business" rel="noopener noreferrer"&gt;chargeback at the team-and-topic level&lt;/a&gt;. You can't optimize what you can't attribute.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A third of our traffic, we know what it has to do with, but we don't know exactly what they're doing."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the usage layer leaking. It costs money, and nobody can fix it because nobody knows how to, where to look, or just own it. It's not an infra problem, it's governance, discovery, and self-service.&lt;/p&gt;

&lt;p&gt;Cost optimization is &lt;strong&gt;everybody's concern and nobody's objective.&lt;/strong&gt; Teams over-provision because &lt;em&gt;what if we need it later&lt;/em&gt; and &lt;em&gt;what if it breaks when we touch it&lt;/em&gt; are rational fears. "It's expensive" is not a business case. What works is showing the waste, the annual dollar number, and the effort to reclaim it, with a name next to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  2026: Where to spend your effort
&lt;/h2&gt;

&lt;p&gt;Most deployments I see have way more headroom in the usage layer than the infra layer: topics nobody reads, partitions nobody needs, teams who'd benefit from streaming but find it too painful to onboard.&lt;/p&gt;

&lt;p&gt;There's a funny industry reflex here too. We chase the next architectural shiny thing, diskless, Iceberg topics, single-AZ, before we've answered the boring questions: who's using this, for what, and why aren't more teams using it?&lt;/p&gt;

&lt;p&gt;My actual recommendation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Do the infrastructure pass once.&lt;/strong&gt; Instance types, AZ placement, BYOC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do the config pass once.&lt;/strong&gt; Compression, retention, partition right-sizing, fetch-from-follower, tiered storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spend the rest of the year on the usage layer.&lt;/strong&gt; Fan-out, ownership, discovery, chargeback, self-service.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps 1 and 2 are a sprint. Step 3 is the marathon.&lt;/p&gt;

&lt;p&gt;If you want to see where your usage layer is leaking, Conduktor's field engineering team does a &lt;a href="https://www.conduktor.io/contact/kafka-cost-analysis" rel="noopener noreferrer"&gt;free Kafka cost analysis&lt;/a&gt;: they'll map cost back to teams and topics and show you where the payoff sits. And if you just want to keep reading, &lt;a href="https://www.conduktor.io/blog/a-better-conversation-about-kafka-costs" rel="noopener noreferrer"&gt;Why Kafka Costs Keep Rising&lt;/a&gt; and &lt;a href="https://www.conduktor.io/blog/the-surprising-cost-of-kafka-partition-waste" rel="noopener noreferrer"&gt;the partition waste deep-dive&lt;/a&gt; are good next stops.&lt;/p&gt;

&lt;p&gt;What's your average fan-out? If you don't know it off the top of your head, that's probably where I'd start.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>You can do WHAT with a Kafka proxy?</title>
      <dc:creator>Stéphane Derosiaux</dc:creator>
      <pubDate>Mon, 15 Jun 2026 13:43:15 +0000</pubDate>
      <link>https://dev.to/conduktor/you-can-do-what-with-a-kafka-proxy-42b1</link>
      <guid>https://dev.to/conduktor/you-can-do-what-with-a-kafka-proxy-42b1</guid>
      <description>&lt;p&gt;At Current 2026, I realized that nobody knows exactly what a Kafka proxy can do.&lt;/p&gt;

&lt;p&gt;Most engineers and architects think it's just some kind of reverse-proxy for Kafka (think nginx) to do routing and used to bridge a legacy or non-native client to the cluster. &lt;/p&gt;

&lt;p&gt;That's not it. It's barely the start of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Encryption
&lt;/h2&gt;

&lt;p&gt;For instance, an engineer at a UK building society had a hard requirement: encrypt personally identifiable fields before they ever hit Kafka: emails, national insurance numbers, that kind of data.&lt;/p&gt;

&lt;p&gt;His team built encryption into the application layer. Every producer that touched PII got encryption code. Every consumer got decryption code. Key handling, rotation, etc. to manage across services.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In every producer that touches PII...&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Customer&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;encrypt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Customer&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crypto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;encrypt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEmail&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;keyRef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"pii-key"&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setSsn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crypto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;encrypt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSsn&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;keyRef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"pii-key"&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"customers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// And the mirror image in every consumer...&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Customer&lt;/span&gt; &lt;span class="nf"&gt;decrypt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Customer&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crypto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;decrypt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEmail&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setSsn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crypto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;decrypt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSsn&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiply that by all the micro-services to update and maintain now (cross languages, versioning, access to KMS etc.). That's quite expensive, at implementation time and to maintain.&lt;/p&gt;

&lt;p&gt;He didn't know a Kafka proxy could have done the whole thing at the record level, outside the apps. When we chat about it, he just realized he might have done a mistake.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Kafka proxy does
&lt;/h2&gt;

&lt;p&gt;A Kafka proxy sits between your clients and your brokers and speaks the Kafka protocol. Clients connect to it exactly like they'd connect to a broker. No SDK, no app changes. It works for Kafka clients, Kafka Connect, Kafka Streams, Flink, Spark, etc. It's fully transparent to them.&lt;/p&gt;

&lt;p&gt;It makes it a natural place to put policy that doesn't belong inside your application and doesn't belong inside the cluster either.&lt;/p&gt;

&lt;p&gt;Encryption is the obvious one. Instead of touching dozens of applications, you declare the rule once. With &lt;a href="https://www.conduktor.io/gateway" rel="noopener noreferrer"&gt;Conduktor Gateway&lt;/a&gt; it's what we call an interceptor: a small piece of config applied to traffic matching a topic pattern. Roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Interceptor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"apiVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gateway/v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"encrypt-customer-pii"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"spec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"pluginClass"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.conduktor.gateway.interceptor.EncryptionPlugin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customers.*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"kmsConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"kms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VAULT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"vault"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://vault:8200"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"recordValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"fieldName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"algorithm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AES256_GCM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"keySecretId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pii-key"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"fieldName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ssn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"algorithm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AES256_GCM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"keySecretId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pii-key"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy encrypts the fields on the way in and authorized consumers get them decrypted on the way out, everyone else gets ciphertext. The application code shrinks back to just... sending a record, not dealing with KMS and secrets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Same producer, after.&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"customers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the rule lives in one declarative place (the proxy), you can do things that are painful at the app layer.&lt;/p&gt;

&lt;p&gt;For instance, crypto-shredding for GDPR. Delete the key, and every message encrypted with it becomes unreadable, instantly, across all your retention. You don't go hunting through topics for one person's data. You revoke a key. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Masking, validation, isolation: same one place
&lt;/h2&gt;

&lt;p&gt;Once the proxy is in the Kafka path, the same pattern opens a lot of doors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Field-level masking&lt;/strong&gt;: show &lt;code&gt;j***@example.com&lt;/code&gt; to one team, the real value to another, same topic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema and payload validation&lt;/strong&gt;: reject malformed records at the edge instead of poisoning a downstream consumer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topic aliasing for migration&lt;/strong&gt;: point clients at a stable name while you move the real topic between clusters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Virtual clusters&lt;/strong&gt;: carve one physical cluster into isolated tenants without standing up new infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit and policy enforcement&lt;/strong&gt;: log and gate access without patching the broker or the client.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of that touches application code or broker config. It's policy, declared once, enforced in the path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this connects to cost and self-service
&lt;/h2&gt;

&lt;p&gt;The other big topic at the conference was cost. Conduktor published a &lt;a href="https://www.conduktor.io/resources/ebooks/where-kafka-costs-hide-a-field-guide" rel="noopener noreferrer"&gt;field guide on where Kafka costs hide&lt;/a&gt; in April.&lt;/p&gt;

&lt;p&gt;Then the self-service conversation: teams want developers to create topics and request access in autonomy, but simple Topic on GitOps solution is just not enough, because self-service without guardrails easily turns Kafka into a mess:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If somebody goes onto the tool and adds in something ridiculous, like a thousand partitions, we need someone to have eyes on that. That's something we've learned we can't let go of."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Look at what encryption-at-the-proxy, cost guardrails, and self-service approval gates have in common. They're all policy that belongs &lt;em&gt;between&lt;/em&gt; your developers and your brokers, not baked into either one. Push it into the app and you copy-paste it dozens of time. Push it into the cluster and you can't change it without a migration. Put it in the layer in between, declare it once, and you can actually govern it.&lt;/p&gt;

&lt;p&gt;To build AI agents on top of streaming, the plumbing must come first: ownership, schema discipline, key custody, data quality before the data even enters Kafka, etc. A proxy that enforces structure and policy is a big chunk of that plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  So where does your policy live?
&lt;/h2&gt;

&lt;p&gt;What people think a proxy does is pass packets. What it really does is more of a safekeeper that holds the policy.&lt;/p&gt;

&lt;p&gt;If you've got encryption, masking, validation, or multi-tenant isolation scattered across your services right now, it's worth asking whether any of it should be living one layer down instead.&lt;/p&gt;

&lt;p&gt;Want to go deeper? The &lt;a href="https://www.conduktor.io/gateway" rel="noopener noreferrer"&gt;Gateway overview&lt;/a&gt; walks through the interceptor model, and the &lt;a href="https://www.conduktor.io/blog/what-we-learned-at-current-2026" rel="noopener noreferrer"&gt;original Current 2026 write-up&lt;/a&gt; has the rest of what we heard on the floor.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>architecture</category>
      <category>dataengineering</category>
      <category>devops</category>
    </item>
    <item>
      <title>I let an AI agent set up my entire Kafka platform. Here's what actually happened.</title>
      <dc:creator>Stéphane Derosiaux</dc:creator>
      <pubDate>Mon, 08 Jun 2026 13:34:35 +0000</pubDate>
      <link>https://dev.to/conduktor/i-let-an-ai-agent-set-up-my-entire-kafka-platform-heres-what-actually-happened-220m</link>
      <guid>https://dev.to/conduktor/i-let-an-ai-agent-set-up-my-entire-kafka-platform-heres-what-actually-happened-220m</guid>
      <description>&lt;p&gt;Your AI coding assistant can explain consumer groups, rebalancing, and exactly-once semantics. Ask it to actually &lt;em&gt;set up&lt;/em&gt; a Kafka platform with governance, though, and it won't be able to do that on its own.&lt;/p&gt;

&lt;p&gt;Between hallucinations, misunderstanding, production impact (I really saw Claude messing up a rolling upgrade of Kafka brokers), and the lack of knowledge of the products your Kafka infra is relying on, there's a lot working against it&lt;/p&gt;

&lt;p&gt;The models, besides their training, have zero context about your infra. They've never seen your cluster, don't know your policies (technical, governance), and often have no way to check anything against your actual environment.&lt;/p&gt;

&lt;p&gt;You can give it the missing context using Conduktor.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that was missing
&lt;/h2&gt;

&lt;p&gt;There is an open-source &lt;a href="https://github.com/conduktor/skills" rel="noopener noreferrer"&gt;Conduktor skill&lt;/a&gt; you install into your AI assistant. It works with Claude Code, Cursor, VS Code Copilot, Gemini CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add conduktor/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is teaching the agent the whole platform and how to run process against it: Console, Gateway, and the CLI, so it can be efficient and not hallucinate.&lt;/p&gt;

&lt;p&gt;After the install, the agent discovers your environment (Kafka clusters, Schema Registry, policies, etc.), asks questions based on what it finds, generates configs with &lt;em&gt;real&lt;/em&gt; values and best practices, and runs everything with dry-run validation before it touches anything.&lt;/p&gt;

&lt;p&gt;The CLI are really its "hands" as more deep than just MCP. The skill is the playbook where all the experience and practices from years of usage are written. This does a big difference VS "generate some YAML and cross fingers"&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting from absolutely nothing
&lt;/h2&gt;

&lt;p&gt;You can start from scratch with just Docker running and nothing else. No Kafka, no Conduktor, no config. When I just ask this (with the Conduktor skill setup): &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;install Conduktor and set it up so I can login&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It checked my environment, asked what I was trying to do, wrote a &lt;code&gt;docker-compose.yml&lt;/code&gt;, spun up the containers, hit one error along the way, self-corrected, and handed me a working platform, Kafka &amp;amp; Console perfectly configured.&lt;/p&gt;

&lt;p&gt;I could ask the same but on my production Kubernetes. It would follow best practices too, use Helm, discover my environment, etc., and in minutes everything would be wired perfectly, with policies already in place.&lt;/p&gt;

&lt;p&gt;This is much more powerful than a "human" quickstart, as the range of applications it covers is just wider and more production-ready already. The agent knows the Kafka domain, and with the skill it knows Conduktor, so the combination of both makes it ask me the right questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance, without becoming a Kafka lawyer
&lt;/h2&gt;

&lt;p&gt;Running Kafka isn't the hard part anymore. Making it &lt;em&gt;safe for a team to share&lt;/em&gt; is the hard part: naming conventions, ownership boundaries, policies. This is what prevent a Kafka cluster from turning into a wasteland of &lt;code&gt;test-topic-final-v2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The beautiful thing is to be able to ask large prompts like this now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;set up governance for two teams, Payments and Analytics, with topic policies and cross-team permissions&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It worked in stages and figured out the dependency ordering itself. When the API rejected something, it read the rejection, restructured the YAML, and retried, with minimal hand-holding from me (just asking what policies I want based on what's possible). It ended up creating the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;TopicPolicy&lt;/code&gt; objects: locking down naming per team, enforcing safe defaults (retention, replication, required labels) across every topic. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Application&lt;/code&gt; objects with non-overlapping resource boundaries to define ownership of resources and teams.&lt;/li&gt;
&lt;li&gt;Topics with descriptions and labels in the catalog.&lt;/li&gt;
&lt;li&gt;Cross-team permission giving Analytics read access to &lt;code&gt;payments.orders.*&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is federated ownership in practice: the platform team sets the boundaries, developers move freely inside them. Normally that knowledge takes months to accumulate and lives spreadsheet or Jira tickets. Here it lives in a skill file that every agent on the team can read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now flip to the developer side
&lt;/h2&gt;

&lt;p&gt;Once those guardrails exist, a developer on the Payments team installs the &lt;em&gt;same skill&lt;/em&gt; and never has to know any of it happened. No &lt;code&gt;ApplicationInstance&lt;/code&gt;, no &lt;code&gt;TopicPolicy&lt;/code&gt;, no YAML. They just talk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What topics do we have?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent runs &lt;code&gt;conduktor get Topic&lt;/code&gt; and shows the catalog — descriptions, owners, labels, visibility. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need a topic for my service."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent checks their &lt;code&gt;ApplicationInstance&lt;/code&gt;, reads the policy constraints (naming prefix &lt;code&gt;payments.*&lt;/code&gt;, retention one-to-seven days, a required &lt;code&gt;data-criticality&lt;/code&gt; label), asks what the topic is for, generates compliant YAML, dry-runs it, and applies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic/payments.fulfillment.shipped: Created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The developer just got a topic that's compliant by default. Without the skill, that's a JIRA ticket most likely, and asking platform team what's the right shape and what to put.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How do I produce to my topic?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It reads the cluster config, grabs the real bootstrap server, and hands back working code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Producer&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:19092&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments.fulfillment.shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ord-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orderId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ord-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy, paste, run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need to read the Analytics team's clickstream."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent finds that &lt;code&gt;analytics.clickstream.pageviews&lt;/code&gt; belongs to the Analytics team, then writes a read-only permission scoped to exactly that topic, at both the Kafka and Console layers. The developer doesn't know what an ACL is or what &lt;code&gt;patternType: LITERAL&lt;/code&gt; means. They asked in English and got access. &lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually take away from this
&lt;/h2&gt;

&lt;p&gt;This walkthrough only touched governance and onboarding. The skill also covers Gateway (Kafka proxy) encryption, data quality rules, Terraform export, and CI/CD scaffolding.&lt;/p&gt;

&lt;p&gt;Try it, it's one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add conduktor/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's &lt;a href="https://github.com/conduktor/skills" rel="noopener noreferrer"&gt;open source&lt;/a&gt;, so if you hit a workflow it handles badly, open a PR. And if you're new to Conduktor, the &lt;a href="https://www.conduktor.io/community" rel="noopener noreferrer"&gt;Community Edition&lt;/a&gt; is free and self-hosted, the skill will do the install for you.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was adapted from the &lt;a href="https://www.conduktor.io/blog/set-up-a-kafka-platform-with-an-ai-agent" rel="noopener noreferrer"&gt;original on the Conduktor blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to analyze the cost of Kafka?</title>
      <dc:creator>Stéphane Derosiaux</dc:creator>
      <pubDate>Mon, 25 May 2026 15:19:36 +0000</pubDate>
      <link>https://dev.to/conduktor/how-to-analyze-the-cost-of-kafka-2a4b</link>
      <guid>https://dev.to/conduktor/how-to-analyze-the-cost-of-kafka-2a4b</guid>
      <description>&lt;p&gt;Which side are you on: "This is just what Kafka costs at scale" or "We should switch to a cheaper Kafka provider"?&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://conduktor.io" rel="noopener noreferrer"&gt;Conduktor&lt;/a&gt;, our field team works inside Kafka environments that have been running for a long time. We see this: most Kafka teams are overpaying by 25 to 40 percent. Not because anyone did anything wrong, but because of how Kafka got built up over time.&lt;/p&gt;

&lt;p&gt;The cost drivers of Kafka are weirdly context-dependent: the infrastructure and the provider are a tiny part of the full picture. &lt;/p&gt;

&lt;p&gt;The "how" it's being used is the real question.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five bad patterns eating budget
&lt;/h2&gt;

&lt;p&gt;Below is what see, the same patterns show up everywhere, and are the first things we work with our customers.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Partition overprovisioning
&lt;/h3&gt;

&lt;p&gt;"How many partitions?" is the most common question with Kafka. I heard last week someone telling me an org just defaults to "64". I was shocked. Not only providers may price per partitions, but from a Kafka point of view: this takes metadata and open files etc.&lt;/p&gt;

&lt;p&gt;Partitions depend on throughput and concurrency expected (consumer parallelism). If a 64-partitions topic is sitting in a cluster with barely no traffic, you're just losing money on all sides. Multiply by dozens or hundreds of topics at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Retention that makes no sense
&lt;/h3&gt;

&lt;p&gt;Long retention on topics that nobody reads past the last few hours. Do you need replay? Default is 7-day retention, but it's often applied uniformly, when some topics only need a couple of hours and others genuinely need weeks.&lt;/p&gt;

&lt;p&gt;Tips: when using compacted topics and/or Kafka streams (changelog etc.), data is being stored indefinitely, that can cause some security/regulations issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Let's spin up another cluster
&lt;/h3&gt;

&lt;p&gt;One-cluster-per-team was a reasonable isolation strategy a long time ago. We saw this multiple times, more than 500 clusters, with tons of mirroring to share data. Throwing money down the drain.&lt;/p&gt;

&lt;p&gt;You're paying for underutilized clusters instead of consolidating onto fewer well-managed ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Zombie topics
&lt;/h3&gt;

&lt;p&gt;Topics created for experiments, migrations, or one-off tests that were never cleaned up. It's a simple thing but cost so much money as no one is looking. Every one of them is replicated and has retention costs. We've seen enterprises with hundreds of zombie topics, who were so surprised when we showed them.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Runaway egress
&lt;/h3&gt;

&lt;p&gt;We had a customer where egress was running 30x higher than ingress on a single topic because of a misconfigured consumer. Buggy consumers, unnecessary fan-out, and chatty clients create traffic patterns that are invisible without dedicated infra monitoring. Egress is rarely free.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to deal with it
&lt;/h2&gt;

&lt;p&gt;Pick your starting point based on where the waste is concentrated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop the bleeding: better defaults
&lt;/h3&gt;

&lt;p&gt;Low-coordination work that pays off over time. It's better to have exceptions rather than wrong defaults you can't rollback.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set sensible low partition defaults (3) and short retention (1 day). Increase if necessary only. &lt;/li&gt;
&lt;li&gt;Enforce client-side compression. (Conduktor Gateway)&lt;/li&gt;
&lt;li&gt;Require ownership metadata at topic creation. (Conduktor)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This won't reduce your bill right away, but it will prevent it from getting worse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trim the fat: optimize what's running
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tune retention where it's drifted, analyze consumer patterns.&lt;/li&gt;
&lt;li&gt;Retire topics with no active producers or consumers.&lt;/li&gt;
&lt;li&gt;Right-size partition counts (this is the hard one, since it means recreating topics and coordinating with every producer and consumer). - Consolidate Kafka clusters, introduce multi-tenancy (Conduktor)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This work easily moves the infrastructure bill, we saw reductions of $500k just doing this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Now, keep it clean, be disciplined
&lt;/h2&gt;

&lt;p&gt;After a cleanup, the same "drift" will start operating again.&lt;/p&gt;

&lt;p&gt;To help you keeping the direction, have absolute visibility into what you Kafka ecosystems contains and what it costs (&lt;a href="https://conduktor.io/blog/chargeback-attribute-map-kafka-costs-to-your-business" rel="noopener noreferrer"&gt;chargeback&lt;/a&gt; is powerful for this), clear ownership so every topic and cluster has a team accountable for it, and a regular review cadence to catch drift before it becomes permanent. Not heavyweight governance. Just enough discipline that the cleanup doesn't have to be repeated every year.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;The diagnostic question is simple: which of these patterns are present in your environment, and what are they costing you?&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://conduktor.io/blog/a-better-conversation-about-kafka-costs" rel="noopener noreferrer"&gt;original deep-dive&lt;/a&gt; goes further into the four layers of Kafka cost (infrastructure, ecosystem tooling, vendor/licensing, and operational) and includes a framework for sequencing the work.&lt;/p&gt;

&lt;p&gt;If you want to look at your own estate, Conduktor's field team does a &lt;a href="https://conduktor.io/contact/demo" rel="noopener noreferrer"&gt;free cost analysis&lt;/a&gt; where they walk through your environment with you and give you concrete numbers.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>datastreaming</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
