<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Conduktor</title>
    <description>The latest articles on DEV Community by Conduktor (@conduktor).</description>
    <link>https://dev.to/conduktor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7353%2Fe70b4d17-904a-4b56-9963-4dd2ac5aa071.png</url>
      <title>DEV Community: Conduktor</title>
      <link>https://dev.to/conduktor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/conduktor"/>
    <language>en</language>
    <item>
      <title>I let an AI agent set up my entire Kafka platform. Here's what actually happened.</title>
      <dc:creator>Stéphane Derosiaux</dc:creator>
      <pubDate>Mon, 08 Jun 2026 13:34:35 +0000</pubDate>
      <link>https://dev.to/conduktor/i-let-an-ai-agent-set-up-my-entire-kafka-platform-heres-what-actually-happened-220m</link>
      <guid>https://dev.to/conduktor/i-let-an-ai-agent-set-up-my-entire-kafka-platform-heres-what-actually-happened-220m</guid>
      <description>&lt;p&gt;Your AI coding assistant can explain consumer groups, rebalancing, and exactly-once semantics. Ask it to actually &lt;em&gt;set up&lt;/em&gt; a Kafka platform with governance, though, and it won't be able to do that on its own.&lt;/p&gt;

&lt;p&gt;Between hallucinations, misunderstanding, production impact (I really saw Claude messing up a rolling upgrade of Kafka brokers), and the lack of knowledge of the products your Kafka infra is relying on, there's a lot working against it&lt;/p&gt;

&lt;p&gt;The models, besides their training, have zero context about your infra. They've never seen your cluster, don't know your policies (technical, governance), and often have no way to check anything against your actual environment.&lt;/p&gt;

&lt;p&gt;You can give it the missing context using Conduktor.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that was missing
&lt;/h2&gt;

&lt;p&gt;There is an open-source &lt;a href="https://github.com/conduktor/skills" rel="noopener noreferrer"&gt;Conduktor skill&lt;/a&gt; you install into your AI assistant. It works with Claude Code, Cursor, VS Code Copilot, Gemini CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add conduktor/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is teaching the agent the whole platform and how to run process against it: Console, Gateway, and the CLI, so it can be efficient and not hallucinate.&lt;/p&gt;

&lt;p&gt;After the install, the agent discovers your environment (Kafka clusters, Schema Registry, policies, etc.), asks questions based on what it finds, generates configs with &lt;em&gt;real&lt;/em&gt; values and best practices, and runs everything with dry-run validation before it touches anything.&lt;/p&gt;

&lt;p&gt;The CLI are really its "hands" as more deep than just MCP. The skill is the playbook where all the experience and practices from years of usage are written. This does a big difference VS "generate some YAML and cross fingers"&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting from absolutely nothing
&lt;/h2&gt;

&lt;p&gt;You can start from scratch with just Docker running and nothing else. No Kafka, no Conduktor, no config. When I just ask this (with the Conduktor skill setup): &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;install Conduktor and set it up so I can login&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It checked my environment, asked what I was trying to do, wrote a &lt;code&gt;docker-compose.yml&lt;/code&gt;, spun up the containers, hit one error along the way, self-corrected, and handed me a working platform, Kafka &amp;amp; Console perfectly configured.&lt;/p&gt;

&lt;p&gt;I could ask the same but on my production Kubernetes. It would follow best practices too, use Helm, discover my environment, etc., and in minutes everything would be wired perfectly, with policies already in place.&lt;/p&gt;

&lt;p&gt;This is much more powerful than a "human" quickstart, as the range of applications it covers is just wider and more production-ready already. The agent knows the Kafka domain, and with the skill it knows Conduktor, so the combination of both makes it ask me the right questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance, without becoming a Kafka lawyer
&lt;/h2&gt;

&lt;p&gt;Running Kafka isn't the hard part anymore. Making it &lt;em&gt;safe for a team to share&lt;/em&gt; is the hard part: naming conventions, ownership boundaries, policies. This is what prevent a Kafka cluster from turning into a wasteland of &lt;code&gt;test-topic-final-v2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The beautiful thing is to be able to ask large prompts like this now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;set up governance for two teams, Payments and Analytics, with topic policies and cross-team permissions&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It worked in stages and figured out the dependency ordering itself. When the API rejected something, it read the rejection, restructured the YAML, and retried, with minimal hand-holding from me (just asking what policies I want based on what's possible). It ended up creating the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;TopicPolicy&lt;/code&gt; objects: locking down naming per team, enforcing safe defaults (retention, replication, required labels) across every topic. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Application&lt;/code&gt; objects with non-overlapping resource boundaries to define ownership of resources and teams.&lt;/li&gt;
&lt;li&gt;Topics with descriptions and labels in the catalog.&lt;/li&gt;
&lt;li&gt;Cross-team permission giving Analytics read access to &lt;code&gt;payments.orders.*&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is federated ownership in practice: the platform team sets the boundaries, developers move freely inside them. Normally that knowledge takes months to accumulate and lives spreadsheet or Jira tickets. Here it lives in a skill file that every agent on the team can read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now flip to the developer side
&lt;/h2&gt;

&lt;p&gt;Once those guardrails exist, a developer on the Payments team installs the &lt;em&gt;same skill&lt;/em&gt; and never has to know any of it happened. No &lt;code&gt;ApplicationInstance&lt;/code&gt;, no &lt;code&gt;TopicPolicy&lt;/code&gt;, no YAML. They just talk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What topics do we have?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent runs &lt;code&gt;conduktor get Topic&lt;/code&gt; and shows the catalog — descriptions, owners, labels, visibility. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need a topic for my service."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent checks their &lt;code&gt;ApplicationInstance&lt;/code&gt;, reads the policy constraints (naming prefix &lt;code&gt;payments.*&lt;/code&gt;, retention one-to-seven days, a required &lt;code&gt;data-criticality&lt;/code&gt; label), asks what the topic is for, generates compliant YAML, dry-runs it, and applies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic/payments.fulfillment.shipped: Created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The developer just got a topic that's compliant by default. Without the skill, that's a JIRA ticket most likely, and asking platform team what's the right shape and what to put.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How do I produce to my topic?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It reads the cluster config, grabs the real bootstrap server, and hands back working code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Producer&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:19092&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments.fulfillment.shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ord-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orderId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ord-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy, paste, run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need to read the Analytics team's clickstream."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent finds that &lt;code&gt;analytics.clickstream.pageviews&lt;/code&gt; belongs to the Analytics team, then writes a read-only permission scoped to exactly that topic, at both the Kafka and Console layers. The developer doesn't know what an ACL is or what &lt;code&gt;patternType: LITERAL&lt;/code&gt; means. They asked in English and got access. &lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually take away from this
&lt;/h2&gt;

&lt;p&gt;This walkthrough only touched governance and onboarding. The skill also covers Gateway (Kafka proxy) encryption, data quality rules, Terraform export, and CI/CD scaffolding.&lt;/p&gt;

&lt;p&gt;Try it, it's one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add conduktor/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's &lt;a href="https://github.com/conduktor/skills" rel="noopener noreferrer"&gt;open source&lt;/a&gt;, so if you hit a workflow it handles badly, open a PR. And if you're new to Conduktor, the &lt;a href="https://www.conduktor.io/community" rel="noopener noreferrer"&gt;Community Edition&lt;/a&gt; is free and self-hosted, the skill will do the install for you.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was adapted from the &lt;a href="https://www.conduktor.io/blog/set-up-a-kafka-platform-with-an-ai-agent" rel="noopener noreferrer"&gt;original on the Conduktor blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to analyze the cost of Kafka?</title>
      <dc:creator>Stéphane Derosiaux</dc:creator>
      <pubDate>Mon, 25 May 2026 15:19:36 +0000</pubDate>
      <link>https://dev.to/conduktor/how-to-analyze-the-cost-of-kafka-2a4b</link>
      <guid>https://dev.to/conduktor/how-to-analyze-the-cost-of-kafka-2a4b</guid>
      <description>&lt;p&gt;Which side are you on: "This is just what Kafka costs at scale" or "We should switch to a cheaper Kafka provider"?&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://conduktor.io" rel="noopener noreferrer"&gt;Conduktor&lt;/a&gt;, our field team works inside Kafka environments that have been running for a long time. We see this: most Kafka teams are overpaying by 25 to 40 percent. Not because anyone did anything wrong, but because of how Kafka got built up over time.&lt;/p&gt;

&lt;p&gt;The cost drivers of Kafka are weirdly context-dependent: the infrastructure and the provider are a tiny part of the full picture. &lt;/p&gt;

&lt;p&gt;The "how" it's being used is the real question.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five bad patterns eating budget
&lt;/h2&gt;

&lt;p&gt;Below is what see, the same patterns show up everywhere, and are the first things we work with our customers.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Partition overprovisioning
&lt;/h3&gt;

&lt;p&gt;"How many partitions?" is the most common question with Kafka. I heard last week someone telling me an org just defaults to "64". I was shocked. Not only providers may price per partitions, but from a Kafka point of view: this takes metadata and open files etc.&lt;/p&gt;

&lt;p&gt;Partitions depend on throughput and concurrency expected (consumer parallelism). If a 64-partitions topic is sitting in a cluster with barely no traffic, you're just losing money on all sides. Multiply by dozens or hundreds of topics at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Retention that makes no sense
&lt;/h3&gt;

&lt;p&gt;Long retention on topics that nobody reads past the last few hours. Do you need replay? Default is 7-day retention, but it's often applied uniformly, when some topics only need a couple of hours and others genuinely need weeks.&lt;/p&gt;

&lt;p&gt;Tips: when using compacted topics and/or Kafka streams (changelog etc.), data is being stored indefinitely, that can cause some security/regulations issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Let's spin up another cluster
&lt;/h3&gt;

&lt;p&gt;One-cluster-per-team was a reasonable isolation strategy a long time ago. We saw this multiple times, more than 500 clusters, with tons of mirroring to share data. Throwing money down the drain.&lt;/p&gt;

&lt;p&gt;You're paying for underutilized clusters instead of consolidating onto fewer well-managed ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Zombie topics
&lt;/h3&gt;

&lt;p&gt;Topics created for experiments, migrations, or one-off tests that were never cleaned up. It's a simple thing but cost so much money as no one is looking. Every one of them is replicated and has retention costs. We've seen enterprises with hundreds of zombie topics, who were so surprised when we showed them.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Runaway egress
&lt;/h3&gt;

&lt;p&gt;We had a customer where egress was running 30x higher than ingress on a single topic because of a misconfigured consumer. Buggy consumers, unnecessary fan-out, and chatty clients create traffic patterns that are invisible without dedicated infra monitoring. Egress is rarely free.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to deal with it
&lt;/h2&gt;

&lt;p&gt;Pick your starting point based on where the waste is concentrated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop the bleeding: better defaults
&lt;/h3&gt;

&lt;p&gt;Low-coordination work that pays off over time. It's better to have exceptions rather than wrong defaults you can't rollback.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set sensible low partition defaults (3) and short retention (1 day). Increase if necessary only. &lt;/li&gt;
&lt;li&gt;Enforce client-side compression. (Conduktor Gateway)&lt;/li&gt;
&lt;li&gt;Require ownership metadata at topic creation. (Conduktor)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This won't reduce your bill right away, but it will prevent it from getting worse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trim the fat: optimize what's running
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tune retention where it's drifted, analyze consumer patterns.&lt;/li&gt;
&lt;li&gt;Retire topics with no active producers or consumers.&lt;/li&gt;
&lt;li&gt;Right-size partition counts (this is the hard one, since it means recreating topics and coordinating with every producer and consumer). - Consolidate Kafka clusters, introduce multi-tenancy (Conduktor)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This work easily moves the infrastructure bill, we saw reductions of $500k just doing this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Now, keep it clean, be disciplined
&lt;/h2&gt;

&lt;p&gt;After a cleanup, the same "drift" will start operating again.&lt;/p&gt;

&lt;p&gt;To help you keeping the direction, have absolute visibility into what you Kafka ecosystems contains and what it costs (&lt;a href="https://conduktor.io/blog/chargeback-attribute-map-kafka-costs-to-your-business" rel="noopener noreferrer"&gt;chargeback&lt;/a&gt; is powerful for this), clear ownership so every topic and cluster has a team accountable for it, and a regular review cadence to catch drift before it becomes permanent. Not heavyweight governance. Just enough discipline that the cleanup doesn't have to be repeated every year.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;The diagnostic question is simple: which of these patterns are present in your environment, and what are they costing you?&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://conduktor.io/blog/a-better-conversation-about-kafka-costs" rel="noopener noreferrer"&gt;original deep-dive&lt;/a&gt; goes further into the four layers of Kafka cost (infrastructure, ecosystem tooling, vendor/licensing, and operational) and includes a framework for sequencing the work.&lt;/p&gt;

&lt;p&gt;If you want to look at your own estate, Conduktor's field team does a &lt;a href="https://conduktor.io/contact/demo" rel="noopener noreferrer"&gt;free cost analysis&lt;/a&gt; where they walk through your environment with you and give you concrete numbers.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>datastreaming</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
