<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Byron Hsieh</title>
    <description>The latest articles on DEV Community by Byron Hsieh (@hantedyou_0106).</description>
    <link>https://dev.to/hantedyou_0106</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3569973%2Fd6adf80b-720f-412c-979c-c45050958577.jpg</url>
      <title>DEV Community: Byron Hsieh</title>
      <link>https://dev.to/hantedyou_0106</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hantedyou_0106"/>
    <language>en</language>
    <item>
      <title>Why Data Teams Need Data Lineage: From Common Pain Points to Real-World Challenges</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Thu, 12 Mar 2026 09:24:31 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/why-data-teams-need-data-lineage-from-common-pain-points-to-real-world-challenges-1m9c</link>
      <guid>https://dev.to/hantedyou_0106/why-data-teams-need-data-lineage-from-common-pain-points-to-real-world-challenges-1m9c</guid>
      <description>&lt;h2&gt;
  
  
  Why Data Teams Need Data Lineage
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;From Common Pain Points to Real-World Challenges&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Data Lineage has become a core component of modern data platforms. It provides transparency, traceability, and observability needed for maintainability and governance. This article first covers the common challenges faced by data teams, then explores the additional real-world difficulties encountered in highly heterogeneous SQL ecosystems — the context that motivated building an automated SQL lineage extraction tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Why Data Teams Need Data Lineage (General)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Complex dependencies make incident investigation expensive
&lt;/h3&gt;

&lt;p&gt;In mature data warehouses, data flows across multiple transformation layers.&lt;br&gt;&lt;br&gt;
When issues occur, teams must determine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where does this column come from?
&lt;/li&gt;
&lt;li&gt;Which transformation introduced the issue?
&lt;/li&gt;
&lt;li&gt;What downstream tables will be impacted?
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without lineage, this requires manual code tracing — slow and error-prone.&lt;/p&gt;


&lt;h3&gt;
  
  
  Risky schema and logic changes without impact analysis
&lt;/h3&gt;

&lt;p&gt;Schema updates or ETL refactoring require an understanding of downstream dependencies.&lt;br&gt;&lt;br&gt;
Without lineage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dashboards may break
&lt;/li&gt;
&lt;li&gt;ML pipelines may fail
&lt;/li&gt;
&lt;li&gt;Incidents appear only after deployment
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lineage makes changes predictable.&lt;/p&gt;


&lt;h3&gt;
  
  
  Slow onboarding and weak knowledge transfer
&lt;/h3&gt;

&lt;p&gt;Most data platforms lack complete documentation.&lt;br&gt;&lt;br&gt;
Newcomers must learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table relationships
&lt;/li&gt;
&lt;li&gt;Column semantics
&lt;/li&gt;
&lt;li&gt;End-to-end data flow
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lineage accelerates onboarding by offering a visual data map.&lt;/p&gt;


&lt;h3&gt;
  
  
  Data teams become a support center for business users
&lt;/h3&gt;

&lt;p&gt;Common questions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“How is this metric calculated?”
&lt;/li&gt;
&lt;li&gt;“What is the source of this column?”
&lt;/li&gt;
&lt;li&gt;“Why does this month’s number differ?”
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without lineage, engineers manually search SQL each time.&lt;/p&gt;


&lt;h3&gt;
  
  
  Inefficient Data Quality (DQ) incident handling and reprocessing
&lt;/h3&gt;

&lt;p&gt;After identifying a data quality issue, teams must decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which downstream tables need reprocessing?
&lt;/li&gt;
&lt;li&gt;How far does the impact propagate?
&lt;/li&gt;
&lt;li&gt;What is the safe order for backfilling data?
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without lineage, reprocessing scope is often guesswork — teams either miss affected systems or waste time rerunning unnecessary jobs. Lineage provides the dependency map needed for surgical, efficient remediation.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Real-World Challenges: A Highly Heterogeneous SQL Ecosystem
&lt;/h2&gt;

&lt;p&gt;Beyond typical issues, certain enterprise environments present extreme complexity due to decades of organic growth and diverse technology stacks. Drawing from experience in large-scale data warehouses, this section describes challenges that make automated lineage extraction particularly difficult — challenges that motivated building a specialized preprocessing and parsing pipeline.&lt;/p&gt;


&lt;h3&gt;
  
  
  Thousands of legacy batch jobs
&lt;/h3&gt;

&lt;p&gt;In mature data warehouses, jobs accumulate over years, created by different teams with inconsistent conventions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Highly coupled dependencies
&lt;/li&gt;
&lt;li&gt;Missing or outdated metadata
&lt;/li&gt;
&lt;li&gt;Manual dependency tracking becomes impractical at scale
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Automation becomes essential.&lt;/p&gt;


&lt;h3&gt;
  
  
  SQL is not pure SQL: templates, embedded code, and vendor dialects
&lt;/h3&gt;

&lt;p&gt;SQL rarely exists in isolation in production systems. Common patterns include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Template-based SQL:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SOURCE_SCHEMA&lt;/span&gt;&lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="n"&gt;customer_data&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;batch_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;RUN_DATE&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL embedded in application code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python: Dynamic query construction with f-strings or string concatenation&lt;/li&gt;
&lt;li&gt;COBOL: SQL embedded in EXEC SQL blocks&lt;/li&gt;
&lt;li&gt;Perl: Template systems mixing procedural logic with SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vendor-specific dialects:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teradata procedural extensions (.SET, .IF, EXEC)&lt;/li&gt;
&lt;li&gt;Oracle PL/SQL blocks&lt;/li&gt;
&lt;li&gt;T-SQL stored procedures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each approach complicates lineage extraction — the SQL parser must first extract the query from its host language context.&lt;/p&gt;




&lt;h3&gt;
  
  
  Noise from comments, multilingual content, and inconsistent naming
&lt;/h3&gt;

&lt;p&gt;Legacy codebases often accumulate various forms of noise:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comments and documentation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mix of inline, multi-line, and vendor-specific comment styles&lt;/li&gt;
&lt;li&gt;Documentation in multiple languages (reflecting global or offshore teams)&lt;/li&gt;
&lt;li&gt;Outdated comments referencing deprecated logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Naming inconsistencies:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Column names in different languages (English, local language, or mixed)&lt;/li&gt;
&lt;li&gt;Meaningless aliases (&lt;code&gt;t1&lt;/code&gt;, &lt;code&gt;x&lt;/code&gt;, &lt;code&gt;temp_final_final&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Reused table names across contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Extracting clean lineage requires distinguishing signal from noise — comments must be removed, but not at the expense of losing vendor-specific SQL syntax.&lt;/p&gt;




&lt;h3&gt;
  
  
  SQL patterns too complex for off-the-shelf lineage tools
&lt;/h3&gt;

&lt;p&gt;Production SQL often contains patterns that defeat generic parsers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Computed column dependencies within the same SELECT:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;acct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; 
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;acct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'00'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;acct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'07'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'C'&lt;/span&gt;
        &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'X'&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;derived_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;-- References the column defined above&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; 
        &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;derived_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'C'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="n"&gt;acct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;effective_balance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;-- Further chained reference&lt;/span&gt;
    &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;effective_balance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversion_rate&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;converted_balance&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;source_table&lt;/span&gt; &lt;span class="n"&gt;acct&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;exchange_rates&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;acct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currency&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is common in financial ETL but problematic for parsers: &lt;code&gt;derived_status&lt;/code&gt; is defined and immediately referenced in the same SELECT clause. Generic parsers fail because they expect column references to resolve to source tables, not to computed columns in the same projection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other challenging patterns:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deeply nested subqueries (5+ levels) with alias renaming at each layer&lt;/li&gt;
&lt;li&gt;Large UNION chains combining 10+ tables with overlapping column names&lt;/li&gt;
&lt;li&gt;Dynamic table/column name resolution (metadata-driven ETL)&lt;/li&gt;
&lt;li&gt;Vendor-specific ranking functions (QUALIFY in Teradata, TOP in T-SQL)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each pattern requires specialized AST traversal and context tracking beyond what standard SQL parsers provide.&lt;/p&gt;




&lt;h3&gt;
  
  
  A preprocessing pipeline becomes required
&lt;/h3&gt;

&lt;p&gt;To reliably feed SQL into lineage engines (e.g., sqllineage, sqlglot), a multi-stage preprocessing pipeline is necessary:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleaning and normalization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comment removal (preserving vendor syntax)&lt;/li&gt;
&lt;li&gt;Template variable expansion and macro resolution&lt;/li&gt;
&lt;li&gt;Vendor-specific syntax normalization (Teradata, Oracle, T-SQL, etc.)&lt;/li&gt;
&lt;li&gt;Removal of multilingual content and non-standard formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Structure extraction:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separating DDL metadata from DML queries&lt;/li&gt;
&lt;li&gt;Extracting SQL from host language contexts (Python, COBOL, shell scripts)&lt;/li&gt;
&lt;li&gt;Tokenization and syntax tree preparation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only after this normalization can standard lineage engines produce reliable results. The preprocessing layer becomes as critical as the parser itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Conclusion
&lt;/h2&gt;

&lt;p&gt;Data Lineage is not optional — it is foundational to modern data operations. In environments with large amounts of legacy SQL and heterogeneous scripting, a robust lineage pipeline is crucial for maintainability and reliability.&lt;/p&gt;

&lt;p&gt;This context motivated the development of an automated SQL lineage extraction tool. In the next articles of this series, we'll explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: Getting started with Python and &lt;code&gt;sqllineage&lt;/code&gt; for standard SQL scenarios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: System Design - A Production-Ready SQL Lineage Pipeline

&lt;ul&gt;
&lt;li&gt;Architecture designed for real production environments with thousands of legacy SQL files&lt;/li&gt;
&lt;li&gt;Multi-stage processing pipeline: preprocessing → parsing → lineage extraction → export&lt;/li&gt;
&lt;li&gt;Happy path workflow and edge case handling strategies&lt;/li&gt;
&lt;li&gt;Error isolation, graceful degradation, and comprehensive logging&lt;/li&gt;
&lt;li&gt;Design decisions: single-threaded processing, step-based architecture, and scalability considerations&lt;/li&gt;
&lt;li&gt;Real-world deployment and operational characteristics&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

</description>
      <category>dataengineering</category>
      <category>sql</category>
      <category>datalineage</category>
      <category>database</category>
    </item>
    <item>
      <title>dbt + OpenLineage #1: Why dbt-ol Is a Post-Processor (Not a Plugin) — and Why It Matters</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Wed, 04 Mar 2026 14:29:19 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/dbt-openlineage-1-why-dbt-ol-is-a-post-processor-not-a-plugin-and-why-it-matters-897</link>
      <guid>https://dev.to/hantedyou_0106/dbt-openlineage-1-why-dbt-ol-is-a-post-processor-not-a-plugin-and-why-it-matters-897</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;After exploring cloud migration patterns in my current work, I started thinking about one of the next layers: &lt;strong&gt;data lineage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Legacy lineage systems built on top of SQLLineage often struggle with on-premise codebases that lack enforced standards — SQL mixed across legacy scripts and raw &lt;code&gt;.sql&lt;/code&gt; files, runtime variables, shared temp tables, duplicate aliases. Accuracy tends to top out around 70%.&lt;/p&gt;

&lt;p&gt;The cloud migration changed the equation. A common modern pattern is to use a Spark-based ingestion layer for data landing and dbt for transformation — and with that comes actual standards: consistent naming conventions, declarative SQL, and reproducible artifacts. That's the foundation automated lineage needs. But it also introduced a new challenge: the ingestion layer and dbt are separate tools, and getting end-to-end lineage means both need to emit events in a common format. OpenLineage is designed exactly for this. This series starts with the dbt side: given that &lt;code&gt;manifest.json&lt;/code&gt; already captures the dependency graph, what does OpenLineage actually emit?&lt;/p&gt;

&lt;p&gt;This series documents my hands-on exploration of &lt;code&gt;openlineage-dbt&lt;/code&gt;. In this first post, I'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How &lt;code&gt;dbt-ol&lt;/code&gt; works under the hood (it's not what I initially assumed)&lt;/li&gt;
&lt;li&gt;The anatomy of an OpenLineage event: Job, Run, and Dataset&lt;/li&gt;
&lt;li&gt;What column-level lineage actually looks like in a raw &lt;code&gt;.ndjson&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;Why &lt;code&gt;inputs: []&lt;/code&gt; is empty — and why that matters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full project is on GitHub: &lt;a href="https://github.com/hantedyou/openlineage-dbt" rel="noopener noreferrer"&gt;openlineage-dbt&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transformation&lt;/td&gt;
&lt;td&gt;dbt-core + dbt-duckdb&lt;/td&gt;
&lt;td&gt;Lightweight, no server needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;DuckDB&lt;/td&gt;
&lt;td&gt;File-based, perfect for local learning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lineage&lt;/td&gt;
&lt;td&gt;openlineage-dbt&lt;/td&gt;
&lt;td&gt;Official dbt integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;File (&lt;code&gt;.ndjson&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Start simple, switch to Marquez later&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Project Architecture
&lt;/h3&gt;

&lt;p&gt;One design decision I made early on: keep the dbt project completely clean of OpenLineage configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openlineage_dbt/
├── jaffle_shop/          ← pure dbt project (deployable independently)
│   ├── dbt_project.yml
│   ├── models/
│   │   ├── staging/      stg_customers, stg_orders, stg_payments
│   │   ├── intermediate/
│   │   └── marts/
│   └── seeds/            raw_customers.csv, raw_orders.csv, raw_payments.csv
├── openlineage/
│   ├── openlineage.yml   ← OL config lives here, not inside jaffle_shop/
│   └── events/           ← file transport output (.gitignored)
└── docker/               ← Marquez compose (Milestone 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenLineage config is injected via environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENLINEAGE_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;../openlineage/openlineage.yml
&lt;span class="nb"&gt;cd &lt;/span&gt;jaffle_shop
uv run dbt-ol run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; The dbt project stays pure transformation logic; lineage config is an infrastructure concern. The same dbt project runs unchanged across environments — only the injected config differs. For this learning project, &lt;code&gt;OPENLINEAGE_CONFIG&lt;/code&gt; is sufficient. In production, the newer &lt;a href="https://openlineage.io/docs/client/python/configuration" rel="noopener noreferrer"&gt;&lt;code&gt;OPENLINEAGE__&lt;/code&gt; double-underscore env var system&lt;/a&gt; (e.g., &lt;code&gt;OPENLINEAGE__TRANSPORT__TYPE=http&lt;/code&gt;) is the recommended approach — no config file needed at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Learning #1: How dbt-ol Actually Works
&lt;/h2&gt;

&lt;p&gt;My first assumption was that &lt;code&gt;dbt-ol&lt;/code&gt; intercepts dbt's execution and emits events in real time — like a plugin hooked into each model run. That turned out to be wrong.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dbt-ol&lt;/code&gt; is a &lt;strong&gt;post-processing wrapper&lt;/strong&gt;. Here's what actually happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dbt-ol run
    │
    ├── 1. Run dbt normally (identical to `dbt run`)
    │       └── produces: manifest.json, run_results.json
    │
    └── 2. After dbt completes, read the artifacts
            └── parse manifest.json + run_results.json
                └── emit OpenLineage events to transport
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The implications of this design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Events are emitted &lt;strong&gt;after&lt;/strong&gt; execution, not during&lt;/li&gt;
&lt;li&gt;All lineage information comes from &lt;strong&gt;static artifact parsing&lt;/strong&gt;, not runtime introspection&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;dbt run&lt;/code&gt; fails mid-way, only the completed models get events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why &lt;code&gt;catalog.json&lt;/code&gt; matters — &lt;code&gt;dbt-ol&lt;/code&gt; reads it (if available) to enrich output datasets with schema information (field names and DuckDB types). More on this in a later section.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Two Artifacts
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Artifact&lt;/th&gt;
&lt;th&gt;Produced by&lt;/th&gt;
&lt;th&gt;What dbt-ol reads from it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;manifest.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;dbt run&lt;/code&gt; / &lt;code&gt;dbt compile&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Model graph, compiled SQL, dependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_results.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dbt run&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execution status, timing per model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;catalog.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dbt docs generate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Column names and types (output schema)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Running It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set config path (relative to execution directory)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENLINEAGE_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;../openlineage/openlineage.yml

&lt;span class="nb"&gt;cd &lt;/span&gt;jaffle_shop
uv run dbt-ol run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One subtle gotcha: &lt;code&gt;log_file_path&lt;/code&gt; in &lt;code&gt;openlineage.yml&lt;/code&gt; is resolved &lt;strong&gt;relative to where you run &lt;code&gt;dbt-ol&lt;/code&gt;&lt;/strong&gt; (i.e., &lt;code&gt;jaffle_shop/&lt;/code&gt;), not relative to the config file itself. So the path needs to account for that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# openlineage/openlineage.yml&lt;/span&gt;
&lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file&lt;/span&gt;
  &lt;span class="na"&gt;log_file_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;../openlineage/events/events.ndjson&lt;/span&gt;  &lt;span class="c1"&gt;# relative to jaffle_shop/&lt;/span&gt;
  &lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Learning #2: Anatomy of an OpenLineage Event
&lt;/h2&gt;

&lt;p&gt;Running &lt;code&gt;dbt-ol run&lt;/code&gt; with 3 staging models produces 8 events in &lt;code&gt;events.ndjson&lt;/code&gt; — one JSON object per line (NDJSON format):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Line 1: parent START       → the entire dbt run
Line 2: stg_customers START
Line 3: stg_orders START
Line 4: stg_payments START
Line 5: stg_customers COMPLETE
Line 6: stg_orders COMPLETE
Line 7: stg_payments COMPLETE
Line 8: parent COMPLETE    → the entire dbt run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every event shares the same top-level structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"eventType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"START"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"eventTime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-27T14:55:51.150612+00:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"job"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Three Core Entities
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Job&lt;/strong&gt; — what the work &lt;em&gt;is&lt;/em&gt; (static, doesn't change between runs)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"job"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev.main.jaffle_shop.stg_customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dbt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"facets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"jobType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"jobType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MODEL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"integration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DBT"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sql"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dialect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duckdb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"with source as (...) select ..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Run&lt;/strong&gt; — this specific &lt;em&gt;execution instance&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"019c9f99-59c5-75bd-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;UUID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;v&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;new&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;every&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;execution&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"facets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"parent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"job"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dbt-run-jaffle_shop"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"runId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"019c9f99-516e-7d6a-..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;same&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;across&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;models&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;dbt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;run&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dataset&lt;/strong&gt; — the data being read or written&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev.main.stg_customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duckdb://dev.duckdb"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why the parent Facet Matters
&lt;/h3&gt;

&lt;p&gt;Every model's &lt;code&gt;run&lt;/code&gt; points to the same parent &lt;code&gt;runId&lt;/code&gt; — the ID of the overall &lt;code&gt;dbt run&lt;/code&gt;. This is what allows Marquez (or any OpenLineage backend) to group all models from a single execution together, and eventually reconstruct the full DAG from a single pipeline run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dbt-run-jaffle_shop  (runId: 019c9f99-516e...)
    ├── stg_customers  (runId: 019c9f99-59c5-75bd..., parent → 019c9f99-516e...)
    ├── stg_orders     (runId: 019c9f99-59c5-7bf9..., parent → 019c9f99-516e...)
    └── stg_payments   (runId: 019c9f99-59c5-7a35..., parent → 019c9f99-516e...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Learning #3: Column-Level Lineage in Practice
&lt;/h2&gt;

&lt;p&gt;Column-level lineage (CLL) is where OpenLineage gets interesting. Without any extra configuration, &lt;code&gt;dbt-ol&lt;/code&gt; already parses the compiled SQL and produces field-level mappings in &lt;code&gt;outputs[].facets.columnLineage&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here's &lt;code&gt;stg_customers.sql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;                             &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;full_name&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And what OpenLineage extracts from it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"columnLineage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"inputFields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"source"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"first_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"inputFields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"first_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"source"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"last_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"inputFields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"source"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"full_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"inputFields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"first_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"source"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"source"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For simple rename and concatenation, the tracking is accurate:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Output column&lt;/th&gt;
&lt;th&gt;Source column(s)&lt;/th&gt;
&lt;th&gt;Tracking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;customer_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ rename detected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;first_name&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;first_name&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ pass-through&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;full_name&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;first_name&lt;/code&gt; + &lt;code&gt;last_name&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅ multi-source detected&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The CTE Alias Problem
&lt;/h3&gt;

&lt;p&gt;Notice &lt;code&gt;"name": "source"&lt;/code&gt; in every &lt;code&gt;inputFields&lt;/code&gt; entry. That's the CTE alias — not the actual table name &lt;code&gt;raw_customers&lt;/code&gt;. OpenLineage parsed the SQL correctly, but without knowing what &lt;code&gt;source&lt;/code&gt; resolves to, the lineage chain is broken at the CTE boundary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw_customers  →  source (CTE)  →  stg_customers
                  ↑
              lineage stops here (CTE alias, not resolved to actual table)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a SQL parser limitation — the parser sees the CTE alias &lt;code&gt;source&lt;/code&gt; but doesn't trace it back to &lt;code&gt;raw_customers&lt;/code&gt;. Adding &lt;code&gt;catalog.json&lt;/code&gt; does not resolve this; &lt;code&gt;"name": "source"&lt;/code&gt; persists in both runs. That's exactly what Milestone 2 investigates.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Missing: The Empty inputs[]
&lt;/h2&gt;

&lt;p&gt;Every model event in this run has &lt;code&gt;"inputs": []&lt;/code&gt;. The output dataset and its column lineage are present, but there's no record of what was &lt;em&gt;read&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"eventType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"COMPLETE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;←&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;empty&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev.main.stg_customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"facets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columnLineage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason: these staging models read from &lt;strong&gt;seed tables&lt;/strong&gt; (&lt;code&gt;raw_customers&lt;/code&gt;, &lt;code&gt;raw_orders&lt;/code&gt;, &lt;code&gt;raw_payments&lt;/code&gt;). Seeds are created by &lt;code&gt;dbt seed&lt;/code&gt; and exist as real tables in DuckDB, but &lt;code&gt;dbt-ol&lt;/code&gt; does not treat them as upstream lineage datasets — they don't appear in &lt;code&gt;inputs[]&lt;/code&gt; regardless of what artifacts are present.&lt;/p&gt;

&lt;h3&gt;
  
  
  What catalog.json actually changes
&lt;/h3&gt;

&lt;p&gt;Running &lt;code&gt;dbt docs generate&lt;/code&gt; produces &lt;code&gt;target/catalog.json&lt;/code&gt;, which records the actual schema (column names and types) of every table and view in the database as DuckDB sees them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run dbt docs generate   &lt;span class="c"&gt;# produces target/catalog.json&lt;/span&gt;
uv run dbt-ol run          &lt;span class="c"&gt;# now reads catalog.json alongside manifest.json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;catalog.json&lt;/code&gt; present, the &lt;strong&gt;output&lt;/strong&gt; dataset gains a &lt;code&gt;SchemaDatasetFacet&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev.main.stg_customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"facets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"columnLineage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INTEGER"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"first_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VARCHAR"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VARCHAR"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"full_name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VARCHAR"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;inputs[]&lt;/code&gt; remains empty. &lt;code&gt;catalog.json&lt;/code&gt; enriches the &lt;em&gt;output&lt;/em&gt; side with type information — it does not resolve the upstream seed tables into input datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  What actually changes with catalog.json
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Without catalog.json&lt;/th&gt;
&lt;th&gt;With catalog.json&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inputs[]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;[]&lt;/code&gt; empty&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;[]&lt;/code&gt; still empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output &lt;code&gt;schema&lt;/code&gt; facet&lt;/td&gt;
&lt;td&gt;absent&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;present&lt;/strong&gt; (field names + DuckDB types)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;columnLineage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;present&lt;/td&gt;
&lt;td&gt;present (unchanged)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CTE alias in &lt;code&gt;inputFields&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"name": "source"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"name": "source"&lt;/code&gt; (unchanged)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap this leaves: there's no dataset-to-dataset edge connecting &lt;code&gt;raw_customers&lt;/code&gt; → &lt;code&gt;stg_customers&lt;/code&gt; in the emitted events. For this learning project the seeds act as the raw data layer, but in a real pipeline where staging models read from &lt;code&gt;source()&lt;/code&gt; references (external tables), &lt;code&gt;dbt-ol&lt;/code&gt; &lt;em&gt;would&lt;/em&gt; populate &lt;code&gt;inputs[]&lt;/code&gt; — that's the scenario where catalog.json matters for input schema enrichment.&lt;/p&gt;

&lt;p&gt;This is exactly what Milestone 2 tests: intermediate and mart models that ref staging models (not seeds) — where &lt;code&gt;inputs[]&lt;/code&gt; is no longer empty.&lt;/p&gt;




&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. dbt-ol is a post-processor, not a runtime interceptor&lt;/strong&gt;&lt;br&gt;
It reads &lt;code&gt;manifest.json&lt;/code&gt; and &lt;code&gt;run_results.json&lt;/code&gt; after &lt;code&gt;dbt run&lt;/code&gt; completes. This means lineage accuracy depends entirely on what's in the artifacts — and &lt;code&gt;catalog.json&lt;/code&gt; is what adds output schema (field names and DuckDB types) to the emitted events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Job/Run/Dataset model maps cleanly onto dbt concepts&lt;/strong&gt;&lt;br&gt;
Job = dbt model definition. Run = one execution instance. Dataset = a table or view in the database. The &lt;code&gt;parent&lt;/code&gt; facet ties all model runs within a single &lt;code&gt;dbt run&lt;/code&gt; together, which is what makes pipeline-level lineage possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Column-level lineage works out of the box for simple SQL&lt;/strong&gt;&lt;br&gt;
Renames, pass-throughs, and multi-column expressions are all tracked correctly. The CTE alias issue (&lt;code&gt;"name": "source"&lt;/code&gt; instead of the actual table name) is a known parser limitation — &lt;code&gt;catalog.json&lt;/code&gt; does not resolve it. The &lt;code&gt;inputFields&lt;/code&gt; still reference the CTE alias even after &lt;code&gt;dbt docs generate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Keep lineage config separate from the dbt project&lt;/strong&gt;&lt;br&gt;
Keeping &lt;code&gt;openlineage.yml&lt;/code&gt; outside the dbt project enforces a clean boundary: dbt handles transformation, lineage config is infrastructure. &lt;code&gt;OPENLINEAGE_CONFIG&lt;/code&gt; works well for local learning. In production, the newer &lt;a href="https://openlineage.io/docs/client/python/configuration" rel="noopener noreferrer"&gt;&lt;code&gt;OPENLINEAGE__&lt;/code&gt; double-underscore env var system&lt;/a&gt; (e.g., &lt;code&gt;OPENLINEAGE__TRANSPORT__TYPE=http&lt;/code&gt;) is the recommended path — each value injected directly, no config file required. The separation habit transfers either way.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In Milestone 2, I'll run &lt;code&gt;dbt docs generate&lt;/code&gt; to bring &lt;code&gt;catalog.json&lt;/code&gt; into the picture, add intermediate and mart models with more complex SQL (CTEs + JOINs + window functions), and examine how column-level lineage accuracy degrades as SQL complexity increases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openlineage.io/docs/integrations/dbt/" rel="noopener noreferrer"&gt;OpenLineage dbt Integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openlineage.io/docs/client/python/configuration" rel="noopener noreferrer"&gt;OpenLineage Python Client Configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openlineage.io/blog/dynamic-env-variables/" rel="noopener noreferrer"&gt;Simplify OpenLineage Configuration with Dynamic Env Vars (Sept 2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openlineage.io/docs/spec/examples/" rel="noopener noreferrer"&gt;OpenLineage Event Spec &amp;amp; Examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/hantedyou/openlineage-dbt" rel="noopener noreferrer"&gt;Project Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dbt</category>
      <category>openlineage</category>
      <category>dataengineering</category>
      <category>python</category>
    </item>
    <item>
      <title>Why Multi-Agent Deployment Might Slow Down Your Redshift DDL Deployments</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Thu, 05 Feb 2026 09:16:53 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/why-multi-agent-deployment-might-slow-down-your-redshift-ddl-deployments-hkp</link>
      <guid>https://dev.to/hantedyou_0106/why-multi-agent-deployment-might-slow-down-your-redshift-ddl-deployments-hkp</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Our data warehouse team was scaling up deployment volumes. Based on prior experience (approximately 100 tables in 30 minutes), we knew larger deployments would take considerably longer.&lt;/p&gt;

&lt;p&gt;When concerns about execution time arose, I noticed Azure DevOps offered multi-agent deployment capabilities. The idea seemed straightforward: distribute the workload across multiple agents to speed things up.&lt;/p&gt;

&lt;p&gt;Before implementing, I investigated what would actually happen at the database level.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Deployment Pattern
&lt;/h2&gt;

&lt;p&gt;For each table deployment, we execute DDL and DML operations across a typical multi-layered data warehouse architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data ingestion and transformation layers&lt;/strong&gt; — Create tables for staging, integration, and historical tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Presentation layer&lt;/strong&gt; — Create views for data consumption and abstraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ETL orchestration metadata&lt;/strong&gt; — Update control tables that manage job dependencies and execution tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When scaling from ~100 tables to several hundred, this translates to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thousands of DDL statements (&lt;code&gt;CREATE TABLE&lt;/code&gt; / &lt;code&gt;CREATE VIEW&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Hundreds of DML operations on shared control tables&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Azure DevOps Multi-Agent: Understanding the Mechanism
&lt;/h2&gt;

&lt;p&gt;Azure DevOps supports &lt;strong&gt;parallel job execution&lt;/strong&gt; through two primary interfaces, both achieving the same underlying behavior:&lt;/p&gt;

&lt;h3&gt;
  
  
  Classic Editor (Release Pipelines)
&lt;/h3&gt;

&lt;p&gt;In Azure DevOps Classic Editor, you can configure parallelism via:&lt;br&gt;
&lt;strong&gt;&lt;code&gt;Agent Job &amp;gt; Execution Plan &amp;gt; Parallelism &amp;gt; Multi-agent&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This setting duplicates the entire Agent Job across multiple agents.&lt;/p&gt;
&lt;h3&gt;
  
  
  YAML Pipelines
&lt;/h3&gt;

&lt;p&gt;The equivalent in YAML pipelines uses the &lt;code&gt;parallel&lt;/code&gt; strategy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DeployTables&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;parallel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;  &lt;span class="c1"&gt;# This DUPLICATES the job 10 times&lt;/span&gt;
  &lt;span class="na"&gt;pool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vmImage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ubuntu-latest'&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;# WARNING: Without work distribution logic,&lt;/span&gt;
      &lt;span class="s"&gt;# ALL 10 agents will execute THE SAME steps!&lt;/span&gt;
      &lt;span class="s"&gt;psql -h redshift-cluster -f deploy_all_tables.sql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Both approaches have the same fundamental behavior:&lt;/strong&gt; They create multiple identical jobs where each agent executes the same tasks unless you explicitly write logic to distribute the work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Actually Distribute Work
&lt;/h3&gt;

&lt;h4&gt;
  
  
  In Classic Editor (Release Pipelines)
&lt;/h4&gt;

&lt;p&gt;You need to use predefined variables or custom logic in your tasks to determine which portion of work each agent should handle.&lt;/p&gt;

&lt;h4&gt;
  
  
  In YAML Pipelines
&lt;/h4&gt;

&lt;p&gt;Azure DevOps provides system variables for manual work distribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DeployTables&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;parallel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;bash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;# Use system variables to determine which tables THIS agent should handle&lt;/span&gt;
      &lt;span class="s"&gt;POSITION=$(System.JobPositionInPhase)  # Values: 1, 2, 3, ..., 10&lt;/span&gt;
      &lt;span class="s"&gt;TOTAL=$(System.TotalJobsInPhase)       # Value: 10&lt;/span&gt;

      &lt;span class="s"&gt;# Calculate this agent's workload (50 tables each)&lt;/span&gt;
      &lt;span class="s"&gt;START=$((($POSITION - 1) * 50 + 1))&lt;/span&gt;
      &lt;span class="s"&gt;END=$(($POSITION * 50))&lt;/span&gt;

      &lt;span class="s"&gt;echo "Agent $POSITION deploying tables $START to $END"&lt;/span&gt;
      &lt;span class="s"&gt;for i in $(seq $START $END); do&lt;/span&gt;
        &lt;span class="s"&gt;psql -f "table_${i}.sql"&lt;/span&gt;
      &lt;span class="s"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Even with correct work distribution, DDL operations on Redshift face a fundamental limitation&lt;/strong&gt; that makes parallelization ineffective.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Critical Understanding
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What developers often assume:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Azure DevOps will automatically split my tables across 10 agents."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What actually happens (in both Classic Editor and YAML):&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Azure DevOps creates 10 identical jobs. Each job runs the exact same steps unless you explicitly write logic to distribute the work.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Redshift Architecture: The Leader Node Constraint
&lt;/h2&gt;

&lt;p&gt;This is where my investigation became crucial. To understand whether multi-agent deployment would help, I needed to understand how Redshift actually handles DDL operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Redshift Handles DDL
&lt;/h3&gt;

&lt;p&gt;Redshift uses a &lt;strong&gt;Leader Node + Compute Nodes&lt;/strong&gt; architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────┐
│     Leader Node             │  ← All DDL executes here
│  - System Catalog Tables    │
│  - Query Planning           │
│  - Metadata Management      │
└─────────────────────────────┘
         │
    ┌────┴────┬────────┬────────┐
    ▼         ▼        ▼        ▼
[Compute] [Compute] [Compute] [Compute]  ← Only for data processing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key finding&lt;/strong&gt;: All DDL operations execute on the &lt;strong&gt;Leader Node&lt;/strong&gt; only, regardless of how many compute nodes you have.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Catalog Table Locking
&lt;/h3&gt;

&lt;p&gt;When you execute DDL statements, Redshift must update its &lt;strong&gt;system catalog tables&lt;/strong&gt;. While Redshift wraps PostgreSQL system catalogs with its own views (like &lt;code&gt;PG_CLASS_INFO&lt;/code&gt;, &lt;code&gt;PG_TABLE_DEF&lt;/code&gt;), the underlying PostgreSQL catalog tables are where the actual locking occurs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core PostgreSQL Catalog Tables (where locks occur):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pg_class&lt;/code&gt; — stores table/view metadata&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pg_attribute&lt;/code&gt; — stores column definitions
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pg_namespace&lt;/code&gt; — stores schema information&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pg_depend&lt;/code&gt; — stores object dependencies&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pg_type&lt;/code&gt; — stores data type definitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Redshift Information Views (built on top):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PG_CLASS_INFO&lt;/code&gt;, &lt;code&gt;PG_ATTRIBUTE_INFO&lt;/code&gt; — Redshift wrappers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PG_TABLE_DEF&lt;/code&gt; — Redshift-specific comprehensive view&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SVV_TABLE_INFO&lt;/code&gt; — System view with distribution information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Critical finding&lt;/strong&gt;: DDL operations require &lt;strong&gt;ACCESS EXCLUSIVE LOCKS&lt;/strong&gt; on the underlying &lt;code&gt;pg_class&lt;/code&gt; and &lt;code&gt;pg_attribute&lt;/code&gt; tables, which are &lt;strong&gt;global singleton resources&lt;/strong&gt; on the Leader Node. This forces all DDL statements to execute &lt;strong&gt;serially&lt;/strong&gt;, regardless of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many compute nodes you have (4, 10, or 100)&lt;/li&gt;
&lt;li&gt;How many Azure DevOps agents you use&lt;/li&gt;
&lt;li&gt;How well you distribute the work
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- What would happen with 10 parallel agents executing CREATE TABLE:&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;Session&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;table_001&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;  &lt;span class="c1"&gt;-- Executes (holds pg_class lock)&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;Session&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;table_021&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;  &lt;span class="c1"&gt;-- Waiting for pg_class lock ⏳&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;Session&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;table_041&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;  &lt;span class="c1"&gt;-- Waiting for pg_class lock ⏳&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;Session&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;table_061&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;  &lt;span class="c1"&gt;-- Waiting for pg_class lock ⏳&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Agents&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="n"&gt;waiting&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;-- Result: Serialized execution despite parallel agents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why This Happens:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Redshift's architecture inherits PostgreSQL's catalog design, where DDL operations must maintain ACID properties across system metadata. The Leader Node must ensure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Consistent metadata across all catalog tables&lt;/li&gt;
&lt;li&gt;No concurrent modifications to object definitions&lt;/li&gt;
&lt;li&gt;Proper dependency tracking for views and constraints&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This design choice prioritizes &lt;strong&gt;data integrity over DDL parallelism&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Control Table Locking
&lt;/h3&gt;

&lt;p&gt;Additionally, our deployment pattern includes updates to shared control tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;metadata_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;etl_control&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'example_table'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;metadata_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;etl_control&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These operations acquire &lt;strong&gt;table-level exclusive locks&lt;/strong&gt;, forcing all agents to queue for access to the same tables.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance Analysis: What Would Actually Happen
&lt;/h2&gt;

&lt;p&gt;Based on our baseline data (~100 tables in 30 minutes) and understanding of Redshift's architecture, I projected what would happen with different approaches:&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: Multi-Agent Without Work Distribution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parallel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;execute_all_tables.sql&lt;/span&gt;  &lt;span class="c1"&gt;# No work distribution!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Projected result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each of 10 agents executes all tables&lt;/li&gt;
&lt;li&gt;Total DDL operations: N_tables × 10 agents&lt;/li&gt;
&lt;li&gt;Database receives 10x the operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outcome&lt;/strong&gt;: Severe database overload, significantly slower than baseline&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 2: Multi-Agent With Correct Work Distribution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parallel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;bash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;# Correctly distribute: each agent handles subset of tables&lt;/span&gt;
      &lt;span class="s"&gt;TABLES_PER_AGENT=$(( TOTAL_TABLES / 10 ))&lt;/span&gt;
      &lt;span class="s"&gt;# ... execute only assigned tables&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Projected result based on architecture analysis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Expected (naive calculation):
  Sequential time: ~300 minutes (extrapolated from baseline)
  With 10 agents: 300 ÷ 10 = 30 minutes ✓

Actual (with Redshift catalog locks):
  Agent 1: [████████████████████] ~300 min (executing)
  Agent 2: [⏳⏳⏳⏳⏳⏳⏳⏳] wait → execute
  Agent 3: [⏳⏳⏳⏳⏳⏳⏳⏳] wait → execute
  ...
  Total time: ~300 minutes (serialized) + orchestration overhead
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the overhead?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Connection Overhead&lt;/strong&gt;: 10 simultaneous connections to Redshift (vs. 1)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lock Contention Monitoring&lt;/strong&gt;: Redshift Leader Node processing lock queues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transaction Retry Logic&lt;/strong&gt;: Agents detecting timeouts and retrying&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Latency&lt;/strong&gt;: Multiple agents competing for responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging Overhead&lt;/strong&gt;: 10x more connection logs and audit entries&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Scenario 3: Single-Agent Sequential (Current Approach)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Projected result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean serial execution&lt;/li&gt;
&lt;li&gt;Predictable linear scaling from baseline&lt;/li&gt;
&lt;li&gt;No orchestration overhead&lt;/li&gt;
&lt;li&gt;Expected time: Scales linearly from baseline (~3x for 300 tables)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance Comparison Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation Type&lt;/th&gt;
&lt;th&gt;Lock Scope&lt;/th&gt;
&lt;th&gt;Without Distribution&lt;/th&gt;
&lt;th&gt;With Distribution&lt;/th&gt;
&lt;th&gt;Actual Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CREATE TABLE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pg_class&lt;/code&gt; (global)&lt;/td&gt;
&lt;td&gt;❌ N×agents duplication&lt;/td&gt;
&lt;td&gt;❌ Serialized&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CREATE VIEW&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pg_class&lt;/code&gt;, &lt;code&gt;pg_depend&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;❌ N×agents duplication&lt;/td&gt;
&lt;td&gt;❌ Serialized&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control table DML&lt;/td&gt;
&lt;td&gt;Table-level exclusive&lt;/td&gt;
&lt;td&gt;❌ N×agents duplication&lt;/td&gt;
&lt;td&gt;❌ Serialized (same table)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;SELECT&lt;/code&gt; queries&lt;/td&gt;
&lt;td&gt;Row-level/Data blocks&lt;/td&gt;
&lt;td&gt;⚠️ N×agents queries&lt;/td&gt;
&lt;td&gt;✅ True parallel&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80-90%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Decision: Stay with Single-Agent
&lt;/h2&gt;

&lt;p&gt;Based on this analysis, the decision was clear: &lt;strong&gt;do not implement multi-agent deployment&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rationale:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No performance benefit&lt;/strong&gt;: DDL operations would serialize at the database level regardless of CI/CD parallelism&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Added complexity&lt;/strong&gt;: Multi-agent requires work distribution logic, monitoring, and coordination&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Potential risks&lt;/strong&gt;: Connection overhead and lock contention could actually slow things down&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simpler is better&lt;/strong&gt;: Single-agent deployment is more predictable and easier to debug&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Alternative optimizations considered:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Batch Transactions&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;table_001&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
   &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;view_001&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
   &lt;span class="c1"&gt;-- More statements...&lt;/span&gt;
   &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Split into Multiple Smaller Deployments&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy in batches across multiple maintenance windows&lt;/li&gt;
&lt;li&gt;Reduces single-deployment duration&lt;/li&gt;
&lt;li&gt;More manageable rollback if issues occur&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimize Individual DDL Statements&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove unnecessary &lt;code&gt;IF NOT EXISTS&lt;/code&gt; checks&lt;/li&gt;
&lt;li&gt;Defer &lt;code&gt;COMMENT&lt;/code&gt; statements to post-deployment&lt;/li&gt;
&lt;li&gt;Minimize column count where possible&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Architecture Understanding Matters More Than Tooling
&lt;/h3&gt;

&lt;p&gt;More CI/CD resources don't help if the database architecture doesn't support parallelism. Understanding Redshift's Leader Node serialization was more valuable than adding more Azure DevOps agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Challenge "Obvious" Solutions
&lt;/h3&gt;

&lt;p&gt;"More parallelism = faster" is true for data processing (SELECT queries on compute nodes), but not for DDL operations on shared catalog tables. The "obvious" solution would have wasted implementation effort with no benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Analyze Before Implementing
&lt;/h3&gt;

&lt;p&gt;By investigating the architecture before implementation, we avoided:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wasted engineering time building work distribution logic&lt;/li&gt;
&lt;li&gt;Added operational complexity with no performance gain&lt;/li&gt;
&lt;li&gt;Potential performance degradation from connection overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This preventive analysis saved resources and kept our deployment process simple and predictable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Simple Can Be Better
&lt;/h3&gt;

&lt;p&gt;Single-agent deployment with optimized batching beat complex multi-agent orchestration. Sometimes the best optimization is recognizing when complexity doesn't add value.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For Redshift DDL Deployments:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Single-agent sequential deployment&lt;/strong&gt; is optimal for DDL-heavy workloads&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Batch transactions&lt;/strong&gt; reduce round-trip overhead&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Multiple smaller deployments&lt;/strong&gt; manage risk better than one large deployment&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Optimize individual DDL statements&lt;/strong&gt; for the most impact  &lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;Multi-agent parallelism&lt;/strong&gt; provides no benefit (serialized at Leader Node)&lt;br&gt;&lt;br&gt;
❌ &lt;strong&gt;More compute nodes&lt;/strong&gt; don't speed up DDL (only data queries benefit)&lt;br&gt;&lt;br&gt;
❌ &lt;strong&gt;Complex orchestration&lt;/strong&gt; adds overhead without performance gain  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Universal Lesson:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Understanding system constraints prevents premature optimization.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The best solution isn't always the most sophisticated—it's the one that works with your architecture, not against it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Documentation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html" rel="noopener noreferrer"&gt;AWS Redshift Architecture - System Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_intro_catalog_views.html" rel="noopener noreferrer"&gt;AWS Redshift - System Catalog Tables&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&amp;amp;tabs=yaml" rel="noopener noreferrer"&gt;Azure DevOps - Specify Jobs in Your Pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/devops/pipelines/licensing/concurrent-jobs?view=azure-devops" rel="noopener noreferrer"&gt;Azure DevOps - Parallel Jobs Licensing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/current/catalogs.html" rel="noopener noreferrer"&gt;PostgreSQL Documentation - System Catalogs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Documentation Insights
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Redshift DDL Processing&lt;/strong&gt;: All DDL operations execute on the Leader Node, not compute nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure DevOps Multi-Agent&lt;/strong&gt;: Both Classic Editor and YAML approaches duplicate jobs; require manual work distribution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Catalog Locking&lt;/strong&gt;: PostgreSQL (and Redshift) use ACCESS EXCLUSIVE locks on metadata tables during DDL operations&lt;/li&gt;
&lt;/ol&gt;




</description>
      <category>aws</category>
      <category>redshift</category>
      <category>devops</category>
      <category>database</category>
    </item>
    <item>
      <title>Kafka Producer Deep Dive: From Basics to Production-Ready Configuration</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Tue, 20 Jan 2026 15:48:11 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/kafka-producer-deep-dive-from-basics-to-production-ready-configuration-26c6</link>
      <guid>https://dev.to/hantedyou_0106/kafka-producer-deep-dive-from-basics-to-production-ready-configuration-26c6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When I started learning Apache Kafka, the Producer seemed simple at first - just call &lt;code&gt;send()&lt;/code&gt; and you're done, right?&lt;/p&gt;

&lt;p&gt;Wrong.&lt;/p&gt;

&lt;p&gt;As I dug deeper, I discovered a rich set of configurations that determine whether your messages are delivered reliably or potentially lost. Understanding these concepts transformed how I think about building data pipelines.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk you through everything I learned about Kafka Producers, from the basics to production-ready configurations.&lt;/p&gt;

&lt;p&gt;This guide is based on the excellent course &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;"Apache Kafka Series - Learn Apache Kafka for Beginners v3"&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Producer Basics&lt;/li&gt;
&lt;li&gt;The Sticky Partitioner&lt;/li&gt;
&lt;li&gt;Producer Acknowledgements (acks)&lt;/li&gt;
&lt;li&gt;Producer Retries&lt;/li&gt;
&lt;li&gt;Idempotent Producer&lt;/li&gt;
&lt;li&gt;Production-Ready Configuration&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Producer Basics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setting Up a Producer
&lt;/h3&gt;

&lt;p&gt;Every Kafka Producer starts with configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bootstrap.servers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"127.0.0.1:9092"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"value.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

&lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Critical Trio: send(), flush(), close()
&lt;/h3&gt;

&lt;p&gt;This is where things get interesting:&lt;/p&gt;

&lt;h4&gt;
  
  
  send() - Asynchronous Operation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous&lt;/strong&gt;: The message goes into a buffer, NOT sent immediately&lt;/li&gt;
&lt;li&gt;If your program exits right after this, the message might never reach Kafka&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  flush() - Synchronous Operation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous&lt;/strong&gt;: Forces all buffered messages to be sent and blocks until complete&lt;/li&gt;
&lt;li&gt;Useful for learning/demos, but &lt;strong&gt;rarely used in production&lt;/strong&gt; (impacts performance)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  close() - Cleanup
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Shuts down the Producer and releases resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internally calls flush()&lt;/strong&gt; - ensures all messages are sent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MUST be called&lt;/strong&gt; in production to prevent resource leaks
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;send()          flush()         close()
  ↓               ↓               ↓
[Buffer] -----&amp;gt; [Send] -----&amp;gt; [Clean up]
(async)        (sync)         (includes flush)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Callbacks for Monitoring
&lt;/h3&gt;

&lt;p&gt;Callbacks let you track message delivery:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sent to topic: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                &lt;span class="s"&gt;" partition: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                &lt;span class="s"&gt;" offset: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error while producing"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What you can get from metadata:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;topic&lt;/code&gt;: Which topic the message was sent to&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;partition&lt;/code&gt;: Which partition number&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;offset&lt;/code&gt;: The position of this message in the partition&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;timestamp&lt;/code&gt;: When the message was created&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. The Sticky Partitioner
&lt;/h2&gt;

&lt;p&gt;When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: &lt;strong&gt;all messages went to partition 0&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Was my code broken? Not quite. This is the &lt;strong&gt;Sticky Partitioner&lt;/strong&gt; at work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Introduced in Kafka 2.4+, the Sticky Partitioner is the default when messages &lt;strong&gt;don't have a key&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Messages "stick" to one partition until a batch is full&lt;/li&gt;
&lt;li&gt;When the batch is sent, switch to a different partition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal&lt;/strong&gt;: Reduce network requests by batching more efficiently&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When Does a Batch Get Sent?
&lt;/h3&gt;

&lt;p&gt;A batch is sent when &lt;strong&gt;ANY&lt;/strong&gt; of these conditions is met:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Size limit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;batch.size&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;16384 bytes (16 KB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time limit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;linger.ms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;&lt;code&gt;flush()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Common Misconception
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;batch.size is in BYTES, not message count!&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Make batches smaller to observe partition switching&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"batch.size"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"400"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// 400 bytes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Sticky is Better Than Round-Robin
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Old Way (Pre-Kafka 2.4):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each message goes to a different partition&lt;/li&gt;
&lt;li&gt;100 messages = potentially 100 network requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New Way (Sticky Partitioner):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Messages batch together per partition&lt;/li&gt;
&lt;li&gt;100 messages = maybe 5 network requests&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fewer network calls = better throughput&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Producer Acknowledgements (acks)
&lt;/h2&gt;

&lt;p&gt;This is one of the &lt;strong&gt;most important&lt;/strong&gt; producer configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Three Levels
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Data Loss Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;acks=0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fire and forget - don't wait for any confirmation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;acks=1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wait for leader broker only&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Medium&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;acks=all&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wait for leader + all in-sync replicas&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;None&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  acks=0 (Fire and Forget)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → [send] → Broker
              ↓
         (don't wait)
              ↓
         [continue]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Producer doesn't wait for any acknowledgement&lt;/li&gt;
&lt;li&gt;Highest throughput, but &lt;strong&gt;data loss is possible&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Use case: Metrics collection where some loss is acceptable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  acks=1 (Leader Only)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → [send] → Leader Broker → [commit]
                          ↓
                    [send ack back]
                          ↓
                 Replicas sync later (background)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Producer waits for leader to acknowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Problem&lt;/strong&gt;: If leader fails before replication, data is lost&lt;/li&gt;
&lt;li&gt;Was the default from Kafka 1.0 to 2.8&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  acks=all (Full Replication)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → [send] → Leader → Replica 1 → [ack]
                      ↓           ↓
                  Replica 2 → [ack]
                      ↓
              [all acks received]
                      ↓
              [send ack to producer]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Producer waits for leader AND all in-sync replicas (ISR)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No data loss&lt;/strong&gt; (with proper configuration)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default since Kafka 3.0&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The min.insync.replicas Setting
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;acks=all&lt;/code&gt; works together with &lt;code&gt;min.insync.replicas&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Replication Factor = 3
min.insync.replicas = 2

Scenario: Only 1 broker available
Result: Producer receives NOT_ENOUGH_REPLICAS exception
        → Better to fail than to lose data!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best practice formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Brokers that can fail = Replication Factor - min.insync.replicas

Example: RF=3, min.insync=2 → Can tolerate 1 broker failure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Producer Retries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Retry Mechanism
&lt;/h3&gt;

&lt;p&gt;When sending fails, Kafka retries automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|&amp;lt;------------ delivery.timeout.ms (default: 2 min) ------------&amp;gt;|
|                                                                  |
send() → [batch] → [send] → [fail] → [wait] → [retry] → [success]
                              ↑         ↑
                         network    retry.backoff.ms
                          error      (default: 100ms)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Settings
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;retries&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2147483647 (Kafka 2.1+)&lt;/td&gt;
&lt;td&gt;Max retry attempts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;retry.backoff.ms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;Wait time between retries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delivery.timeout.ms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;120000 (2 min)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Upper bound&lt;/strong&gt; for total delivery time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Understanding delivery.timeout.ms
&lt;/h3&gt;

&lt;p&gt;This is the &lt;strong&gt;most important&lt;/strong&gt; timeout setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Total time from send() to success or failure&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"delivery.timeout.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"120000"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 2 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happens when timeout is reached:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nc"&gt;TimeoutException&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// delivery.timeout.ms exceeded&lt;/span&gt;
        &lt;span class="c1"&gt;// Message was NOT delivered - handle it!&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Out-of-Order Problem
&lt;/h3&gt;

&lt;p&gt;With retries enabled and &lt;code&gt;max.in.flight.requests.per.connection &amp;gt; 1&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Timeline:
1. Send batch A (messages 1-10)
2. Send batch B (messages 11-20)
3. Batch A fails (network error)
4. Batch B succeeds ← commits first!
5. Batch A retries and succeeds

Result in Kafka: [11-20, 1-10]  ← Out of order!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Use Idempotent Producer (next section)&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Idempotent Producer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Duplicate Problem
&lt;/h3&gt;

&lt;p&gt;Without idempotence, network errors can cause duplicates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Producer sends message
2. Kafka commits message to log
3. Kafka sends ack
4. Ack is LOST (network error)
5. Producer thinks it failed → retries
6. Kafka commits AGAIN → Duplicate!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"enable.idempotence"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Each producer gets a &lt;strong&gt;Producer ID (PID)&lt;/strong&gt; and each message batch gets a &lt;strong&gt;sequence number&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer (PID=1) sends:
  Batch 1: seq=0 → Kafka commits
  Batch 1: seq=0 → Kafka says "already have seq=0, ignoring"

Result: No duplicates!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What Idempotence Automatically Sets
&lt;/h3&gt;

&lt;p&gt;When you enable idempotence, Kafka automatically configures:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;acks&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;all&lt;/td&gt;
&lt;td&gt;Need confirmation from all replicas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;retries&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;MAX_VALUE&lt;/td&gt;
&lt;td&gt;Retry until delivery.timeout.ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max.in.flight.requests.per.connection&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;With ordering guaranteed!&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Magic of Ordering with max.in.flight=5
&lt;/h3&gt;

&lt;p&gt;You might wonder: "How can we have 5 in-flight requests AND maintain order?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer&lt;/strong&gt;: Kafka uses sequence numbers to detect out-of-order batches and rejects them, forcing the producer to retry in correct order.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Batch A (seq=0) fails, Batch B (seq=1) arrives first
Kafka: "I expected seq=0, got seq=1 - rejecting!"
Producer retries Batch A, then Batch B
Order maintained!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Default Since Kafka 3.0
&lt;/h3&gt;

&lt;p&gt;Idempotent producer is &lt;strong&gt;enabled by default&lt;/strong&gt; in Kafka 3.0+, but explicitly enable it for older versions or for clarity.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Production-Ready Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Safe Producer Settings
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Connection&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bootstrap.servers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"broker1:9092,broker2:9092,broker3:9092"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"value.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

&lt;span class="c1"&gt;// Reliability (Kafka 3.0+ defaults, but explicit is better)&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"enable.idempotence"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"acks"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"all"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"retries"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;valueOf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MAX_VALUE&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"max.in.flight.requests.per.connection"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"5"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Timeouts&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"delivery.timeout.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"120000"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// 2 minutes&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"request.timeout.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"30000"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// 30 seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  High Throughput Settings (Optional)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Batching - wait a bit to collect more messages&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"linger.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"20"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"batch.size"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;valueOf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt; &lt;span class="c1"&gt;// 32KB&lt;/span&gt;

&lt;span class="c1"&gt;// Compression - reduce network bandwidth&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"compression.type"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"snappy"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// or "lz4", "zstd"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuration Cheat Sheet
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                    PRODUCER SETTINGS                        │
├─────────────────────────────────────────────────────────────┤
│  RELIABILITY                                                │
│  ├── enable.idempotence = true     (prevent duplicates)    │
│  ├── acks = all                    (full replication)      │
│  └── retries = MAX_VALUE           (bounded by timeout)    │
├─────────────────────────────────────────────────────────────┤
│  TIMEOUTS                                                   │
│  ├── delivery.timeout.ms = 120000  (total time bound)      │
│  ├── request.timeout.ms = 30000    (per request)           │
│  └── retry.backoff.ms = 100        (between retries)       │
├─────────────────────────────────────────────────────────────┤
│  THROUGHPUT                                                 │
│  ├── linger.ms = 20                (batch collection time) │
│  ├── batch.size = 32768            (batch size in bytes)   │
│  └── compression.type = snappy     (reduce network I/O)    │
├─────────────────────────────────────────────────────────────┤
│  BROKER/TOPIC (not producer settings)                       │
│  ├── replication.factor = 3                                 │
│  └── min.insync.replicas = 2                               │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;send() is asynchronous&lt;/strong&gt; - always call &lt;code&gt;close()&lt;/code&gt; to ensure delivery&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sticky Partitioner&lt;/strong&gt; batches messages by partition for better throughput&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;acks=all + min.insync.replicas=2&lt;/strong&gt; = no data loss (with RF=3)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;delivery.timeout.ms&lt;/strong&gt; is the upper bound for all retries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Idempotent Producer&lt;/strong&gt; prevents duplicates AND maintains ordering&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Default since Kafka 3.0&lt;/strong&gt; - idempotence is on, but be explicit&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Kafka Producer is deceptively simple on the surface but incredibly powerful when you understand its internals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batching&lt;/strong&gt; improves throughput&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acknowledgements&lt;/strong&gt; control durability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retries&lt;/strong&gt; handle transient failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotence&lt;/strong&gt; prevents duplicates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: Kafka's defaults have gotten much better over time. In Kafka 3.0+, you get idempotent, exactly-once producer semantics out of the box. But understanding &lt;em&gt;why&lt;/em&gt; these settings matter helps you tune them for your specific use case.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more tutorials!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Course Reference:&lt;/strong&gt; &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;Apache Kafka Series - Learn Apache Kafka for Beginners v3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
    </item>
    <item>
      <title>Building a Kafka Wikimedia Producer: Solving 403 Errors and Understanding Java Fundamentals</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Sat, 10 Jan 2026 10:19:56 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/building-a-kafka-wikimedia-producer-understanding-constructors-and-threading-1nmo</link>
      <guid>https://dev.to/hantedyou_0106/building-a-kafka-wikimedia-producer-understanding-constructors-and-threading-1nmo</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When I started building a Kafka producer to consume real-time data from Wikimedia, I quickly realized this wasn't just about Kafka—it was also a crash course in fundamental Java concepts and real-world troubleshooting. As someone more familiar with Python, concepts like constructors and multi-threading required some mindful learning.&lt;/p&gt;

&lt;p&gt;What I didn't expect was hitting a &lt;strong&gt;403 Forbidden&lt;/strong&gt; error when connecting to Wikimedia's API, and a confusing &lt;strong&gt;import error&lt;/strong&gt; that taught me an important lesson about Maven dependencies versus Java packages.&lt;/p&gt;

&lt;p&gt;In this article, I'll share what I learned while building a &lt;code&gt;WikimediaChangeHandler&lt;/code&gt; that processes real-time Wikipedia edits and sends them to Kafka, including the problems I encountered and how to solve them.&lt;/p&gt;

&lt;p&gt;This guide is based on the excellent course &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;"Apache Kafka Series - Learn Apache Kafka for Beginners v3"&lt;/a&gt;, with additional troubleshooting for issues not covered in the course materials.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Goal: Building a Real-Time Wikimedia Producer
&lt;/h2&gt;

&lt;p&gt;We want to build a Kafka producer that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connects to Wikimedia's real-time change stream&lt;/li&gt;
&lt;li&gt;Processes incoming events&lt;/li&gt;
&lt;li&gt;Sends them to a Kafka topic for downstream processing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The architecture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wikimedia SSE Stream → EventHandler → Kafka Producer → Kafka Topic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 1: Understanding Constructors and Dependency Injection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Sharing Objects Between Classes
&lt;/h3&gt;

&lt;p&gt;When building the handler, I encountered a fundamental question: &lt;strong&gt;How do I use the KafkaProducer (created in one class) inside my WikimediaChangeHandler (another class)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Python, I might just import and use it, but Java has a more structured approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Constructors
&lt;/h3&gt;

&lt;p&gt;Here's the key insight from the instructor:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"To pass in an object from one class to another in Java, you need to implement a constructor."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's see this in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;EventHandler&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Constructor receives dependencies when the object is created&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;WikimediaChangeHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;kafkaProducer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;MessageEvent&lt;/span&gt; &lt;span class="n"&gt;messageEvent&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Now we can use kafkaProducer here!&lt;/span&gt;
        &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messageEvent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getData&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Do We Need a Constructor?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The scenario:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We create a &lt;code&gt;KafkaProducer&lt;/code&gt; in the &lt;code&gt;WikimediaChangesProducer&lt;/code&gt; class&lt;/li&gt;
&lt;li&gt;We need to use that same producer instance in &lt;code&gt;WikimediaChangeHandler&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Specifically in the &lt;code&gt;onMessage&lt;/code&gt; method, which gets called every time Wikimedia sends us data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The constructor's role:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Acts as a "receiver" when creating the object&lt;/li&gt;
&lt;li&gt;Stores the received objects in instance variables&lt;/li&gt;
&lt;li&gt;Makes them available to all methods in the class&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In WikimediaChangesProducer.java&lt;/span&gt;
&lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"wikimedia.recentchange"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Pass producer and topic via constructor&lt;/span&gt;
&lt;span class="nc"&gt;EventHandler&lt;/span&gt; &lt;span class="n"&gt;eventHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  This is Dependency Injection!
&lt;/h3&gt;

&lt;p&gt;This pattern has a name: &lt;strong&gt;Dependency Injection (DI)&lt;/strong&gt;. Instead of creating dependencies inside a class, we "inject" them from outside.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ More flexible code&lt;/li&gt;
&lt;li&gt;✅ Easier to test (you can inject mock objects)&lt;/li&gt;
&lt;li&gt;✅ Clear dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  This Pattern Exists in Other Languages
&lt;/h3&gt;

&lt;p&gt;The constructor-based dependency injection pattern isn't unique to Java:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kafka_producer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# Python's constructor
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kafka_producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafka_producer&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kafka_producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;JavaScript/TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kafkaProducer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// JS constructor&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;kafkaProducer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;kafkaProducer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;onMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;kafkaProducer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a universal &lt;strong&gt;Object-Oriented Programming&lt;/strong&gt; concept!&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: Multi-Threading and Blocking the Main Thread
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Threading Challenge
&lt;/h3&gt;

&lt;p&gt;Here's something the instructor explained:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"When we call eventSource.start(), it starts a background thread to process events. If we don't block the main thread, the program will finish and all threads will stop."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without blocking:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;eventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// Start background thread&lt;/span&gt;
&lt;span class="c1"&gt;// Main method ends immediately&lt;/span&gt;
&lt;span class="c1"&gt;// → Main thread exits → Background thread also terminates → No data is processed!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Start EventSource in a background thread&lt;/span&gt;
&lt;span class="n"&gt;eventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Block the main program for 10 minutes to let the background thread work&lt;/span&gt;
&lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MINUTES&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When the main thread exits, the JVM shuts down all background threads&lt;/li&gt;
&lt;li&gt;By using &lt;code&gt;Thread.sleep()&lt;/code&gt;, we keep the main thread alive&lt;/li&gt;
&lt;li&gt;This gives the background thread time to receive and process Wikimedia data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Analogy
&lt;/h3&gt;

&lt;p&gt;Think of it like running a restaurant:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ Without blocking:&lt;/strong&gt; You hire a chef, then immediately close the restaurant—the chef can't cook anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ With blocking:&lt;/strong&gt; You hire a chef and keep the restaurant open for 10 minutes—the chef can do their work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: Troubleshooting - The 403 Forbidden Error
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Course Code is Outdated
&lt;/h3&gt;

&lt;p&gt;When I ran the program with the exact code from the course, I immediately hit an error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[okhttp-eventsource-events-[]-0] ERROR WikimediaChangeHandler - Error in Stream Reading
com.launchdarkly.eventsource.UnsuccessfulResponseException:
Unsuccessful response code received from stream: 403
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The program kept trying to reconnect but failed every time with &lt;strong&gt;HTTP 403 Forbidden&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Happens
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Wikimedia's API Policy Changed&lt;/strong&gt;: Wikimedia now requires all clients connecting to their streaming API to include a &lt;code&gt;User-Agent&lt;/code&gt; HTTP header. Without it, the server rejects the connection.&lt;/p&gt;

&lt;p&gt;This is similar to a security guard at a building asking you to sign in. If you refuse to identify yourself, you're not allowed to enter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The course materials haven't been updated&lt;/strong&gt; to reflect this API policy change, so the code provided will fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Add User-Agent Header
&lt;/h3&gt;

&lt;p&gt;We need to modify the EventSource creation to include HTTP headers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Original code (from course - doesn't work):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eventHandler&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="nc"&gt;EventSource&lt;/span&gt; &lt;span class="n"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fixed code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Add User-Agent header required by Wikimedia&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;okhttp3.Headers&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Headers&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Headers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"User-Agent"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"KafkaLearningProject/1.0"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eventHandler&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Add headers here&lt;/span&gt;
&lt;span class="nc"&gt;EventSource&lt;/span&gt; &lt;span class="n"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's happening:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We create a &lt;code&gt;Headers&lt;/code&gt; object using the Builder pattern (from OkHttp library)&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;User-Agent&lt;/code&gt; header with our application identifier&lt;/li&gt;
&lt;li&gt;Pass the headers to EventSource using &lt;code&gt;.headers(headers)&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;User-Agent format:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard format: &lt;code&gt;ApplicationName/Version&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;With contact info: &lt;code&gt;"KafkaLearningProject/1.0 (your.email@example.com)"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 4: Troubleshooting - Maven Coordinates vs Java Packages
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Import Error Mystery
&lt;/h3&gt;

&lt;p&gt;When I tried to import the &lt;code&gt;Headers&lt;/code&gt; class, I encountered another confusing error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: package com.squareup.okhttp3 does not exist
import com.squareup.okhttp3.Headers;
                           ^
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I thought: "But I have the dependency in &lt;code&gt;build.gradle&lt;/code&gt;! Let me check..."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gradle"&gt;&lt;code&gt;&lt;span class="k"&gt;dependencies&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;implementation&lt;/span&gt; &lt;span class="s1"&gt;'com.squareup.okhttp3:okhttp:4.9.3'&lt;/span&gt;  &lt;span class="c1"&gt;// It's here!&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dependency was there, and Gradle successfully downloaded it. So why couldn't Java find the class?&lt;/p&gt;

&lt;h3&gt;
  
  
  The Key Insight: Maven Coordinates ≠ Java Packages
&lt;/h3&gt;

&lt;p&gt;This taught me an important lesson about the difference between &lt;strong&gt;Maven coordinates&lt;/strong&gt; and &lt;strong&gt;Java package names&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maven Coordinates&lt;/strong&gt; (for dependency management):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;groupId:artifactId:version
↓       ↓          ↓
com.squareup.okhttp3:okhttp:4.9.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Java Package&lt;/strong&gt; (for import statements):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;okhttp3.Headers&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Not com.squareup.okhttp3!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Are They Different?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Maven Coordinates&lt;/th&gt;
&lt;th&gt;Java Package&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Uniquely identify library in repository&lt;/td&gt;
&lt;td&gt;Organize code in the codebase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;com.squareup.okhttp3:okhttp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;okhttp3&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Usage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;build.gradle&lt;/code&gt; dependencies&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;import&lt;/code&gt; statements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reason&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Avoid naming conflicts across organizations&lt;/td&gt;
&lt;td&gt;Keep code concise and clean&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Real-world examples:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Maven Dependency&lt;/th&gt;
&lt;th&gt;Java Import&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OkHttp&lt;/td&gt;
&lt;td&gt;&lt;code&gt;com.squareup.okhttp3:okhttp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;import okhttp3.Headers;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gson&lt;/td&gt;
&lt;td&gt;&lt;code&gt;com.google.code.gson:gson&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;import com.google.gson.Gson;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jackson&lt;/td&gt;
&lt;td&gt;&lt;code&gt;com.fasterxml.jackson.core:jackson-databind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;import com.fasterxml.jackson.databind.ObjectMapper;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  How to Find the Correct Package Name
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Method 1: Check Official Documentation&lt;/strong&gt; (Fastest)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// OkHttp docs clearly show:&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;okhttp3.*&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Method 2: Use IDE Auto-Complete&lt;/strong&gt; (Most Reliable)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Type: &lt;code&gt;Headers headers = new Headers...&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;IDE shows error, press &lt;code&gt;Cmd + .&lt;/code&gt; (Mac) or &lt;code&gt;Ctrl + .&lt;/code&gt; (Windows)&lt;/li&gt;
&lt;li&gt;Select "Import 'Headers' (okhttp3)"&lt;/li&gt;
&lt;li&gt;IDE adds correct import automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Method 3: Inspect the JAR File&lt;/strong&gt; (When in doubt)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jar tf okhttp-4.9.3.jar | &lt;span class="nb"&gt;grep &lt;/span&gt;Headers
&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# okhttp3/Headers.class          ← The actual package!&lt;/span&gt;
&lt;span class="c"&gt;# okhttp3/Headers$Builder.class&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Wrong import:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.squareup.okhttp3.Headers&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// ❌ Compilation error!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Correct import:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;okhttp3.Headers&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// ✅ Works!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Lesson Learned
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Never assume the package name from the Maven coordinates!&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maven coordinates use organization namespacing (&lt;code&gt;com.squareup.okhttp3&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Java packages prioritize simplicity and backwards compatibility (&lt;code&gt;okhttp3&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Always verify the actual package name before importing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Complete Code Overview
&lt;/h2&gt;

&lt;p&gt;Here's the complete, working code with all fixes applied:&lt;/p&gt;

&lt;h3&gt;
  
  
  WikimediaChangeHandler.java
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;io.conduktor.demos.kafka.wikimedia&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.apache.kafka.clients.producer.KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.apache.kafka.clients.producer.ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.slf4j.Logger&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.slf4j.LoggerFactory&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.launchdarkly.eventsource.EventHandler&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.launchdarkly.eventsource.MessageEvent&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;EventHandler&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoggerFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getLogger&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/*
     * Why do we need a constructor?
     * 1. We created a KafkaProducer in the Producer class
     * 2. We need to use it in this WikimediaChangeHandler class
     * 3. Specifically in the onMessage method to send data to Kafka
     *
     * In Java, to pass objects between classes, we use constructors
     * The constructor receives parameters when creating the object and stores them for later use
     */&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;WikimediaChangeHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;kafkaProducer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;MessageEvent&lt;/span&gt; &lt;span class="n"&gt;messageEvent&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// When we receive a message from the stream, send it to Kafka&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messageEvent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getData&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messageEvent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getData&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onOpen&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Connection opened&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onClosed&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Close the producer when stream closes&lt;/span&gt;
        &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onComment&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;comment&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Handle comments (not used in this case)&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Throwable&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error in Stream Reading"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  WikimediaChangesProducer.java
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;io.conduktor.demos.kafka.wikimedia&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.launchdarkly.eventsource.EventHandler&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.launchdarkly.eventsource.EventSource&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;okhttp3.Headers&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Correct import!&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.net.URI&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.util.Properties&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.util.concurrent.TimeUnit&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.apache.kafka.clients.producer.KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.apache.kafka.common.serialization.StringSerializer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangesProducer&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;bootstrapServers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"127.0.0.1:9092"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;// Create and set Producer properties&lt;/span&gt;
        &lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bootstrap.servers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bootstrapServers&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"value.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

        &lt;span class="c1"&gt;// Create the Producer&lt;/span&gt;
        &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"wikimedia.recentchange"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;// Create EventHandler with dependency injection&lt;/span&gt;
        &lt;span class="nc"&gt;EventHandler&lt;/span&gt; &lt;span class="n"&gt;eventHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WikimediaChangeHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://stream.wikimedia.org/v2/stream/recentchange"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;// Wikimedia requires a User-Agent header to identify the client&lt;/span&gt;
        &lt;span class="nc"&gt;Headers&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Headers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"User-Agent"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"KafkaLearningProject/1.0"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// Create EventSource with headers&lt;/span&gt;
        &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eventHandler&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;EventSource&lt;/span&gt; &lt;span class="n"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// Start EventSource, which continuously receives real-time data from Wikimedia in a background thread&lt;/span&gt;
        &lt;span class="n"&gt;eventSource&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// Block the main program to prevent the main thread from terminating and stopping the background thread&lt;/span&gt;
        &lt;span class="c1"&gt;// Let the program run for 10 minutes to give the background thread enough time to receive and process data&lt;/span&gt;
        &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MINUTES&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Constructors Enable Dependency Injection&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Constructors receive objects when creating instances&lt;/li&gt;
&lt;li&gt;This is how we share objects between classes in Java&lt;/li&gt;
&lt;li&gt;It's a universal OOP pattern, not unique to Java&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Multi-Threading Requires Careful Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Background threads die when the main thread exits&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;Thread.sleep()&lt;/code&gt; to keep the main thread alive&lt;/li&gt;
&lt;li&gt;In production, use shutdown hooks or latches for graceful shutdown&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;APIs Change - Always Test Course Code&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Course materials can become outdated as APIs evolve&lt;/li&gt;
&lt;li&gt;Wikimedia now requires User-Agent headers (wasn't required before)&lt;/li&gt;
&lt;li&gt;Always check API documentation if you encounter 403/401 errors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Maven Coordinates ≠ Java Packages&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Don't assume package names from &lt;code&gt;build.gradle&lt;/code&gt; dependencies&lt;/li&gt;
&lt;li&gt;Maven uses organizational namespacing (&lt;code&gt;com.squareup.okhttp3&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Java packages prioritize simplicity (&lt;code&gt;okhttp3&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Use IDE auto-complete or check documentation for correct imports&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Builder Pattern Provides Flexibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Builder pattern allows optional configuration&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.headers()&lt;/code&gt; wasn't needed before, but easy to add when required&lt;/li&gt;
&lt;li&gt;Two-step process: configure → build&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building this Kafka Wikimedia producer taught me fundamental Java concepts and valuable troubleshooting skills:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Dependency Injection&lt;/strong&gt; through constructors - how to share objects between classes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-threading&lt;/strong&gt; management - why we need to block the main thread&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API troubleshooting&lt;/strong&gt; - dealing with 403 errors and outdated course materials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency management&lt;/strong&gt; - understanding Maven coordinates vs Java packages&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The most important lesson? &lt;strong&gt;When following online courses, the code might not work as-is.&lt;/strong&gt; APIs change, libraries update, and policies evolve. Being able to troubleshoot these issues is just as valuable as learning the core concepts.&lt;/p&gt;

&lt;p&gt;These challenges made me a better developer by forcing me to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read error messages carefully&lt;/li&gt;
&lt;li&gt;Understand the tools I'm using (Gradle, Maven, HTTP)&lt;/li&gt;
&lt;li&gt;Check library documentation&lt;/li&gt;
&lt;li&gt;Learn the difference between dependency management and code organization&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Troubleshooting Checklist
&lt;/h2&gt;

&lt;p&gt;If you encounter issues while building this project:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;403 Forbidden Error:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Add User-Agent header to EventSource&lt;/li&gt;
&lt;li&gt;✅ Ensure headers are properly built with OkHttp's Headers.Builder&lt;/li&gt;
&lt;li&gt;✅ Check Wikimedia's API documentation for current requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Import Errors:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Don't assume package name from Maven coordinates&lt;/li&gt;
&lt;li&gt;✅ Use &lt;code&gt;import okhttp3.Headers;&lt;/code&gt; not &lt;code&gt;import com.squareup.okhttp3.Headers;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ Let IDE auto-complete suggest the correct import&lt;/li&gt;
&lt;li&gt;✅ Run &lt;code&gt;gradle --refresh-dependencies&lt;/code&gt; if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Connection Errors:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Ensure Kafka broker is running on port 9092&lt;/li&gt;
&lt;li&gt;✅ Create topic: &lt;code&gt;wikimedia.recentchange&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ Check network connectivity&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Course Reference:&lt;/strong&gt; &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;Apache Kafka Series - Learn Apache Kafka for Beginners v3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>tutorial</category>
      <category>backend</category>
    </item>
    <item>
      <title>Kafka Consumer Rebalancing: From Stop-the-World to Cooperative Protocol</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Sun, 04 Jan 2026 14:10:34 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/kafka-consumer-rebalancing-from-stop-the-world-to-cooperative-protocol-1kh</link>
      <guid>https://dev.to/hantedyou_0106/kafka-consumer-rebalancing-from-stop-the-world-to-cooperative-protocol-1kh</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When learning about Kafka consumer groups, I discovered an important concept called &lt;strong&gt;rebalancing&lt;/strong&gt; - the process where Kafka redistributes partitions among consumers when the group changes.&lt;/p&gt;

&lt;p&gt;What I learned is that when a consumer joins or leaves a group, all consumers pause briefly during the partition reassignment. This behavior isn't a bug - it's a deliberate design choice, and Kafka actually offers two different strategies for handling it.&lt;/p&gt;

&lt;p&gt;In this article, I'll explain the two rebalancing strategies (Eager and Cooperative), show real logs from my experiments, and discuss the trade-offs to help you choose the right strategy for your use case.&lt;/p&gt;

&lt;p&gt;This guide is based on the excellent course &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;"Apache Kafka Series - Learn Apache Kafka for Beginners v3"&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Rebalancing Strategies
&lt;/h2&gt;

&lt;p&gt;Kafka offers two fundamentally different approaches to rebalancing. Each has trade-offs depending on your use case.&lt;/p&gt;




&lt;h2&gt;
  
  
  Strategy 1: Eager Rebalance (Stop-the-World)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Trigger event occurs (consumer joins/leaves)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ALL consumers stop consuming&lt;/strong&gt; (stop-the-world event)&lt;/li&gt;
&lt;li&gt;All consumers give up their partition assignments&lt;/li&gt;
&lt;li&gt;Kafka reassigns partitions to all consumers&lt;/li&gt;
&lt;li&gt;Consumers resume processing with new assignments&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Characteristics
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Simplicity:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Single-step process - easy to reason about&lt;/li&gt;
&lt;li&gt;✅ Clean state transitions - all consumers synchronized&lt;/li&gt;
&lt;li&gt;✅ Simpler implementation and debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚠️ Complete pause in processing during rebalance&lt;/li&gt;
&lt;li&gt;⚠️ All consumers affected, even if their partitions don't change&lt;/li&gt;
&lt;li&gt;⚠️ Local state/caches must be rebuilt after reassignment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;p&gt;Eager rebalancing works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small consumer groups (2-5 consumers)&lt;/li&gt;
&lt;li&gt;Infrequent scaling events&lt;/li&gt;
&lt;li&gt;Rebalance duration is acceptable (typically seconds)&lt;/li&gt;
&lt;li&gt;Simplicity is valued over minimal disruption&lt;/li&gt;
&lt;li&gt;Consumers are stateless or have minimal state&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When Does Rebalancing Trigger?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Consumer joins or leaves the group&lt;/li&gt;
&lt;li&gt;Consumer crashes or becomes unresponsive&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session.timeout.ms&lt;/code&gt; expires&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Strategy 2: Cooperative Rebalance (Incremental)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Trigger event occurs&lt;/li&gt;
&lt;li&gt;Kafka identifies &lt;strong&gt;only the partitions that need to move&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only affected consumers&lt;/strong&gt; pause those specific partitions&lt;/li&gt;
&lt;li&gt;Other partitions continue processing uninterrupted&lt;/li&gt;
&lt;li&gt;May take multiple iterations to reach stable state&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Characteristics
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Minimal Disruption:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Only revoked partitions pause&lt;/li&gt;
&lt;li&gt;✅ Non-affected partitions keep consuming&lt;/li&gt;
&lt;li&gt;✅ Sticky assignment - partitions stay with consumers when possible&lt;/li&gt;
&lt;li&gt;✅ Lower latency impact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚠️ More complex - multiple rebalance steps&lt;/li&gt;
&lt;li&gt;⚠️ Harder to debug (multi-phase process)&lt;/li&gt;
&lt;li&gt;⚠️ Requires all consumers to support the protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;p&gt;Cooperative rebalancing is beneficial when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large consumer groups (10+ consumers)&lt;/li&gt;
&lt;li&gt;Frequent scaling events (auto-scaling, deployments)&lt;/li&gt;
&lt;li&gt;Stateful consumers with large local caches&lt;/li&gt;
&lt;li&gt;Processing interruption is costly&lt;/li&gt;
&lt;li&gt;High-throughput systems where pauses impact SLAs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example Scenario
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt; 3 partitions, 2 consumers, then 1 new consumer joins&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eager Rebalance:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: Consumer 1: [P0, P1]    Consumer 2: [P2]
        ↓ [ALL STOP]
After:  Consumer 1: [P0]    Consumer 2: [P1]    Consumer 3: [P2]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All consumers stopped, all partitions reassigned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cooperative Rebalance:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: Consumer 1: [P0, P1]    Consumer 2: [P2]
        ↓ [Only P1 pauses]
After:  Consumer 1: [P0]    Consumer 2: [P2]    Consumer 3: [P1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only partition P1 moved, others kept consuming.&lt;/p&gt;




&lt;h2&gt;
  
  
  Partition Assignment Strategies
&lt;/h2&gt;

&lt;p&gt;Kafka provides multiple assignment strategies via the &lt;code&gt;partition.assignment.strategy&lt;/code&gt; config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Eager Strategies (Stop-the-World)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. RangeAssignor
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Assigns partitions on per-topic basis&lt;/li&gt;
&lt;li&gt;Can lead to imbalanced assignments&lt;/li&gt;
&lt;li&gt;Old default strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. RoundRobin
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Distributes partitions evenly across consumers&lt;/li&gt;
&lt;li&gt;All consumers have ±1 the same number of partitions&lt;/li&gt;
&lt;li&gt;Better balance than RangeAssignor&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. StickyAssignor
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Balanced like RoundRobin initially&lt;/li&gt;
&lt;li&gt;Minimizes partition movements during rebalance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Still causes stop-the-world event&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cooperative Strategy
&lt;/h3&gt;

&lt;h4&gt;
  
  
  4. CooperativeStickyAssignor
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Uses cooperative rebalancing protocol&lt;/li&gt;
&lt;li&gt;Minimizes partition movements&lt;/li&gt;
&lt;li&gt;Consumers keep processing non-moved partitions&lt;/li&gt;
&lt;li&gt;Preferred for large-scale, stateful systems&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Default Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Kafka 3.0+ Default
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;partition.assignment.strategy = [RangeAssignor, CooperativeStickyAssignor]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why both?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides backward compatibility&lt;/li&gt;
&lt;li&gt;Allows gradual migration from eager to cooperative&lt;/li&gt;
&lt;li&gt;Group coordinator picks the first strategy supported by all members&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Other Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Connect&lt;/strong&gt;: Cooperative rebalance enabled by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Streams&lt;/strong&gt;: Uses &lt;code&gt;StreamsPartitionAssignor&lt;/code&gt; (cooperative) by default&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Implementing Cooperative Rebalancing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Add this property to your consumer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"partition.assignment.strategy"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;CooperativeStickyAssignor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor,
                                  org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;partition.assignment.strategy = [org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Real Logs: Observing Cooperative Rebalance
&lt;/h2&gt;

&lt;p&gt;I ran consumers with CooperativeStickyAssignor enabled and captured the logs during different scaling events.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: Single Consumer Starts
&lt;/h3&gt;

&lt;p&gt;First consumer joins and gets all 3 partitions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator -
  [Consumer clientId=consumer-my-java-application-1, groupId=my-java-application]
  Updating assignment with
        Assigned partitions:                       [demo_java-0, demo_java-1, demo_java-2]
        Current owned partitions:                  []
        Added partitions (assigned - owned):       [demo_java-0, demo_java-1, demo_java-2]
        Revoked partitions (owned - assigned):     []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;State:&lt;/strong&gt; Consumer 1 owns partitions 0, 1, 2&lt;/p&gt;




&lt;h3&gt;
  
  
  Scenario 2: Second Consumer Joins (Scale Up)
&lt;/h3&gt;

&lt;p&gt;A new consumer joins the group. Watch how only partition 2 is revoked:&lt;/p&gt;

&lt;h4&gt;
  
  
  Consumer 1 - Revokes partition 2
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator -
  [Consumer clientId=consumer-my-java-application-1, groupId=my-java-application]
  Updating assignment with
        Assigned partitions:                       [demo_java-0, demo_java-1]
        Current owned partitions:                  [demo_java-0, demo_java-1, demo_java-2]
        Added partitions (assigned - owned):       []
        Revoked partitions (owned - assigned):     [demo_java-2]  ← Only this one!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Consumer 1 continues processing partitions 0 and 1 during this operation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Consumer 1 - Assignment stabilizes
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator -
  [Consumer clientId=consumer-my-java-application-1, groupId=my-java-application]
  Updating assignment with
        Assigned partitions:                       [demo_java-0, demo_java-1]
        Current owned partitions:                  [demo_java-0, demo_java-1]
        Added partitions (assigned - owned):       []
        Revoked partitions (owned - assigned):     []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Consumer 2 - Receives the revoked partition
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator -
  [Consumer clientId=consumer-my-java-application-1, groupId=my-java-application]
  Updating assignment with
        Assigned partitions:                       [demo_java-2]
        Current owned partitions:                  []
        Added partitions (assigned - owned):       [demo_java-2]
        Revoked partitions (owned - assigned):     []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consumer 1: partitions 0, 1 (kept processing throughout)&lt;/li&gt;
&lt;li&gt;Consumer 2: partition 2 (received smoothly)&lt;/li&gt;
&lt;li&gt;Only 1 partition moved!&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Scenario 3: Consumer Leaves (Scale Down)
&lt;/h3&gt;

&lt;p&gt;Consumer 2 shuts down, Consumer 1 picks up the orphaned partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator -
  [Consumer clientId=consumer-my-java-application-1, groupId=my-java-application]
  Updating assignment with
        Assigned partitions:                       [demo_java-0, demo_java-1, demo_java-2]
        Current owned partitions:                  [demo_java-0, demo_java-1]
        Added partitions (assigned - owned):       [demo_java-2]  ← Picked up orphan
        Revoked partitions (owned - assigned):     []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Consumer 1 seamlessly adds partition 2 while continuing to process 0 and 1.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing Between Strategies
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Eager (RangeAssignor)&lt;/th&gt;
&lt;th&gt;Cooperative (CooperativeStickyAssignor)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Partition Revocation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ALL partitions&lt;/td&gt;
&lt;td&gt;Only affected partitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumption During Rebalance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;STOPPED&lt;/td&gt;
&lt;td&gt;CONTINUES on non-revoked partitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple (single step)&lt;/td&gt;
&lt;td&gt;Complex (multiple steps)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Debugging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easier to trace&lt;/td&gt;
&lt;td&gt;Multi-phase, harder to debug&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumer Lag Impact&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher (all partitions pause)&lt;/td&gt;
&lt;td&gt;Lower (only moved partitions pause)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;State Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All state reset&lt;/td&gt;
&lt;td&gt;Partial state retention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best For&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Small groups, stateless consumers&lt;/td&gt;
&lt;td&gt;Large groups, stateful consumers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Good Fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infrequent changes, simple systems&lt;/td&gt;
&lt;td&gt;Frequent scaling, high-throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Static Group Membership
&lt;/h2&gt;

&lt;p&gt;Cooperative rebalancing is great, but what if you don't want &lt;strong&gt;any&lt;/strong&gt; rebalance during brief restarts?&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;By default:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Consumer leaves → Loses member ID&lt;/li&gt;
&lt;li&gt;Consumer rejoins → Gets new member ID&lt;/li&gt;
&lt;li&gt;Rebalance triggered (even for brief restart)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Solution: Static Members
&lt;/h3&gt;

&lt;p&gt;Configure consumers with fixed IDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"group.instance.id"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"consumer-1"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Behavior
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Consumer rejoins within &lt;code&gt;session.timeout.ms&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Keeps same partition assignment&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;NO rebalance triggered&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consumer away longer than &lt;code&gt;session.timeout.ms&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ Rebalance triggered&lt;/li&gt;
&lt;li&gt;❌ Partitions reassigned&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Use Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Kubernetes/Container Environments&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pod restarts don't trigger rebalance&lt;/li&gt;
&lt;li&gt;Rolling updates happen smoothly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Local Cache/State Maintenance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consumers maintain local state for their partitions&lt;/li&gt;
&lt;li&gt;Avoid rebuilding cache on restart&lt;/li&gt;
&lt;li&gt;Ensure partition affinity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;✅ &lt;strong&gt;Rebalancing strategies are design choices&lt;/strong&gt; - not one-size-fits-all&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Eager rebalancing&lt;/strong&gt; is simpler but pauses all consumers&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Cooperative rebalancing&lt;/strong&gt; minimizes disruption but adds complexity&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Choose based on your use case&lt;/strong&gt; - group size, scaling frequency, state management&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Multiple strategies in config&lt;/strong&gt; allows backward compatibility and migration&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Static group membership&lt;/strong&gt; prevents rebalance during brief restarts&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Logs reveal the process&lt;/strong&gt; - watch "Revoked partitions" to understand impact&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Understanding rebalancing strategies was a turning point in my Kafka learning journey. Rather than one being "better," each strategy solves different problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insights:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Eager rebalancing&lt;/strong&gt; works well for simple, small-scale systems where simplicity matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cooperative rebalancing&lt;/strong&gt; shines in large-scale, stateful, high-throughput scenarios&lt;/li&gt;
&lt;li&gt;The "best" strategy depends on your specific requirements and constraints&lt;/li&gt;
&lt;li&gt;Static group membership complements both strategies for handling restarts&lt;/li&gt;
&lt;li&gt;Real logs help you understand what's actually happening during rebalances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's no universal "right" answer - choose the strategy that fits your system's characteristics and operational needs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Course Reference:&lt;/strong&gt; &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;Apache Kafka Series - Learn Apache Kafka for Beginners v3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>tutorial</category>
      <category>backend</category>
    </item>
    <item>
      <title>Learning Apache Kafka with Python - Part 1: Producers</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Fri, 02 Jan 2026 09:25:22 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/learning-apache-kafka-with-python-part-1-producers-356b</link>
      <guid>https://dev.to/hantedyou_0106/learning-apache-kafka-with-python-part-1-producers-356b</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;After setting up a local Kafka environment with Docker (covered in my &lt;a href="https://dev.to/hantedyou_0106/setting-up-kafka-40-locally-with-docker-a-learning-journey-dif"&gt;previous post&lt;/a&gt;), I started converting Java Kafka producer examples to Python as part of my learning process. The Udemy course uses Java, but I want to practice Kafka concepts using Python.&lt;/p&gt;

&lt;p&gt;This journal documents the practical challenges, solutions, and patterns I discovered while converting two producer implementations: a basic producer and a producer with key-based partitioning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; For Kafka producer concepts (sticky partitioning, acknowledgments, etc.), see my separate article "Understanding Kafka Producer From Basics to Sticky Partitioning". This journal focuses on the Python implementation journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenge #1: Translating Java Error Handling to Python
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Java Try-Catch-Finally Pattern
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Python Translation
&lt;/h3&gt;

&lt;p&gt;Converting this pattern to Python, I learned several important details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# ← Must initialize outside try!
&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_producer_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;demo_python&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;BufferError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Producer queue full: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;  &lt;span class="c1"&gt;# ← Bare raise preserves traceback
&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Producer closed successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cleanup error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ← Don't crash on cleanup
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Differences from Java
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Java&lt;/th&gt;
&lt;th&gt;Python&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Syntax&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;try-catch-finally&lt;/td&gt;
&lt;td&gt;try-except-finally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Variable scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Inside try block&lt;/td&gt;
&lt;td&gt;Must initialize before try&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Specific exceptions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple catch blocks&lt;/td&gt;
&lt;td&gt;Multiple except blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Traceback logging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;log.error("msg", e)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;log.error("msg", exc_info=True)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Re-throw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;throw e;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;raise&lt;/code&gt; (bare, no argument)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Python-Specific Gotchas I Learned
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Why Initialize &lt;code&gt;producer = None&lt;/code&gt; Outside Try?
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ WRONG: Variable undefined if Producer() fails
&lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# May fail here
&lt;/span&gt;    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# NameError if Producer() failed!
&lt;/span&gt;
&lt;span class="c1"&gt;# ✅ CORRECT: Always defined
&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Safe check
&lt;/span&gt;        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Java, variables declared in try block are still accessible in finally. In Python, if the line that defines the variable throws an exception, the variable doesn't exist.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Bare &lt;code&gt;raise&lt;/code&gt; vs &lt;code&gt;raise e&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;BufferError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Queue full: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt;  &lt;span class="c1"&gt;# ← Nothing after raise!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Three ways to raise:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;raise&lt;/code&gt; - Re-raises exact exception with &lt;strong&gt;full traceback&lt;/strong&gt; ✅&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;raise e&lt;/code&gt; - Re-raises but &lt;strong&gt;loses original traceback&lt;/strong&gt; ❌&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;raise CustomError()&lt;/code&gt; - Raises different exception ⚠️&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why bare &lt;code&gt;raise&lt;/code&gt; matters:&lt;/strong&gt; It shows where the error originally occurred, not just where you re-raised it.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Nested Try-Except in Finally
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# ← Nested try for cleanup
&lt;/span&gt;            &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cleanup error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Don't re-raise
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cleanup can fail too! Use &lt;code&gt;log.warning()&lt;/code&gt; instead of &lt;code&gt;log.error()&lt;/code&gt; since we're already shutting down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenge #2: The Mysterious &lt;code&gt;poll()&lt;/code&gt; Method
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Confusing Part
&lt;/h3&gt;

&lt;p&gt;This was the most confusing difference between Java and Python Kafka clients:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Java (kafka-clients library):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Callback executes automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python (confluent-kafka library):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;demo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Callback WON'T execute yet!
&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ← Must explicitly call this
# NOW callback executes
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;p&gt;The confluent-kafka library (Python) wraps librdkafka (C library). Unlike Java's client, you must explicitly call &lt;code&gt;poll()&lt;/code&gt; to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger network operations (send queued messages)&lt;/li&gt;
&lt;li&gt;Receive acknowledgments from broker&lt;/li&gt;
&lt;li&gt;Execute callbacks
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;  &lt;span class="c1"&gt;# Step 1: Queue in memory (fast)
&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# Step 2: Send + trigger callback
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  poll() Variations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# Non-blocking: process ready events, return immediately
&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Blocking: wait up to 1 second for events
&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;      &lt;span class="c1"&gt;# Blocking: wait for ALL messages + callbacks
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Visual Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without poll() ❌:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;  &lt;span class="c1"&gt;# Queue all 10 messages
&lt;/span&gt;    &lt;span class="c1"&gt;# No poll() - callbacks delayed
&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# All 10 callbacks fire here at once
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With poll(0) ✅:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;  &lt;span class="c1"&gt;# Queue message
&lt;/span&gt;    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# Callback fires immediately
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives real-time feedback about partition assignment, which is crucial for the key-partitioning demo!&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenge #3: Callback Timing - Sync vs Async
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Unexpected Output
&lt;/h3&gt;

&lt;p&gt;When I first ran &lt;code&gt;producer_demo_keys.py&lt;/code&gt;, the output was confusing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--- Round 1 ---

--- Round 2 ---
Key: id_0 | Partition: 1  ← Where's Round 1?
Key: id_3 | Partition: 1
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All Round 1 callbacks fired during Round 2! This revealed an important decision point.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cause
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;round_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Non-blocking
&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Not long enough for callbacks to complete
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Network latency (even on localhost) meant callbacks from Round 1 hadn't executed before Round 2 started.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Valid Approaches
&lt;/h3&gt;

&lt;p&gt;I realized both patterns have valid use cases, so I made it configurable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 1: Synchronized (Educational)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;round_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# ← Block until all callbacks complete
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Clean output showing Round 1 complete before Round 2&lt;br&gt;&lt;br&gt;
✅ Perfect for demos and testing&lt;br&gt;&lt;br&gt;
❌ Lower throughput (blocking waits)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 2: Async (Production)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;round_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ← Non-blocking
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Higher throughput&lt;br&gt;&lt;br&gt;
✅ More realistic production behavior&lt;br&gt;&lt;br&gt;
⚠️ Callbacks may interleave&lt;/p&gt;
&lt;h3&gt;
  
  
  Making It Toggleable
&lt;/h3&gt;

&lt;p&gt;Added configuration comments in the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ============================================================
# TWO APPROACHES: Choose based on your use case
# ============================================================
&lt;/span&gt;
&lt;span class="c1"&gt;# APPROACH 1: Synchronized Rounds (Educational/Testing)
&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# ← Active by default
&lt;/span&gt;
&lt;span class="c1"&gt;# APPROACH 2: Async High-Throughput (Production)
# Comment out flush() above, uncomment below:
# time.sleep(0.5)
# ============================================================
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Either way, the key-to-partition mapping remains consistent - that's what matters!&lt;/p&gt;

&lt;h2&gt;
  
  
  Java to Python Conversion Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Configuration: Properties → dict
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Java:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bootstrap.servers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"127.0.0.1:9092"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"value.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;localhost:9092&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# No serializer config needed for simple strings
&lt;/span&gt;    &lt;span class="c1"&gt;# confluent-kafka handles it automatically
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Producing Messages
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Java:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"topic"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"key"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"value"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;# Must encode to bytes
&lt;/span&gt;    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Must encode to bytes
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key difference:&lt;/strong&gt; Python's confluent-kafka requires bytes, so always &lt;code&gt;.encode('utf-8')&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Callbacks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Java:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Callback&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onCompletion&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RecordMetadata&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Partition: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delivery_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Delivery failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Partition: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;msg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;delivery_callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ← Must explicitly trigger!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Resource Cleanup
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Java:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# No explicit close() - flush() is sufficient
# Python handles cleanup via garbage collection
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Implementations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  producer_demo_keys.py (With Partitioning)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KafkaException&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kafka_logging&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_logger&lt;/span&gt;

&lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_logger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delivery_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message delivery failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;key_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;key&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | Partition: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;localhost:9092&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;demo_python&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;round_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- Round &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;round_num&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello world &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

                &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;delivery_callback&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Trigger callbacks
&lt;/span&gt;
            &lt;span class="c1"&gt;# Choose approach:
&lt;/span&gt;            &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="c1"&gt;# Synchronized
&lt;/span&gt;            &lt;span class="c1"&gt;# time.sleep(0.5)  # Async
&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All messages sent!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;BufferError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Queue full: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;KafkaException&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kafka error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Producer closed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cleanup error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Running the Producers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic producer&lt;/span&gt;
uv run python &lt;span class="nt"&gt;-m&lt;/span&gt; kafka_basics.producer_demo

&lt;span class="c"&gt;# Producer with keys&lt;/span&gt;
uv run python &lt;span class="nt"&gt;-m&lt;/span&gt; kafka_basics.producer_demo_keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verifying with Console Consumer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; broker kafka-console-consumer.sh &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--topic&lt;/span&gt; demo_python &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-beginning&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; localhost:9092
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Press &lt;code&gt;Ctrl+C&lt;/code&gt; to exit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Python-Specific Lessons
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Bytes Encoding&lt;/strong&gt;: Always &lt;code&gt;.encode('utf-8')&lt;/code&gt; for keys and values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;poll() is Required&lt;/strong&gt;: Unlike Java, callbacks need explicit triggering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variable Scope&lt;/strong&gt;: Initialize resources before try block in Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bare raise&lt;/strong&gt;: Use &lt;code&gt;raise&lt;/code&gt; without arguments to preserve full traceback&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Error Handling Patterns
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Initialize resources to &lt;code&gt;None&lt;/code&gt; before try block&lt;/li&gt;
&lt;li&gt;Use bare &lt;code&gt;raise&lt;/code&gt; to preserve tracebacks&lt;/li&gt;
&lt;li&gt;Nest try-except in finally block for safe cleanup&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;log.warning()&lt;/code&gt; for cleanup errors (not &lt;code&gt;log.error()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Always add timeouts to prevent indefinite blocking&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Conversion Strategy
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Java Concept&lt;/th&gt;
&lt;th&gt;Python Equivalent&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Properties&lt;/td&gt;
&lt;td&gt;dict&lt;/td&gt;
&lt;td&gt;Use dot-notation keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;try-catch-finally&lt;/td&gt;
&lt;td&gt;try-except-finally&lt;/td&gt;
&lt;td&gt;Different syntax, same pattern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anonymous Callback&lt;/td&gt;
&lt;td&gt;Function callback&lt;/td&gt;
&lt;td&gt;Define as regular function&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;producer.send()&lt;/td&gt;
&lt;td&gt;producer.produce() + poll()&lt;/td&gt;
&lt;td&gt;Explicit callback trigger needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;producer.close()&lt;/td&gt;
&lt;td&gt;producer.flush()&lt;/td&gt;
&lt;td&gt;flush() is sufficient&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I Learned About Python Development
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Resource Management&lt;/strong&gt;: Python's finally block works like Java but with scope differences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Library Quirks&lt;/strong&gt;: confluent-kafka requires manual poll() unlike Java client&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration Flexibility&lt;/strong&gt;: Python's dict makes config more readable than Java Properties&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async Patterns&lt;/strong&gt;: Understanding when to use flush() vs poll() for different use cases&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/kafka-clients/python/current/overview.html" rel="noopener noreferrer"&gt;Confluent Kafka Python Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/hantedyou_0106/setting-up-kafka-40-locally-with-docker-a-learning-journey-dif"&gt;Previous Post: Setting Up Kafka 4.0 Locally with Docker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Session Date: January 2, 2026&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Python 3.13, uv package manager, confluent-kafka, Kafka 4.0.1&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Graceful Shutdown in Kafka: Understanding Shutdown Hooks and Thread Management</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Tue, 30 Dec 2025 15:38:11 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/graceful-shutdown-in-kafka-understanding-shutdown-hooks-and-thread-management-4g5l</link>
      <guid>https://dev.to/hantedyou_0106/graceful-shutdown-in-kafka-understanding-shutdown-hooks-and-thread-management-4g5l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When I first saw the &lt;code&gt;ConsumerDemoWithShutdown.java&lt;/code&gt; code, I was puzzled by this comment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// get a reference to the main thread&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;mainThread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why do we need a reference to the main thread? What's a Shutdown Hook? And what does &lt;code&gt;join()&lt;/code&gt; actually do?&lt;/p&gt;

&lt;p&gt;As a Java and Kafka beginner, these concepts were confusing. But after diving deep into the code, I realized this is one of the most important patterns for building reliable Kafka applications.&lt;/p&gt;

&lt;p&gt;In this article, I'll explain everything from &lt;strong&gt;Shutdown Hooks&lt;/strong&gt; to &lt;strong&gt;Singleton patterns&lt;/strong&gt; to &lt;strong&gt;Thread.join()&lt;/strong&gt; - all the foundational concepts you need to understand graceful shutdown.&lt;/p&gt;

&lt;p&gt;This guide is based on the excellent course &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;"Apache Kafka Series - Learn Apache Kafka for Beginners v3"&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: The Problem - Why We Need Graceful Shutdown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Naive Approach (Without Shutdown Hook)
&lt;/h3&gt;

&lt;p&gt;Let's start with a simple consumer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BadConsumer&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;KafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribe&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Arrays&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;asList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"demo_java"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
            &lt;span class="c1"&gt;// Process messages...&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// This line is NEVER reached!&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happens when you press Ctrl+C?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;❌ The &lt;code&gt;while(true)&lt;/code&gt; loop is interrupted&lt;/li&gt;
&lt;li&gt;❌ &lt;code&gt;consumer.close()&lt;/code&gt; is NEVER called&lt;/li&gt;
&lt;li&gt;❌ Offsets are NOT committed&lt;/li&gt;
&lt;li&gt;❌ Resources are NOT released&lt;/li&gt;
&lt;li&gt;❌ Next time you start: &lt;strong&gt;duplicate message processing&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a serious problem in production systems!&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: Understanding Shutdown Hooks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is a Shutdown Hook?
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Shutdown Hook&lt;/strong&gt; is a special mechanism provided by the JVM (Java Virtual Machine) that allows you to run cleanup code when your program is about to exit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Think of it as:&lt;/strong&gt; "Hey JVM, before you shut down, please run this cleanup code for me!"&lt;/p&gt;

&lt;h3&gt;
  
  
  When Does a Shutdown Hook Trigger?
&lt;/h3&gt;

&lt;p&gt;✅ &lt;strong&gt;It WILL trigger when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You press &lt;code&gt;Ctrl+C&lt;/code&gt; (SIGINT signal)&lt;/li&gt;
&lt;li&gt;You call &lt;code&gt;System.exit()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Your program finishes normally&lt;/li&gt;
&lt;li&gt;Operating system sends SIGTERM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;It will NOT trigger when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Force kill: &lt;code&gt;kill -9&lt;/code&gt; (SIGKILL)&lt;/li&gt;
&lt;li&gt;JVM crashes&lt;/li&gt;
&lt;li&gt;Operating system crashes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Basic Shutdown Hook Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ShutdownHookDemo&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Application starting..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// Register a Shutdown Hook&lt;/span&gt;
        &lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;addShutdownHook&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Shutdown detected! Cleaning up..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}));&lt;/span&gt;

        &lt;span class="c1"&gt;// Simulate work&lt;/span&gt;
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Working..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Sleep for 10 seconds&lt;/span&gt;

        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Work done!"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try it yourself:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the program&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;Ctrl+C&lt;/code&gt; during the 10-second sleep&lt;/li&gt;
&lt;li&gt;You'll see: "Shutdown detected! Cleaning up..."&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Part 3: Understanding Runtime.getRuntime()
&lt;/h2&gt;

&lt;p&gt;Before diving into the Kafka code, I needed to understand what &lt;code&gt;Runtime.getRuntime()&lt;/code&gt; means.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Runtime?
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Runtime&lt;/code&gt; class represents the JVM (Java Virtual Machine) environment. Since there's only ONE JVM per application, &lt;code&gt;Runtime&lt;/code&gt; uses the &lt;strong&gt;Singleton pattern&lt;/strong&gt; - ensuring only one instance exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key point:&lt;/strong&gt; You can't create a Runtime object with &lt;code&gt;new Runtime()&lt;/code&gt;. Instead, you must use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Runtime&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This always returns the same Runtime instance throughout your application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters for Shutdown Hooks
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Register a shutdown hook to the JVM&lt;/span&gt;
&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;addShutdownHook&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Cleanup code runs here"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Runtime&lt;/code&gt; is how we access JVM-level operations like adding shutdown hooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: Understanding Thread.join()
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;join()&lt;/code&gt; method was initially confusing to me. Here's what I learned.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Does join() Do?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;thread.join()&lt;/code&gt; makes the current thread &lt;strong&gt;wait&lt;/strong&gt; until another thread finishes executing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple analogy:&lt;/strong&gt; It's like waiting for someone to finish their task before you continue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;worker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Working..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Done!"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;

&lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;worker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Wait here until worker finishes&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Worker completed, continuing..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Visual Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without join():&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Main Thread:    [start worker] → [END]
Worker Thread:                [Working...] → [END]
                               ↑ Main doesn't wait!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With join():&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Main Thread:    [start worker] → [join - waiting...] → [END]
Worker Thread:                [Working...] → [Done!] ┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why It Throws InterruptedException
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;join()&lt;/code&gt; method can be interrupted by other threads, so we need to handle the exception:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 5: The Complete Kafka Shutdown Pattern
&lt;/h2&gt;

&lt;p&gt;Now let's put all the pieces together and understand the full Kafka consumer shutdown code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Get a Reference to the Main Thread
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// get a reference to the main thread&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;mainThread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why do we need this?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Shutdown Hook runs in a &lt;strong&gt;different thread&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;That thread needs to know which thread to wait for&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Thread.currentThread()&lt;/code&gt; returns the currently executing thread (in this case, main thread)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;final&lt;/code&gt; keyword allows the anonymous inner class to access this variable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Register the Shutdown Hook
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;addShutdownHook&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Detected a shutdown, let's exit by calling consumer.wakeup()..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;wakeup&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;mainThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Breaking it down:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Runtime.getRuntime()&lt;/code&gt;&lt;/strong&gt; - Get the singleton Runtime instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;addShutdownHook(new Thread() {...})&lt;/code&gt;&lt;/strong&gt; - Register a new thread to run on shutdown&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;new Thread() { public void run() {...} }&lt;/code&gt;&lt;/strong&gt; - Anonymous inner class defining thread behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;consumer.wakeup()&lt;/code&gt;&lt;/strong&gt; - Wake up the consumer from &lt;code&gt;poll()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mainThread.join()&lt;/code&gt;&lt;/strong&gt; - Wait for main thread to finish cleanup&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 3: Understanding consumer.wakeup()
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Main thread is stuck here, waiting for messages&lt;/span&gt;
&lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;poll()&lt;/code&gt; method &lt;strong&gt;blocks&lt;/strong&gt; (waits) for up to 1000ms looking for new messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does wakeup() do?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interrupts the &lt;code&gt;poll()&lt;/code&gt; operation&lt;/li&gt;
&lt;li&gt;Makes &lt;code&gt;poll()&lt;/code&gt; throw &lt;code&gt;WakeupException&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Allows the while loop to be exited&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Without wakeup():&lt;/strong&gt;&lt;br&gt;
Main thread might be stuck in &lt;code&gt;poll()&lt;/code&gt; for up to 1 second before noticing the shutdown!&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: The Main Consumer Loop
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribe&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Arrays&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;asList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Polling"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConsumerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;", Value: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Partition: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;", Offset: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WakeupException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Consumer is starting to shut down"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;){&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unexpected exception in consumer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// This is GUARANTEED to run&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"The consumer is now gracefully shut down"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;The flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Normal operation:&lt;/strong&gt; Continuously poll for messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ctrl+C pressed:&lt;/strong&gt; Shutdown Hook calls &lt;code&gt;wakeup()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WakeupException thrown:&lt;/strong&gt; Caught by the catch block&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finally block:&lt;/strong&gt; Always executes, closing the consumer&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Part 6: Complete Execution Flow
&lt;/h2&gt;

&lt;p&gt;Let's trace what happens when you press Ctrl+C:&lt;/p&gt;
&lt;h3&gt;
  
  
  Timeline Visualization
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ Time ─────────────────────────────────────────────────┐

1. Normal Operation
   Main Thread: [polling...polling...polling...]

2. User Presses Ctrl+C
   ↓
   JVM Detects Shutdown Signal

3. JVM Starts Shutdown Hook Thread
   ┌─ Shutdown Hook Thread ─────────────────┐
   │ 1. log("Detected a shutdown...")       │
   │ 2. consumer.wakeup()  ─────────┐       │
   │ 3. mainThread.join()           │       │
   │    [WAITING...]                │       │
   └────────────────────────────────┼───────┘
                                    │
                                    │ wakeup signal
                                    ↓
   ┌─ Main Thread ─────────────────────────┐
   │ poll() receives wakeup signal         │
   │ → Throws WakeupException              │
   │ → Enters catch block                  │
   │ → log("Consumer is shutting down")    │
   │ → Enters finally block                │
   │ → consumer.close()                    │
   │    - Commits offsets                  │
   │    - Releases resources               │
   │ → log("Gracefully shut down")         │
   │ → Main thread ENDS ──────────┐        │
   └──────────────────────────────┼────────┘
                                  │
                                  │ main thread finished
                                  ↓
   ┌─ Shutdown Hook Thread ────────────────┐
   │ join() returns                        │
   │ Shutdown Hook thread ENDS             │
   └───────────────────────────────────────┘

4. JVM Exits Cleanly
   All resources properly released! ✓

└────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Why join() is Critical
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without join():&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shutdown Hook:  [wakeup()] → [END]
                     ↓
Main Thread:        [processing...] → [close()] → [END]
                                      ↑
                                   Might not finish!
                                   JVM exits too early!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With join():&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shutdown Hook:  [wakeup()] → [join() - WAITING...] → [END]
                     ↓                        ↑
Main Thread:        [processing] → [close()] ┘ → [END]
                                  ↑
                              Guaranteed to complete!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 7: My Understanding - How It All Connects
&lt;/h2&gt;

&lt;p&gt;After going through all the concepts above, here's how I finally understood the complete shutdown mechanism:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Shutdown Hook is a JVM Mechanism
&lt;/h3&gt;

&lt;p&gt;I learned that the shutdown hook is provided by the JVM itself. When I press Ctrl+C (or when the program exits in other ways), the JVM triggers this hook. The &lt;code&gt;run()&lt;/code&gt; function inside the shutdown hook is what gets executed when this trigger happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Happens in the run() Method
&lt;/h3&gt;

&lt;p&gt;In the shutdown hook's &lt;code&gt;run()&lt;/code&gt; method, two key things happen:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;consumer.wakeup()&lt;/code&gt; is called&lt;/strong&gt; - This interrupts the consumer that's stuck in the infinite polling loop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mainThread.join()&lt;/code&gt; is called&lt;/strong&gt; - This makes the shutdown hook thread wait for the main thread to finish&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Try-Catch-Finally Structure
&lt;/h3&gt;

&lt;p&gt;The infinite polling loop is wrapped with a try-catch-finally structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(...);&lt;/span&gt;  &lt;span class="c1"&gt;// Infinite loop polling for messages&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WakeupException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Catches the exception thrown by consumer.wakeup()&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Consumer is starting to shut down"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// This ALWAYS executes&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How the Exception Flow Works
&lt;/h3&gt;

&lt;p&gt;Here's the key insight that helped me understand the flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The shutdown hook's &lt;code&gt;run()&lt;/code&gt; calls &lt;code&gt;consumer.wakeup()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;This causes &lt;code&gt;consumer.poll()&lt;/code&gt; to throw a &lt;code&gt;WakeupException&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The exception breaks out of the infinite &lt;code&gt;while(true)&lt;/code&gt; loop&lt;/li&gt;
&lt;li&gt;The catch block catches &lt;code&gt;WakeupException&lt;/code&gt; and logs "...starting to shut down..."&lt;/li&gt;
&lt;li&gt;The finally block ALWAYS executes and calls &lt;code&gt;consumer.close()&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why This Design Makes Sense
&lt;/h3&gt;

&lt;p&gt;The way I see it now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without the shutdown hook:&lt;/strong&gt; The infinite loop would never break, &lt;code&gt;consumer.close()&lt;/code&gt; would never run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without &lt;code&gt;wakeup()&lt;/code&gt;:&lt;/strong&gt; The main thread would be stuck in &lt;code&gt;poll()&lt;/code&gt;, not knowing it needs to shut down&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without &lt;code&gt;join()&lt;/code&gt;:&lt;/strong&gt; The JVM might exit before &lt;code&gt;consumer.close()&lt;/code&gt; finishes, losing uncommitted offsets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without try-catch-finally:&lt;/strong&gt; We couldn't handle the &lt;code&gt;WakeupException&lt;/code&gt; properly and guarantee cleanup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pattern ensures that no matter how the program exits (Ctrl+C, System.exit(), etc.), the consumer will always close gracefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 8: Key Concepts Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Shutdown Hook
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Execute cleanup code before JVM exits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Registration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Runtime.getRuntime().addShutdownHook(thread)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Triggers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ctrl+C, System.exit(), normal termination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does NOT trigger&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;kill -9, JVM crash&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Singleton Pattern
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Definition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ensures only ONE instance of a class exists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Private constructor + static getInstance()&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runtime class&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Why&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Some resources should be unique (JVM environment)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Thread.join()
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wait for another thread to complete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Syntax&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;thread.join()&lt;/code&gt; or &lt;code&gt;thread.join(timeout)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throws&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;InterruptedException (if interrupted while waiting)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use in Kafka&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ensure main thread completes cleanup before JVM exits&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  consumer.wakeup()
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interrupt a consumer that's blocked in poll()&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Effect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Throws WakeupException in the polling thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Thread-safe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Can be called from a different thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Graceful shutdown from Shutdown Hook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Part 9: Common Mistakes and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: Not Using final for mainThread
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ Wrong - Compiler error!&lt;/span&gt;
&lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;mainThread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;addShutdownHook&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;mainThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Error: Cannot access non-final variable&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ Correct&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;mainThread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;addShutdownHook&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;mainThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Works!&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; Anonymous inner classes can only access &lt;code&gt;final&lt;/code&gt; or effectively final variables from the enclosing scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Calling wakeup() From Main Thread
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ Wrong - This doesn't help!&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;wakeup&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// This is in the SAME thread!&lt;/span&gt;
    &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;wakeup()&lt;/code&gt; must be called from a &lt;strong&gt;different thread&lt;/strong&gt; (like a Shutdown Hook) to interrupt &lt;code&gt;poll()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Forgetting the catch Block
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ Wrong - WakeupException propagates!&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ Correct - Catch WakeupException&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WakeupException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Expected exception - handle gracefully&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mistake 4: Not Handling InterruptedException
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ Wrong - Ignoring the exception&lt;/span&gt;
&lt;span class="n"&gt;mainThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ Correct - Always handle it&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mainThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;✅ &lt;strong&gt;Shutdown Hooks&lt;/strong&gt; provide a way to run cleanup code before JVM exits&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Runtime is a Singleton&lt;/strong&gt; - there's only one JVM environment per application&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Thread.join()&lt;/strong&gt; makes one thread wait for another to complete&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;consumer.wakeup()&lt;/strong&gt; interrupts poll() from a different thread&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;final keyword&lt;/strong&gt; is necessary for variables accessed in anonymous inner classes&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;try-catch-finally&lt;/strong&gt; pattern ensures resources are always released&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Graceful shutdown&lt;/strong&gt; prevents data loss and duplicate processing&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;When I started learning Kafka, I didn't understand why we needed all this complexity just to stop a consumer. But now I realize that &lt;strong&gt;graceful shutdown is fundamental to building reliable systems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The key insights:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Shutdown Hooks&lt;/strong&gt; give you a chance to cleanup before the JVM exits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Singleton pattern&lt;/strong&gt; (like Runtime) ensures system resources are managed correctly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread coordination&lt;/strong&gt; (join, wakeup) allows different threads to work together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proper exception handling&lt;/strong&gt; ensures cleanup code always runs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pattern isn't just for Kafka - it applies to any Java application that needs to cleanup resources on shutdown: database connections, file handles, network sockets, and more.&lt;/p&gt;

&lt;p&gt;Understanding these fundamentals will make you a better Java developer and help you build more robust applications.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Course Reference:&lt;/strong&gt; &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;Apache Kafka Series - Learn Apache Kafka for Beginners v3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>threading</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Setting Up Kafka 4.0 Locally with Docker: A Learning Journey</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Tue, 23 Dec 2025 08:23:05 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/setting-up-kafka-40-locally-with-docker-a-learning-journey-dif</link>
      <guid>https://dev.to/hantedyou_0106/setting-up-kafka-40-locally-with-docker-a-learning-journey-dif</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As part of my journey through the "Apache Kafka Series - Learn Apache Kafka for Beginners v3" Udemy course, I needed to set up a local Kafka environment using Docker. What started as a simple container setup evolved into a deeper understanding of Kafka architecture, Docker best practices, and the transition from Zookeeper to KRaft mode.&lt;/p&gt;

&lt;p&gt;In this post, I'll share my learning process, the decisions I made, and the final production-ready configuration I arrived at.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting Point: Finding the Right Docker Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial Research - Confluent vs Bitnami vs Apache Official
&lt;/h3&gt;

&lt;p&gt;When searching for Kafka Docker setups, I encountered three main options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Confluent's official tutorial&lt;/strong&gt; - &lt;a href="https://developer.confluent.io/confluent-tutorials/kafka-on-docker/" rel="noopener noreferrer"&gt;https://developer.confluent.io/confluent-tutorials/kafka-on-docker/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bitnami Kafka image&lt;/strong&gt; - Popular for its ease of use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Kafka official image&lt;/strong&gt; - The source of truth&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Initially, I was torn between Bitnami (known for simplified configuration) and Apache's official image (more control but steeper learning curve).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Confluent Tutorial Discovery
&lt;/h3&gt;

&lt;p&gt;The Confluent tutorial provided an excellent starting point with this configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;broker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apache/kafka:latest&lt;/span&gt;
    &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;9092:9092&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_BROKER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_LISTENER_SECURITY_PROTOCOL_MAP&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_ADVERTISED_LISTENERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_TRANSACTION_STATE_LOG_MIN_ISR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_PROCESS_ROLES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker,controller&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_NODE_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_CONTROLLER_QUORUM_VOTERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1@broker:29093&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_LISTENERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_INTER_BROKER_LISTENER_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_CONTROLLER_LISTENER_NAMES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CONTROLLER&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_LOG_DIRS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/tmp/kraft-combined-logs&lt;/span&gt;
      &lt;span class="na"&gt;CLUSTER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MkU3OEVBNTcwNTJENDM2Qk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why I chose the Apache official image:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Direct from the source (Apache Foundation)&lt;/li&gt;
&lt;li&gt;✅ Production-ready and enterprise-grade&lt;/li&gt;
&lt;li&gt;✅ Better alignment with official documentation&lt;/li&gt;
&lt;li&gt;✅ Latest features and security updates&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Learning #1: Understanding KRaft Mode
&lt;/h2&gt;

&lt;p&gt;One of the biggest revelations was learning about &lt;strong&gt;KRaft&lt;/strong&gt; (Kafka Raft) - Kafka's replacement for Zookeeper.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is KRaft?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;K&lt;/strong&gt;afka &lt;strong&gt;R&lt;/strong&gt;aft &lt;strong&gt;A&lt;/strong&gt;lgorithm for &lt;strong&gt;T&lt;/strong&gt;racking metadata&lt;/li&gt;
&lt;li&gt;Eliminates Zookeeper dependency&lt;/li&gt;
&lt;li&gt;Single process can handle both broker and controller roles&lt;/li&gt;
&lt;li&gt;Faster startup and simpler architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuration Breakdown:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;KAFKA_PROCESS_ROLES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker,controller&lt;/span&gt;        &lt;span class="c1"&gt;# Single node handles both roles&lt;/span&gt;
&lt;span class="na"&gt;KAFKA_NODE_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;                             &lt;span class="c1"&gt;# Unique node identifier&lt;/span&gt;
&lt;span class="na"&gt;KAFKA_CONTROLLER_QUORUM_VOTERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1@broker:29093&lt;/span&gt;  &lt;span class="c1"&gt;# Controller election&lt;/span&gt;
&lt;span class="na"&gt;KAFKA_CONTROLLER_LISTENER_NAMES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CONTROLLER&lt;/span&gt;   &lt;span class="c1"&gt;# Controller communication&lt;/span&gt;
&lt;span class="na"&gt;CLUSTER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MkU3OEVBNTcwNTJENDM2Qk&lt;/span&gt;            &lt;span class="c1"&gt;# Unique cluster identifier&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits of KRaft mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚡ Faster startup (no Zookeeper coordination)&lt;/li&gt;
&lt;li&gt;🏗️ Simpler architecture&lt;/li&gt;
&lt;li&gt;📈 Better scalability&lt;/li&gt;
&lt;li&gt;🔄 Single point of configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Learning #2: Docker Image Layering and Customization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Command Path Problem
&lt;/h3&gt;

&lt;p&gt;Initially, running Kafka commands required full paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec &lt;/span&gt;broker /opt/kafka/bin/kafka-topics.sh &lt;span class="nt"&gt;--list&lt;/span&gt; &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; localhost:9092
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was verbose and error-prone. I learned about Docker image layering and decided to create a custom image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: Custom Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; apache/kafka:4.0.1&lt;/span&gt;

&lt;span class="c"&gt;# Add Kafka bin directory to PATH for convenient command usage&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PATH="/opt/kafka/bin:${PATH}"&lt;/span&gt;

&lt;span class="c"&gt;# Set working directory&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /opt/kafka&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Updated docker-compose.yml
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;broker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-kafka-kraft:4.0.1&lt;/span&gt;               &lt;span class="c1"&gt;# Custom image name&lt;/span&gt;
    &lt;span class="c1"&gt;# ... rest of configuration&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insights:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Original Apache image remains unchanged&lt;/li&gt;
&lt;li&gt;✅ New layer adds convenience without bloat&lt;/li&gt;
&lt;li&gt;✅ Commands now work directly: &lt;code&gt;kafka-topics.sh --list --bootstrap-server localhost:9092&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Learning #3: Understanding Kafka Listeners
&lt;/h2&gt;

&lt;p&gt;The listener configuration was initially confusing but crucial for proper networking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;KAFKA_LISTENERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092&lt;/span&gt;
&lt;span class="na"&gt;KAFKA_ADVERTISED_LISTENERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  LISTENERS vs ADVERTISED_LISTENERS:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LISTENERS&lt;/strong&gt; = "Where Kafka actually listens" (Server-side binding)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;ADVERTISED_LISTENERS&lt;/strong&gt; = "How clients should connect" (Client-side addressing)&lt;/p&gt;
&lt;h3&gt;
  
  
  Listener Breakdown:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Listener&lt;/th&gt;
&lt;th&gt;Binding Address&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Access From&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PLAINTEXT://broker:29092&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Container hostname&lt;/td&gt;
&lt;td&gt;29092&lt;/td&gt;
&lt;td&gt;Inter-service communication&lt;/td&gt;
&lt;td&gt;Docker network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CONTROLLER://broker:29093&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Container hostname&lt;/td&gt;
&lt;td&gt;29093&lt;/td&gt;
&lt;td&gt;KRaft metadata operations&lt;/td&gt;
&lt;td&gt;Internal only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PLAINTEXT_HOST://0.0.0.0:9092&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;All interfaces&lt;/td&gt;
&lt;td&gt;9092&lt;/td&gt;
&lt;td&gt;External client access&lt;/td&gt;
&lt;td&gt;Host machine&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Why Different Addresses?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Internal Communication&lt;/strong&gt;: &lt;code&gt;broker:29092&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Other Docker services connect using container hostname&lt;/li&gt;
&lt;li&gt;Fast, low-latency container-to-container networking&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Controller Operations&lt;/strong&gt;: &lt;code&gt;broker:29093&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KRaft protocol for cluster coordination&lt;/li&gt;
&lt;li&gt;Replaces Zookeeper functionality&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;External Access&lt;/strong&gt;: &lt;code&gt;localhost:9092&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Host machine applications connect via port forwarding&lt;/li&gt;
&lt;li&gt;Docker maps container port to host port&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Network Flow Diagram:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;External App → localhost:9092 → Docker Port Mapping → PLAINTEXT_HOST://0.0.0.0:9092
                                                           ↓
Internal Service → broker:29092 → PLAINTEXT://broker:29092 → Kafka Broker
                                                           ↓
KRaft System → broker:29093 → CONTROLLER://broker:29093 ↗
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Key Insight&lt;/strong&gt;: The address Kafka binds to (0.0.0.0) differs from what it advertises to clients (localhost) because clients can't connect to 0.0.0.0 directly.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Learning #4: Data Persistence Strategy
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The Problem: Data Loss on Container Restart
&lt;/h3&gt;

&lt;p&gt;Initially, running &lt;code&gt;docker-compose down&lt;/code&gt; would delete all topics and data. This happened because:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;KAFKA_LOG_DIRS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/tmp/kraft-combined-logs&lt;/span&gt;  &lt;span class="c1"&gt;# Data stored in container's temp directory&lt;/span&gt;
&lt;span class="c1"&gt;# No volumes configured = data loss on container deletion&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution: Docker Volumes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;broker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# ... other configuration&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/tmp/kraft-combined-logs&lt;/span&gt;  &lt;span class="c1"&gt;# Bind mount for data persistence&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What Gets Persisted:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./data/
├── __cluster_metadata-0/              # KRaft metadata (replaces Zookeeper)
├── __consumer_offsets-*/              # Consumer group offsets
├── my-topic-0/                        # Topic partition data
│   ├── 00000000000000000000.log       # Actual messages
│   ├── 00000000000000000000.index     # Message index
│   └── partition.metadata             # Partition metadata
├── meta.properties                    # Broker metadata
└── ...                               # Other Kafka state files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Volume strategy comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;No volumes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple setup&lt;/td&gt;
&lt;td&gt;Data loss on restart&lt;/td&gt;
&lt;td&gt;Learning/testing only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Named volumes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker managed&lt;/td&gt;
&lt;td&gt;Hidden location&lt;/td&gt;
&lt;td&gt;Development&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bind mounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full control, easy backup&lt;/td&gt;
&lt;td&gt;Manual directory management&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Production&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Final Local Development Configuration
&lt;/h2&gt;

&lt;p&gt;Here's my complete local development setup:&lt;/p&gt;

&lt;h3&gt;
  
  
  docker-compose.yml
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;broker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;                              &lt;span class="c1"&gt;# Build context&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;                  &lt;span class="c1"&gt;# Custom Dockerfile&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-kafka-kraft:4.0.1&lt;/span&gt;              &lt;span class="c1"&gt;# Explicit image name&lt;/span&gt;
    &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;9092:9092&lt;/span&gt;                            &lt;span class="c1"&gt;# External client port&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/tmp/kraft-combined-logs&lt;/span&gt;      &lt;span class="c1"&gt;# Data persistence&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_BROKER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_LISTENER_SECURITY_PROTOCOL_MAP&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_ADVERTISED_LISTENERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_TRANSACTION_STATE_LOG_MIN_ISR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_PROCESS_ROLES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;broker,controller&lt;/span&gt;  &lt;span class="c1"&gt;# KRaft mode&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_NODE_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_CONTROLLER_QUORUM_VOTERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1@broker:29093&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_LISTENERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_INTER_BROKER_LISTENER_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PLAINTEXT&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_CONTROLLER_LISTENER_NAMES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CONTROLLER&lt;/span&gt;
      &lt;span class="na"&gt;KAFKA_LOG_DIRS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/tmp/kraft-combined-logs&lt;/span&gt;
      &lt;span class="na"&gt;CLUSTER_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MkU3OEVBNTcwNTJENDM2Qk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; apache/kafka:4.0.1&lt;/span&gt;

&lt;span class="c"&gt;# Add Kafka bin directory to PATH for convenient command usage&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PATH="/opt/kafka/bin:${PATH}"&lt;/span&gt;

&lt;span class="c"&gt;# Set working directory&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /opt/kafka&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kafka-docker/
├── docker-compose.yml      # Main configuration
├── Dockerfile              # Custom image definition
├── data/                   # Kafka data (auto-created)
│   ├── __cluster_metadata-0/
│   ├── my-topic-0/
│   └── ...
└── docker_commands         # Command reference file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Takeaways and Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Choose the Right Base Image&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use official Apache Kafka for production environments&lt;/li&gt;
&lt;li&gt;Bitnami is excellent for quick prototyping&lt;/li&gt;
&lt;li&gt;Always pin specific versions: &lt;code&gt;apache/kafka:4.0.1&lt;/code&gt; vs &lt;code&gt;latest&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Embrace KRaft Mode&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simpler than Zookeeper-based setups&lt;/li&gt;
&lt;li&gt;Better performance and reliability&lt;/li&gt;
&lt;li&gt;Future-proof (Zookeeper deprecation planned)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Layer Docker Images Thoughtfully&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Keep customizations minimal and purpose-driven&lt;/li&gt;
&lt;li&gt;Document why each layer exists&lt;/li&gt;
&lt;li&gt;Use multi-stage builds for complex setups&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Plan for Data Persistence&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Always use volumes in production&lt;/li&gt;
&lt;li&gt;Bind mounts offer better control than named volumes&lt;/li&gt;
&lt;li&gt;Backup strategy should include volume data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Network Configuration Matters&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Understand internal vs external listeners&lt;/li&gt;
&lt;li&gt;Plan port allocation carefully&lt;/li&gt;
&lt;li&gt;Test connectivity from both inside and outside containers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This journey from a simple Docker container to a well-configured local Kafka setup taught me valuable lessons about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modern Kafka architecture&lt;/strong&gt; (KRaft vs Zookeeper)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker best practices&lt;/strong&gt; (layering, volumes, networking)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration decisions&lt;/strong&gt; (persistence, networking, image customization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development environment setup&lt;/strong&gt; (network restrictions, data management)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final configuration is suitable for local development and learning, with a solid foundation that could be enhanced for production use when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developer.confluent.io/confluent-tutorials/kafka-on-docker/" rel="noopener noreferrer"&gt;Confluent Kafka Docker Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kafka.apache.org/documentation/" rel="noopener noreferrer"&gt;Apache Kafka Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kafka.apache.org/documentation/#kraft" rel="noopener noreferrer"&gt;KRaft Mode Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>kafka</category>
      <category>docker</category>
      <category>kraft</category>
    </item>
    <item>
      <title>Understanding Kafka Producer: From Basics to Sticky Partitioner</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Sun, 21 Dec 2025 10:18:02 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/understanding-kafka-producer-from-basics-to-sticky-partitioner-48ap</link>
      <guid>https://dev.to/hantedyou_0106/understanding-kafka-producer-from-basics-to-sticky-partitioner-48ap</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When I started learning Apache Kafka, one of the first questions I had was: "How exactly does a Producer work?" The documentation was thorough, but I wanted to understand it through practical examples.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk you through what I learned about Kafka Producers, from sending your first message to understanding the mysterious "Sticky Partitioner." We'll build two simple Java programs and explore some interesting behaviors along the way.&lt;/p&gt;

&lt;p&gt;This guide is based on the excellent course &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;"Apache Kafka Series - Learn Apache Kafka for Beginners v3"&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: My First Kafka Producer
&lt;/h2&gt;

&lt;p&gt;Let's start with the simplest possible Kafka Producer that sends a single message.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up Producer Properties
&lt;/h3&gt;

&lt;p&gt;Every Kafka Producer needs configuration. We use Java's &lt;code&gt;Properties&lt;/code&gt; class for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Properties&lt;/span&gt; &lt;span class="n"&gt;properties&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Properties&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bootstrap.servers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"127.0.0.1:9092"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"value.serializer"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StringSerializer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getName&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's happening here?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bootstrap.servers&lt;/code&gt;: The address of your Kafka broker(s)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;key.serializer&lt;/code&gt;: Converts your key object into bytes for transmission&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;value.serializer&lt;/code&gt;: Converts your value object into bytes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;code&gt;java.util.Properties&lt;/code&gt; is similar to Python's &lt;code&gt;dict&lt;/code&gt;, but it only stores string key-value pairs. It's part of the Java standard library, not specific to Kafka.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the Producer
&lt;/h3&gt;

&lt;p&gt;With properties configured, we can create our producer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KafkaProducer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generic types &lt;code&gt;&amp;lt;String, String&amp;gt;&lt;/code&gt; represent the types for key and value respectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating and Sending a Message
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;producerRecord&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"demo_java"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producerRecord&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;"demo_java"&lt;/code&gt; is the topic name, and &lt;code&gt;"hello world"&lt;/code&gt; is our message. Notice we didn't specify a key - it defaults to &lt;code&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Critical Trio: send(), flush(), close()
&lt;/h3&gt;

&lt;p&gt;This is where things get interesting. Here's what each method does:&lt;/p&gt;

&lt;h4&gt;
  
  
  send() - Asynchronous Operation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producerRecord&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous&lt;/strong&gt;: The message goes into a buffer, it's NOT sent immediately&lt;/li&gt;
&lt;li&gt;If your program exits right after this, the message might never reach Kafka&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  flush() - Synchronous Operation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous&lt;/strong&gt;: Forces all buffered messages to be sent and blocks until complete&lt;/li&gt;
&lt;li&gt;Useful for learning/demos to ensure messages are sent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rarely used in production&lt;/strong&gt; because it impacts performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  close() - Cleanup
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Shuts down the Producer and releases resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internally calls flush()&lt;/strong&gt; to ensure all messages are sent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MUST be called&lt;/strong&gt; in production to prevent resource leaks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's the relationship:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;send()          flush()         close()
  ↓               ↓               ↓
[Buffer] -----&amp;gt; [Send] -----&amp;gt; [Clean up]
(async)        (sync)         (includes flush)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best Practice:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always call &lt;code&gt;close()&lt;/code&gt; in production&lt;/li&gt;
&lt;li&gt;Avoid calling &lt;code&gt;flush()&lt;/code&gt; unless absolutely necessary&lt;/li&gt;
&lt;li&gt;Let Kafka handle batching automatically for better performance&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 2: Producer with Callbacks
&lt;/h2&gt;

&lt;p&gt;Now let's level up and add callbacks to track message metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Use Callbacks?
&lt;/h3&gt;

&lt;p&gt;Callbacks let you know when a message is successfully sent (or if it failed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producerRecord&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Callback&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onCompletion&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RecordMetadata&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Success!&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Received new metadata \n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                    &lt;span class="s"&gt;"Topic:\t"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                    &lt;span class="s"&gt;"Partition:\t"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                    &lt;span class="s"&gt;"Offset:\t"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                    &lt;span class="s"&gt;"Timestamp:\t"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Something went wrong&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error while producing"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What metadata can you get?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;topic&lt;/code&gt;: Which topic the message was sent to&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;partition&lt;/code&gt;: Which partition number within that topic&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;offset&lt;/code&gt;: The position of this message in the partition&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;timestamp&lt;/code&gt;: When the message was created&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 3: Understanding the Sticky Partitioner
&lt;/h2&gt;

&lt;p&gt;This is where it gets really interesting. When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: &lt;strong&gt;all messages went to partition 0&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Was my code broken? Not quite. I needed to understand the &lt;strong&gt;Sticky Partitioner&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the Sticky Partitioner?
&lt;/h3&gt;

&lt;p&gt;Introduced in Kafka 2.4+, the &lt;strong&gt;UniformStickyPartitioner&lt;/strong&gt; is the default partitioner when messages &lt;strong&gt;don't have a key&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Messages "stick" to one partition until a batch is full&lt;/li&gt;
&lt;li&gt;When the batch is sent, switch to a different partition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal&lt;/strong&gt;: Improve performance by reducing network requests&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When Does a Batch Get Sent?
&lt;/h3&gt;

&lt;p&gt;A batch is sent when &lt;strong&gt;ANY&lt;/strong&gt; of these conditions is met:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;batch.size&lt;/code&gt; - Batch reaches configured size (in &lt;strong&gt;bytes&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;linger.ms&lt;/code&gt; - Time limit reached (in &lt;strong&gt;milliseconds&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flush()&lt;/code&gt; - Manually forced&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Common Misconception: batch.size is NOT time!
&lt;/h3&gt;

&lt;p&gt;I initially thought &lt;code&gt;batch.size&lt;/code&gt; was related to time. &lt;strong&gt;It's not!&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;batch.size&lt;/code&gt; = Size in &lt;strong&gt;bytes&lt;/strong&gt; (default: 16384 bytes or 16 KB)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;linger.ms&lt;/code&gt; = Time in &lt;strong&gt;milliseconds&lt;/strong&gt; (default: 0)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Demonstrating Sticky Partitioner Behavior
&lt;/h3&gt;

&lt;p&gt;To observe partition switching, we need to make batches fill up quickly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Make batches smaller so they fill up faster&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"batch.size"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"400"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// 400 bytes instead of 16KB&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why 400?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each message is roughly 15-20 bytes ("hello worldXX")&lt;/li&gt;
&lt;li&gt;400 bytes ÷ 20 bytes ≈ &lt;strong&gt;20 messages per batch&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;When a batch fills up → it's sent → switches to next partition&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Loop with Delay
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Pause every 10 messages&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;producerRecord&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ProducerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"demo_java"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;producerRecord&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Expected Behavior
&lt;/h3&gt;

&lt;p&gt;With 3 partitions and &lt;code&gt;batch.size=400&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First ~20 messages → Partition 0&lt;/li&gt;
&lt;li&gt;Next ~20 messages → Partition 1&lt;/li&gt;
&lt;li&gt;Next ~20 messages → Partition 2&lt;/li&gt;
&lt;li&gt;Cycle continues...&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can observe this in the callback logs showing the partition number!&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Configuration Parameters
&lt;/h2&gt;

&lt;p&gt;Here's a quick reference table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bootstrap.servers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kafka broker address&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;key.serializer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Key serializer class&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;value.serializer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Value serializer class&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;batch.size&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Max batch size&lt;/td&gt;
&lt;td&gt;16384&lt;/td&gt;
&lt;td&gt;bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;linger.ms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Max wait time before sending&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Keyed vs Keyless Messages
&lt;/h2&gt;

&lt;p&gt;The partitioning behavior changes based on whether you specify a key:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Partitioner Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;With key&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Messages with the same key go to the same partition (hash-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Without key&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Uses Sticky Partitioner (batch-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Troubleshooting Common Issues
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Error: Invalid value null for configuration value.serializer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Typo in property name&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ &lt;code&gt;values.serializer&lt;/code&gt; (with 's')&lt;/li&gt;
&lt;li&gt;✅ &lt;code&gt;value.serializer&lt;/code&gt; (singular)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  All Messages Going to Partition 0
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Possible causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Topic only has 1 partition&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;batch.size&lt;/code&gt; is too large - all messages fit in one batch&lt;/li&gt;
&lt;li&gt;Wrong delay logic (&lt;code&gt;i/10&lt;/code&gt; instead of &lt;code&gt;i%10&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
Check your topic configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kafka-topics.sh &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; localhost:9092 &lt;span class="nt"&gt;--describe&lt;/span&gt; &lt;span class="nt"&gt;--topic&lt;/span&gt; demo_java
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adjust batch size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"batch.size"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"400"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix the delay logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// Correct!&lt;/span&gt;
    &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why Sticky Partitioner is Better Than Round-Robin
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Old Way: Round-Robin (Pre-Kafka 2.4)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each message goes to a different partition in sequence&lt;/li&gt;
&lt;li&gt;100 messages = potentially 100 network requests&lt;/li&gt;
&lt;li&gt;More overhead, more latency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  New Way: Sticky Partitioner (Kafka 2.4+)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Messages stick to one partition until batch is full&lt;/li&gt;
&lt;li&gt;100 messages = maybe 5 network requests (assuming 20 messages per batch)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fewer network calls = better throughput&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When to Use flush()
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Good use cases for flush():&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Financial transaction systems (need immediate confirmation)&lt;/li&gt;
&lt;li&gt;Critical logging where every message must be guaranteed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Avoid flush() in:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular log aggregation&lt;/li&gt;
&lt;li&gt;High-throughput data pipelines&lt;/li&gt;
&lt;li&gt;Real-time analytics (slight delay is acceptable)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Remember:&lt;/strong&gt; Kafka's automatic batching is designed for performance. Only override it when you have a specific requirement!&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;✅ &lt;strong&gt;Properties configuration&lt;/strong&gt; is the foundation - get it right&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;send() is async&lt;/strong&gt; - use flush() or close() to ensure delivery&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Callbacks provide metadata&lt;/strong&gt; - use them to track message status&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Sticky Partitioner batches by partition&lt;/strong&gt; - improves performance&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;batch.size is in bytes&lt;/strong&gt;, not time - linger.ms is the time setting&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Keyless messages&lt;/strong&gt; use Sticky Partitioner&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Always call close()&lt;/strong&gt; - prevents resource leaks&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Understanding how Kafka Producers work is fundamental to building robust event-driven systems. The key insights I gained were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The asynchronous nature of &lt;code&gt;send()&lt;/code&gt; and why &lt;code&gt;close()&lt;/code&gt; is critical&lt;/li&gt;
&lt;li&gt;How callbacks provide visibility into message delivery&lt;/li&gt;
&lt;li&gt;The performance benefits of the Sticky Partitioner&lt;/li&gt;
&lt;li&gt;The importance of proper configuration (especially &lt;code&gt;batch.size&lt;/code&gt; vs &lt;code&gt;linger.ms&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These concepts form the foundation for more advanced Kafka patterns like transactions, idempotent producers, and exactly-once semantics.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Course Reference:&lt;/strong&gt; &lt;a href="https://www.udemy.com/course/apache-kafka/" rel="noopener noreferrer"&gt;Apache Kafka Series - Learn Apache Kafka for Beginners v3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>tutorial</category>
      <category>backend</category>
    </item>
    <item>
      <title>From 8 Minutes to 40 Seconds: Solving Data Pipeline Deployment Bottlenecks with Git Sparse Checkout</title>
      <dc:creator>Byron Hsieh</dc:creator>
      <pubDate>Thu, 06 Nov 2025 04:40:00 +0000</pubDate>
      <link>https://dev.to/hantedyou_0106/from-30-minutes-to-5-solving-data-pipeline-deployment-bottlenecks-with-git-sparse-checkout-3m3d</link>
      <guid>https://dev.to/hantedyou_0106/from-30-minutes-to-5-solving-data-pipeline-deployment-bottlenecks-with-git-sparse-checkout-3m3d</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern data engineering projects typically use Python for orchestration (Airflow DAGs), data transformation, and DBT or SQL for data ingestion. At scale, however, deployment becomes a significant bottleneck.&lt;/p&gt;

&lt;p&gt;My data engineering team manages ingestion for a data warehouse containing nearly 10,000 tables. We follow a standardized approach where each table requires at least 5 programs covering the standard pipeline: file ingestion → staging → transformation → ODS layer.&lt;/p&gt;

&lt;p&gt;This results in &lt;strong&gt;over 50,000 program files&lt;/strong&gt; requiring deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 8-Minute Deployment Problem
&lt;/h2&gt;

&lt;p&gt;Our Azure DevOps CI/CD pipeline was taking nearly 8 minutes per deployment — unacceptable for any development workflow. Having previously managed deployment pipelines for Java microservices with comprehensive test suites, this was excessive for our relatively simple process in my opinion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Cause Analysis
&lt;/h2&gt;

&lt;p&gt;The deployment process consisted of four straightforward steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Checkout&lt;/strong&gt; codebase from repository&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract&lt;/strong&gt; programs listed in deployment manifest&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Package&lt;/strong&gt; selected programs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt; to target server directories&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Investigation revealed the bottleneck: &lt;strong&gt;checkout consumed over 7 minutes&lt;/strong&gt; — over 80% of total deployment time.&lt;/p&gt;

&lt;p&gt;With 50,000+ files in our monorepo, Git was downloading the entire codebase even when we only needed a small subset for deployment. This led me to explore &lt;strong&gt;Git sparse checkout&lt;/strong&gt; as a solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Sparse Checkout
&lt;/h2&gt;

&lt;p&gt;Git sparse checkout allows you to download only specific files or directories from a repository, rather than cloning the entire codebase. Introduced in Git 2.25 (2020), it's designed for exactly our use case: large monorepos where you only need a subset of files.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Traditional Problem
&lt;/h3&gt;

&lt;p&gt;Azure DevOps' default &lt;code&gt;checkout&lt;/code&gt; task doesn't support sparse checkout natively. The standard approach looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;checkout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;self&lt;/span&gt;
    &lt;span class="na"&gt;fetchDepth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# Only helps with history, not file count&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even with &lt;code&gt;fetchDepth: 1&lt;/code&gt;, Git still downloads &lt;strong&gt;all 50,000+ files&lt;/strong&gt; from our repository. Shallow clones reduce history but don't reduce the working tree size.&lt;/p&gt;

&lt;h3&gt;
  
  
  Our Deployment Reality
&lt;/h3&gt;

&lt;p&gt;In our data engineering workflow, deployments are selective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production deployment of 15 modified DAGs&lt;/li&gt;
&lt;li&gt;Staging deployment of 3 new data models&lt;/li&gt;
&lt;li&gt;Hotfix deployment of 2 SQL transformations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We don't need all 50,000 files — we need the &lt;strong&gt;specific files listed in our deployment manifest&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Four-Layer Optimization
&lt;/h3&gt;

&lt;p&gt;To maximize performance, we combine sparse checkout with three other Git optimizations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Blobless clone&lt;/strong&gt; (&lt;code&gt;--filter=blob:none&lt;/code&gt;) — Download tree structure, not file contents initially&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shallow clone&lt;/strong&gt; (&lt;code&gt;--depth 1&lt;/code&gt;) — Skip commit history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single branch&lt;/strong&gt; (&lt;code&gt;--single-branch&lt;/code&gt;) — Ignore other branches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparse checkout&lt;/strong&gt; (non-cone mode) — Get only files in deployment manifest&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Together, these techniques reduce our checkout from 8 minutes to under 40 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Guide
&lt;/h2&gt;

&lt;p&gt;The complete implementation is available in my &lt;a href="https://github.com/hantedyou/azure-pipeline-sparse-checkout-demo" rel="noopener noreferrer"&gt;GitHub demo repository&lt;/a&gt;. Here's the step-by-step breakdown:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Repository Structure
&lt;/h3&gt;

&lt;p&gt;The demo simulates a realistic financial data platform with 293 files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;financial-data-pipeline-demo/
├── dags/                        # 65 Airflow DAGs
├── data_model/                  # 67 DDL files (staging/marts/dimensions)
├── dbt/models/                  # DBT transformation models
├── metadata/                    # 27 schemas &amp;amp; configs
├── deployment/
│   └── deploy-list.txt         # Deployment manifest (10 files)
└── azure-pipeline-sparse-checkout.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Create the Deployment Manifest
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;deployment/deploy-list.txt&lt;/code&gt; defines exactly which files to deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dags/stock_price_ingestion.py
dags/bond_yield_analysis.py
dags/forex_rates_pipeline.py
metadata/schemas/stock_price_schema.json
metadata/schemas/bond_yield_schema.json
data_model/staging/stg_stock_prices.sql
data_model/staging/stg_bond_yields.sql
dbt/models/staging/stg_stock_prices.sql
dbt/models/staging/stg_bond_yields.sql
dbt/models/marts/fact_daily_prices.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Only 10 files are checked out from 293 total (97.8% reduction).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The Sparse Checkout Pipeline
&lt;/h3&gt;

&lt;p&gt;The core optimization is in &lt;code&gt;azure-pipeline-sparse-checkout.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;checkout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;none&lt;/span&gt;  &lt;span class="c1"&gt;# Disable default Azure DevOps checkout&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;echo "=== Sparse Checkout: Selective File Download ==="&lt;/span&gt;

    &lt;span class="s"&gt;# Four-layer optimization&lt;/span&gt;
    &lt;span class="s"&gt;git clone \&lt;/span&gt;
      &lt;span class="s"&gt;--filter=blob:none \      # Blobless clone&lt;/span&gt;
      &lt;span class="s"&gt;--no-checkout \            # Don't materialize files yet&lt;/span&gt;
      &lt;span class="s"&gt;--depth 1 \                # Shallow clone (no history)&lt;/span&gt;
      &lt;span class="s"&gt;--single-branch \          # Only current branch&lt;/span&gt;
      &lt;span class="s"&gt;--branch $(Build.SourceBranchName) \&lt;/span&gt;
      &lt;span class="s"&gt;$(repositoryUrl)&lt;/span&gt;

    &lt;span class="s"&gt;cd $(Build.SourcesDirectory)&lt;/span&gt;

    &lt;span class="s"&gt;# Enable file-level sparse checkout (non-cone mode)&lt;/span&gt;
    &lt;span class="s"&gt;git sparse-checkout init --no-cone&lt;/span&gt;

    &lt;span class="s"&gt;# Two-stage checkout process&lt;/span&gt;
    &lt;span class="s"&gt;# Stage 1: Get only the manifest file&lt;/span&gt;
    &lt;span class="s"&gt;echo "$(deployListFile)" &amp;gt; .git/info/sparse-checkout&lt;/span&gt;
    &lt;span class="s"&gt;git checkout $(Build.SourceBranchName)&lt;/span&gt;

    &lt;span class="s"&gt;# Stage 2: Read manifest and checkout actual files&lt;/span&gt;
    &lt;span class="s"&gt;grep -v '^#' $(deployListFile) | grep -v '^$' &amp;gt; .git/info/sparse-checkout&lt;/span&gt;
    &lt;span class="s"&gt;git checkout $(Build.SourceBranchName)&lt;/span&gt;

    &lt;span class="s"&gt;# Performance tracking&lt;/span&gt;
    &lt;span class="s"&gt;TOTAL_FILES=$(find . -type f -not -path './.git/*' | wc -l)&lt;/span&gt;
    &lt;span class="s"&gt;echo "Files checked out: $TOTAL_FILES"&lt;/span&gt;

  &lt;span class="na"&gt;displayName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sparse&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Checkout&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Selective&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Download'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Verification &amp;amp; Validation
&lt;/h3&gt;

&lt;p&gt;The pipeline automatically verifies the sparse checkout worked correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;cd $(Build.SourcesDirectory)&lt;/span&gt;

    &lt;span class="s"&gt;echo "=== Verifying Sparse Checkout ==="&lt;/span&gt;
    &lt;span class="s"&gt;missing_count=0&lt;/span&gt;
    &lt;span class="s"&gt;found_count=0&lt;/span&gt;

    &lt;span class="s"&gt;# Check each file from deploy-list.txt exists&lt;/span&gt;
    &lt;span class="s"&gt;while IFS= read -r file_path; do&lt;/span&gt;
      &lt;span class="s"&gt;if [ -f "$file_path" ]; then&lt;/span&gt;
        &lt;span class="s"&gt;found_count=$((found_count + 1))&lt;/span&gt;
      &lt;span class="s"&gt;else&lt;/span&gt;
        &lt;span class="s"&gt;echo "Missing: $file_path"&lt;/span&gt;
        &lt;span class="s"&gt;missing_count=$((missing_count + 1))&lt;/span&gt;
      &lt;span class="s"&gt;fi&lt;/span&gt;
    &lt;span class="s"&gt;done &amp;lt; &amp;lt;(grep -v '^#' $(deployListFile) | grep -v '^$')&lt;/span&gt;

    &lt;span class="s"&gt;echo "Files found: $found_count"&lt;/span&gt;
    &lt;span class="s"&gt;echo "Files missing: $missing_count"&lt;/span&gt;

  &lt;span class="na"&gt;displayName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Verify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Sparse&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Checkout'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Implementation Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;--no-cone&lt;/code&gt; Mode?
&lt;/h3&gt;

&lt;p&gt;Traditional cone mode only works with directories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cone mode (directory-only)&lt;/span&gt;
git sparse-checkout &lt;span class="nb"&gt;set &lt;/span&gt;dags/ metadata/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Non-cone mode enables file-level precision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Non-cone mode (file-level)&lt;/span&gt;
git sparse-checkout init &lt;span class="nt"&gt;--no-cone&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;deploy-list.txt &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .git/info/sparse-checkout
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets us cherry-pick specific files across multiple directories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Two-Stage Checkout?
&lt;/h3&gt;

&lt;p&gt;The manifest file itself must be checked out before we can read it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First checkout&lt;/strong&gt;: Get &lt;code&gt;deployment/deploy-list.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Second checkout&lt;/strong&gt;: Use manifest contents to get actual files&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without this, the pipeline would fail with "file not found" errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Results
&lt;/h2&gt;

&lt;p&gt;Running the pipeline on the demo repository:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Full Checkout&lt;/th&gt;
&lt;th&gt;Sparse Checkout&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Files Downloaded&lt;/td&gt;
&lt;td&gt;293 files&lt;/td&gt;
&lt;td&gt;10 files&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;97.8% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkout Time&lt;/td&gt;
&lt;td&gt;~45-60s&lt;/td&gt;
&lt;td&gt;~5-10s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80-90% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk Usage&lt;/td&gt;
&lt;td&gt;Full repo&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Significant savings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network Transfer&lt;/td&gt;
&lt;td&gt;All objects&lt;/td&gt;
&lt;td&gt;Blob-less + selective&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90%+ reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Production Environment Impact
&lt;/h3&gt;

&lt;p&gt;In our real-world deployment with 50,000+ program files:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before Optimization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deployment time: &lt;strong&gt;8 minutes&lt;/strong&gt; per deployment&lt;/li&gt;
&lt;li&gt;Checkout phase: &lt;strong&gt;6+ minutes&lt;/strong&gt; (83% of total time)&lt;/li&gt;
&lt;li&gt;Files downloaded: All 50,000+ files every time&lt;/li&gt;
&lt;li&gt;Network transfer: ~2GB per deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After Sparse Checkout:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deployment time: &lt;strong&gt;40 seconds&lt;/strong&gt; (90% reduction)&lt;/li&gt;
&lt;li&gt;Checkout phase: &lt;strong&gt;&amp;lt;20 seconds&lt;/strong&gt; (93% improvement)&lt;/li&gt;
&lt;li&gt;Files downloaded: ~500 files (only what's needed)&lt;/li&gt;
&lt;li&gt;Network transfer: ~200MB (90% reduction)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to Use This Approach
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;Ideal for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large monorepos (1,000+ files)&lt;/li&gt;
&lt;li&gt;Selective deployments (deploying subset of changed files)&lt;/li&gt;
&lt;li&gt;Frequent deployments with small change sets&lt;/li&gt;
&lt;li&gt;Self-hosted agents with disk constraints&lt;/li&gt;
&lt;li&gt;High network transfer costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Not recommended for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small repositories (&amp;lt;100 files)&lt;/li&gt;
&lt;li&gt;Full application deployments requiring all files&lt;/li&gt;
&lt;li&gt;First-time repository setups&lt;/li&gt;
&lt;li&gt;Teams unfamiliar with Git sparse checkout&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By combining four optimization techniques—&lt;strong&gt;partial clone, shallow fetch, single-branch, and file-level sparse checkout&lt;/strong&gt;—with &lt;strong&gt;sparse checkout being the most critical&lt;/strong&gt; for selective deployments, we achieved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;90% faster deployments&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;97.8% fewer files downloaded&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;90% reduction in network bandwidth&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;10x improvement in developer productivity&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The complete working implementation is available on &lt;a href="https://github.com/hantedyou/azure-pipeline-sparse-checkout-demo" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; for you to try.&lt;/p&gt;

&lt;p&gt;Have you faced similar deployment bottlenecks in your data engineering pipelines? Share your experiences in the comments!&lt;/p&gt;




</description>
      <category>git</category>
      <category>devops</category>
      <category>dataengineering</category>
      <category>azure</category>
    </item>
  </channel>
</rss>
