<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mohammed Saifulhuq</title>
    <description>The latest articles on DEV Community by Mohammed Saifulhuq (@mohammed_saifulhuq).</description>
    <link>https://dev.to/mohammed_saifulhuq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908419%2Fc9d63b26-a5b4-4926-a633-2adcb6a5b03f.png</url>
      <title>DEV Community: Mohammed Saifulhuq</title>
      <link>https://dev.to/mohammed_saifulhuq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mohammed_saifulhuq"/>
    <language>en</language>
    <item>
      <title>Kafka DLQ Messages Stuck After Schema Change? Here's Why and How to Fix It</title>
      <dc:creator>Mohammed Saifulhuq</dc:creator>
      <pubDate>Mon, 01 Jun 2026 02:43:06 +0000</pubDate>
      <link>https://dev.to/mohammed_saifulhuq/kafka-dlq-messages-stuck-after-schema-change-heres-why-and-how-to-fix-it-9e3</link>
      <guid>https://dev.to/mohammed_saifulhuq/kafka-dlq-messages-stuck-after-schema-change-heres-why-and-how-to-fix-it-9e3</guid>
      <description>&lt;p&gt;&lt;em&gt;I asked 4 senior Kafka engineers this question on Reddit. Nobody named a tool. So I built one.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You deployed a hotfix at 9:45 PM.&lt;/p&gt;

&lt;p&gt;The fix was correct. The &lt;code&gt;status&lt;/code&gt; field had been accepting invalid values — &lt;code&gt;"ok"&lt;/code&gt;, &lt;code&gt;"done"&lt;/code&gt;, &lt;code&gt;"finished"&lt;/code&gt; — from different teams. Converting it to a strict Enum with four valid values was the right call: &lt;code&gt;PENDING&lt;/code&gt;, &lt;code&gt;PROCESSING&lt;/code&gt;, &lt;code&gt;COMPLETED&lt;/code&gt;, &lt;code&gt;FAILED&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Nobody checked the DLQ first.&lt;/p&gt;

&lt;p&gt;There were 23,000 payment events sitting in &lt;code&gt;payments.dlq&lt;/code&gt; from a consumer failure that afternoon. Your new V2 consumer tries to process them. Each one fails immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;com.fasterxml.jackson.databind.exc.InvalidFormatException: 
Cannot deserialize value of type `PaymentStatus` from String "pending": 
not one of the values accepted for Enum class: 
[PENDING, PROCESSING, COMPLETED, FAILED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PagerDuty fires at 2:00 AM.&lt;/p&gt;

&lt;p&gt;You cannot roll back — the bug fix must stay. You cannot delete the messages — they are real payment transactions. You cannot redrive them as-is — the V2 consumer will reject every one.&lt;/p&gt;

&lt;p&gt;You need to &lt;strong&gt;transform&lt;/strong&gt; the messages first. And there is no tool for this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Schema Registry Doesn't Solve This
&lt;/h2&gt;

&lt;p&gt;The first suggestion you'll get is "just use Schema Registry."&lt;/p&gt;

&lt;p&gt;Schema Registry is a &lt;strong&gt;prevention tool&lt;/strong&gt;. It sits between your producer and Kafka and enforces compatibility for &lt;strong&gt;new&lt;/strong&gt; messages being produced. &lt;/p&gt;

&lt;p&gt;It does nothing for messages &lt;strong&gt;already sitting in your DLQ&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Those 23,000 messages are stored as bytes inside Kafka. Schema Registry cannot see them. Cannot transform them. Cannot validate them. Cannot redrive them.&lt;/p&gt;

&lt;p&gt;This is a recovery problem, not a prevention problem. They are different.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Senior Engineers Actually Do (This Is The Problem)
&lt;/h2&gt;

&lt;p&gt;I posted this exact scenario on &lt;a href="https://reddit.com/r/apachekafka" rel="noopener noreferrer"&gt;r/apachekafka&lt;/a&gt; and asked how teams handle it. Four experienced engineers replied. &lt;strong&gt;Not one named a tool.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;BroBroMate:&lt;/strong&gt; &lt;em&gt;"Something I've done in the past is just throw a quick Kafka Streams app up to do mass transformations... easy to unit test before rolling out."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KTCrisis:&lt;/strong&gt; &lt;em&gt;"You could spin up a v2 topic and keep a specific v1 consumer around just to drain the DLQ. But it adds a new topic and a dedicated consumer for a one-shot issue."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every solution is one of three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a temporary Kafka Streams application, use it once, delete it&lt;/li&gt;
&lt;li&gt;Keep zombie v1 consumer infrastructure alive until the queue drains
&lt;/li&gt;
&lt;li&gt;Write a throwaway Python/Java script with no tests, no validation, no audit trail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Time cost: 3–8 hours per incident. At 2am. Every single time.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Existing Tools and Their Gaps
&lt;/h2&gt;

&lt;p&gt;Before building anything, I looked at what exists:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Browse DLQ&lt;/th&gt;
&lt;th&gt;Transform Schema&lt;/th&gt;
&lt;th&gt;Idempotent Redrive&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DLQMan (irori-ab)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;✓ lifecycle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Confluent Control Center&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;kafka-rewind-tools&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗ No&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Kafka Streams&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The transformation step — converting broken message format to valid format before redriving — exists nowhere as a productized tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building DLQ Revive
&lt;/h2&gt;

&lt;p&gt;I spent 5 weeks building &lt;a href="https://github.com/Saifulhuq01/dlq-revive" rel="noopener noreferrer"&gt;DLQ Revive&lt;/a&gt; — an open-source Kafka Dead Letter Queue mutation and redrive engine.&lt;/p&gt;

&lt;p&gt;Here are the four decisions that matter most.&lt;/p&gt;




&lt;h3&gt;
  
  
  Decision 1: assign() + seek(), Never subscribe()
&lt;/h3&gt;

&lt;p&gt;This is the most critical Kafka safety decision in the codebase.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ WRONG - what most tutorials show&lt;/span&gt;
&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribe&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payments.dlq"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt; 
&lt;span class="c1"&gt;// This JOINS your consumer group.&lt;/span&gt;
&lt;span class="c1"&gt;// Kafka can trigger rebalancing and assign this partition&lt;/span&gt;
&lt;span class="c1"&gt;// to your read-only viewer tool instead of your production consumer.&lt;/span&gt;
&lt;span class="c1"&gt;// Your debug tool just took down your live pipeline.&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ CORRECT - what DLQ Revive does&lt;/span&gt;
&lt;span class="nc"&gt;TopicPartition&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TopicPartition&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payments.dlq"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;assign&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;seek&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fromOffset&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Direct partition access. No group membership.&lt;/span&gt;
&lt;span class="c1"&gt;// No rebalance risk. Production consumer untouched.&lt;/span&gt;
&lt;span class="c1"&gt;// NEVER calls commitSync() in view mode.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;subscribe()&lt;/code&gt; participates in Kafka's consumer group protocol. When you subscribe, Kafka's group coordinator can reassign partitions at any time during a rebalance. Your "read-only" DLQ viewer can suddenly become the assigned consumer for a production partition — reading and potentially skipping messages your application needs.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;assign()&lt;/code&gt; + &lt;code&gt;seek()&lt;/code&gt; bypasses all of this. You specify exactly which partition and offset. You read exactly &lt;code&gt;limit&lt;/code&gt; records and stop. The rest of the cluster has no idea you exist.&lt;/p&gt;




&lt;h3&gt;
  
  
  Decision 2: JSONata for Transformation, Not Groovy
&lt;/h3&gt;

&lt;p&gt;The transformation engine was the most debated architectural decision.&lt;/p&gt;

&lt;p&gt;The obvious first choice was Groovy — it's powerful, familiar to Java developers, and can handle any transformation logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I rejected it immediately.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User-submitted Groovy executes arbitrary Java code on your backend. A careless engineer could write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight groovy"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is a valid Groovy transformation that will be executed:&lt;/span&gt;
&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"rm -rf /"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// Or extract your AWS credentials:&lt;/span&gt;
&lt;span class="n"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getenv&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"AWS_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an RCE vulnerability built into the product's core feature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://jsonata.org/" rel="noopener noreferrer"&gt;JSONata&lt;/a&gt; is purely declarative — a JSON-to-JSON mapping language with no access to the file system, network, or system calls. For our String to Enum scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "orderId": orderId,
  "amount": amount,
  "currency": currency,
  "status": $uppercase(status),
  "processedAt": $now()
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One expression. Zero RCE surface. Applied to all 23,000 messages identically.&lt;/p&gt;




&lt;h3&gt;
  
  
  Decision 3: Idempotency at the Kafka Offset Level
&lt;/h3&gt;

&lt;p&gt;What happens if your application crashes halfway through redriving 10,000 messages?&lt;/p&gt;

&lt;p&gt;Without idempotency: it restarts, processes the first 5,000 messages again, double-charges 5,000 customers.&lt;/p&gt;

&lt;p&gt;DLQ Revive records every message before producing it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Table created on startup:&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;redrive_log&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;offset&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;redriven_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;redriven_by&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- ← this is the guarantee&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before producing ANY message:&lt;/span&gt;
&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;alreadyRedriven&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"SELECT COUNT(*) FROM redrive_log WHERE topic=? AND partition=? AND offset=?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alreadyRedriven&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;skippedCount&lt;/span&gt;&lt;span class="o"&gt;++;&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Skip. Already processed.&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// INSERT before produce — not after:&lt;/span&gt;
&lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"INSERT INTO redrive_log (topic, partition, offset, redriven_at) VALUES (?,?,?,NOW())"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;kafkaTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;targetTopic&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transformedMessage&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;UNIQUE&lt;/code&gt; constraint makes this bulletproof at the database level. Even if two concurrent requests try to redrive the same offset, only one INSERT succeeds. The second gets a &lt;code&gt;DataIntegrityViolationException&lt;/code&gt; and skips the message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pod restart during a 10k bulk redrive cannot double-process a single message.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Decision 4: Cursor-Based Pagination at the Kafka Offset Level
&lt;/h3&gt;

&lt;p&gt;Standard REST pagination (page 1, page 2) breaks for Kafka because page boundaries change as messages are consumed.&lt;/p&gt;

&lt;p&gt;DLQ Revive uses cursor-based pagination at the offset level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /dlq/payments.dlq/messages?partition=0&amp;amp;fromOffset=0&amp;amp;limit=50
→ Returns messages at offsets 0-49

GET /dlq/payments.dlq/messages?partition=0&amp;amp;fromOffset=50&amp;amp;limit=50  
→ Returns messages at offsets 50-99
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cursor is the Kafka offset itself. Stable, consistent, and memory-safe regardless of topic size. A DLQ with 500,000 messages loads exactly 50 at a time. No JVM heap exhaustion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Saifulhuq01/dlq-revive.git
&lt;span class="nb"&gt;cd &lt;/span&gt;dlq-revive
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker/docker-compose.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;span class="c"&gt;# Dashboard at http://localhost:4200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Starts Kafka, Zookeeper, PostgreSQL, Spring Boot backend, and Angular dashboard together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Java 17, Spring Boot 3.x, Spring Kafka, Apache Kafka 3.x, Angular 13+, PostgreSQL 15, JSONata, Docker Compose, GitHub Actions CI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;License:&lt;/strong&gt; MIT. Self-hosted. Free core forever.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 2am Scenario, Resolved
&lt;/h2&gt;

&lt;p&gt;Back to those 23,000 stuck payment events:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without DLQ Revive:&lt;/strong&gt; Engineer wakes at 2am, spends 4 hours writing a throwaway transformer script, runs it with no validation, hopes it works, deletes the script. Next incident: start over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With DLQ Revive:&lt;/strong&gt; Engineer opens browser, connects to Kafka broker, sees 23,000 stuck messages paginated safely, writes one JSONata expression (&lt;code&gt;"status": $uppercase(status)&lt;/code&gt;), previews the transformation on 5 sample messages, validates all 23,000, clicks redrive. PostgreSQL audit trail records every message processed. Done in 15 minutes. The expression is saved as a template for next time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Missing (Honest)
&lt;/h2&gt;

&lt;p&gt;The current open-source version handles plain JSON DLQ topics. Avro and Protobuf deserialization (reading the magic byte + Schema ID prefix and fetching schema from Confluent Schema Registry) is on the roadmap for v1.1.&lt;/p&gt;

&lt;p&gt;The free tier is limited to 100 messages per redrive session. For bulk redrives, a cloud tier is in progress.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It, Break It, Tell Me What's Wrong
&lt;/h2&gt;

&lt;p&gt;If you've dealt with DLQ schema incompatibility in production — or if you think my Kafka consumer approach has an edge case I've missed — I genuinely want to know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Saifulhuq01/dlq-revive" rel="noopener noreferrer"&gt;github.com/Saifulhuq01/dlq-revive&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drop a comment here or open a GitHub issue. The idempotency design and the JSONata sandbox are the two areas I'm most paranoid about.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Mohammed Saifulhuq — Apache Fineract contributor (SQL injection patch, CI/CD hardening), building DLQ Revive&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>opensource</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>How to Recover Kafka DLQ Messages After a Schema Change Broke Your Consumer</title>
      <dc:creator>Mohammed Saifulhuq</dc:creator>
      <pubDate>Thu, 21 May 2026 11:56:24 +0000</pubDate>
      <link>https://dev.to/mohammed_saifulhuq/how-to-recover-kafka-dlq-messages-after-a-schema-change-broke-your-consumer-24nb</link>
      <guid>https://dev.to/mohammed_saifulhuq/how-to-recover-kafka-dlq-messages-after-a-schema-change-broke-your-consumer-24nb</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Nobody Has a Tool For
&lt;/h2&gt;

&lt;p&gt;Here is the scenario that wakes engineers up at 2am.&lt;/p&gt;

&lt;p&gt;Your payment service has been running fine for months. The consumer reads events from a Kafka topic, processes them, sends confirmations. The &lt;code&gt;status&lt;/code&gt; field is a plain String - &lt;code&gt;"pending"&lt;/code&gt;, &lt;code&gt;"failed"&lt;/code&gt;, &lt;code&gt;"done"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A bug is discovered: the status field accepts invalid values like &lt;code&gt;"ok"&lt;/code&gt; and &lt;code&gt;"finished"&lt;/code&gt; from different producer teams, causing downstream analytics to break. The correct fix is converting &lt;code&gt;status&lt;/code&gt; from &lt;code&gt;String&lt;/code&gt; to a strict &lt;code&gt;Enum&lt;/code&gt; with only four valid values: &lt;code&gt;PENDING&lt;/code&gt;, &lt;code&gt;PROCESSING&lt;/code&gt;, &lt;code&gt;COMPLETED&lt;/code&gt;, &lt;code&gt;FAILED&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The fix gets deployed at 9:45 PM.&lt;/p&gt;

&lt;p&gt;Nobody checked the DLQ first.&lt;/p&gt;

&lt;p&gt;There were 23,000 messages sitting in &lt;code&gt;payments.dlq&lt;/code&gt; from an earlier consumer failure that afternoon. The new V2 consumer starts processing them. Each one crashes immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;InvalidFormatException: Cannot deserialize value of type `PaymentStatus` 
from String "pending": not one of the values accepted for Enum class: 
[PENDING, PROCESSING, COMPLETED, FAILED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All 23,000 messages are now permanently stuck. The V2 consumer cannot read them. They cannot be redriven as-is. PagerDuty fires at 2:00 AM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Schema Registry Doesn't Help Here
&lt;/h2&gt;

&lt;p&gt;The first suggestion you'll get is "just use Schema Registry."&lt;/p&gt;

&lt;p&gt;Schema Registry is a seatbelt. It prevents &lt;strong&gt;new&lt;/strong&gt; schema-incompatible events from being produced. It does nothing for messages &lt;strong&gt;already sitting in your DLQ&lt;/strong&gt; from before the schema change.&lt;/p&gt;

&lt;p&gt;Those messages are stored as bytes in Kafka. Schema Registry has no knowledge of them. It cannot transform them. Cannot validate them. Cannot redrive them.&lt;/p&gt;

&lt;p&gt;The 23,000 stuck messages are entirely your problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Engineers Actually Do (The Manual Reality)
&lt;/h2&gt;

&lt;p&gt;I posted this exact scenario on r/apachekafka and asked how senior teams handle it. Four experienced engineers replied. Not one named a tool.&lt;/p&gt;

&lt;p&gt;Their answers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BroBroMate (Senior Engineer):&lt;/strong&gt; &lt;em&gt;"Something I've done in the past is just throw a quick Kafka Streams app up to do mass transformations... And easy to unit test before rolling out."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KTCrisis:&lt;/strong&gt; &lt;em&gt;"You could also spin up a v2 topic for the new consumers and keep a specific v1 consumer around just to drain the DLQ. But it adds a new topic and a dedicated consumer for a one-shot issue."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every solution described is either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building a temporary application from scratch, using it once, deleting it&lt;/li&gt;
&lt;li&gt;Keeping zombie infrastructure alive until the queue drains&lt;/li&gt;
&lt;li&gt;Writing a throwaway Python script with no tests, no audit trail, no validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The time cost: 3-8 hours of a senior engineer's time. At 2am. Every single incident.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gap No Existing Tool Fills
&lt;/h2&gt;

&lt;p&gt;I looked at every existing solution:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DLQMan (irori-ab)&lt;/strong&gt; - routes messages based on exception type headers. No transformation engine. If your messages have the wrong data format, DLQMan cannot help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confluent Control Center&lt;/strong&gt; - has basic redrive. No schema transformation. Requires full Confluent Enterprise stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kafka-rewind-tools&lt;/strong&gt; - handles the redrive part. You still need to write the mutation yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom Kafka Streams apps&lt;/strong&gt; - what everyone builds from scratch, uses once, and throws away.&lt;/p&gt;

&lt;p&gt;The gap: &lt;strong&gt;no tool handles the transformation step between "message has wrong format" and "message is safe to redrive".&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Building DLQ Revive
&lt;/h2&gt;

&lt;p&gt;I spent 5 weeks building the tool that should exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Saifulhuq01/dlq-revive" rel="noopener noreferrer"&gt;DLQ Revive&lt;/a&gt;&lt;/strong&gt; is an open-source Kafka Dead Letter Queue mutation and redrive engine. Here is what it does and the architectural decisions behind each feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Browse DLQ Messages Safely
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// WRONG - what most people do&lt;/span&gt;
&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribe&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payments.dlq"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;  &lt;span class="c1"&gt;// joins consumer group!&lt;/span&gt;

&lt;span class="c1"&gt;// CORRECT - what DLQ Revive does&lt;/span&gt;
&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;assign&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TopicPartition&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payments.dlq"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;seek&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TopicPartition&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payments.dlq"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fromOffset&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;subscribe()&lt;/code&gt; joins your application to Kafka's consumer group. Kafka can then trigger a rebalance and assign partitions to your read-only viewer - &lt;strong&gt;stealing them from your production consumer&lt;/strong&gt;. Your 2am debugging tool accidentally takes down your production pipeline.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;assign()&lt;/code&gt; + &lt;code&gt;seek()&lt;/code&gt; bypasses group membership entirely. You read exactly what you want. The production consumer is untouched. DLQ Revive never calls &lt;code&gt;commitSync()&lt;/code&gt; in view mode.&lt;/p&gt;

&lt;p&gt;Reads are paginated at max 100 messages per API call. No full topic loads. No JVM OutOfMemoryError on topics with 500,000 stuck messages.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Transform Schema With JSONata
&lt;/h3&gt;

&lt;p&gt;The transformation step is what makes DLQ recovery possible without writing custom code.&lt;/p&gt;

&lt;p&gt;DLQ Revive uses &lt;strong&gt;JSONata&lt;/strong&gt; - a declarative JSON-to-JSON mapping language. To fix the String to Enum problem from our scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "orderId": orderId,
  "amount": amount,
  "currency": currency,
  "status": $uppercase(status),
  "processedAt": $now()
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One expression. Applies to all 23,000 messages. Preview 5 samples before committing. See the before and after side by side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not Groovy?&lt;/strong&gt; User-submitted Groovy on your backend can call &lt;code&gt;Runtime.getRuntime().exec("rm -rf /")&lt;/code&gt;. It is an RCE vulnerability built into the product if you allow it. JSONata is purely declarative - it has no access to the file system, network, or system commands. Zero RCE surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Validate Before Redriving
&lt;/h3&gt;

&lt;p&gt;BroBroMate's advice from that Reddit thread was exactly right:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Whatever way you do it, validate every 'fixed' record against an agreed good schema before producing it - there's nothing worse than dumping another X thousand bad messages on the DLQ."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;DLQ Revive validates each transformed message against the target schema before producing. Failed validation shows per-message errors. You see exactly which messages will fail before redriving a single one.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Idempotency Guard - Pod-Restart Safe
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Before producing any message:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;redrive_log&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;

&lt;span class="c1"&gt;-- If not found, insert THEN produce:&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;redrive_log&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redriven_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redriven_by&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;redrive_log&lt;/code&gt; table has a &lt;code&gt;UNIQUE(topic, partition, offset)&lt;/code&gt; constraint.&lt;/p&gt;

&lt;p&gt;If your application crashes halfway through redriving 10,000 messages and restarts, it will not reprocess the messages it already produced. In a fintech payment pipeline, that is the difference between a successful recovery and double-charging 5,000 customers.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Full Audit Trail
&lt;/h3&gt;

&lt;p&gt;Every browse and redrive action is logged to PostgreSQL with timestamp, user, session ID, message count, and source/target topics. For teams in regulated industries (PCI-DSS, SOC 2), this is the compliance evidence that a manual throwaway script can never provide.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Saifulhuq01/dlq-revive.git
&lt;span class="nb"&gt;cd &lt;/span&gt;dlq-revive
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker/docker-compose.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;span class="c"&gt;# Open http://localhost:4200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That starts Kafka, Zookeeper, PostgreSQL, the Spring Boot backend, and the Angular dashboard together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Java 17, Spring Boot 3.x, Apache Kafka 3.x, Angular 13+, PostgreSQL 15, JSONata, Docker Compose, GitHub Actions CI.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Kafka Consumer Safety Details
&lt;/h2&gt;

&lt;p&gt;A few things worth understanding if you are building Kafka tooling:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;assign()&lt;/code&gt; instead of &lt;code&gt;subscribe()&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;subscribe()&lt;/code&gt; participates in consumer group coordination. Kafka's group coordinator can reassign partitions at any time during a rebalance. For a read-only tool this is dangerous - you don't want Kafka moving partitions between your tool and the production consumer mid-operation.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;assign()&lt;/code&gt; with &lt;code&gt;seek()&lt;/code&gt; gives you direct partition access. No group membership. No rebalance risk. The offset you seek to is exactly where you start reading. You consume exactly &lt;code&gt;limit&lt;/code&gt; records and stop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why pagination at the Kafka offset level:&lt;/strong&gt;&lt;br&gt;
Standard REST pagination (page 1, page 2) doesn't work well for Kafka topics because the "next page" needs to know the exact offset where the previous page ended. DLQ Revive uses cursor-based pagination: &lt;code&gt;fromOffset=0&amp;amp;limit=50&lt;/code&gt; returns messages at offsets 0-49. The next call uses &lt;code&gt;fromOffset=50&lt;/code&gt;. This is memory-safe regardless of topic size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why no &lt;code&gt;commitSync()&lt;/code&gt; in view mode:&lt;/strong&gt;&lt;br&gt;
Committing offsets in view mode would change the consumer group's position - affecting what the production consumer reads next if they share a group ID. DLQ Revive uses a unique, isolated consumer group and never commits in view mode.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The open-source core handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paginated DLQ browsing&lt;/li&gt;
&lt;li&gt;JSONata schema transformation&lt;/li&gt;
&lt;li&gt;Preview before redriving&lt;/li&gt;
&lt;li&gt;Idempotency-safe redrive&lt;/li&gt;
&lt;li&gt;Full PostgreSQL audit trail&lt;/li&gt;
&lt;li&gt;One-command Docker setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Free tier limit: 100 messages per redrive session. This is enough for testing and small incidents. Bulk redrive (&amp;gt;100 messages) and team features are in the cloud tier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Saifulhuq01/dlq-revive" rel="noopener noreferrer"&gt;github.com/Saifulhuq01/dlq-revive&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. If you've dealt with DLQ schema recovery at work and want to try it against a real scenario, I'd genuinely value your feedback. Especially on the JSONata approach and whether the safety guarantees hold up under your specific Kafka setup.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Mohammed Saifulhuq - Apache Fineract contributor, building DLQ Revive&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>opensource</category>
      <category>distributedsystems</category>
    </item>
  </channel>
</rss>
