<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fathma Siddique</title>
    <description>The latest articles on DEV Community by Fathma Siddique (@fathma).</description>
    <link>https://dev.to/fathma</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F576254%2F8e104d42-04b3-4083-a7ad-3e9184835e27.JPG</url>
      <title>DEV Community: Fathma Siddique</title>
      <link>https://dev.to/fathma</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fathma"/>
    <language>en</language>
    <item>
      <title>Dealing with Long-Running Kafka Consumers and Message Backlogs</title>
      <dc:creator>Fathma Siddique</dc:creator>
      <pubDate>Wed, 25 Feb 2026 18:44:52 +0000</pubDate>
      <link>https://dev.to/fathma/dealing-with-long-running-kafka-consumers-and-message-backlogs-522h</link>
      <guid>https://dev.to/fathma/dealing-with-long-running-kafka-consumers-and-message-backlogs-522h</guid>
      <description>&lt;p&gt;For a while I genuinely could not figure out what was wrong.&lt;br&gt;
Nothing was throwing errors. The service was running. But messages were piling up, some were being processed twice, and the lag just kept climbing. I kept waiting for it to sort itself out. It did not.&lt;br&gt;
Eventually I had to sit down and actually trace through what was happening.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
Our consumer was doing too much. Each message triggered external API calls, some heavy business logic, blocking operations. Individually, a message might take a couple of minutes to get through. That feels manageable until you remember that Kafka's default &lt;code&gt;max.poll.records&lt;/code&gt; is 500. Pull a batch of even a handful of slow messages, and the cumulative processing time blows past Kafka's default &lt;code&gt;max.poll.interval.ms&lt;/code&gt; of 5 minutes without much effort.&lt;br&gt;
When that happens, Kafka assumes the consumer has died. It triggers a rebalance, reassigns the partitions, and those same messages get picked up and processed all over again.&lt;br&gt;
That was our loop. Consumer pulls a batch, gets bogged down processing it, Kafka loses patience, rebalance happens, repeat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What We Did&lt;/strong&gt;&lt;br&gt;
The first thing was just to stop the bleeding. We bumped &lt;code&gt;max.poll.interval.ms&lt;/code&gt; up to 8 minutes to give the consumer a bit more breathing room. Rebalances stopped almost immediately. That was a relief, but it was a band-aid not a fix.&lt;/p&gt;

&lt;p&gt;Next we set &lt;code&gt;max.poll.records = 1&lt;/code&gt;. One message at a time. With each message taking a couple of minutes, pulling any larger batch was just asking for trouble. Throughput dropped considerably, but at least the system was stable and we could reason about it.&lt;/p&gt;

&lt;p&gt;We also dropped auto-commit and switched to manual offset commits. Honestly we should have done this from the start. Auto-commit quietly marks messages as done on a timer whether processing actually succeeded or not. Manual commits meant we knew exactly what had been handled and what had not.&lt;/p&gt;

&lt;p&gt;Kafka consumers are not meant to do heavy, long-running work.&lt;br&gt;
After things stabilised, we redesigned the flow so that the consumer became lightweight. It would validate the message and quickly hand off the heavy work to background workers.Kafka went back to doing what it is good at: moving data fast. And our system stopped fighting it.&lt;/p&gt;

&lt;p&gt;We also added retries and a dead letter queue so one broken message could not drag everything else down with it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Stuck With Me&lt;/strong&gt;&lt;br&gt;
I think I was treating Kafka like a job queue because that is what felt familiar. But it is not that. It is a streaming system that expects you to keep up with it. The moment you do slow, heavy work inside the consumer, you are borrowing time you do not have.&lt;br&gt;
Once we aligned with how Kafka actually works, everything got simpler. The lag cleared. The rebalances stopped. The system finally felt like it was running the way it was supposed to.&lt;br&gt;
Sometimes the fix is technical. But sometimes you just have to admit the design was wrong.&lt;/p&gt;

</description>
      <category>backend</category>
      <category>dataengineering</category>
      <category>distributedsystems</category>
      <category>performance</category>
    </item>
    <item>
      <title>Optimizing Real-Time Location Tracking: A System-Wide Approach</title>
      <dc:creator>Fathma Siddique</dc:creator>
      <pubDate>Fri, 12 Dec 2025 15:24:56 +0000</pubDate>
      <link>https://dev.to/fathma/optimizing-real-time-location-tracking-a-system-wide-approach-2oa2</link>
      <guid>https://dev.to/fathma/optimizing-real-time-location-tracking-a-system-wide-approach-2oa2</guid>
      <description>&lt;p&gt;I recently worked on a location tracking feature that was causing major problems. The app would drain phone batteries quickly, the server costs were getting expensive, and users were complaining about lag. Here's how I fixed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔴 What Was Wrong
&lt;/h2&gt;

&lt;p&gt;The system had some serious issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Phones were losing &lt;strong&gt;20-30% battery every hour&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The app was sending way too many updates through Socket.IO&lt;/li&gt;
&lt;li&gt;The database was handling &lt;strong&gt;thousands of writes every minute&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Server costs kept increasing as more people used the app&lt;/li&gt;
&lt;li&gt;The map would freeze and lag when updating locations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🔍 Why These Problems Happened
&lt;/h2&gt;

&lt;p&gt;After checking the logs and monitoring the system, I found the main causes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The app was sending location updates &lt;strong&gt;every single second&lt;/strong&gt; via Socket.IO&lt;/li&gt;
&lt;li&gt;Every time someone moved, the server sent &lt;strong&gt;everyone's locations to all users&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Every location update was being saved to the database &lt;strong&gt;immediately&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;GPS was set to &lt;strong&gt;maximum accuracy all the time&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;There was &lt;strong&gt;no caching system&lt;/strong&gt; to handle the load&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  ⚡ How I Fixed It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Sending Only What Changed and Filtering Insignificant Movements
&lt;/h3&gt;

&lt;p&gt;This was the biggest improvement. Instead of the previous approach, I implemented two key changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client-side filtering:&lt;/strong&gt; The phone only sends location updates when movement exceeds 10 meters, eliminating unnecessary network calls&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Selective broadcasting:&lt;/strong&gt; The server broadcasts only the changed user's location instead of sending everyone's data to all users&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Socket.IO traffic went down by &lt;strong&gt;60%&lt;/strong&gt; and made everything much faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Adding a Caching System
&lt;/h3&gt;

&lt;p&gt;I set up Redis to handle the constant location updates:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The caching strategy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The server stores the latest location in Redis with a &lt;strong&gt;2-minute TTL&lt;/strong&gt; to remove stale data&lt;/li&gt;
&lt;li&gt;A background job saves active user's locations to the main database &lt;strong&gt;every 60 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;When a user stops tracking, their &lt;strong&gt;final location is immediately saved&lt;/strong&gt; to the database before marking them inactive. This ensures we never lose the end point of a journey&lt;/li&gt;
&lt;li&gt;Start/stop events instantly update the active-users list in Redis so the map doesn't show inactive devices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Database writes reduced by &lt;strong&gt;90%&lt;/strong&gt; and saved a lot on server costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Better GPS Settings
&lt;/h3&gt;

&lt;p&gt;I changed how the phone's GPS works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key changes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;balanced accuracy&lt;/strong&gt; instead of maximum (saves battery)&lt;/li&gt;
&lt;li&gt;Send updates only when movement exceeds &lt;strong&gt;10 meters&lt;/strong&gt; (reduces network usage and battery drain)&lt;/li&gt;
&lt;li&gt;Background updates use OS-recommended &lt;strong&gt;minimum intervals&lt;/strong&gt; to reduce battery drain further&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Battery life improved by &lt;strong&gt;60%&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Fixing the Map Display
&lt;/h3&gt;

&lt;p&gt;The map re-rendering was optimised by updating only affected markers and memoising static elements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key optimizations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only update markers for users whose locations actually changed&lt;/li&gt;
&lt;li&gt;Memoize static map elements to prevent unnecessary re-renders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; The map stayed smooth with no lag, even with many active users.&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 The Results
&lt;/h2&gt;

&lt;p&gt;After all these changes, the improvements were significant:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Battery life&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60-70% better&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Socket.IO traffic&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database writes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server costs&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Map performance&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No lag&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data integrity&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Zero data loss&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  💡 What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Real-time doesn't mean every second:&lt;/strong&gt; Most apps don't need constant updates. Sending data only when the user has moved a meaningful distance (10+ meters) saves battery and network resources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Only sending what changed:&lt;/strong&gt; Broadcasting only the updated user's location instead of everyone's data was a game-changer for network efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use caching wisely:&lt;/strong&gt; Redis helped handle the constant flow of updates without overloading the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPS accuracy costs battery:&lt;/strong&gt; High accuracy mode drains battery really fast. Balanced mode works great for most cases and users can't tell the difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handle stop events properly:&lt;/strong&gt; Immediately saving the final location when users stop tracking prevents data loss and ensures complete trip records.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filter on the client when possible:&lt;/strong&gt; Processing location changes on the phone before sending them to the server reduces network traffic and server load.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔄 How It All Works Now
&lt;/h2&gt;

&lt;p&gt;Here's the complete flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐
│  User starts    │
│   tracking      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Phone GPS with  │
│ balanced mode   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Filter: Only    │
│ send if moved   │
│ 10+ meters      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Server stores   │
│ in Redis        │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Broadcast only  │
│ changed user    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Background job  │
│ saves to DB     │
│ every 60s       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ User stops:     │
│ Immediate save  │
│ to database     │
└─────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Detailed steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User starts tracking:&lt;/strong&gt; Phone GPS begins collecting location with balanced accuracy mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location filtering:&lt;/strong&gt; Phone only sends updates to server when movement exceeds 10 meters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server processing:&lt;/strong&gt; Server stores the latest location in Redis and immediately broadcasts only the changed user's data via Socket.IO&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Periodic saves:&lt;/strong&gt; Background job runs every 60 seconds, saving all active users' locations from Redis to the database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User stops tracking:&lt;/strong&gt; Server immediately saves the final location to database with &lt;code&gt;active=false&lt;/code&gt;, removes user from active list in Redis, and broadcasts stop event to other users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New user joins:&lt;/strong&gt; When a user opens the app, the server fetches all current active locations from Redis and sends them once as an initial payload&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  🎯 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Fixing location tracking isn't about finding one magic solution—it's about making thoughtful decisions at each layer of the system. From GPS collection to network transmission to database storage, small improvements compound into significant gains.&lt;br&gt;
I learned that handling edge cases (like user stops and starts) and filtering unnecessary updates early in the pipeline can prevent much bigger problems downstream.&lt;br&gt;
These optimizations helped turn a problematic feature into something more reliable and cost-effective, though there's always room to learn and improve further. Every system has its unique constraints, and what worked here might need adjustment for different use cases.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>optimization</category>
      <category>architecture</category>
      <category>node</category>
    </item>
  </channel>
</rss>
