<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: lowkey dev</title>
    <description>The latest articles on DEV Community by lowkey dev (@lowkey_dev_591).</description>
    <link>https://dev.to/lowkey_dev_591</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3236539%2F4508123a-eb14-4b6f-8c87-3edb9cf352a1.jpg</url>
      <title>DEV Community: lowkey dev</title>
      <link>https://dev.to/lowkey_dev_591</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lowkey_dev_591"/>
    <language>en</language>
    <item>
      <title>How I Saved My System Through Peak Season</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Sun, 21 Sep 2025 04:02:46 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/how-i-saved-my-system-through-peak-season-3m79</link>
      <guid>https://dev.to/lowkey_dev_591/how-i-saved-my-system-through-peak-season-3m79</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: Peak Season and the Challenge Ahead
&lt;/h2&gt;

&lt;p&gt;The travel season was here, and the atmosphere at our company was hotter than the sun outside. Our system—the heartbeat of all operations—was about to face &lt;strong&gt;peak traffic&lt;/strong&gt; 8–10 times higher than usual. I opened my laptop and accessed the dashboard like a normal user, but immediately felt the pressure: everything was slow and laggy, each click sent a flurry of requests that were hard to control.&lt;/p&gt;

&lt;p&gt;Every analytics table, every chart was a potential “CPU and memory bomb.” The &lt;strong&gt;server under stress&lt;/strong&gt;, and OOM (Out of Memory) was almost guaranteed if traffic kept spiking. This marked the start of my journey to save the system, where every decision would directly affect the user experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Investigating the Frontend: The Tip of the Iceberg
&lt;/h2&gt;

&lt;p&gt;Opening F12, I saw hundreds of requests continuously hitting endpoints, many fetching entire customer, transaction, and payment tables. The dashboard tried to compute everything in real-time, but CPU and memory jumped with every click.&lt;/p&gt;

&lt;p&gt;I applied &lt;strong&gt;lazy loading&lt;/strong&gt; for non-critical data, cached some tables temporarily in localStorage, and sacrificed a little smoothness in UX. Instantly, the dashboard became more responsive, the backend felt lighter. But I knew this was just the tip of the iceberg—the real danger was lurking deeper.&lt;/p&gt;




&lt;h2&gt;
  
  
  Investigating the Backend: Where the Pressure Truly Lies
&lt;/h2&gt;

&lt;p&gt;Frontend only shows the tip of the iceberg. I opened server logs, enabled APM, and tracked &lt;strong&gt;slow queries&lt;/strong&gt; and profiling metrics. Many endpoints computed analytics in real-time on massive tables. &lt;strong&gt;Read-heavy queries&lt;/strong&gt; were unoptimized, fetching all data on each dashboard load, sending CPU and memory into overdrive.&lt;/p&gt;

&lt;p&gt;I tried &lt;strong&gt;precomputing&lt;/strong&gt; heavy metrics and storing them in Redis. Initially, data was a few minutes behind real-time, making me anxious, but the dashboard ran smoothly and the backend stabilized. A clear &lt;strong&gt;trade-off&lt;/strong&gt;: sacrificing some accuracy to save the system. Redis hit rates increased, and I felt both relief and tension.&lt;/p&gt;




&lt;h2&gt;
  
  
  CQRS and Read-Heavy Queries: A Long-Term Solution
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;read-heavy queries&lt;/strong&gt; continued to stress the server. I tried scaling MySQL, adding replicas, increasing RAM—but memory spikes still occurred. I decided to implement &lt;strong&gt;CQRS&lt;/strong&gt;, separating write and read operations, using OpenSearch to serve read-heavy queries.&lt;/p&gt;

&lt;p&gt;Data synchronization was complex, logic was intricate, but the dashboard finally responded fast and reliably. Complexity increased—more services in the codebase, listeners syncing data, added monitoring for OpenSearch, Redis, and MySQL. Yet, heavy analytics tables now ran smoothly, CPU and memory no longer jumped wildly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Precomputing the Dashboard: Sacrificing Realtime
&lt;/h2&gt;

&lt;p&gt;The most critical analytics tables, if computed in real-time, would make the &lt;strong&gt;server under stress&lt;/strong&gt; prone to crashes. I precomputed results and stored them in Redis. When &lt;strong&gt;peak traffic&lt;/strong&gt; hit, the dashboard ran smoothly, though data was no longer fully real-time. I remembered the moment of clicking through the dashboard and seeing charts lag by a few minutes—a &lt;strong&gt;trade-off&lt;/strong&gt; worth accepting to keep the system alive.&lt;/p&gt;

&lt;p&gt;Exports and dashboard queries now returned lightning-fast data from Redis, CPU dropped from 95% to 60%, and memory stabilized.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cache Promise, Request Coalescing, and Pre-Warming
&lt;/h2&gt;

&lt;p&gt;Before peak traffic, many concurrent requests hitting the same data made Redis and the database shaky. I implemented &lt;strong&gt;Cache Promise&lt;/strong&gt; and &lt;strong&gt;request coalescing&lt;/strong&gt;, merging multiple requests so that only one query actually hit the database. The code became more complex, but the backend stood firm—I felt like we had weathered a storm.&lt;/p&gt;

&lt;p&gt;I also scheduled &lt;strong&gt;pre-warming cache jobs&lt;/strong&gt;. The server absorbed a light load during off-peak hours, but when traffic peaked, data was ready. The dashboard stayed smooth, and the backend calmly handled 8–10x traffic without faltering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Request Prioritization and Selective Querying
&lt;/h2&gt;

&lt;p&gt;Some Excel exports or analytics requests used to slow down critical operations. I implemented &lt;strong&gt;bulkhead&lt;/strong&gt; and &lt;strong&gt;request prioritization&lt;/strong&gt;, ensuring critical requests were processed first. Some analytics exports were slower, but the system remained responsive.&lt;/p&gt;

&lt;p&gt;To avoid OOM, I queried only necessary fields and processed large exports in batches. Real-time data integrity was partially sacrificed, but the server survived, the dashboard remained smooth, and the feeling of victory ran through the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Monitoring and Alerting: Better Safe Than Sorry
&lt;/h2&gt;

&lt;p&gt;During preparation, I set up continuous &lt;strong&gt;monitoring&lt;/strong&gt;: CPU, memory, Redis hits, OpenSearch query latency, successful and failed request counts. I configured &lt;strong&gt;alerts&lt;/strong&gt; for threshold breaches, so we received warnings before the system truly failed.&lt;/p&gt;

&lt;p&gt;This way, I didn’t wait for the server to crash to know something was wrong—memory spikes or slow queries were reported immediately, allowing timely intervention.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chaos Testing and Load Testing
&lt;/h2&gt;

&lt;p&gt;Before the peak season, my team ran &lt;strong&gt;load tests&lt;/strong&gt; simulating peak traffic and performed &lt;strong&gt;chaos testing&lt;/strong&gt;, intentionally breaking some services. Through these tests, we learned a lot: redundant cache, stacked request queues, potential deadlocks in OpenSearch sync listeners. These exercises helped us prepare rollback plans, increase replicas, and adjust batch sizes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rollout &amp;amp; Hotfix During Peak Hours
&lt;/h2&gt;

&lt;p&gt;One night, during the traffic peak, a minor bug in the precomputed dashboard caused data to lag more than usual. I had to apply a &lt;strong&gt;hotfix&lt;/strong&gt; directly in production, deploying carefully step by step while monitoring Redis and OpenSearch. It was tense and stressful, but once everything stabilized, it felt like we had truly survived a data storm.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: Lessons Learned
&lt;/h2&gt;

&lt;p&gt;After surviving the peak traffic, the dashboard ran smoothly, the backend was stable, and users were unaffected. Reflecting on the experience, I realized that preparation is everything: setting up monitoring, alerts, load testing, chaos testing, and pre-warming caches beforehand can make the difference between success and disaster.&lt;/p&gt;

&lt;p&gt;Equally important is finding the root cause of issues. It’s easy to patch symptoms, but unless you understand the underlying problems—whether it’s read-heavy queries, unoptimized endpoints, or poorly synchronized data—the system will eventually break under stress.&lt;/p&gt;

&lt;p&gt;Finally, there’s no perfect solution. Every choice involves trade-offs: sacrificing some UX smoothness, accepting minor delays in real-time data, increasing system complexity. Recognizing these trade-offs and planning for them ahead of time is the key to keeping a system alive during high-pressure peak traffic seasons.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>performance</category>
    </item>
    <item>
      <title>Understanding the Saga Pattern in 5 Minutes</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Tue, 02 Sep 2025 12:04:03 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/understanding-the-saga-pattern-in-5-minutes-3kea</link>
      <guid>https://dev.to/lowkey_dev_591/understanding-the-saga-pattern-in-5-minutes-3kea</guid>
      <description>&lt;p&gt;If you are new to &lt;strong&gt;microservices&lt;/strong&gt;, you’ve probably heard of the &lt;strong&gt;Saga Pattern&lt;/strong&gt; – a &lt;strong&gt;design pattern for managing distributed transactions in microservices&lt;/strong&gt;. It helps services coordinate smoothly, maintain data consistency, and achieve &lt;strong&gt;eventual consistency&lt;/strong&gt; even when a service fails. This article will help you &lt;strong&gt;quickly understand the Saga Pattern&lt;/strong&gt;, with clear examples and fundamental technical concepts.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1. Context and Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In &lt;strong&gt;traditional (monolithic) systems&lt;/strong&gt;, you can use a &lt;strong&gt;transaction&lt;/strong&gt; to ensure data consistency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If all steps succeed → &lt;strong&gt;commit&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If any step fails → &lt;strong&gt;rollback&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example of order processing in a monolithic system:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deduct customer payment&lt;/li&gt;
&lt;li&gt;Deduct product inventory&lt;/li&gt;
&lt;li&gt;Send confirmation email&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All steps are in &lt;strong&gt;one transaction&lt;/strong&gt;, so if any step fails → rollback everything, keeping the data &lt;strong&gt;consistent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, in &lt;strong&gt;microservices&lt;/strong&gt;, each step is usually managed by a &lt;strong&gt;separate service&lt;/strong&gt; with its &lt;strong&gt;own database&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Payment Service:&lt;/strong&gt; deduct money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inventory Service:&lt;/strong&gt; deduct stock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notification Service:&lt;/strong&gt; send email&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a step fails, previous steps may have already &lt;strong&gt;committed&lt;/strong&gt;, leading to &lt;strong&gt;data inconsistency&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example: the customer is charged, but the &lt;strong&gt;product is out of stock&lt;/strong&gt;, or the confirmation email was not sent.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;problem the Saga Pattern solves&lt;/strong&gt;: helping services in microservices &lt;strong&gt;coordinate smoothly&lt;/strong&gt; and &lt;strong&gt;keep data consistent even in case of failures&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. What is the Saga Pattern?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Saga Pattern&lt;/strong&gt; is a &lt;strong&gt;design pattern for managing distributed transactions in microservices&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of using a &lt;strong&gt;traditional transaction&lt;/strong&gt; (rollback everything if one step fails), &lt;strong&gt;each service manages its own transaction&lt;/strong&gt;, and if a subsequent step fails, the system performs &lt;strong&gt;compensation&lt;/strong&gt; for previous steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example of an online order:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Payment Service:&lt;/strong&gt; deducts money → success&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inventory Service:&lt;/strong&gt; deducts stock → fails (out of stock)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notification Service:&lt;/strong&gt; sends email → not executed&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without Saga Pattern:&lt;/strong&gt; Payment Service already charged the customer → the customer loses money but gets no product&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With Saga Pattern:&lt;/strong&gt; Inventory Service fails → Payment Service &lt;strong&gt;refunds&lt;/strong&gt;, email not sent → avoids confusion&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Core idea: Each step &lt;strong&gt;takes responsibility&lt;/strong&gt; and has a &lt;strong&gt;compensation mechanism&lt;/strong&gt;, allowing steps in a &lt;strong&gt;distributed transaction&lt;/strong&gt; to coordinate without breaking the system.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. Two Approaches to Implement Saga Pattern&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.1 Event-Driven Saga&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each step &lt;strong&gt;emits an event&lt;/strong&gt; on success or failure&lt;/li&gt;
&lt;li&gt;The next step &lt;strong&gt;listens to events&lt;/strong&gt; to decide whether to execute or compensate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Payment Service deducts money → emits event &lt;code&gt;"PaymentSuccess"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Inventory Service listens → deducts stock&lt;/li&gt;
&lt;li&gt;If Inventory Service fails → emits event &lt;code&gt;"InventoryFailed"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Payment Service listens → performs &lt;strong&gt;refund&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No central orchestrator needed; services &lt;strong&gt;coordinate flexibly on their own&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Easy to scale when adding new services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hard to track the overall transaction state&lt;/li&gt;
&lt;li&gt;Susceptible to &lt;strong&gt;duplicate or delayed events&lt;/strong&gt;, requiring &lt;strong&gt;idempotency&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3.2 Orchestration Saga&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;central orchestrator&lt;/strong&gt; coordinates all steps&lt;/li&gt;
&lt;li&gt;If a step fails, the orchestrator commands &lt;strong&gt;rollback of previous steps&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Orchestrator commands Payment Service → success&lt;/li&gt;
&lt;li&gt;Orchestrator commands Inventory Service → fails&lt;/li&gt;
&lt;li&gt;Orchestrator commands Payment Service &lt;strong&gt;refund&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Notification Service does not send email&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy to manage complex processes, centralized state control&lt;/li&gt;
&lt;li&gt;Easier to track and reduce risk of &lt;strong&gt;duplicate or missing events&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Orchestrator becomes a &lt;strong&gt;single point of failure&lt;/strong&gt;; if it fails or lags → affects the entire transaction&lt;/li&gt;
&lt;li&gt;Adds a central component → increases deployment complexity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4. Illustrative Example: Online Order&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Assume a 3-step order process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Payment Service:&lt;/strong&gt; deduct customer money → success&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inventory Service:&lt;/strong&gt; deduct stock → fails (out of stock)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notification Service:&lt;/strong&gt; send email → not executed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Without Saga Pattern:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment Service already charged → customer loses money but gets no product&lt;/li&gt;
&lt;li&gt;Inventory Service fails → data inconsistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Saga Pattern (Event-Driven or Orchestration):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inventory Service fails → Payment Service &lt;strong&gt;refunds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Email not sent → avoids confusion&lt;/li&gt;
&lt;li&gt;Process remains &lt;strong&gt;consistent&lt;/strong&gt;, ensuring good customer experience&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Saga Pattern allows &lt;strong&gt;each step in a distributed transaction to be independent&lt;/strong&gt; while still coordinating effectively, ensuring &lt;strong&gt;data consistency and good user experience&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;5. Key Technical Terms&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transaction:&lt;/strong&gt; A sequence of operations on data that ensures &lt;strong&gt;ACID&lt;/strong&gt; (Atomicity, Consistency, Isolation, Durability).&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Transferring money from account A to B; if deducting A succeeds but adding B fails → rollback.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distributed Transaction:&lt;/strong&gt; A transaction spanning multiple services or separate databases, requiring &lt;strong&gt;compensation or eventual consistency&lt;/strong&gt;.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Online order: Payment Service deducts money, Inventory Service deducts stock, Notification Service sends email.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Saga Pattern:&lt;/strong&gt; Design pattern managing &lt;strong&gt;distributed transactions&lt;/strong&gt; by performing &lt;strong&gt;compensation&lt;/strong&gt; if a subsequent step fails.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Inventory Service reports out of stock → Payment Service refunds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compensation:&lt;/strong&gt; Undo a committed step if another step fails.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Payment Service deducted money but Inventory Service fails → Payment Service refunds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Event:&lt;/strong&gt; Asynchronous message between services indicating transaction status.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Payment Service sends &lt;code&gt;"PaymentSuccess"&lt;/code&gt;, Inventory Service listens and deducts stock.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Orchestrator:&lt;/strong&gt; Central component in &lt;strong&gt;Orchestration Saga&lt;/strong&gt; coordinating steps and rollbacks.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Orchestrator commands Payment → Inventory → rollback if necessary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partial Failure:&lt;/strong&gt; One step in a distributed transaction fails while others have committed.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Payment Service succeeds, but Inventory Service reports out of stock.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistency:&lt;/strong&gt; Data always satisfies business rules after a transaction.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; After ordering, total money deducted = total order price, stock decreases correctly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Eventual Consistency:&lt;/strong&gt; The system will become consistent over time, not immediately.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Payment Service commits first, Inventory Service commits later, overall state eventually correct.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Idempotency:&lt;/strong&gt; Performing an operation multiple times &lt;strong&gt;does not corrupt data&lt;/strong&gt;, preventing duplicate events.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;"PaymentSuccess"&lt;/code&gt; event sent twice → Payment Service only deducts once.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Orchestration Saga:&lt;/strong&gt; Saga Pattern implemented with a &lt;strong&gt;central orchestrator&lt;/strong&gt; coordinating steps.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Orchestrator commands Payment → Inventory → Notification; rollback if Inventory fails.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Event-Driven Saga:&lt;/strong&gt; Saga Pattern implemented with &lt;strong&gt;each service managing its own transaction&lt;/strong&gt;, emitting/listening to events without a central coordinator.&lt;br&gt;
&lt;em&gt;Example:&lt;/em&gt; Payment sends &lt;code&gt;"PaymentSuccess"&lt;/code&gt; → Inventory deducts stock → Inventory sends &lt;code&gt;"InventoryFailed"&lt;/code&gt; → Payment refunds.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;6. Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Saga Pattern is &lt;strong&gt;a design pattern for managing distributed transactions in microservices&lt;/strong&gt;, helping:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each service &lt;strong&gt;manages its own transaction&lt;/strong&gt; and can perform &lt;strong&gt;compensation&lt;/strong&gt; in case of failure&lt;/li&gt;
&lt;li&gt;Services remain &lt;strong&gt;independent but coordinated&lt;/strong&gt;, ensuring &lt;strong&gt;overall process stability&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Reduces risks: &lt;strong&gt;data consistency&lt;/strong&gt;, &lt;strong&gt;good user experience&lt;/strong&gt;, &lt;strong&gt;continuous operation&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Saga Pattern is an &lt;strong&gt;essential design pattern&lt;/strong&gt; that makes complex systems &lt;strong&gt;efficient, reliable, and easier to manage&lt;/strong&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Saga Pattern: When Theory Collides with Reality</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Tue, 02 Sep 2025 09:27:41 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/saga-pattern-when-theory-collides-with-reality-4enj</link>
      <guid>https://dev.to/lowkey_dev_591/saga-pattern-when-theory-collides-with-reality-4enj</guid>
      <description>&lt;p&gt;You start your computer, open your IDE, ready to implement the order flow in your microservices. In your mind, you still have a clear picture of what you read about the &lt;strong&gt;Saga Pattern&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Oh, easy. Each service handles its own transaction, if it fails, just rollback using a compensate. Eventual consistency? No problem, Saga’s got it covered.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sounds neat, sounds simple… but when you actually start coding, you realize nothing is that smooth.&lt;/p&gt;

&lt;p&gt;You imagine the ideal flow in your head:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Order Service&lt;/strong&gt; creates an order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment Service&lt;/strong&gt; deducts money.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inventory Service&lt;/strong&gt; reduces stock.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shipping Service&lt;/strong&gt; creates a shipment.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the books, if any step fails → compensate → everything returns to the original state, the system is perfect. In your mind, it’s &lt;strong&gt;a smooth dance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But in reality… it’s a &lt;strong&gt;completely different dance&lt;/strong&gt;. A network timeout, a duplicate event, or an imperfect compensate, and the dance quickly becomes… &lt;strong&gt;an operational nightmare&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1afg6ma2ntfq94goa7eu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1afg6ma2ntfq94goa7eu.png" alt="image.png" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Partial Failure – The First Shock
&lt;/h2&gt;

&lt;p&gt;You imagine: &lt;strong&gt;Payment Service successfully deducts money&lt;/strong&gt;, but &lt;strong&gt;Order Service hasn’t received the event&lt;/strong&gt; due to a network timeout.&lt;/p&gt;

&lt;p&gt;Result? &lt;strong&gt;The customer lost money, but the order hasn’t been created.&lt;/strong&gt; You try retrying, but it gets worse: duplicate events → money deducted twice, wrong stock reduction, double shipment.&lt;/p&gt;

&lt;p&gt;Partial failure and duplicate events are &lt;strong&gt;not exceptions&lt;/strong&gt;, they are the reality in microservices.&lt;/p&gt;

&lt;p&gt;You realize: if partial failures are already complex, can &lt;strong&gt;rollback and compensate&lt;/strong&gt; really save the day?&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Compensate – When Rollback Is Never Perfect
&lt;/h2&gt;

&lt;p&gt;Books teach: rollback is just calling a compensate function → everything returns to the original state.&lt;/p&gt;

&lt;p&gt;Reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Email already sent&lt;/strong&gt; → can’t undo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shipment label created&lt;/strong&gt; → can’t reverse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Third-party booking&lt;/strong&gt; → rollback almost impossible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example: a service sends a payment confirmation SMS. If the transaction fails, you can’t “take back” the SMS. Compensate only &lt;strong&gt;makes up with another action&lt;/strong&gt;, like sending a cancellation notice or issuing a credit.&lt;/p&gt;

&lt;p&gt;Saga is &lt;strong&gt;not magic&lt;/strong&gt;. Compensate is only &lt;strong&gt;approximate&lt;/strong&gt;, sometimes requiring &lt;strong&gt;manual intervention&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But the story doesn’t stop there. If states aren’t synchronized, what does the customer see? This is when &lt;strong&gt;Eventual Consistency&lt;/strong&gt; comes into play.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Eventual Consistency – The Inevitable Trade-Off
&lt;/h2&gt;

&lt;p&gt;Data will eventually be consistent, but customers might see: “Processing…” while &lt;strong&gt;money is deducted, but order isn’t created&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You realize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UX must hide temporary states.&lt;/li&gt;
&lt;li&gt;The system needs monitoring, retries, reconciliation.&lt;/li&gt;
&lt;li&gt;Alerts must be clear.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eventual consistency &lt;strong&gt;isn’t free&lt;/strong&gt;. It requires accepting &lt;strong&gt;temporary risk&lt;/strong&gt;. Otherwise, you’ll face &lt;strong&gt;a flood of support tickets&lt;/strong&gt; from customers.&lt;/p&gt;

&lt;p&gt;While calculating UX, a question arises: &lt;strong&gt;should the flow be managed by a central “director” or let services handle themselves?&lt;/strong&gt; This is when &lt;strong&gt;Orchestration vs. Choreography&lt;/strong&gt; appears.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Orchestration or Choreography – A Painful Choice
&lt;/h2&gt;

&lt;p&gt;You must choose:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Orchestration&lt;/th&gt;
&lt;th&gt;Choreography&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Debug &amp;amp; Monitoring&lt;/td&gt;
&lt;td&gt;Easy to track Saga states&lt;/td&gt;
&lt;td&gt;Hard to debug, needs detailed logging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single Point of Failure&lt;/td&gt;
&lt;td&gt;Has orchestrator&lt;/td&gt;
&lt;td&gt;No SPoF, distributed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate Event&lt;/td&gt;
&lt;td&gt;Easy to control&lt;/td&gt;
&lt;td&gt;Likely, requires idempotency &amp;amp; retry queue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flexibility&lt;/td&gt;
&lt;td&gt;Fixed flow, less flexible&lt;/td&gt;
&lt;td&gt;Flexible when adding/removing services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment &amp;amp; Scaling&lt;/td&gt;
&lt;td&gt;Orchestrator requires special scaling&lt;/td&gt;
&lt;td&gt;Each service can scale independently&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Example: you want to add a service to send promotional vouchers after order completion.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Orchestration: update orchestrator flow, easy to control.&lt;/li&gt;
&lt;li&gt;Choreography: add a listener for the event, but must ensure idempotency and retry queue; errors arise if events are delayed or duplicated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You realize: &lt;strong&gt;there is no perfect choice&lt;/strong&gt;. Easier debug or avoid SPoF? Accept temporary inconsistency or strict consistency? Saga isn’t just a technique – it’s a &lt;strong&gt;constant trade-off&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And when you consider it, a red warning flashes: &lt;strong&gt;Saga won’t always save the day&lt;/strong&gt;, especially in systems requiring &lt;strong&gt;strong consistency&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Saga Is Not a Solution for Every Case
&lt;/h2&gt;

&lt;p&gt;Imagine: a &lt;strong&gt;bank&lt;/strong&gt;, transferring money between two accounts. You decide to use Saga: deduct money from A, add to B, log the transaction.&lt;/p&gt;

&lt;p&gt;At first, you are confident: any step fails → compensate → all good.&lt;/p&gt;

&lt;p&gt;Then disaster strikes. Payment Service deducted the money, but Ledger Service hasn’t received the event. Customers panic, support is busy. Compensate? Doesn’t help. Only &lt;strong&gt;manual intervention&lt;/strong&gt; can save it.&lt;/p&gt;

&lt;p&gt;Now you understand: &lt;strong&gt;Saga is not suitable for banking transactions&lt;/strong&gt;. A safer solution: &lt;strong&gt;2-Phase Commit (2PC)&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2PC ensures strong consistency&lt;/strong&gt;: commit synchronously, fail → rollback immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoids dangerous partial failures&lt;/strong&gt;: customers don’t see temporary wrong balances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Absolute integrity&lt;/strong&gt;: critical transactions are always correct.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lesson: choose the wrong tool, and microservices can turn into &lt;strong&gt;an operational nightmare&lt;/strong&gt;, even if you just wanted “to apply a cool technique.”&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Real Lessons from Applying Saga
&lt;/h2&gt;

&lt;p&gt;After all the shocks from &lt;strong&gt;partial failure, approximate compensate, duplicate events&lt;/strong&gt;, and choosing a model, you begin to draw “painful” lessons.&lt;/p&gt;

&lt;p&gt;You recall the first time you deployed Saga: events delayed, compensate called in wrong order, customers constantly calling support. Only then you understood:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uncontrolled retries = disaster. Idempotency is mandatory.&lt;/li&gt;
&lt;li&gt;Compensate can’t save everything. It only reduces risk; sometimes manual intervention is still needed.&lt;/li&gt;
&lt;li&gt;Customers see temporary inconsistent states? UX must be clever, alerts clear, reconciliation always ready.&lt;/li&gt;
&lt;li&gt;Deployment model has no perfect choice. Orchestration is easier to debug but SPoF; Choreography is distributed but hard to trace. Choose a flow wisely, not on a whim.&lt;/li&gt;
&lt;li&gt;Saga is not for every system. If business requires strong consistency – e.g., banking – 2PC or other synchronous transactions are safer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking back, you realize: Saga isn’t magic, it’s a &lt;strong&gt;sophisticated tool&lt;/strong&gt;. Applied correctly → reduces risk, increases flexibility. Applied wrongly → operational nightmare.&lt;/p&gt;

&lt;p&gt;Most importantly: &lt;strong&gt;don’t use it because it’s “cool,” use it because it truly fits your business needs.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Saga Pattern is a &lt;strong&gt;powerful tool&lt;/strong&gt; for &lt;strong&gt;complex distributed transactions&lt;/strong&gt;, but &lt;strong&gt;not a solution for every problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand &lt;strong&gt;trade-offs and edge cases&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Prepare &lt;strong&gt;monitoring, alerting, retry, reconciliation&lt;/strong&gt;, and even &lt;strong&gt;manual intervention&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Choose between &lt;strong&gt;Orchestration and Choreography&lt;/strong&gt; based on flow, debugging, SPoF.&lt;/li&gt;
&lt;li&gt;Evaluate &lt;strong&gt;system specifics before deploying Saga&lt;/strong&gt;, avoiding environments needing strong consistency, where &lt;strong&gt;2PC or synchronous transactions&lt;/strong&gt; are safer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After reading this, you’ll ask yourself:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Does this business really need Saga, or am I just adding complexity for myself?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Understanding this, you can implement Saga &lt;strong&gt;safely, flexibly, effectively&lt;/strong&gt;, instead of getting caught in &lt;strong&gt;an entirely avoidable operational nightmare&lt;/strong&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Be careful with retries — don't DDoS your own system</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Sun, 22 Jun 2025 16:13:46 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/be-careful-with-retries-dont-ddos-your-own-system-i6a</link>
      <guid>https://dev.to/lowkey_dev_591/be-careful-with-retries-dont-ddos-your-own-system-i6a</guid>
      <description>&lt;p&gt;&lt;strong&gt;Retry isn't bad. But used incorrectly, you could unknowingly become a "DDoS hacker"... of your own system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Retry — the mechanism of repeating a request upon failure — is a crucial part of distributed system design. When one API call to another service fails due to network errors, timeouts, or temporary issues, retries are often configured to increase the chance of success.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;From a supporting mechanism, retry can easily turn into the culprit of a domino failure effect if left uncontrolled.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  1. When Retry Is a Double-Edged Sword
&lt;/h2&gt;

&lt;p&gt;Imagine a simple scenario:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service A calls Service B.&lt;/li&gt;
&lt;li&gt;Service B is under heavy load and returns a 503 (Service Unavailable).&lt;/li&gt;
&lt;li&gt;Service A retries 3 times, with a 100ms delay between each attempt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now suppose 1000 requests hit Service A at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each request makes 4 calls to Service B (1 original + 3 retries).&lt;/li&gt;
&lt;li&gt;Total: &lt;strong&gt;1000 × 4 = 4000 requests&lt;/strong&gt; to Service B.&lt;/li&gt;
&lt;li&gt;While Service B is already overloaded, these retries &lt;strong&gt;choke it completely&lt;/strong&gt;, leading to &lt;strong&gt;cascading failure&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Uncontrolled retries = shooting yourself in the foot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzuhtzervu0nu04v0gmg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzuhtzervu0nu04v0gmg.png" alt="image.png" width="278" height="181"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Dangerous Retry Patterns
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Retry without delay&lt;/strong&gt;&lt;br&gt;
→ Causes request storms when errors occur.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simultaneous retries from multiple instances&lt;/strong&gt;&lt;br&gt;
→ Multiple services retrying at once → sudden traffic spikes → downstream crashes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infinite retries&lt;/strong&gt;&lt;br&gt;
→ Can cause memory leaks, jammed queues, and unstoppable request storms.&lt;/p&gt;




&lt;h2&gt;
  
  
  3.5 When to Retry and When Not To
&lt;/h2&gt;

&lt;p&gt;Not every error should be retried.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retry if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temporary issues: timeouts, connection resets&lt;/li&gt;
&lt;li&gt;System errors: HTTP 5xx like 500, 502, 503, 504&lt;/li&gt;
&lt;li&gt;Downstream service is restarting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Do NOT retry if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client errors: 400, 401, 403, 404&lt;/li&gt;
&lt;li&gt;Business logic errors: user not found, insufficient funds, validation failed&lt;/li&gt;
&lt;li&gt;422 – Unprocessable Entity&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Only retry if the error is recoverable.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3.6 How to Retry the Right Way
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limit retry attempts&lt;/strong&gt;&lt;br&gt;
Never retry infinitely. Use a max of 2–3 tries depending on the context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use delay and jitter&lt;/strong&gt;&lt;br&gt;
Add delays between retries (exponential or linear), with jitter to avoid synchronized spikes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Only retry idempotent actions&lt;/strong&gt;&lt;br&gt;
E.g., GET and PUT are safer than POST — avoid duplicate orders or repeated payments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use a circuit breaker&lt;/strong&gt;&lt;br&gt;
Temporarily cut off retries when the downstream service keeps failing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deferred Retry – Smart retries using jobs&lt;/strong&gt;&lt;br&gt;
Instead of retrying immediately, queue the task or store it in a DB, and process later via background jobs. Helps avoid additional load during a system failure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log everything&lt;/strong&gt;&lt;br&gt;
Record the error reason, retry count, and retry time for easier debugging and alerting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3.7 How Do You Know When It's Safe to Retry?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use circuit breakers&lt;/strong&gt;&lt;br&gt;
Stop retrying temporarily when services fail repeatedly. Switch back to half-open state gradually.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitor health checks and metrics&lt;/strong&gt;&lt;br&gt;
Check &lt;code&gt;/health&lt;/code&gt; endpoints or tools like Prometheus and Grafana to see if services have recovered.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Respect the &lt;code&gt;Retry-After&lt;/code&gt; header&lt;/strong&gt;&lt;br&gt;
Some APIs return this to indicate the recommended wait time before retrying.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate-limit retries&lt;/strong&gt;&lt;br&gt;
Avoid flooding the service again after it starts recovering.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Tools for Effective Retry Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Java / Spring Ecosystem:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spring Retry&lt;/strong&gt;&lt;br&gt;
Supports &lt;code&gt;@Retryable&lt;/code&gt;, configurable delays, backoff, and fallback with &lt;code&gt;@Recover&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resilience4j&lt;/strong&gt;&lt;br&gt;
Combines retry, circuit breaker, rate limiter, and bulkhead into one library. Works well with Spring Boot and Micrometer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kafka Retry Topic&lt;/strong&gt;&lt;br&gt;
Separate retry topics with delay, avoids blocking the main consumer. Combine with dead-letter topics for reliability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quartz / Spring Task&lt;/strong&gt;&lt;br&gt;
Schedule deferred retries using background jobs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Other Languages / Platforms:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tenacity&lt;/code&gt;: powerful retry decorator&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;celery&lt;/code&gt;: built-in retry policy for async tasks&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Node.js&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;retry&lt;/code&gt;, &lt;code&gt;bull&lt;/code&gt;, &lt;code&gt;agenda&lt;/code&gt;: retry support with timing and retry limits&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Go&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;go-retryablehttp&lt;/code&gt;, &lt;code&gt;backoff&lt;/code&gt;: lightweight and effective&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud-native:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AWS&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQS + Lambda + DLQ&lt;/li&gt;
&lt;li&gt;Step Functions with retry/catch blocks&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;GCP&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Tasks, Pub/Sub retry + DLQ&lt;/li&gt;
&lt;li&gt;Workflows with built-in retry logic&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service Bus with configurable retry policy&lt;/li&gt;
&lt;li&gt;Azure Durable Functions with built-in retry&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Real Case: Saving the System During Peak Load with Strategic Retry
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Context:&lt;/strong&gt;&lt;br&gt;
At year-end, the system was under heavy traffic due to a promotional campaign. A payment processing service got overloaded, frequently timing out. Meanwhile, a batch job was firing thousands of requests per minute, with 5 retries per request, no delay, no jitter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;br&gt;
Massive retry storm completely choked the payment service → triggered cascading failures in related systems → 15 minutes of downtime during peak hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced retries to 2&lt;/li&gt;
&lt;li&gt;Added exponential backoff and jitter&lt;/li&gt;
&lt;li&gt;Applied circuit breaker on the job&lt;/li&gt;
&lt;li&gt;Moved retries to a queue and processed via background jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Outcome:&lt;/strong&gt;&lt;br&gt;
System stabilized in under 10 minutes. Retries no longer overwhelmed the backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Retry isn’t about “hammering through” — it’s about &lt;strong&gt;helping the system recover gracefully&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  6. Conclusion
&lt;/h2&gt;

&lt;p&gt;Retry is a powerful tool when used correctly. But if applied without control, it can bring down your system faster than the original error.&lt;/p&gt;

&lt;p&gt;Keep in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Retry only for temporary, recoverable errors&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Always limit retries, add delay + jitter, and use circuit breakers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Effective retry isn’t about "how many times you call back", but "knowing when to stop and wait"&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Retry is medicine — used wisely, it heals. Used wrong, it poisons your system.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>Hundreds of orders vanished in just 3 minutes – all because of one forgotten config line</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Wed, 18 Jun 2025 15:18:11 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/500-orders-lost-in-just-3-minutes-all-because-of-one-forgotten-config-line-4o9d</link>
      <guid>https://dev.to/lowkey_dev_591/500-orders-lost-in-just-3-minutes-all-because-of-one-forgotten-config-line-4o9d</guid>
      <description>&lt;p&gt;&lt;strong&gt;Prologue: A Seemingly Normal Afternoon&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It was a Friday, 4:30 PM. My team was about to deploy an update for the &lt;code&gt;order-service&lt;/code&gt; – one of the most critical microservices in our order processing pipeline.&lt;/p&gt;

&lt;p&gt;Everything looked smooth. Tests passed. CI/CD was all green. I confidently hit the Deploy button to production.&lt;/p&gt;

&lt;p&gt;“Just a small rollout… what could go wrong?”&lt;/p&gt;

&lt;p&gt;Five minutes later, Slack lit up. Channels like &lt;code&gt;#alert&lt;/code&gt;, &lt;code&gt;#ops&lt;/code&gt;, and &lt;code&gt;#order-system&lt;/code&gt; turned red with pings.&lt;br&gt;
Grafana showed a strange spike: the failure rate of orders shot up.&lt;br&gt;
Log entries appeared, and they weren’t friendly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;java.net.SocketException: Connection reset
org.apache.kafka.common.errors.TimeoutException
Connection refused: no further information
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I froze. Within minutes, nearly 500 orders vanished without a trace. Each one was abruptly halted—as if someone pressed “pause” then hit “delete.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flysyv9gkjnp90x9qam4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flysyv9gkjnp90x9qam4u.png" alt="Image description" width="800" height="1075"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Investigation: Something Wasn't Right&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We jumped into a quick incident meeting.&lt;br&gt;
No bugs in the code.&lt;br&gt;
No Kafka issues.&lt;br&gt;
No database outages.&lt;br&gt;
But one thing was consistent: all failed orders happened during the new deployment.&lt;/p&gt;

&lt;p&gt;Then someone from the team asked:&lt;/p&gt;

&lt;p&gt;“Did anyone set up graceful shutdown for this service?”&lt;/p&gt;

&lt;p&gt;I went silent. It all started to make sense.&lt;/p&gt;

&lt;p&gt;The old pod had just received requests when Kubernetes sent it a &lt;code&gt;SIGTERM&lt;/code&gt;.&lt;br&gt;
But we hadn’t configured Spring Boot for graceful shutdown.&lt;br&gt;
So the pod was killed—instantly and brutally. Kafka didn’t get a chance to send messages. Database transactions were left hanging. Half-processed data disappeared.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aftermath: Production Fell Apart Because of One Missing Config&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Who would’ve thought a single missing line could cause so much damage?&lt;/p&gt;

&lt;p&gt;500 lost orders, all had to be manually recovered one by one.&lt;br&gt;
We did 4 hours of overtime, tracing logs from Kafka to reconstruct requests.&lt;br&gt;
An apology email went out to customers—along with compensation vouchers.&lt;/p&gt;

&lt;p&gt;At that point, all I could think was: “I wish I’d known this earlier.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Realization: How a Service Dies Is Just as Important as How It Starts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That incident pushed me to dig into &lt;strong&gt;graceful shutdown&lt;/strong&gt;—a concept I had only glanced over before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson #1: Enable shutdown with empathy&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;shutdown&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;graceful&lt;/span&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;timeout-per-shutdown-phase&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes Spring wait for in-flight requests to finish before shutting down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson #2: Say goodbye to Kafka properly&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PreDestroy&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;cleanUp&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Kafka producer closed."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don’t close your producer correctly, you’re basically throwing messages into the void.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson #3: Don’t forget your thread pools&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Executor&lt;/span&gt; &lt;span class="nf"&gt;taskExecutor&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;ThreadPoolTaskExecutor&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolTaskExecutor&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setWaitForTasksToCompleteOnShutdown&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setAwaitTerminationSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson #4: Readiness probes are your safety net&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@EventListener&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onAppShutdown&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ContextClosedEvent&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;isReady&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// readiness = false =&amp;gt; K8s stops sending new traffic&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a pod is in the middle of dying and still receiving traffic, it’s like asking a patient on life support to keep working.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That incident was a painful but valuable lesson. It taught me that a system shouldn’t just be designed to run well—it must also be designed to &lt;strong&gt;shut down safely&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In a microservices world, where everything is interconnected in real-time, a single service dying unexpectedly can cause a &lt;strong&gt;domino effect&lt;/strong&gt;—disrupting data, user experience, and system reputation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Graceful shutdown is not optional – it's essential.&lt;/strong&gt;&lt;br&gt;
Especially for services dealing with requests, Kafka, RabbitMQ, databases, or external APIs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Always configure&lt;/strong&gt; &lt;code&gt;server.shutdown: graceful&lt;/code&gt; &lt;strong&gt;and set an appropriate&lt;/strong&gt; &lt;code&gt;timeout-per-shutdown-phase&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ensure all critical resources are properly released:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Kafka producers&lt;/li&gt;
&lt;li&gt;Thread pools&lt;/li&gt;
&lt;li&gt;DB connections&lt;/li&gt;
&lt;li&gt;External clients&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use readiness probes&lt;/strong&gt; to signal Kubernetes to stop sending new traffic during shutdown.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test shutdown scenarios in staging – not just startup ones.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;And finally: avoid Friday deployments if you can.&lt;/strong&gt;&lt;br&gt;
Systems may fail—but people deserve their weekends.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Writing clean code is one thing.&lt;br&gt;
&lt;strong&gt;Running a system responsibly and safely is another—and it’s often the part that’s overlooked.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I hope this story saves you from facing a black Friday like I did.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI won’t take your job – but your outdated thinking might!</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Sun, 08 Jun 2025 17:11:02 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/ai-wont-take-your-job-but-your-complacent-laziness-might-39d6</link>
      <guid>https://dev.to/lowkey_dev_591/ai-wont-take-your-job-but-your-complacent-laziness-might-39d6</guid>
      <description>&lt;p&gt;These days, coders are busy debating whether AI is a “work savior” or a “nightmare that causes unemployment.”&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“AI writes code so fast, I’m about to lose my job!”&lt;br&gt;
“Developers nowadays have all become vibe coders.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Calm down, skilled coders!&lt;/p&gt;

&lt;p&gt;AI &lt;strong&gt;does not take your job away&lt;/strong&gt; — it only does well at repetitive, mechanical tasks. Honestly, AI won’t make you unemployed; rather, not knowing how to leverage AI to innovate and grow your work is what will leave you behind.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wbb4pjyw7dhi0uw24pd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wbb4pjyw7dhi0uw24pd.png" alt="Image description" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  1️⃣ Software engineers are not typists, but software developers
&lt;/h2&gt;

&lt;p&gt;Many still think: “I’m a dev = I know how to code = I’m safe.”&lt;/p&gt;

&lt;p&gt;Then AI shows up and hits hard with the harsh truth: you’re just a typist, while AI types faster, with fewer bugs, and never takes lunch breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software engineers are not typists — they are software developers, meaning they design and build solutions, not copy-paste on command.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A coder asks: &lt;strong&gt;“Give me the specs, I’ll just type.”&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A software engineer asks: &lt;strong&gt;“Who is this feature for? Should it be prioritized over others? Is the data flowing correctly? Can the API scale?”&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only know how to type code according to specs without understanding why you’re typing it, AI will replace that “typing” part for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  2️⃣ AI generates code fast, but can’t replace you if you… know what you’re doing
&lt;/h2&gt;

&lt;p&gt;I’ve tried many AI tools: ChatGPT, Copilot, Claude, Gemini… and here’s the truth I found:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI works like a skilled chef, but it doesn’t know what the customer really wants. If you don’t understand the menu, no matter how fast the kitchen is, the dish will be a messy, tasteless plate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It helps me write API controllers in 30 seconds but can’t help me decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to expose APIs in a proper RESTful way?&lt;/li&gt;
&lt;li&gt;Should authentication be bypassed?&lt;/li&gt;
&lt;li&gt;What data does the mobile app actually need?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI is a tool, but you still have to lead and decide. Otherwise, no matter how fast it is, the result will just be “a messy table.”&lt;/p&gt;




&lt;h2&gt;
  
  
  3️⃣ Learn deeply so you don’t get “trapped” by AI, learn broadly so life doesn’t “hit” you
&lt;/h2&gt;

&lt;p&gt;Common mindset:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I know enough Java, the rest can be handled by AI.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then when CI/CD breaks, JSON is malformed, or the app lags, you only know how to shout:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“AI, save me!” 😭&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Don’t think AI means you can stop learning — if you do, you’re pushing yourself down the path of unemployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Learn broadly so you’re not “technologically blind” when collaborating with teammates
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;With Product Owners — to know what they really want (not 47 unnecessary APIs).&lt;/li&gt;
&lt;li&gt;With UX — so you don’t build an app that looks good on desktop but breaks on phones.&lt;/li&gt;
&lt;li&gt;With DevOps — so you don’t panic when deploying.&lt;/li&gt;
&lt;li&gt;With Data teams — so you understand data pipelines aren’t a joke.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Learn deeply so you don’t get fooled by AI “nonsense”
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AI generates great code but bugs still appear like Swiss watches.&lt;/li&gt;
&lt;li&gt;Without deep understanding, you’ll be misled by bots that never deployed real products.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4️⃣ AI is a learning weapon, not a reason to stop learning
&lt;/h2&gt;

&lt;p&gt;I use AI every day to learn faster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;YAML is hard to remember? AI suggests.&lt;/li&gt;
&lt;li&gt;Dockerfile is confusing? AI fixes it.&lt;/li&gt;
&lt;li&gt;Bash script sounds like Klingon? AI translates it to human.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;I don’t learn less, I learn faster.&lt;/strong&gt; Then I use AI as a weapon.&lt;/p&gt;

&lt;p&gt;From a backend dev who used to spam &lt;code&gt;System.out.println&lt;/code&gt;, now I:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand how CI/CD works.&lt;/li&gt;
&lt;li&gt;Know what ETL means in data pipelines.&lt;/li&gt;
&lt;li&gt;Read and understand UX/UI to avoid making apps “as clunky as a punch.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not because I’m smarter, but because I learned how to leverage AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  5️⃣ Losing your job is not because of AI — it’s because of stubbornness
&lt;/h2&gt;

&lt;p&gt;Frankly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI is not the enemy, but stubbornness and refusal to change make you miss opportunities.&lt;/li&gt;
&lt;li&gt;Those who refuse to learn how to use AI will be rated low by their bosses because they no longer meet job requirements in this new era.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the other hand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;People who know how to use AI as a powerful tool work more efficiently, faster, and extend their influence.&lt;/li&gt;
&lt;li&gt;Those who cling to old mindsets and refuse to learn AI will quickly be left behind and easily lose their jobs.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6️⃣ The AI era is the era of “adaptable devs,” not “complaining devs”
&lt;/h2&gt;

&lt;p&gt;Ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Am I using AI to &lt;em&gt;speed up&lt;/em&gt; or just &lt;em&gt;sit and fear being replaced&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;Am I learning something new beyond my old, rusty stack?&lt;/li&gt;
&lt;li&gt;Do I understand who the product I build serves?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If yes — &lt;strong&gt;AI is your ally&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If no — &lt;strong&gt;AI is a mirror showing you’re... outdated.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Conclusion: AI doesn’t take your job — but those who know how to use AI will take your job
&lt;/h2&gt;

&lt;p&gt;We are not just typists — we are solution designers, system integrators, user-understanders.&lt;/p&gt;

&lt;p&gt;In the AI era:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Know deeply&lt;/strong&gt; to tell if AI code is good or nonsense.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Know broadly&lt;/strong&gt; to connect teams, understand products, and support teammates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Know how to use AI&lt;/strong&gt; as an assistant — not an “online teacher” you rely on every second.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI isn’t scary — stubbornness is what makes you lag behind.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;p&gt;👉 Don’t fear AI — turn AI into a powerful ally for creativity and growth.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I don’t like microservices, and here’s why</title>
      <dc:creator>lowkey dev</dc:creator>
      <pubDate>Sun, 01 Jun 2025 14:16:20 +0000</pubDate>
      <link>https://dev.to/lowkey_dev_591/i-dont-like-microservices-and-heres-why-2mja</link>
      <guid>https://dev.to/lowkey_dev_591/i-dont-like-microservices-and-heres-why-2mja</guid>
      <description>&lt;p&gt;Hello everyone! I’m Hung Pham, a backend developer who used to think that microservices were the standard for every system—until I actually deployed and maintained one myself. After many nights struggling with dozens of logs from 4–5 different services, I realized one thing: microservices aren’t the right solution for every system. Why? Let me share my story in detail, hoping it will give you a more realistic perspective on microservices.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. In the beginning—microservices seemed like the “holy grail”
&lt;/h3&gt;

&lt;p&gt;When I first discovered microservices, it felt incredibly &lt;em&gt;exciting&lt;/em&gt;. The hype, the case studies from big players like Netflix, Amazon, Uber, and Google made me almost obsessed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Break your app into small parts that can be developed and deployed independently, as you wish!”&lt;/li&gt;
&lt;li&gt;“Scale each service separately—no need to scale the whole bloated app!”&lt;/li&gt;
&lt;li&gt;“If you’re not doing microservices, you’re falling behind—it’s the future of software development!”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I dove straight into Docker, Kubernetes, service mesh, CI/CD automation, API Gateway… The list of things to learn was endless—longer than the actual project deadlines! I thought I was opening a brand-new chapter for my backend career.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01632eejl322bqt2xixn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F01632eejl322bqt2xixn.png" alt="Image description" width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  2. But reality was not as dreamy
&lt;/h3&gt;

&lt;p&gt;When I finally “microfied” some of my team’s projects—our small team had only 4–5 devs—I realized that microservices were &lt;em&gt;not&lt;/em&gt; just about splitting up code. It’s a complex web that nearly drove me crazy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Network latency and timeouts:&lt;/strong&gt; If a single service is slow, the whole system can crash like a line of dominos. One seemingly simple request might pass through a dozen services—timeouts and partial failures became routine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complicated deployment management:&lt;/strong&gt; Each service had its own CI/CD pipeline, its own configs, its own versioning. Deployments weren’t just a click anymore—they turned into a campaign.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data consistency headaches:&lt;/strong&gt; No more simple transactions. Now we had to think about eventual consistency, complex patterns like Saga, Orchestrator (Camunda, Temporal…)—just hearing those words made me want to give up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debug logs were a nightmare:&lt;/strong&gt; When production issues hit, I had to dig through logs from multiple services, tracing requests across systems. I felt like Sherlock Holmes stumbling in the dark!&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. The things microservices “stole” from our small team
&lt;/h3&gt;

&lt;p&gt;For our small team, I realized microservices were stealing many valuable things from us:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Focus:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Monolith: One repo, one codebase, everyone working together—easy to communicate, easy to grasp the big picture.&lt;/li&gt;
&lt;li&gt;Microservices: Everyone camping in their own service, talking only via APIs, creating silos in the team.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Initial development speed:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Monolith: Deploy once, rollback once, small changes went live quickly.&lt;/li&gt;
&lt;li&gt;Microservices: Deployments scattered across services, config tweaking everywhere, rollbacks were trickier and took much longer.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The joy of releasing features:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Monolith: Release a feature immediately, get instant feedback from users.&lt;/li&gt;
&lt;li&gt;Microservices: Release in pieces, carefully coordinate to avoid breaking APIs—stressful and slow.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. But microservices aren’t the villain
&lt;/h3&gt;

&lt;p&gt;I’m not denying that microservices have some &lt;em&gt;real&lt;/em&gt; strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Independent scaling:&lt;/strong&gt; Hot services can be scaled separately, saving resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empowered teams:&lt;/strong&gt; Teams can work independently on their services, reducing dependencies and speeding up long-term development.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy to evolve and replace:&lt;/strong&gt; Updating a single part doesn’t require messing with a huge monolith.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  5. So when should you actually use microservices?
&lt;/h3&gt;

&lt;p&gt;I think microservices only truly shine when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your project has a &lt;strong&gt;large backend team&lt;/strong&gt; (10+ devs), so you can split by domain.&lt;/li&gt;
&lt;li&gt;Your infrastructure is &lt;strong&gt;mature enough&lt;/strong&gt; (CI/CD automation, great observability—logging, tracing, metrics…), so deploying doesn’t feel like rocket science.&lt;/li&gt;
&lt;li&gt;Your application has clear, separate domains—like payments, user management, logistics, each operating almost independently.&lt;/li&gt;
&lt;li&gt;Your traffic is huge, and you really need to &lt;strong&gt;scale specific components&lt;/strong&gt; to save costs and boost performance.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  6. And when you probably shouldn’t “play fancy”
&lt;/h3&gt;

&lt;p&gt;If you’re in one of these situations, think twice before jumping into microservices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small team (3–5 devs), drowning in backlog with tons of features to build.&lt;/li&gt;
&lt;li&gt;Simple application with just a few main modules, no real need for complex scaling.&lt;/li&gt;
&lt;li&gt;No experience with CI/CD or DevOps—microservices will force you to learn DevOps first.&lt;/li&gt;
&lt;li&gt;Tight deadlines (like 1 month to launch) rather than a year for sustainable growth.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  7. Conclusion: I don’t hate microservices—I just don’t like meaningless “hype-following”
&lt;/h3&gt;

&lt;p&gt;Microservices aren’t evil. They’re not “automatically great,” either. I just don’t like when small teams chase trends blindly and burden themselves with unnecessary complexity.&lt;/p&gt;

&lt;p&gt;For me, the most important thing is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Understand your actual problem.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Choose the architecture that fits your team size, your app’s nature, and the real complexity you need.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small team, few features, tight deadlines: &lt;strong&gt;Monolith is king.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Big team, complex domains, heavy traffic: &lt;strong&gt;Microservices is the savior.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  8 And what about you?
&lt;/h3&gt;

&lt;p&gt;Have you had a wildly successful microservices experience? Or a complete disaster? I’d love to hear your stories—so we can learn from each other and avoid repeating the same mistakes I did.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
