<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Randa</title>
    <description>The latest articles on DEV Community by Randa (@randazraik).</description>
    <link>https://dev.to/randazraik</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2504407%2F4b825956-933d-4015-99f7-aa785685c005.gif</url>
      <title>DEV Community: Randa</title>
      <link>https://dev.to/randazraik</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/randazraik"/>
    <language>en</language>
    <item>
      <title>Microservices Security: From Fundamentals to Advanced Patterns</title>
      <dc:creator>Randa</dc:creator>
      <pubDate>Tue, 09 Sep 2025 13:47:24 +0000</pubDate>
      <link>https://dev.to/randazraik/microservices-security-from-fundamentals-to-advanced-patterns-2h2k</link>
      <guid>https://dev.to/randazraik/microservices-security-from-fundamentals-to-advanced-patterns-2h2k</guid>
      <description>&lt;p&gt;This article explores key security principles and practical tools for protecting distributed microservices. From foundational ideas like least privilege and defense in depth to real-world practices including zero trust, encryption, observability, and service meshes, it guides you through making security decisions in microservice environments.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Distributed Security Challenge&lt;/li&gt;
&lt;li&gt;
The Three Core Security Principles

&lt;ul&gt;
&lt;li&gt;Least Privilege&lt;/li&gt;
&lt;li&gt;Defense in Depth&lt;/li&gt;
&lt;li&gt;Automation&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
The Five Functions of Cybersecurity

&lt;ul&gt;
&lt;li&gt;Identify&lt;/li&gt;
&lt;li&gt;Protect&lt;/li&gt;
&lt;li&gt;Detect&lt;/li&gt;
&lt;li&gt;Respond&lt;/li&gt;
&lt;li&gt;Recover&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Zero Trust

&lt;ul&gt;
&lt;li&gt;Zero Trust Principles&lt;/li&gt;
&lt;li&gt;Zero Trust Use Cases&lt;/li&gt;
&lt;li&gt;Zero Trust Architecture&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Protection Mechanisms

&lt;ul&gt;
&lt;li&gt;Patching&lt;/li&gt;
&lt;li&gt;Authentication and Authorization&lt;/li&gt;
&lt;li&gt;Data in Transit&lt;/li&gt;
&lt;li&gt;Data at Rest&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Service Meshes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Wrap-Up&lt;/li&gt;
&lt;li&gt;Further Reading&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Distributed Security Challenge
&lt;/h2&gt;

&lt;p&gt;Breaking apart a monolith into microservices creates a fundamental trade-off: you gain flexibility but multiply your security challenges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5r22pjtgkr3teyiipz2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5r22pjtgkr3teyiipz2g.png" alt="The Distributed Security Challenge" width="699" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Bigger Attack Surface
&lt;/h4&gt;

&lt;p&gt;A monolith typically has three main security concerns: one application server, one database, and a few external APIs. With microservices, each service has its own endpoints, databases, and dependencies, creating a larger attack surface. If each service has a 1% daily vulnerability risk, 10 services increase the chance of a breach to nearly 10%, and with 100 services, it’s guaranteed.&lt;/p&gt;

&lt;h4&gt;
  
  
  More Problems, Better Defenses
&lt;/h4&gt;

&lt;p&gt;Microservices create more problems: More endpoints to attack, more network traffic to intercept, more systems to patch, and way more complexity. But they also improve defenses through service isolation, precise permissions, and better breach containment. Microservices offer stronger security, but only if you handle the complexity with focused security at every boundary, automation, and distributed monitoring.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Core Security Principles
&lt;/h2&gt;

&lt;p&gt;Before diving into specific patterns and practices, we must establish the three core principles that should guide all security decisions in distributed systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqa9sfwnsboqxahq6mcjq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqa9sfwnsboqxahq6mcjq.png" alt="The Three Core Security Principles" width="717" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Least Privilege
&lt;/h3&gt;

&lt;p&gt;Grant the minimum access needed for each service to do its job. Nothing more.&lt;/p&gt;

&lt;h4&gt;
  
  
  Database Access Control
&lt;/h4&gt;

&lt;p&gt;Ensure services only have access to the data they truly need. For example, Order service needs read/write access to the orders table but zero access to the payment table. If an attacker compromises the Order service, they can't touch payment data, minimizing the blast radius.&lt;/p&gt;

&lt;h4&gt;
  
  
  Network Segmentation
&lt;/h4&gt;

&lt;p&gt;Limit which services can communicate with each other to limit attacker movement between services. For example, Order service needs access to Payment but not Inventory. Most organizations deploy on "open by default" networks violating least privilege. Use network policies to whitelist allowed connections and reduce lateral movement.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Default-Deny Approach
&lt;/h4&gt;

&lt;p&gt;Start secure, then open access as needed. This requires more setup but creates stronger security and makes systems easier to reason about. So start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No network ports open by default&lt;/li&gt;
&lt;li&gt;No database connections allowed by default&lt;/li&gt;
&lt;li&gt;No service-to-service communication permitted by default&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Defense in Depth
&lt;/h3&gt;

&lt;p&gt;Don't rely on a single security measure. Build overlapping protections so attackers have to breach multiple defenses to cause real damage.&lt;/p&gt;

&lt;h4&gt;
  
  
  Security Controls
&lt;/h4&gt;

&lt;p&gt;Security controls are the specific measures you put in place to protect your system, the actual defensive tools and processes. We group them into three types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preventative&lt;/strong&gt; - Stop attacks (encryption, authentication, firewalls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detective&lt;/strong&gt; - Spot attacks happening (monitoring, intrusion detection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Responsive&lt;/strong&gt; - Handle attacks (incident response, backups, recovery)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A robust security system requires all three types. Having only preventative controls means you won't know when they fail. Having only detective controls means attacks succeed before you can respond.&lt;/p&gt;

&lt;h4&gt;
  
  
  Layered Defense in Microservices
&lt;/h4&gt;

&lt;p&gt;Microservices introduce multiple layers where you can implement these security controls. These typically include the network layer (securing North-South and East-West traffic through segmentation and encryption), the service layer (enforcing access rules and validation within each microservice), and the data layer (encrypting and restricting sensitive information).&lt;/p&gt;

&lt;p&gt;Each layer protects against different threats. An SQL injection might bypass your network perimeter but gets stopped by input validation. A stolen credential might pass authentication but fails at role-based access controls.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Automation
&lt;/h3&gt;

&lt;p&gt;Manual security processes don’t scale with microservices. Automation accelerates repetitive tasks, reduces human error, and ensures consistency. As your system grows with microservices, automation becomes essential for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consistently applying security configurations&lt;/li&gt;
&lt;li&gt;Continuously monitoring and responding to security events&lt;/li&gt;
&lt;li&gt;Efficiently applying patches and updates&lt;/li&gt;
&lt;li&gt;Automating service-to-service communication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Throughout this article, we'll explore how automation supports these protection mechanisms.&lt;/p&gt;

&lt;h4&gt;
  
  
  Infrastructure as Code (IaC)
&lt;/h4&gt;

&lt;p&gt;IaC allows you to manage and provision infrastructure through configuration files and scripts rather than manual processes. These files specify network rules, access controls, which services can talk to each other, and more.&lt;/p&gt;

&lt;p&gt;By storing security configurations in version control (just like application code), automation tools apply them consistently across your infrastructure, eliminating manual intervention, reducing errors, and enabling quick recovery by rebuilding environments from version-controlled configurations after failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five Functions of Cybersecurity
&lt;/h2&gt;

&lt;p&gt;The US National Institute of Standards and Technology (NIST) has defined a framework that breaks cybersecurity into five core functions, encouraging a broad, strategic approach rather than focusing only on the technical protection mechanisms.&lt;/p&gt;

&lt;p&gt;As developers and architects, we often focus on the "Protect" function because it involves the technical challenges we enjoy solving (ironically, this is also the focus of our article). But truly secure systems require all five functions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa923ho2csoi8j0b96xp5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa923ho2csoi8j0b96xp5.png" alt="The Five Functions of Cybersecurity" width="800" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Identify
&lt;/h3&gt;

&lt;p&gt;You cannot secure what you don't know exists. In microservices, this challenge multiplies dramatically as services span across teams and environments. To achieve this identification, you need to follow these steps:&lt;/p&gt;

&lt;h4&gt;
  
  
  Asset Inventory
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;List all deployed services and where they run&lt;/li&gt;
&lt;li&gt;Track the version of each service in use&lt;/li&gt;
&lt;li&gt;Map what dependencies each service has&lt;/li&gt;
&lt;li&gt;Identify data each service handles or stores&lt;/li&gt;
&lt;li&gt;Assign ownership - who maintains each service&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Threat Modeling
&lt;/h4&gt;

&lt;p&gt;Threat modeling is the process to identify what attackers might want, how they might try to get it and their potential impact. To achieve this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnst31dw2efulp07v5c90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnst31dw2efulp07v5c90.png" alt="Attack Tree" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build attack trees: Start with the attacker’s goal and work backward to explore possible attack paths.&lt;/li&gt;
&lt;li&gt;Assign costs and impact for each attack path:

&lt;ul&gt;
&lt;li&gt;Cost from the attacker's perspective ($ to $$$$)&lt;/li&gt;
&lt;li&gt;Potential impact to your business (High, Medium, Low)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Handle microservices complexities:

&lt;ul&gt;
&lt;li&gt;Attack paths can span multiple services&lt;/li&gt;
&lt;li&gt;Service dependencies may cause cascading risk&lt;/li&gt;
&lt;li&gt;Rapid development cycles require frequent updates to threat models&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Create multiple threat models:

&lt;ul&gt;
&lt;li&gt;System-level model covering overall architecture&lt;/li&gt;
&lt;li&gt;Service-level models for high-risk services&lt;/li&gt;
&lt;li&gt;Integration models for critical service-to-service communications&lt;/li&gt;
&lt;li&gt;Regular cross-team modeling sessions to identify risks&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Threat Intelligence
&lt;/h4&gt;

&lt;p&gt;While threat modeling analyzes your system, threat intelligence tracks real-world attacks and focuses on actual threats not just theoretical ones. Use this to focus your security efforts where they matter most. A good resource is the Verizon Data Breach Investigations &lt;a href="https://www.verizon.com/business/resources/reports/" rel="noopener noreferrer"&gt;Report&lt;/a&gt;, which annually analyzes thousands of real security incidents. Key takeaways from their report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Credential theft remains the most common attack vector (80% of breaches)&lt;/li&gt;
&lt;li&gt;Unpatched vulnerabilities are increasingly exploited rapidly&lt;/li&gt;
&lt;li&gt;Social engineering attacks are growing more sophisticated&lt;/li&gt;
&lt;li&gt;Insider threats remain a significant risk&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Prioritize Risks
&lt;/h4&gt;

&lt;p&gt;Create a risk prioritization matrix by impact and likelihood. Focus on high-impact, low-cost attacks and also attack paths where you can easily increase attacker's costs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Impact \ Likelihood&lt;/th&gt;
&lt;th&gt;High Likelihood&lt;/th&gt;
&lt;th&gt;Medium Likelihood&lt;/th&gt;
&lt;th&gt;Low Likelihood&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Critical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Critical Risk&lt;/td&gt;
&lt;td&gt;High Risk&lt;/td&gt;
&lt;td&gt;Medium Risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High Risk&lt;/td&gt;
&lt;td&gt;Medium Risk&lt;/td&gt;
&lt;td&gt;Low Risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Medium&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium Risk&lt;/td&gt;
&lt;td&gt;Low Risk&lt;/td&gt;
&lt;td&gt;Very Low Risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low Risk&lt;/td&gt;
&lt;td&gt;Very Low Risk&lt;/td&gt;
&lt;td&gt;Minimal Risk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Make It Ongoing
&lt;/h4&gt;

&lt;p&gt;Threat modeling isn't a one-time exercise. You need to schedule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quarterly threat model reviews as part of architecture planning&lt;/li&gt;
&lt;li&gt;Post-incident threat model updates incorporating lessons learned&lt;/li&gt;
&lt;li&gt;Threat modeling for new features during design phases&lt;/li&gt;
&lt;li&gt;Regular review and integration of threat intelligence&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Protect
&lt;/h3&gt;

&lt;p&gt;Protection means implementing security controls to prevent incidents before they happen. We will talk more about this in Protection Mechanisms, where we will cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authentication and authorization&lt;/li&gt;
&lt;li&gt;Data encryption (in transit and at rest)&lt;/li&gt;
&lt;li&gt;Vulnerability management and patching&lt;/li&gt;
&lt;li&gt;Keys management&lt;/li&gt;
&lt;li&gt;Service meshes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Detect
&lt;/h3&gt;

&lt;p&gt;Protection systems can eventually fail or be bypassed. Detection capabilities help identify security incidents quickly to minimize their impact.&lt;/p&gt;

&lt;h4&gt;
  
  
  Detection strategies:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Centralized logging and security event correlation&lt;/li&gt;
&lt;li&gt;Behavioral analysis to identify unusual patterns&lt;/li&gt;
&lt;li&gt;Automated threat detection using known attack signatures&lt;/li&gt;
&lt;li&gt;Service mesh observability for network-level monitoring&lt;/li&gt;
&lt;li&gt;Application performance monitoring to detect anomalies&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Detection challenges in microservices:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Increased monitoring scope - dozens of services generating security events&lt;/li&gt;
&lt;li&gt;Distributed attack patterns spanning multiple services&lt;/li&gt;
&lt;li&gt;Managing false positives due to numerous alerts&lt;/li&gt;
&lt;li&gt;Complexity in correlating events across services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Respond
&lt;/h3&gt;

&lt;p&gt;When detection systems alert you to a potential security incident, you need well-defined response procedures. During an active incident, people are stressed and don't think clearly. So predefind playbooks and decision trees are essential.&lt;/p&gt;

&lt;h4&gt;
  
  
  Response planning considerations:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Escalation procedures - Who needs to be notified and when?&lt;/li&gt;
&lt;li&gt;Communication plans - How to notify customers, partners, and regulators?&lt;/li&gt;
&lt;li&gt;Containment strategies - How will you isolate compromised services?&lt;/li&gt;
&lt;li&gt;Evidence preservation - How will you maintain forensic evidence?&lt;/li&gt;
&lt;li&gt;Decision-making authority - Who decides during an incident?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Microservices-specific response challenges:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Service isolation - Can you take services offline without breaking everything?&lt;/li&gt;
&lt;li&gt;Blast radius assessment - How do you quickly determine the impact scope?&lt;/li&gt;
&lt;li&gt;Rollback procedures - Can you revert to safe versions fast?&lt;/li&gt;
&lt;li&gt;Communication coordination - How to align teams’ actions?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Recover
&lt;/h3&gt;

&lt;p&gt;Recovery involves restoring systems and applying lessons learned to prevent future incidents and improve resilience.&lt;/p&gt;

&lt;h4&gt;
  
  
  Recovery considerations:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Service restoration priorities - Which services need to come back online first?&lt;/li&gt;
&lt;li&gt;Data integrity - How do you ensure your data hasn't been corrupted?&lt;/li&gt;
&lt;li&gt;Dependency management - How to handle interdependent services in recovery?&lt;/li&gt;
&lt;li&gt;Customer communication - How do you rebuild trust after an incident?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Learning and improvement:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Blameless post-mortems to understand what went wrong&lt;/li&gt;
&lt;li&gt;System and process improvements based on lessons learned&lt;/li&gt;
&lt;li&gt;Training and awareness to improve future response&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Zero Trust
&lt;/h2&gt;

&lt;p&gt;Zero Trust is a modern security architecture built on one core idea: Never trust, always verify, no matter where a request comes from. Traditional models rely on implicit trust, assuming anything inside the perimeter (like a VPN or internal network) is safe. This assumption fails once attackers breach that perimeter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8y87390a24junkrcku3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8y87390a24junkrcku3.png" alt="Zero Trust" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Zero Trust model embodies the three core security principles introduced earlier as you will see in the next sections. This model is adopted by Google (&lt;a href="https://cloud.google.com/beyondcorp" rel="noopener noreferrer"&gt;BeyondCorp&lt;/a&gt;), Netflix (&lt;a href="https://www.usenix.org/conference/enigma2018/presentation/zimmer" rel="noopener noreferrer"&gt;LISA&lt;/a&gt;), and Microsoft (&lt;a href="https://www.microsoft.com/en-us/security/business/zero-trust" rel="noopener noreferrer"&gt;ZT Model&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero Trust Principles
&lt;/h3&gt;

&lt;p&gt;Zero Trust assumes no one is trusted by default, inside or outside the network. The core principles are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Verify Explicitly&lt;/strong&gt;: Authenticate and authorize on every layer every request, based on identity, device, and context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Least Privilege Access&lt;/strong&gt;: Limit access by role, resource, and action, not just broad user groups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assume Breach&lt;/strong&gt;: Design your system as if an attacker is already inside.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Zero Trust Use Cases
&lt;/h3&gt;

&lt;p&gt;Zero trust isn't one-size-fits-all. The decision should be driven by your &lt;strong&gt;threat model&lt;/strong&gt; and business requirements.&lt;/p&gt;

&lt;h5&gt;
  
  
  Use it when:
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;You manage sensitive data (i.e. PII, finance, healthcare)&lt;/li&gt;
&lt;li&gt;You operate in regulated industries (i.e. PCI, HIPAA, FedRAMP)&lt;/li&gt;
&lt;li&gt;Your systems span multiple networks or cloud providers&lt;/li&gt;
&lt;li&gt;You face advanced or persistent threats&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Avoid it when:
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;You're a small team with internal-only systems&lt;/li&gt;
&lt;li&gt;You threat model shows low risk or low attacker motivation&lt;/li&gt;
&lt;li&gt;You lack time and expertise to maintain strong identity and policy infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Zero Trust Architecture
&lt;/h3&gt;

&lt;p&gt;Modern Zero Trust systems apply security controls across multiple layers for defense in depth. We’ll explore many of these mechanisms throughout the article:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identity Layer&lt;/strong&gt; - &lt;em&gt;Verify who or what is making the request&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;OAuth2/OIDC for user authentication (Auth0, Azure AD, Okta)&lt;/li&gt;
&lt;li&gt;Workload identities for services (SPIFFE/SPIRE, AWS IAM roles)&lt;/li&gt;
&lt;li&gt;Short-lived credentials and cert rotation&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Layer&lt;/strong&gt; - &lt;em&gt;Don’t trust internal networks&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;mTLS for service-to-service encryption&lt;/li&gt;
&lt;li&gt;Use microsegmentation to isolate workloads (i.e. Kubernetes Network Policies)&lt;/li&gt;
&lt;li&gt;Block unauthenticated east-west traffic&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service Layer&lt;/strong&gt; - &lt;em&gt;Each service enforces its own policy&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;Service meshes to manage traffic and identity&lt;/li&gt;
&lt;li&gt;Policy enforcement via tools like OPA and Gatekeeper&lt;/li&gt;
&lt;li&gt;Per-request, context-aware authorization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Layer&lt;/strong&gt; - &lt;em&gt;Limit who can access what data and how&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;Encrypt sensitive data at rest and in transit&lt;/li&gt;
&lt;li&gt;Authorize access at the service or method level&lt;/li&gt;
&lt;li&gt;Monitor and audit access to critical data sources&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Protection Mechanisms
&lt;/h2&gt;

&lt;p&gt;Now let's dive into the practical mechanisms you can use to protect your microservices. We'll cover the most critical areas where microservices create new security challenges or require different approaches than monolithic applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patching
&lt;/h2&gt;

&lt;p&gt;Patching is the process of applying updates to software, operating systems, and hardware to fix security vulnerabilities and enhance system performance. In microservices, where multiple layers and dependencies interact, patching is critical to reduce exposure to risks.&lt;/p&gt;

&lt;p&gt;Microservices create a multi-layered environment that requires attention for patching at each level:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag1a7gk38vws69v4u8cn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag1a7gk38vws69v4u8cn.png" alt="Microservices Layers" width="381" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each of these layers, down to the hardware, needs regular patching. Container OS vulnerabilities, for instance, can accumulate even if your application code hasn't changed, making it essential to patch every layer and dependency in your architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why We Care About Patching
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security Vulnerabilities&lt;/strong&gt;&lt;br&gt;
Patching helps to mitigate known vulnerabilities. Unpatched systems can be easily exploited by attackers. The Equifax breach in 2017 was caused by an unpatched Apache Struts vulnerability (CVE-2017-5638), even though a patch had been available for months, which affected 147 million Americans and cost over $1.7 billion in damages and regulatory fines.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operational Stability&lt;/strong&gt;&lt;br&gt;
Outdated systems or components can become unstable, leading to service disruptions. As seen with the 2017 AWS S3 outage, a routine update to outdated systems caused a failure in the S3 service, disrupting access to critical cloud services for hours and impacting many websites and apps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Regulatory Compliance&lt;/strong&gt;&lt;br&gt;
Many industries face stringent compliance requirements (i.e. GDPR, HIPAA) that mandate timely patching of security vulnerabilities. Failing to patch systems can result in non-compliance, leading to fines and damage to reputation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Challenges with Patching
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complex Dependency Chains&lt;/strong&gt;&lt;br&gt;
Microservices rely on hundreds of third-party libraries, creating tangled dependency trees. A vulnerability in one dependency can affect your entire system. Log4Shell incident in 2021 is a prime example, where the Log4j vulnerability was hidden deep within a service's dependencies, making it hard for organizations to know they were even at risk as they didn't know they rely on this library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vendor-Supplied Updates&lt;/strong&gt;&lt;br&gt;
Even security tools can introduce vulnerabilities. In July 2024, a CrowdStrike update caused widespread system failures due to a logic error in the update. The vendor pushed the update, which resulted in systems crashing globally. This incident emphasizes the importance of thoroughly vetting and testing third-party updates before deploying them in production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operational Overhead&lt;/strong&gt;&lt;br&gt;
As microservices grow in size and complexity, keeping track of the patches for each service and component becomes a daunting task. With thousands of containers and dependencies, ensuring timely patching requires heavy automation and monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Patching Can Cause Disruptions&lt;/strong&gt;&lt;br&gt;
Even when patches are available, applying them often involves downtime, which may not always be feasible. This issue is exacerbated in production environments, where continuous availability is a requirement.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Protection Mechanisms
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automated Dependency Scanning&lt;/strong&gt;&lt;br&gt;
Use tools like Snyk, GitHub Advanced Security, or OWASP Dependency-Check to automatically scan for vulnerabilities in both direct and transitive dependencies. These tools integrate seamlessly with CI/CD pipelines, ensuring vulnerabilities are caught early in the development process before they reach production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Container Image Scanning&lt;/strong&gt;&lt;br&gt;
Implement image scanning tools such as Aqua Security, Twistlock, or Snyk Container to detect vulnerabilities in container images. These tools should be integrated with your container registry, blocking vulnerable images from being deployed in production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Software Bill of Materials (SBOM)&lt;/strong&gt;&lt;br&gt;
SBOM is essential for tracking all components and dependencies in your microservices stack, helping you quickly assess which need to be patched.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automate Patching with Infrastructure as Code (IaC)&lt;/strong&gt;&lt;br&gt;
Leverage IaC tools (Terraform, CloudFormation) to automate patching across infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Staged Rollouts and Canary Deployments&lt;/strong&gt;&lt;br&gt;
Applying patches through staged rollouts or canary deployments to catch issues early before a production rollout.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Managed Services&lt;/strong&gt;&lt;br&gt;
Offload patching of infrastructure layers (i.e. VMs, container orchestration) to managed cloud services (i.e. AWS ECS, Azure AKS, GKE), to minimize the patching burden on your team and ensure that lower layers of the stack are updated automatically.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Authentication and Authorization
&lt;/h2&gt;

&lt;p&gt;Authentication and authorization represent two of the most critical security challenges in microservices architectures. Authentication checks who is making a request, usually at the system edge. Authorization decides what they’re allowed to do and must be enforced across all services. You authenticate once, but authorize everywhere.&lt;/p&gt;

&lt;p&gt;In a monolithic architecture, this is simpler. The entire system runs as a single application, so authentication and authorization can be handled centrally. Access control logic has full visibility into user identity and data, and is enforced consistently across all layers (UI, backend, and database).&lt;/p&gt;

&lt;p&gt;In a microservices architecture, responsibilities are split across independent services. Authentication is typically handled at the API gateway, which verifies identity and passes it along. But authorization is more complex, each service must make its own decisions based on the identity and claims it receives. Data is distributed, context is limited, and consistent enforcement requires clear token design and local policy checks within each service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication
&lt;/h3&gt;

&lt;p&gt;Authentication verifies the identity of users or systems, usually at the perimeter of the system via a &lt;strong&gt;centralized&lt;/strong&gt; component of an identity provider and an edge proxy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identity Provider (IdP): Authenticates users and issues identity tokens using standards like OAuth 2.0 and OIDC. Can be cloud-based (Okta, Auth0) or self-hosted.&lt;/li&gt;
&lt;li&gt;Edge proxy/gateway: Validates tokens, forwards unauthenticated requests to the IdP, and passes authenticated traffic to backend services. These proxies/gateways may take the form of traditional API gateways, ingress controllers, service mesh sidecars, or lightweight reverse proxies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By delegating authentication to the proxy and IdP, the system avoids duplicating authentication logic across services and eliminates the need for individual services to store or validate credentials directly.&lt;/p&gt;

&lt;h4&gt;
  
  
  Single Sign-On (SSO)
&lt;/h4&gt;

&lt;p&gt;In distributed systems, SSO allows users to authenticate once with the IdP and access multiple services without logging in again. It's typically implemented using identity protocols like OIDC on top of OAuth 2.0. This simplifies the login experience and avoids duplicating authentication logic across microservices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlysknqjxelm5r0s4nsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlysknqjxelm5r0s4nsg.png" alt="Authentication Flow" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User requests &lt;code&gt;/login&lt;/code&gt; page and is redirected to IdP to submit credentials.&lt;/li&gt;
&lt;li&gt;IdP authenticates user and issues a JWT containing identity and claims.&lt;/li&gt;
&lt;li&gt;User requests a &lt;code&gt;/checkout&lt;/code&gt; with JWT.&lt;/li&gt;
&lt;li&gt;API gateway validates JWT locally. If invalid, user is redirected to IdP.&lt;/li&gt;
&lt;li&gt;API gateway passes the request and token to the downstream service.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Subsequent requests include the token in &lt;code&gt;Authorization: Bearer&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If no API gateway used, each microservice should validate the JWT.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Best practices
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Use standard IdPs supporting OAuth 2.0 and OIDC.&lt;/li&gt;
&lt;li&gt;Centralize authentication enforcement at the gateway level.&lt;/li&gt;
&lt;li&gt;Avoid embedding authentication logic or credentials in microservices.&lt;/li&gt;
&lt;li&gt;Enable MFA, especially for privileged users.&lt;/li&gt;
&lt;li&gt;Choose short-lived credentials to limit risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Authorization
&lt;/h3&gt;

&lt;p&gt;Authorization decides what that user or system is allowed to do and must be enforced throughout the system - inside services, between them, and at data boundaries. Four common models for handling authorization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Role-Based Access Control (RBAC): Based on user roles (&lt;code&gt;admin&lt;/code&gt;, &lt;code&gt;editor&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Attribute-Based Access Control (ABAC): Based on user/resource/env attributes.&lt;/li&gt;
&lt;li&gt;Permission-Based Access Control: Use fine-grained explicit permissions (&lt;code&gt;read:order&lt;/code&gt;, &lt;code&gt;write:order&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Policy-Based Access Control (PBAC): External policy engines (OPA) manage centralized policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Centralized Authorization
&lt;/h4&gt;

&lt;p&gt;One implementation is to use a centralized service where every service asks it whether a user is allowed to perform a certain action. This approach adds latency, bottlenecks, risks downtime, couples services tightly, and lose business context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyeufwz1m0g635ec49tje.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyeufwz1m0g635ec49tje.png" alt="Centralized Service - Authorization Flow" width="788" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another implementation is to centralize all authorization logic at the API gateway, so that every request goes through the gateway, where access is evaluated then routed to the services. No authorization checks inside services. This approach causes network overhead, added latency, complex config, and tight coupling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fennjfg0soznpzobd1jzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fennjfg0soznpzobd1jzq.png" alt="Centralized App Gateway - Authorization Flow" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Decentralized Authorization
&lt;/h4&gt;

&lt;p&gt;A more scalable and resilient approach is to use self-contained tokens, typically JWTs (JSON Web Tokens), to carry authorization data with each request. This allows services to enforce policies locally without relying on a central service.&lt;/p&gt;

&lt;p&gt;A JWT is a compact, secure token composed of three parts: &lt;em&gt;Header.Payload.Signature&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Header: Specifies the token type and signing algorithm.&lt;/li&gt;
&lt;li&gt;Payload: Contains user identity, roles, permissions, and other claims.&lt;/li&gt;
&lt;li&gt;Signature: Verifies token integrity using cryptographic keys.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since JWTs contains all required information, each microservice can validate and authorize requests independently, improving scalability and fault isolation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gfpeqaje5bny6x93ji7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gfpeqaje5bny6x93ji7.png" alt="Decentralized Authentication Flow" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Authenticate user via &lt;code&gt;/login&lt;/code&gt; then IdP as explained before.&lt;/li&gt;
&lt;li&gt;IdP authenticates user and issues a JWT containing identity and permissions.&lt;/li&gt;
&lt;li&gt;User requests a &lt;code&gt;/checkout&lt;/code&gt; with JWT.&lt;/li&gt;
&lt;li&gt;API gateway validates JWT and passes it to the downstream services.&lt;/li&gt;
&lt;li&gt;Order service checks JWT for &lt;code&gt;read:order&lt;/code&gt; and &lt;code&gt;write:order&lt;/code&gt; permissions.&lt;/li&gt;
&lt;li&gt;Payment service checks JWT for &lt;code&gt;write:payment&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Inventory service checks JWT for &lt;code&gt;write:inventory&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If no API gateway used, each microservice should validate the JWT.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  JWT Considerations
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validation&lt;/strong&gt;: Every service must validate the JWT.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public Key Distribution&lt;/strong&gt;: Services need the public key to validate JWT signatures. Use JWKS endpoints, service mesh integration, or secrets managers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Size&lt;/strong&gt;: Keep tokens small by including only necessary claims. Large tokens can exceed header size limits. Do extra calls for additional details.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request-Scoped vs. Session-Scoped Tokens&lt;/strong&gt;: Use session-scoped tokens for general-purpose, longer-lived access, and request-scoped tokens for short-lived, narrowly scoped operations. Request-scoped tokens enforce least privilege and reduce risk if leaked.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Data in Transit
&lt;/h2&gt;

&lt;p&gt;Data in transit refers to information actively moving between services, across the internet, internal networks, or within distributed systems. This includes API calls, service-to-service communication, or any data exchanged over a network.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Protect Data in Transit
&lt;/h3&gt;

&lt;p&gt;When services communicate over networks, four major risks arise:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o5llj6czbzcmdjaqxdz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o5llj6czbzcmdjaqxdz.png" alt="Data in Transit  - Risks" width="706" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Observation&lt;/strong&gt; - &lt;em&gt;Can attackers see your data?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Unencrypted traffic can be intercepted and read. Leaks PII, credit card info, internal APIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Manipulation&lt;/strong&gt; - &lt;em&gt;Can attackers modify your data?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Intercepted data can be altered before reaching its destination. Alters payments, injects malicious payloads, breaks logic.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Access&lt;/strong&gt; - &lt;em&gt;Can attackers reach your endpoints?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Exposed services can be directly hit. Bypasses checks, hits internal APIs, performs actions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Impersonation&lt;/strong&gt; - &lt;em&gt;Can attackers pretend to be your services?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without identity checks, attackers act as legit services. Enables MITM attacks, fake data, unauthorized access.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  TLS vs Mutual TLS
&lt;/h3&gt;

&lt;p&gt;To secure data in transit, systems rely on Transport Layer Security (TLS) or in more secure environments, Mutual TLS (mTLS). Both are cryptographic protocols that encrypt communication, but differ in how they authenticate the parties involved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ficeh66req5ipm5subwex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ficeh66req5ipm5subwex.png" alt="TLS vs Mutual TLS" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLS&lt;/strong&gt;: Encrypts data in transit and authenticates the server, but the client is not verified during the handshake. It is the foundation of secure communication on the internet and internal systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observation: Data is encrypted, preventing attackers from reading it in transit.&lt;/li&gt;
&lt;li&gt;Manipulation: Integrity checks reject altered data.&lt;/li&gt;
&lt;li&gt;Impersonation: The server proves its identity via certificate, the client isn't verified.&lt;/li&gt;
&lt;li&gt;Access: Any client can connect. TLS does not authenticate the client. Access control must be handled at the application layer using tokens, keys, or credentials.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mutual TLS&lt;/strong&gt;: mTLS builds on TLS by requiring both the client and server to present valid certificates, enforcing mutual authentication during the handshake.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observation: Data is encrypted on both ends.&lt;/li&gt;
&lt;li&gt;Manipulation: Integrity checks reject altered data.&lt;/li&gt;
&lt;li&gt;Impersonation: Both the client and server prove their identity using certificates.&lt;/li&gt;
&lt;li&gt;Access: Only clients with valid certificates can connect, enforcing access before the application layer unlike TLS.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Application-Layer Protocols with TLS/mTLS
&lt;/h3&gt;

&lt;p&gt;TLS and mTLS secure data as it moves over the network, but they’re applied through the protocols your services actually use to communicate with each other in a distributed environment.&lt;/p&gt;

&lt;p&gt;Most of these are application-layer protocols built on top of TCP. Here are some of the most commonly used in modern systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;HTTPS (HTTP over TLS)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The standard for web and API communication. Built on HTTP and secured by TLS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;gRPC&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A high-performance communication framework that runs on HTTP2 and supports TLS and mTLS natively. Suitable for service-to-service communication.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Message Brokers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Systems like Kafka or RabbitMQ support TLS for client-to-broker and broker-to-broker communication.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Custom Protocols&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Any custom protocol built on TCP can be secured by layering TLS over the connection.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Protection Mechanisms
&lt;/h3&gt;

&lt;p&gt;To ensure that communication across your systems is private and authenticated, implement the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Encrypt All Internal and External Traffic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All external and internal services should communicate over HTTPS or TLS, ensuring sensitive data remains protected at every hop. Suitable for zero trust.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Avoid Terminating HTTPS Too Early&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TLS should not be terminated at the gateway or load balancer. Internal traffic must also remain encrypted to prevent exposure inside the network. Even better, use separate public/internal certificates.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xh006yfdwv5mw7vm7qq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xh006yfdwv5mw7vm7qq.png" alt="HTTPS public and internal certs" width="663" height="202"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use Mutual TLS (mTLS)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enforce mTLS between services that require strong identity validation. It allows you to reject unauthorized clients before the request even reaches the application layer. Makes sense with zero trust architecture.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Automate with a Service Mesh&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Managing TLS and mTLS manually at scale is difficult. Service meshes automate certificate issuance, renewal, and rotation, handling encryption and authentication transparently across all traffic. We will cover Service Meshes in more details later.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Apply the Same Standards to Non-HTTP Protocols&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TLS and mTLS aren’t just for HTTP. Protocols like gRPC, message brokers, and custom protocols also support them and should be secured at the transport layer.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Data at Rest
&lt;/h2&gt;

&lt;p&gt;Data at rest refers to any stored data (inside databases, file systems, backups, or logs) on disk, SSDs, or cloud storage. Unlike data in transit, it's not moving between systems but sits idle, waiting to be accessed.&lt;/p&gt;

&lt;p&gt;In microservices, data is spread across many services, increasing the attack surface. That’s why defense in depth is critical, even with strong network and API security, assume breaches can happen and make sure stolen data is useless.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Data to Protect
&lt;/h3&gt;

&lt;p&gt;Not all data is equally sensitive. Start by classifying sensitive data per service or database. Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PII (Personally Identifiable Information): names, emails, addresses&lt;/li&gt;
&lt;li&gt;Authentication credentials: hashed passwords, session tokens, API keys&lt;/li&gt;
&lt;li&gt;Payment data: credit card info, billing history&lt;/li&gt;
&lt;li&gt;Business data: pricing models, analytics, trade secrets&lt;/li&gt;
&lt;li&gt;Logs: which may unintentionally contain PII or secrets&lt;/li&gt;
&lt;li&gt;Backups: often overlooked, but contain full data snapshots&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Protect Data at Rest
&lt;/h3&gt;

&lt;p&gt;Protect sensitive data with encryption and minimize data exposure:&lt;/p&gt;

&lt;h4&gt;
  
  
  Encryption Strategies
&lt;/h4&gt;

&lt;p&gt;Encrypt sensitive data early, decrypt only when needed, and never store plain text:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full Disk Encryption: Encrypt the entire disk. Simple to implement, but doesn't protect data if the app is compromised.&lt;/li&gt;
&lt;li&gt;Transparent Data Encryption (TDE): Supported by many databases. Automatically encrypts data files and logs.&lt;/li&gt;
&lt;li&gt;Column-Level Encryption: Encrypt specific database columns.&lt;/li&gt;
&lt;li&gt;Application-Level Encryption: Encrypt data in code before storing it. The app controls this offering the most control but adds complexity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid implementing your own encryption algorithms, use proven and maintained libraries. Keep them updated and track vulnerabilities.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Management
&lt;/h4&gt;

&lt;p&gt;Encryption is ineffective without proper key management. If you store the encryption keys alongside the data they protect, an attacker gets both.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a dedicated key management system (KMS) or secret manager&lt;/li&gt;
&lt;li&gt;Separate data and key storage&lt;/li&gt;
&lt;li&gt;Restrict key access by service identity and role&lt;/li&gt;
&lt;li&gt;Rotate keys regularly, and make sure expired keys are removed&lt;/li&gt;
&lt;li&gt;Audit key usage in production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools like HashiCorp Vault, AWS KMS, Azure Key Vault, and Google Cloud KMS help automate and secure key management.&lt;/p&gt;

&lt;h4&gt;
  
  
  Data Minimization
&lt;/h4&gt;

&lt;p&gt;The less data you collect and retain, the less you have to protect, and the less an attacker can steal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collect only what's necessary for your service to function&lt;/li&gt;
&lt;li&gt;Avoid storing sensitive data long-term unless required&lt;/li&gt;
&lt;li&gt;Mask, hash, or anonymize data when full details aren’t needed&lt;/li&gt;
&lt;li&gt;Regularly delete stale or unused data&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;Observability gives you visibility into how your system behaves, critical in microservices where many services interact. It doesn't just help with spotting bugs, but also helps you detect threats, misconfigurations, or breaches by collecting telemetry that includes logs, metrics, and traces - the three pillars of observability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logs&lt;/strong&gt; - Timestamped event records with structured format for easy search and correlation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics&lt;/strong&gt; - Aggregated data like failure rates, latency, auth attempts used for alerting and trend tracking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traces&lt;/strong&gt; - Show the path of a request across services, to spot abnormal access or performance bottlenecks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To collect and analyze these, teams often use tools like Prometheus, Grafana, Jaeger, and OpenTelemetry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Authentication/Authorization monitoring - Track failed logins, permission failures. Alert on unusual spikes or suspicious patterns.&lt;/li&gt;
&lt;li&gt;Internal movement detection - Observe unexpected service-to-service calls to prevent internal compromise.&lt;/li&gt;
&lt;li&gt;Incident audits and compliance - Maintain logs and metrics to trace issues and support regulatory requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use structured, centralized logs (like JSON, ELK stack) with correlation IDs to trace requests across services.&lt;/li&gt;
&lt;li&gt;Track key health and security metrics, and watch for anything unusual.&lt;/li&gt;
&lt;li&gt;Combine logs, metrics, and traces under a unified system to spot problems faster.&lt;/li&gt;
&lt;li&gt;Build observability into your system from the start, not after things break.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Service Meshes
&lt;/h2&gt;

&lt;p&gt;A service mesh is an infrastructure layer that manages secure communication between microservices without requiring code changes in each service. It simplifies certificate management, enforces strong service identities, and ensures encrypted traffic. Widely used solutions include Istio, Linkerd, and Consul Connect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;p&gt;Let's walk through the main components of a service mesh and how a request flows through it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6274qlpzk8iergqmau3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6274qlpzk8iergqmau3.png" alt="Service Mesh" width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Plane&lt;/strong&gt;: Composed of sidecar proxies deployed alongside each service. These proxies handle all service-to-service communication (routing, retries, mTLS encryption, and telemetry) without modifying services code. The service communicates locally with its sidecar over plain HTTP, while sidecars handle all outbound/inbound network communication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control Plane&lt;/strong&gt;: A centralized component configures proxies, applies policies, manages certificates, and aggregates telemetry.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;User sends a &lt;code&gt;/checkout&lt;/code&gt; request via Ingress Gateway:&lt;br&gt;
The request enters the mesh through the gateway, which terminates TLS and handles external-to-mesh traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ingress Gateway validates and forwards to Order service sidecar:&lt;br&gt;
The gateway validates external identity (JWT, OAuth), applies mesh-level policies (rate limits, IP restrictions), and then establishes mTLS with Order service sidecar using certificates issued by the mesh control plane.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Order service sidecar forwards to local Order service instance:&lt;br&gt;
Order service sidecar receives the request and forwards it to the local Order service instance over HTTP on localhost&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sidecar-to-sidecar communication between Order and Payment services:&lt;br&gt;
Order service sends &lt;code&gt;/payment&lt;/code&gt; request to its sidecar, which establishes mTLS connection with Payment service sidecar, and then Payment sidecar forwards request to the local Payment service. This process repeats for all other internal services calls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Telemetry is captured throughout:&lt;br&gt;
Each sidecar emits telemetry, which the control plane aggregates and analyzes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Security in Service Meshes
&lt;/h3&gt;

&lt;p&gt;As seen, service meshes enhance security by default. The following features enable secure communication and consistent policy enforcement across services.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Automatic mTLS between services&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All service-to-service traffic is encrypted using mTLS, enforced by sidecar proxies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Centralized certificate management&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Certificates are automatically issued, rotated, and revoked by the control plane.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Service identity and authentication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each service gets a cryptographic identity, with authorization policies enforced by control plane.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fine-grained authorization policies&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sidecars enforce detailed access rules, controlling which services can communicate.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Centralized JWT validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Offloads token validation from service code to sidecars.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User identity propagation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Meshes can forward external user identities (from OAuth or SSO) across service calls.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Zero trust enforcement&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All connections are authenticated, authorized, and encrypted. No implicit trust.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Observability and resilience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Built-in telemetry, retries, circuit-breaking, and load balancing.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use a Service Mesh
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;In large microservice architectures with too many services&lt;/li&gt;
&lt;li&gt;Ideal for zero-trust environments&lt;/li&gt;
&lt;li&gt;When strict security policies require mTLS everywhere&lt;/li&gt;
&lt;li&gt;Polyglot environments where consistent security is hard to maintain manually&lt;/li&gt;
&lt;li&gt;In multi-team or multi-tenant environments requiring strong isolation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;Securing distributed systems requires designing with resilience and layered defenses, knowing that failures and breaches can happen. The key is to assume compromise and build security controls that work together smoothly.&lt;/p&gt;

&lt;p&gt;We’ve discussed core principles (least privilege, defense in depth, and automation) and examined how these translate into practical and scalable protections like encryption, zero trust, observability, and service mesh integration.&lt;/p&gt;

&lt;p&gt;No single control is enough on its own. Strong security comes from a consistent use of these strategies across the entire architecture, early and continuously, making sure that when one layer weakens, others keep the system safe and reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.oreilly.com/library/view/building-microservices-2nd/9781492034018/" rel="noopener noreferrer"&gt;Building Microservices Book - Sam Newman&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.schneier.com/academic/archives/1999/12/attack_trees.html" rel="noopener noreferrer"&gt;Attack Trees - Bruce Schneier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Threat_Modeling_Cheat_Sheet.html" rel="noopener noreferrer"&gt;Threat Modeling Cheat Sheet - OWASP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://snyk.io/articles/infrastructure-as-code-iac/" rel="noopener noreferrer"&gt;Infrastructure as Code - Snyk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://csrc.nist.gov/pubs/sp/800/207/final" rel="noopener noreferrer"&gt;Zero Trust Architecture - NIST&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jwt.io/" rel="noopener noreferrer"&gt;JWT Web Tokens&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/docs/security/encryption-in-transit" rel="noopener noreferrer"&gt;Encryption in Transit - Google Cloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://istio.io/latest/about/service-mesh/" rel="noopener noreferrer"&gt;Service Mesh - Istio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/concepts/observability-primer/" rel="noopener noreferrer"&gt;Observability - OpenTelemetry&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>microservices</category>
      <category>webdev</category>
      <category>security</category>
    </item>
    <item>
      <title>Designing Distributed Systems: Sagas and Trade-Offs</title>
      <dc:creator>Randa</dc:creator>
      <pubDate>Thu, 05 Jun 2025 13:36:23 +0000</pubDate>
      <link>https://dev.to/randazraik/designing-distributed-systems-sagas-and-trade-offs-2o0p</link>
      <guid>https://dev.to/randazraik/designing-distributed-systems-sagas-and-trade-offs-2o0p</guid>
      <description>&lt;p&gt;This article breaks down the three core forces behind designing distributed systems (communication, coordination and consistency) and shows how they combine into eight saga patterns. You’ll see how each pattern works, where it fits, and what trade-offs come with it. Whether designing a new workflow or improving old ones, this guide helps you reason through the options and make informed design decisions.&lt;/p&gt;

&lt;p&gt;Throughout this article, we’ll explain things using an order checkout flow example.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Three Forces of Service Interaction&lt;/li&gt;
&lt;li&gt;
Communication

&lt;ul&gt;
&lt;li&gt;Synchronous Communication&lt;/li&gt;
&lt;li&gt;Asynchronous Communication&lt;/li&gt;
&lt;li&gt;Choosing Between Synchronous and Asynchronous&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Coordination

&lt;ul&gt;
&lt;li&gt;Orchestration Pattern&lt;/li&gt;
&lt;li&gt;Choreography Pattern&lt;/li&gt;
&lt;li&gt;Choosing Between Orchestration and Choreography&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Consistency

&lt;ul&gt;
&lt;li&gt;ACID vs BASE&lt;/li&gt;
&lt;li&gt;Atomic Transactions&lt;/li&gt;
&lt;li&gt;Eventual Transactions&lt;/li&gt;
&lt;li&gt;Choosing Between Atomic and Distributed Transactions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Saga Patterns

&lt;ul&gt;
&lt;li&gt;Epic Saga&lt;/li&gt;
&lt;li&gt;Phone Tag Saga&lt;/li&gt;
&lt;li&gt;Fairy Tale Saga&lt;/li&gt;
&lt;li&gt;Time Travel Saga&lt;/li&gt;
&lt;li&gt;Fantasy Fiction Saga&lt;/li&gt;
&lt;li&gt;Horror Story Saga&lt;/li&gt;
&lt;li&gt;Parallel Saga&lt;/li&gt;
&lt;li&gt;Anthology Saga&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Wrap-Up&lt;/li&gt;
&lt;li&gt;Further Reading&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Three Forces of Service Interaction
&lt;/h2&gt;

&lt;p&gt;Software has evolved from &lt;strong&gt;monoliths&lt;/strong&gt; (one deployable, one database) to &lt;strong&gt;SOA&lt;/strong&gt; (multiple deployables, often one shared database) and finally to &lt;strong&gt;microservices&lt;/strong&gt; (each service owns its data and deploys on its own).&lt;/p&gt;

&lt;p&gt;Splitting a system into separate services with the right modularity and granularity is hard, but getting those services to work together is even harder. Business requests like placing an order often span multiple services (&lt;strong&gt;Order&lt;/strong&gt;, &lt;strong&gt;Inventory&lt;/strong&gt;, &lt;strong&gt;Payment&lt;/strong&gt;, &lt;strong&gt;Shipping&lt;/strong&gt;) requiring coordination and introducing new design decisions and trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fff99dejuyyf5z9zzh8q9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fff99dejuyyf5z9zzh8q9.png" alt="Service Interaction Forces" width="664" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To make sense of those trade-offs, Mark Richards and Neal Ford introduced in their &lt;a href="https://www.oreilly.com/library/view/software-architecture-the/9781492086888/" rel="noopener noreferrer"&gt;book&lt;/a&gt; a useful way to think about service interactions. They identified &lt;strong&gt;three forces&lt;/strong&gt; that show up every time services need to work together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Communication&lt;/strong&gt; - How does one service talk to another?

&lt;ul&gt;
&lt;li&gt;Synchronous (like REST or gRPC): Caller waits for a response.&lt;/li&gt;
&lt;li&gt;Asynchronous (messaging or events): Caller sends a message and moves on.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordination&lt;/strong&gt; - Who drives the workflow?

&lt;ul&gt;
&lt;li&gt;Orchestrator: Central service tells each service what to do.&lt;/li&gt;
&lt;li&gt;Choreography: Services listen and react to events independently.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; - When must the data be correct?

&lt;ul&gt;
&lt;li&gt;Atomic: All-or-nothing, like a traditional transaction.&lt;/li&gt;
&lt;li&gt;Eventual: Some inconsistency is fine, resolved over time.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These forces trade off against each other. Atomic consistency leans on sync calls and orchestration. Async flows favor eventual consistency and choreography. Most systems mix styles, like orchestration for payments, choreography for notifications.&lt;/p&gt;

&lt;p&gt;Next, we'll explore each of these forces in more detail, then show how they come together in &lt;strong&gt;eight saga patterns&lt;/strong&gt;, practical approaches to handling distributed transactions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Communication
&lt;/h2&gt;

&lt;p&gt;When two services need to coordinate a task, how they communicate is just as critical as what they exchange. This choice directly impacts system responsiveness, fault tolerance, scalability, and the degree of coupling between services.&lt;/p&gt;

&lt;p&gt;The fundamental communication styles are &lt;strong&gt;synchronous&lt;/strong&gt; and &lt;strong&gt;asynchronous&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Synchronous Communication
&lt;/h3&gt;

&lt;p&gt;In synchronous communication, one service sends a request to another service and waits for the response before continuing. This is a blocking interaction, the caller is stalled until it hears back. This pattern is common in protocols like HTTP/REST and gRPC.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgcv8yfyofdpb66t0j2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgcv8yfyofdpb66t0j2n.png" alt="Synchronous Communication" width="756" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The frontend sends a &lt;code&gt;POST /checkout&lt;/code&gt; to Order Service.&lt;/li&gt;
&lt;li&gt;Order Service calls Payment Service and waits for it to confirm the charge.&lt;/li&gt;
&lt;li&gt;Once payment is confirmed, it calls Inventory Service to reserve stock.&lt;/li&gt;
&lt;li&gt;Inventory Service calls Shipping Service to arrange delivery after successful reservation.&lt;/li&gt;
&lt;li&gt;Only once all steps succeed, Order Service returns "Order confirmed." to the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We now have &lt;strong&gt;tight temporal coupling&lt;/strong&gt;: all services must be online, responsive, and agree in real-time, or the whole system stalls.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Upsides&lt;/th&gt;
&lt;th&gt;Downsides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Immediate, deterministic feedback to the caller&lt;/td&gt;
&lt;td&gt;Lower availability, one service failure breaks the chain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simple control flow and debugging&lt;/td&gt;
&lt;td&gt;Tight coupling between services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fits user actions that &lt;em&gt;must&lt;/em&gt; finish now (login, payment)&lt;/td&gt;
&lt;td&gt;Requires resilience mechanisms (retries, timeouts, circuit breakers)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Asynchronous Communication
&lt;/h3&gt;

&lt;p&gt;In asynchronous communication, one service places a message on a queue and moves on without waiting for a response. This is a non-blocking interaction. The other service picks up the message when ready, often using a message broker like Kafka or RabbitMQ. This decouples services in time and allows for more parallelism.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faf18n65v04sjb4uuq0gi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faf18n65v04sjb4uuq0gi.png" alt="Asynchronous Communication" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The frontend sends a &lt;code&gt;POST /checkout&lt;/code&gt; to Order Service.&lt;/li&gt;
&lt;li&gt;Order Service saves the order and emits an &lt;code&gt;OrderPlaced&lt;/code&gt; event.&lt;/li&gt;
&lt;li&gt;Order Service immediately responds to the user: "Your order is being processed."&lt;/li&gt;
&lt;li&gt;Payment Service listens to that event, charges the card, then emits &lt;code&gt;PaymentCaptured&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Inventory Service sees &lt;code&gt;PaymentCaptured&lt;/code&gt;, reserves the stock, and emits &lt;code&gt;StockReserved&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Shipping Service sees &lt;code&gt;StockReserved&lt;/code&gt;, ships the item, and emits &lt;code&gt;OrderShipped&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Email Service sees &lt;code&gt;OrderShipped&lt;/code&gt; and sends the confirmation email.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No service blocks another, and messages queue safely while any service is down, but this also introduces &lt;strong&gt;eventual consistency&lt;/strong&gt;. We will talk about consistency in the next section.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Upsides&lt;/th&gt;
&lt;th&gt;Downsides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High availability: If the receiver is down, messages queue and are processed once it recovers&lt;/td&gt;
&lt;td&gt;No immediate feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Loose temporal coupling, highly resilient&lt;/td&gt;
&lt;td&gt;Eventual consistency, caller sees only "accepted"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High parallelism and scalability&lt;/td&gt;
&lt;td&gt;Requires extra infrastructure (brokers, tracing)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Choosing Between Synchronous and Asynchronous
&lt;/h3&gt;

&lt;p&gt;The choice depends on the trade-offs you're willing to make between responsiveness, reliability, and coupling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use synchronous communication when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The caller needs an immediate result (e.g. credit-card charge, login).&lt;/li&gt;
&lt;li&gt;The service's response directly controls what happens next.&lt;/li&gt;
&lt;li&gt;Dependencies are reliable and low-latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use asynchronous communication when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loose coupling and resilience matter more than speed.&lt;/li&gt;
&lt;li&gt;The task can be done later or retried (e.g., sending emails, logging, bulk imports).&lt;/li&gt;
&lt;li&gt;You need high throughput or resilience. Services need to keep working even if others are down.&lt;/li&gt;
&lt;li&gt;Services are independently deployable or might be temporarily unavailable.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Coordination
&lt;/h2&gt;

&lt;p&gt;When a business request spans multiple services, those services need to work in sync to get the job done. But who drives the workflow? Should one service take charge, or should each one act on its own? That's what coordination is all about.&lt;/p&gt;

&lt;p&gt;The coordination style you choose shapes everything, from how you handle errors to where state lives to how complex things get. There are two main patterns: &lt;strong&gt;orchestration&lt;/strong&gt; and &lt;strong&gt;choreography&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Orchestration Pattern
&lt;/h3&gt;

&lt;p&gt;A dedicated service (orchestrator) is in charge. It drives the flow by calling each participating service, waiting for their responses, and deciding what happens next. It also owns the workflow state, often storing it in a local table or event log (&lt;code&gt;CREATED&lt;/code&gt;, &lt;code&gt;PAID&lt;/code&gt;, &lt;code&gt;SHIPPED&lt;/code&gt;, etc.). This makes it easy to know exactly where a request stands.&lt;/p&gt;

&lt;h4&gt;
  
  
  Happy Path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5eupym0ganmyszo3nxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5eupym0ganmyszo3nxr.png" alt="Orchestration Pattern - Happy Path" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The frontend sends a &lt;code&gt;POST /checkout&lt;/code&gt; to the Orchestrator.&lt;/li&gt;
&lt;li&gt;The orchestrator calls Order Service (sync) to create the order.&lt;/li&gt;
&lt;li&gt;Then it calls Payment Service (sync) to charge the card.&lt;/li&gt;
&lt;li&gt;Then it calls Inventory Service (sync) to reserve the stock.&lt;/li&gt;
&lt;li&gt;Then it notifies Shipping Service (async) to ship the item.&lt;/li&gt;
&lt;li&gt;Then it notifies Email Service (async) to send confirmation.&lt;/li&gt;
&lt;li&gt;Finally, it responds to the user with "Order confirmed.".&lt;/li&gt;
&lt;li&gt;In each step the orchestrator updates the workflow state.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Failure Path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrytjnhufv9f1z18psv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrytjnhufv9f1z18psv0.png" alt="Orchestration Pattern - Failure Path" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment Service says "declined".&lt;/li&gt;
&lt;li&gt;The orchestrator updates workflow state to &lt;code&gt;FAILED_PAYMENT&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Then it asks Order Service to undo their changes - This is known as a compensating action.&lt;/li&gt;
&lt;li&gt;Then It asks Email Service to notify the user.&lt;/li&gt;
&lt;li&gt;Then it responds to the user with "Payment has failed".&lt;/li&gt;
&lt;li&gt;No extra communications are needed, the orchestrator already talks to every service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These examples illustrate the Fairy Tale Saga, we will talk about sagas later.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Upsides&lt;/th&gt;
&lt;th&gt;Downsides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single source of truth for progress and errors&lt;/td&gt;
&lt;td&gt;Extra network hops adds latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Central place for timeouts, retries, compensations&lt;/td&gt;
&lt;td&gt;Orchestrator can bottleneck or fail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Easier to reason about and unit-test complex flows&lt;/td&gt;
&lt;td&gt;Limits parallelism, steps are often serialized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Tighter coupling between orchestrator and service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Choreography Pattern
&lt;/h3&gt;

&lt;p&gt;Choreography works without a central service. Each service reacts to events and publishes its own events. Together, these event-driven reactions form the workflow. Since there's no orchestrator, managing state is trickier. Here are common options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Front Controller&lt;/strong&gt;: The first service in the chain (e.g. Order Service) tracks the state. Others report back. Easy to query, but adds responsibilities and coupling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless&lt;/strong&gt;: No service tracks workflow state. To know what happened, you query each service and reconstruct the state on the fly. Loose coupling, but lots of network chatter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stamp Coupling&lt;/strong&gt;: Instead of storing state, pass it along. Each service adds its progress to the shared message or event as it moves through the workflow. No extra queries, but messages get heavier.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Happy Path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshz2ra4xovvs2bc99kfj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshz2ra4xovvs2bc99kfj.png" alt="Choreography - Happy Path" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The frontend sends a &lt;code&gt;POST /checkout&lt;/code&gt; to Order Service.&lt;/li&gt;
&lt;li&gt;Order Service saves the order, emits &lt;code&gt;OrderPlaced&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Order Service returns immediately to the user "You order is being processed".&lt;/li&gt;
&lt;li&gt;Payment Service listens, charges the card, emits &lt;code&gt;PaymentCaptured&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Inventory Service listens, reserves the stock, emits &lt;code&gt;StockReserved&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Shipping Service hears &lt;code&gt;StockReserved&lt;/code&gt;, ships the item, emits &lt;code&gt;OrderShipped&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Email Service listens for &lt;code&gt;OrderShipped&lt;/code&gt; and sends confirmation to the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Failure Path x
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F97gplvaueb2z56g2cxnk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F97gplvaueb2z56g2cxnk.png" alt="Choreography - Failure Path" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shipping Service emits &lt;code&gt;OutOfStock&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Payment, Inventory and Order services listens to &lt;code&gt;OutOfStock&lt;/code&gt; to undo their changes.&lt;/li&gt;
&lt;li&gt;Email Service listens to &lt;code&gt;OutOfStock&lt;/code&gt; and notifies the user.&lt;/li&gt;
&lt;li&gt;New communication links are added each time you discover a new error path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These examples illustrate the Anthology Saga.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Upsides&lt;/th&gt;
&lt;th&gt;Downsides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High parallelism, steps run in parallel&lt;/td&gt;
&lt;td&gt;Debugging involves multiple logs and topics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Loose coupling, services scale independently&lt;/td&gt;
&lt;td&gt;No built-in global state, must design your own approach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Better fault-isolation, no single point of failure&lt;/td&gt;
&lt;td&gt;Error handling scatters across services&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Choosing Between Orchestration and Choreography
&lt;/h3&gt;

&lt;p&gt;Start with the workflow's priorities, then pick the style that matches.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex logic or many ways to fail? Orchestration wins. A single component tracks steps, rolls back work, and hides complexity from others.&lt;/li&gt;
&lt;li&gt;Need fast responses and high parallelism? Choreography fits. Each service does its job and moves on, letting the rest catch up through events.&lt;/li&gt;
&lt;li&gt;Want easy way to track the workflow status? Orchestration gives a single source of truth. With choreography, you'll need to reconstruct state from events.&lt;/li&gt;
&lt;li&gt;Worried about a single point of failure? Choreography removes the central brain at the cost of more scattered error handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most production systems mix the two. Keep orchestration for high-risk, money-moving steps such as payment and refunds, where clear control and fast rollback matter. Use choreography for high-volume, low-risk steps like sending emails, updating analytics, or syncing inventory, where speed and autonomy pay off.&lt;/p&gt;




&lt;h2&gt;
  
  
  Consistency
&lt;/h2&gt;

&lt;p&gt;Consistency is the guarantee (strong or weak) that when one service updates data, all other service will immediately or eventually see the same result.&lt;/p&gt;

&lt;p&gt;In a distributed system, as soon as a business request involves more than one service, you have to decide how much inconsistency you can tolerate between them, and for how long. Whether you aim for strict, all-or-nothing guarantees (atomic consistency) or let things settle over time (eventual consistency), your consistency strategy shapes how reliable, responsive, and maintainable your system really is.&lt;/p&gt;

&lt;p&gt;There are two ways for consistency: &lt;strong&gt;atomic consistency&lt;/strong&gt; and &lt;strong&gt;eventual consistency&lt;/strong&gt;. Before exploring these consistency styles, let's look at how consistency works in the monolith world.&lt;/p&gt;

&lt;h3&gt;
  
  
  ACID vs BASE
&lt;/h3&gt;

&lt;p&gt;Inside a single service with a single database the "order checkout" workflow is simple. A request starts and triggers a single transaction: insert the order row, reserve stock, charge the card, mark the order ready to ship. If the card step fails, the database rolls everything back. That comes from the four &lt;strong&gt;ACID&lt;/strong&gt; guarantees for transactions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atomicity: All-or-nothing. All updates commit or none do.&lt;/li&gt;
&lt;li&gt;Consistency: Business rules and constraints stay valid throughout the transaction.&lt;/li&gt;
&lt;li&gt;Isolation: During a transaction, other requests can't see its uncommitted changes.&lt;/li&gt;
&lt;li&gt;Durability: Once committed, it's permanent, a crash can't erase the data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Move the same workflow into four microservices (Order, Inventory, Payment and Shipping), each with its own database, and &lt;strong&gt;ACID&lt;/strong&gt; breaks. Order and Inventory commit, Payment times out, no global rollback, constraints drift, and partial updates leaks to users. &lt;strong&gt;ACID&lt;/strong&gt; only applies within one database connection.&lt;/p&gt;

&lt;p&gt;You could try a global XA transaction using 2PC, but it means extra network round-trips and long-held locks. The single coordinator can stall the system and kill availability, and every datastore must support the same XA protocol. Most modern teams decide the cost is too high.&lt;/p&gt;

&lt;p&gt;Instead, you swap &lt;strong&gt;ACID&lt;/strong&gt; for &lt;strong&gt;BASE&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic availability: Services respond quickly, even if data is temporarily inconsistent.&lt;/li&gt;
&lt;li&gt;Soft state: State may temporarily be incorrect or incomplete.&lt;/li&gt;
&lt;li&gt;Eventual consistency: Given retries, compensations or human help, the data will line up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;BASE&lt;/strong&gt; is a promise to converge, not a guarantee of instant correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Atomic Transactions
&lt;/h3&gt;

&lt;p&gt;If you want an ACID-like experience across services, you typically introduce a central service (orchestrator) that drives the whole workflow. It synchronously invokes each service, commits locally inside each one, and triggers compensating transactions to undo all work if something fails as if it never happened. A response is returned to the caller once all steps succeed or rollback completes.&lt;/p&gt;

&lt;h4&gt;
  
  
  Happy path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwom4rxx3d6bwe2a5s75.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwom4rxx3d6bwe2a5s75.png" alt="Atomic Transactions - Happy Path" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The frontend sends a &lt;code&gt;POST /checkout&lt;/code&gt; to the Orchestrator.&lt;/li&gt;
&lt;li&gt;The orchestrator calls Order, Payment, Inventory and Shipping services in sequence.&lt;/li&gt;
&lt;li&gt;Each service commits to its local database immediately with no failures.&lt;/li&gt;
&lt;li&gt;The orchestrator returns "Order confirmed." to the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Failure path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fliq457pivwp9rtm3y829.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fliq457pivwp9rtm3y829.png" alt="Atomic Transactions - Failure Path" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Order, Payment, and Inventory services have already committed.&lt;/li&gt;
&lt;li&gt;Shipping Service times-out.&lt;/li&gt;
&lt;li&gt;The orchestrator immediately issues three compensating transactions to undo the earlier steps.&lt;/li&gt;
&lt;li&gt;The orchestrator returns "Unable to ship" to the user once every compensation succeeds.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Points to watch
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;This gives you &lt;strong&gt;ACD&lt;/strong&gt; but no &lt;strong&gt;Isolation&lt;/strong&gt;, other requests can see intermediate states before compensation finishes, dirty reads can happen, or other requests might overwrite in-progress changes.&lt;/li&gt;
&lt;li&gt;Compensation itself might fail (e.g. refund gateway offline), you need retry or manual dashboards.&lt;/li&gt;
&lt;li&gt;Side-effects (email, analytics) already triggered may not be reversible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the Epic Saga, one way to handle the atomic transactions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Upsides&lt;/th&gt;
&lt;th&gt;Downsides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data consistency and invariants are restored immediately once compensations finish&lt;/td&gt;
&lt;td&gt;Lower availability, response time grows with each hop and compensation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User sees one clear success/failure result&lt;/td&gt;
&lt;td&gt;Orchestrator is a coordination hot-spot and potential bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic rollback logic lives in one place&lt;/td&gt;
&lt;td&gt;Isolation is gone, other requests may see half-done state until compensation finishes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Eventual Transactions
&lt;/h3&gt;

&lt;p&gt;The more scalable alternative is to let each service act independently. Services commit changes locally, publish asynchronous events, return immediately, and rely on other services to react to these events in their own time. To handle failures, instead of trying to undo work immediately, they are managed through retries, fallback states, or human intervention.&lt;/p&gt;

&lt;h4&gt;
  
  
  Happy Path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapopx65fkvb2vprzv2gb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapopx65fkvb2vprzv2gb.png" alt="Eventual Transactions - Happy Path" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The frontend sends a &lt;code&gt;POST /checkout&lt;/code&gt; to Order Service.&lt;/li&gt;
&lt;li&gt;Order Service saves and commits the order, emits &lt;code&gt;OrderCreated&lt;/code&gt; event.&lt;/li&gt;
&lt;li&gt;Order Service responds to the user immediately "You order is being processed".&lt;/li&gt;
&lt;li&gt;Payment Service processes &lt;code&gt;OrderCreated&lt;/code&gt;, charges card and emits &lt;code&gt;PaymentCaptured&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Inventory Service processes &lt;code&gt;PaymentCaptured&lt;/code&gt;, reserves stock and emits &lt;code&gt;StockReserved&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Shipping Service hears &lt;code&gt;StockReserved&lt;/code&gt;, ships the item and emits &lt;code&gt;OrderShipped&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Email Service hears &lt;code&gt;OrderShipped&lt;/code&gt; and notifies the user.&lt;/li&gt;
&lt;li&gt;Order Service hears &lt;code&gt;OrderShipped&lt;/code&gt; and mark the order as &lt;code&gt;FULLFILLED&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Failure Path
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokjogo052a6ymdq1yc2s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokjogo052a6ymdq1yc2s.png" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment Service declines the charge and emits &lt;code&gt;PaymentFailed&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Order Service hears &lt;code&gt;PaymentFailed&lt;/code&gt;, marks order as &lt;code&gt;PAYMENT_FAILED&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;From here, we have several recovery paths:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retry Policy&lt;/strong&gt;: Payment Service retries the charge and emits &lt;code&gt;PaymentCaptured&lt;/code&gt; or &lt;code&gt;PaymentFailed&lt;/code&gt; again.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human Intervention&lt;/strong&gt;: A support dashboard highlights stuck orders with &lt;code&gt;PAYMENT_FAILED&lt;/code&gt; for a human to manually fix or retry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback State&lt;/strong&gt;: System gives up and issues compensating transactions to clean-up. Here Order Service hears &lt;code&gt;PaymentFailed&lt;/code&gt;, marks order as &lt;code&gt;CANCELLED&lt;/code&gt; and emails users about this issue. Similar to the example in Choreography - Failure Path.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This is the Anthology Saga, one way to handle the eventual transactions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Points to Watch
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Decide where the status lives (row column, side-car table, or event stream). Splitting state across multiple places invites race conditions.&lt;/li&gt;
&lt;li&gt;Idempotency is crucial. Every step may be retried. Services must handle duplicate events without breaking state.&lt;/li&gt;
&lt;li&gt;For every non-terminal failure state (i.e. &lt;code&gt;PAYMENT_FAILED&lt;/code&gt;), identify who's responsible for fixing it and how (automatic retry, human help, or another event).&lt;/li&gt;
&lt;li&gt;Failures that can't recover should be moved to a holding queue or flagged for investigation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Upsides&lt;/th&gt;
&lt;th&gt;Downsides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High availability&lt;/td&gt;
&lt;td&gt;Short windows of data drift. Dashboards, users, and code must tolerate it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Services scale and deploy independently&lt;/td&gt;
&lt;td&gt;Requires retry logic, compensating transactions, or human help to clean-up&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High throughput, no tight transaction boundaries&lt;/td&gt;
&lt;td&gt;Debugging spans multiple event hops&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Choosing Between Atomic and Distributed Transactions
&lt;/h3&gt;

&lt;p&gt;The choice depends on the trade-offs you're willing to make between responsiveness, level of consistency, or effort to recover from failure. Ask yourself a few questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How strict is consistency?
If any mismatch causes serious issues (money, security), atomic wins. If delay is fine, eventual scales better.&lt;/li&gt;
&lt;li&gt;Can you undo steps?
Atomic needs safe rollbacks. If not possible, prefer retries or manual repair.&lt;/li&gt;
&lt;li&gt;Do users need fast responses?
Atomic blocks until all steps finish. Eventual responds fast, even if some parts run later.&lt;/li&gt;
&lt;li&gt;What's your fault tolerance?
Atomic isolates failure but can reduce availability. Distributed keeps moving, but errors may surface later.&lt;/li&gt;
&lt;li&gt;How autonomous are your services?
Atomic often requires orchestration. Distributed keeps services decoupled and event-driven.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most production systems combine atomic transactions for local operations with distributed, asynchronous messaging across services. Some steps might use synchronous calls for strong feedback, while others rely on eventual consistency and retries.&lt;/p&gt;




&lt;h2&gt;
  
  
  Saga Patterns
&lt;/h2&gt;

&lt;p&gt;We've already explored different ways to handle business workflows that span multiple services, these known as sagas. A saga breaks the workflow into local transactions, each owned by one service. After each step commits, the next is triggered via a call or an event, depending on the communication style. If any step fails, the saga issues compensations or moves into an error‐handling path, depending on the consistency and coordination model.&lt;/p&gt;

&lt;p&gt;There are eight saga patterns. They're simply every possible combination of the three forces we've been using throughout the article. Mark Richards and Neal Ford gave these sagas memorable names:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern name&lt;/th&gt;
&lt;th&gt;Communication&lt;/th&gt;
&lt;th&gt;Consistency&lt;/th&gt;
&lt;th&gt;Coordination&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Epic Saga&lt;/td&gt;
&lt;td&gt;synchronous&lt;/td&gt;
&lt;td&gt;atomic&lt;/td&gt;
&lt;td&gt;orchestrated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phone-Tag Saga&lt;/td&gt;
&lt;td&gt;synchronous&lt;/td&gt;
&lt;td&gt;atomic&lt;/td&gt;
&lt;td&gt;choreographed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fairy-Tale Saga&lt;/td&gt;
&lt;td&gt;synchronous&lt;/td&gt;
&lt;td&gt;eventual&lt;/td&gt;
&lt;td&gt;orchestrated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time-Travel Saga&lt;/td&gt;
&lt;td&gt;synchronous&lt;/td&gt;
&lt;td&gt;eventual&lt;/td&gt;
&lt;td&gt;choreographed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fantasy-Fiction Saga&lt;/td&gt;
&lt;td&gt;asynchronous&lt;/td&gt;
&lt;td&gt;atomic&lt;/td&gt;
&lt;td&gt;orchestrated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horror-Story Saga&lt;/td&gt;
&lt;td&gt;asynchronous&lt;/td&gt;
&lt;td&gt;atomic&lt;/td&gt;
&lt;td&gt;choreographed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel Saga&lt;/td&gt;
&lt;td&gt;asynchronous&lt;/td&gt;
&lt;td&gt;eventual&lt;/td&gt;
&lt;td&gt;orchestrated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthology Saga&lt;/td&gt;
&lt;td&gt;asynchronous&lt;/td&gt;
&lt;td&gt;eventual&lt;/td&gt;
&lt;td&gt;choreographed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Dotted boxes show atomic consistency. No box means eventual consistency.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Epic Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Synchronous • Atomic • Orchestrated&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqy8qwimxn22uhsn8zi87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqy8qwimxn22uhsn8zi87.png" alt="Epic Saga" width="800" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This pattern enforces all-or-nothing behavior via an orchestrator that makes blocking, synchronous calls and triggers compensating actions on failure. This makes the system behaves as a monolith.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The orchestrator receives the request and manages the workflow.&lt;/li&gt;
&lt;li&gt;It calls each service one after the other, waiting for each to respond.&lt;/li&gt;
&lt;li&gt;If all services succeed, the saga completes successfully.&lt;/li&gt;
&lt;li&gt;If any step fails, the orchestrator triggers compensating actions in reverse order.&lt;/li&gt;
&lt;li&gt;Guarantees atomicity but suffers from bottlenecks and tight coupling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose Epic Saga when you need all-or-nothing behavior and the workflow is relatively short-lived. It’s a familiar approach, but should be avoided for long chains or highly distributed systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;Very High&lt;/td&gt;
&lt;td&gt;Sync calls, atomicity, and an orchestrator maximize coupling between services.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Sync calls and rollback logic is centralised in the orchestrator.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;One service failure aborts the whole flow. All-or-nothing behavior will affect responsiveness.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;Orchestrator and atomicity coupling create bottlenecks and limit scaling.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phone Tag Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Synchronous • Atomic • Choreographed&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcuf1m49vu52pq895g3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcuf1m49vu52pq895g3i.png" alt="Phone Tag Saga" width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A fully choreographed version of the Epic Saga where services call each other in a strict order and handle their own rollback logic.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The initiating service starts the chain and calls the next service synchronously.&lt;/li&gt;
&lt;li&gt;Each service commits locally and calls the next service.&lt;/li&gt;
&lt;li&gt;If any step fails, services must independently send compensating messages upstream.&lt;/li&gt;
&lt;li&gt;No orchestrator exists, each service has coordination and rollback logic which increases complexity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is only better for simple and linear workflows that rarely fail. Many error handling paths and conditional flows make the code unmanageable, best treated as a transitional or legacy-friendly model.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Atomicity and sync calls cause high coupling, but distributed coordination makes it less coupled than Epic Saga.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Each service has coordination and rollback logic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Error handling without an orchestrator requires callbacks and multiple round-trips.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Sync calls and atomicity prevent parallelism.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Fairy Tale Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Synchronous • Eventual • Orchestrated&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjb0m2q6dscavd3ytm9ye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjb0m2q6dscavd3ytm9ye.png" alt="Fairy Tale Saga" width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Orchestration with synchronous calls, but each service manages its own commit, consistency is achieved eventually, not atomically.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The orchestrator sends synchronous calls to services in sequence.&lt;/li&gt;
&lt;li&gt;Each service commits its changes independently.&lt;/li&gt;
&lt;li&gt;The orchestrator listens for success or failure after each step.&lt;/li&gt;
&lt;li&gt;If any step fails, the data will eventually line up.&lt;/li&gt;
&lt;li&gt;The orchestrator still can trigger compensating actions but they won't be part of an active transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ideal for business processes where a central controller is valuable and consistency can be delayed. Think of checkout, signup, or account setup flows that need visibility and control but don’t require strict atomicity, which makes this saga popular and common with many microservices architectures.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Uses an orchestrator and sync calls, but avoids global transactions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;Sync calls and rollback logic are centralised in the orchestrator, also consistency is loosened.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Still blocks on each call, but allows for eventual consistency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Better scalability due to lack of transactional coupling.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Time Travel Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Synchronous • Eventual • Choreographed&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvrk815krfwaery4je9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvrk815krfwaery4je9c.png" alt="Time Travel Saga" width="800" height="142"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fully decentralized version of the Fairy Tale Saga. Services call each other in sequence and own all workflow logic, including failures.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A service begins and completes its local transaction.&lt;/li&gt;
&lt;li&gt;It then calls the next service synchronously and passes control forward.&lt;/li&gt;
&lt;li&gt;Each service continues this chain until the workflow ends.&lt;/li&gt;
&lt;li&gt;If an error occurs, each service must handle its own compensations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Best for throughput-focused, one-way and linear flows, such as ETL pipelines and simple chains where each step progresses naturally, independently and in-order.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;No orchestrator and no atomicity reduce coupling, but sync calls retain some coupling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No transactional logic, services handle only local logic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Still blocks on each call, but no central bottleneck means fewer hops.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Choreographed flows with local commits scale well.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Fantasy Fiction Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Asynchronous • Atomic • Orchestrated&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxhz9aalf3uqva05l5ku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxhz9aalf3uqva05l5ku.png" alt="Fantasy Fiction Saga" width="800" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An orchestrated saga that attempts atomic coordination over asynchronous calls, introducing heavy complexity in managing order and state.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The orchestrator sends asynchronous commands to each participating service.&lt;/li&gt;
&lt;li&gt;Services perform local transactions and respond back but out-of-order.&lt;/li&gt;
&lt;li&gt;The orchestrator tracks progress and handles pending state.&lt;/li&gt;
&lt;li&gt;On failure, it issues compensating commands asynchronously.&lt;/li&gt;
&lt;li&gt;Coordination logic must handle race conditions and retries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only consider this pattern when atomic guarantees are a must and you need some parallelism or better performance. It is hard to get it right due to the challenges of managing transactional consistency asynchronously, it requires advanced orchestration and observability tooling.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Atomic guarantees demand coordination, async makes timing harder.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Orchestrator must manage out-of-order events, rollbacks, retries, and partial states.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Async compensations mean long recovery paths, and one service failure affects the whole flow.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High scale is still challenging with atomic services, async alone can't offset coordination bottlenecks.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Horror Story Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Asynchronous • Atomic • Choreographed&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjfof7qyn46k1ews9e05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjfof7qyn46k1ews9e05.png" alt="Horror Story Saga" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most difficult model that tries to achieve atomic consistency with no orchestrator and only async messaging (the two loosest coupling factors). All services must coordinate rollbacks without global state.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Services exchange messages asynchronously and commit locally.&lt;/li&gt;
&lt;li&gt;No orchestrator so each service must track workflow state and handle compensation.&lt;/li&gt;
&lt;li&gt;Compensation logic must handle failures across out-of-order, possibly incomplete message chains.&lt;/li&gt;
&lt;li&gt;High risk of race conditions, cascading failures, and coordination errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Never use this pattern, it's considered a red flag, signaling accidental complexity or under-designed coordination. Use it if you truly require atomicity but cannot introduce orchestration due to organizational boundaries.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;No orchestrator helps loosen structure, but atomicity still enforces shared state constraints.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Very High&lt;/td&gt;
&lt;td&gt;Services must coordinate rollbacks asynchronously, tracking transaction state and order.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Async chatter to achieve atomicity hurts responsiveness.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Parallelism is possible with async calls. No orchestrator helps as well.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Parallel Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Asynchronous • Eventual • Orchestrated&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc20hxb2eor11q9gnrj2d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc20hxb2eor11q9gnrj2d.png" alt="Parallel Saga" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A scalable and resilient pattern where the orchestrator coordinates async service calls with eventual consistency, enabling high throughput.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The orchestrator sends async requests to all participating services.&lt;/li&gt;
&lt;li&gt;Services execute independently and manage their own commits.&lt;/li&gt;
&lt;li&gt;Results are returned asynchronously to the orchestrator.&lt;/li&gt;
&lt;li&gt;If the orchestrator receives a failure, it sends async messages to services to compensate for this failed change.&lt;/li&gt;
&lt;li&gt;Enables parallel execution and graceful recovery at scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Perfect for high-volume complex business flows, e.g., onboarding, order processing, subscription handling, where speed and observability matter more than atomic guarantees. Great balance of control, resilience, and performance.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No global transaction, services react to events, the orchestrator only sequences steps.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;The orchestrator's logic is simple due to low coupling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Fast responses, non-blocking flows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;No atomicity guarantee, services scale at their own pace.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Anthology Saga
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Asynchronous • Eventual • Choreographed&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmvztofkqdsl2ztonwlq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmvztofkqdsl2ztonwlq.png" alt="Anthology Saga" width="800" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most decoupled pattern: services communicate via events without orchestration, each maintaining its own state and reacting to changes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Services emit events upon completion of local work.&lt;/li&gt;
&lt;li&gt;Other services listen and react to those events asynchronously.&lt;/li&gt;
&lt;li&gt;Each service is responsible for its own transaction scope and compensation.&lt;/li&gt;
&lt;li&gt;No orchestrator or synchronous links, state is emergent from event flow.&lt;/li&gt;
&lt;li&gt;Maximizes scalability and autonomy at the cost of visibility and control.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose it when scale and service independence are priority. Ideal for data ingestion, analytics pipelines, or any process tolerant to loose consistency. Expect reduced observability, but maximum throughput and fault isolation. It's common in many microservices architectures.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coupling&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;No orchestrator, no global transaction, and fully decoupled via events.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Error handling and state reconstruction are tricky.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Services operate independently, queues absorb load spikes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;Very High&lt;/td&gt;
&lt;td&gt;No coupling factors. Ideal for massive scale.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;There’s no one-size-fits-all saga. Each pattern involves trade-offs across key characteristics like consistency, availability, scalability and performance. You can't maximize them all at once. Strong control often limits scalability, while loose coupling increases flexibility but demands stronger coordination and observability.&lt;/p&gt;

&lt;p&gt;In practice, many systems adopt multiple saga patterns. For example, you might use the Epic Saga for critical and atomic flows like payments, and the Parallel Saga for scalable tasks that doesn't require immediate consistency like order fulfillment. The key is to choose the right trade-offs for each workflow guided by the characteristics your business values most and can’t afford to sacrifice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;p&gt;Most of the material here is taken from &lt;em&gt;The Hard Parts&lt;/em&gt; book by Mark Richards and Neal Ford.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.oreilly.com/library/view/software-architecture-the/9781492086888/" rel="noopener noreferrer"&gt;Software Architecture: The Hard Parts Book - Mark Richards and Neal Ford&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.developertoarchitect.com/lessons" rel="noopener noreferrer"&gt;Distributed Transactions and Sagas Lessons - Mark Richards&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://microservices.io/patterns/data/saga.html" rel="noopener noreferrer"&gt;Microservices.io - Chris Richardson&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/architecture/patterns/saga" rel="noopener noreferrer"&gt;Saga Pattern - Microsoft Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/saga.html" rel="noopener noreferrer"&gt;Saga Choreography and Orchestration Patterns - AWS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>microservices</category>
      <category>programming</category>
      <category>architecture</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Microservices Caching: Strategies, Topologies, and Best Practices</title>
      <dc:creator>Randa</dc:creator>
      <pubDate>Wed, 26 Feb 2025 23:32:20 +0000</pubDate>
      <link>https://dev.to/randazraik/microservices-caching-demystified-strategies-topologies-and-best-practices-43ad</link>
      <guid>https://dev.to/randazraik/microservices-caching-demystified-strategies-topologies-and-best-practices-43ad</guid>
      <description>&lt;p&gt;This article offers a thorough look at caching in microservices from the fundamental to more advanced techniques and patterns. Along the way, we’ll see how caching can accelerate performance, keep services decoupled, and respect each microservice’s autonomy. We will go through the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
Introduction: Core Concepts and Definitions

&lt;ul&gt;
&lt;li&gt;What Are Microservices?&lt;/li&gt;
&lt;li&gt;Bounded Context in Microservices&lt;/li&gt;
&lt;li&gt;What Is Caching?&lt;/li&gt;
&lt;li&gt;Consistency vs Eventual Consistency&lt;/li&gt;
&lt;li&gt;Why Caching Matters in Microservices?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Cache Implementation Approaches

&lt;ul&gt;
&lt;li&gt;IMDG (In-Memory Data Grid)&lt;/li&gt;
&lt;li&gt;IMDB (In-Memory Database)&lt;/li&gt;
&lt;li&gt;IMDG vs. IMDB&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Caching Strategies

&lt;ul&gt;
&lt;li&gt;Read-Through&lt;/li&gt;
&lt;li&gt;Write-Through&lt;/li&gt;
&lt;li&gt;Write-Behind&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Caching Topologies

&lt;ul&gt;
&lt;li&gt;Single In-Memory Caching&lt;/li&gt;
&lt;li&gt;Distributed Caching (Client-Server)&lt;/li&gt;
&lt;li&gt;Replicated Caching (In-Process)&lt;/li&gt;
&lt;li&gt;Near-Cache Hybrids&lt;/li&gt;
&lt;li&gt;Topologies Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Caching Patterns and Use Cases

&lt;ul&gt;
&lt;li&gt;Data Sharing&lt;/li&gt;
&lt;li&gt;Data Sidecars&lt;/li&gt;
&lt;li&gt;Multi-Instance Caching&lt;/li&gt;
&lt;li&gt;Tuple-Space Pattern&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Data Collisions

&lt;ul&gt;
&lt;li&gt;Understanding Data Collisions&lt;/li&gt;
&lt;li&gt;Avoiding Data Collisions&lt;/li&gt;
&lt;li&gt;Calculating Collision Probability&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Eviction Policies

&lt;ul&gt;
&lt;li&gt;Time-to-Live (TTL)&lt;/li&gt;
&lt;li&gt;Archive (ARC) Policy&lt;/li&gt;
&lt;li&gt;Least Frequently Used (LFU)&lt;/li&gt;
&lt;li&gt;Least Recently Used (LRU)&lt;/li&gt;
&lt;li&gt;Random Replacement (RR)&lt;/li&gt;
&lt;li&gt;Selecting the Right Eviction Policy&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Wrap-Up&lt;/li&gt;
&lt;li&gt;Further Reading&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Introduction: Core Concepts and Definitions
&lt;/h2&gt;

&lt;p&gt;Let's clarify first few key concepts and definitions related to microservices and caching before we deep dive into the caching topologies and strategies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdabmx0ul3l7w1nf7k1kd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdabmx0ul3l7w1nf7k1kd.png" alt="Microservices and Bounded Contexts" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What Are Microservices?
&lt;/h3&gt;

&lt;p&gt;Microservices is an architectural style where software is composed of multiple independent services, each focused on a single purpose. These services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can be deployed, scaled, and updated independently.&lt;/li&gt;
&lt;li&gt;Communicate (often via HTTP or messaging) rather than relying on a single monolithic database.&lt;/li&gt;
&lt;li&gt;Avoid tightly coupled monolithic structures, enabling faster iteration and smaller failure cycles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation helps teams iterate faster and isolate failures. However, data management across microservices can become more complex, especially when different services need overlapping sets of information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bounded Context in Microservices
&lt;/h3&gt;

&lt;p&gt;A bounded context is a principle from domain-driven design, crucial for microservices. It means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each microservice owns its domain logic and data.&lt;/li&gt;
&lt;li&gt;Internally, the service can structure or store data however it wants (e.g., a relational database schema, NoSQL documents, or a simple file system).&lt;/li&gt;
&lt;li&gt;Other services cannot directly query or modify that data store.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is often called a share-nothing approach at the data level: each service controls its own resources. However, this does not necessarily require each service to have a completely separate physical database instance. A common setup is one database (e.g., PostgreSQL) where each microservice is assigned a dedicated schema or set of tables it alone manages. As long as the service is the only one reading/writing those specific tables (and no other service bypasses it), the bounded context principle holds.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Caching?
&lt;/h3&gt;

&lt;p&gt;Caching means temporarily storing data in a faster medium (often memory) to make subsequent requests for the same data quicker. By avoiding repeated expensive queries or computations, caching can significantly boost performance and scalability. It’s a common technique everywhere from simple in-memory lookups to distributed systems that replicate large data sets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consistency vs Eventual Consistency
&lt;/h3&gt;

&lt;p&gt;Consistency or strong consistency means that whenever you read data, you always get the most recent write (like in a traditional database with full ACID guarantees). This is great for correctness but can slow down distributed systems.&lt;/p&gt;

&lt;p&gt;Eventual Consistency means data might be out of date for a short while, but eventually, all replicas or caches catch up. In microservices, we often accept a brief window of staleness in exchange for better speed and uptime. For example, if you update user preferences, a remote cache might still have the old version for a few seconds until it’s invalidated or refreshed. That’s “eventual consistency”.&lt;/p&gt;

&lt;p&gt;If you want absolute consistency, you might do synchronous writes, which can slow the system or cause partial unavailability. If you accept occasional staleness, you get better performance and resilience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Caching Matters in Microservices?
&lt;/h3&gt;

&lt;p&gt;In microservices, caching can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Improve Performance: Serve data from memory instead of re-fetching from databases or external APIs. This is crucial when a microservice must repeatedly call another microservice or run expensive queries.&lt;/li&gt;
&lt;li&gt;Enhance Scalability: Offloading repeated reads to a cache lightens the load on the original data store or service, allowing the overall system to handle more traffic.&lt;/li&gt;
&lt;li&gt;Reduce Inter-Service Chatter: Some services might rely heavily on data “owned” by another service. Instead of making many network calls, a local or shared cache can speed things up.&lt;/li&gt;
&lt;li&gt;Partially Decouple Services: If the owner goes offline temporarily, other services can still serve cached data (for read-only cases).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yet, caching in microservices introduces additional complexity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consistency: Cached data can become stale or out-of-sync.&lt;/li&gt;
&lt;li&gt;Collision Handling: Multiple services or instances writing the same cached data can overwrite each other.&lt;/li&gt;
&lt;li&gt;Bounded Context: We must ensure that caching external data doesn’t break the share-nothing principle by bypassing the owning service’s authority over updates.&lt;/li&gt;
&lt;li&gt;Eviction Policies: Which data gets removed when the cache is full or out-of-date?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Cache Implementation Approaches
&lt;/h2&gt;

&lt;p&gt;In many caching products, you’ll find two broad ways to store and query data: IMDG (In-Memory Data Grid) and IMDB (In-Memory Database).&lt;/p&gt;

&lt;h3&gt;
  
  
  IMDG (In-Memory Data Grid)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: A distributed key-value store kept entirely in RAM.&lt;/li&gt;
&lt;li&gt;Data Model: Typically a map or dictionary of name-value pairs, plus some metadata.&lt;/li&gt;
&lt;li&gt;Use Case: Fast get/put caching with minimal overhead, primarily for simple data access.&lt;/li&gt;
&lt;li&gt;Examples: Hazelcast, Apache Ignite, Infinispan, Coherence, GemFire.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your caching usage centers on straightforward queries, i.e., fetching or updating objects by key, an IMDG is ideal for its simplicity and speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  IMDB (In-Memory Database)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: An in-memory system that can behave more like a database, often supporting SQL-like queries, indexing, or advanced data operations.&lt;/li&gt;
&lt;li&gt;Data Model: Potentially relational or table-like, capable of handling more complex queries (joins, aggregates).&lt;/li&gt;
&lt;li&gt;Use Case: You need robust query capabilities or analytics on cached data, not just key-based lookups.&lt;/li&gt;
&lt;li&gt;Trade-Off: Usually higher memory/CPU usage than an IMDG due to indexing and query engines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An IMDB is valuable if your cache must support complex queries, like filtering or joining multiple data sets in-memory. This can be a big performance gain for analytics or specialized read patterns but requires more resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  IMDG vs. IMDB
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simplicity: If your data is basically a series of name-value pairs, an IMDG suffices.&lt;/li&gt;
&lt;li&gt;Complex Queries: If you want advanced querying (e.g., partial scans, joins, SQL), an IMDB is a better fit.&lt;/li&gt;
&lt;li&gt;Performance Overhead: IMDB’s query engines can be slower and more memory-intensive compared to IMDG.&lt;/li&gt;
&lt;li&gt;Purpose: Evaluate whether the cache is just a performance booster for repeated gets or a mini-database in memory for more elaborate data logic.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Caching Strategies
&lt;/h2&gt;

&lt;p&gt;These strategies describe how reads and writes flow between your service, the cache, and the underlying data store. You can apply them to almost any caching topology (single in-memory or distributed), though they’re commonly used with local caches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read-Through
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkdk5re8rlzab84eiyvd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkdk5re8rlzab84eiyvd.png" alt="Read-Through" width="638" height="173"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The microservice always reads from the cache.&lt;/li&gt;
&lt;li&gt;If the data is missing (cache miss), the cache itself fetches from the database, updates the cache, and returns the result.&lt;/li&gt;
&lt;li&gt;From the microservice’s perspective, it’s only talking to the cache.&lt;/li&gt;
&lt;li&gt;Simplifies reading, but if the database belongs to another microservice domain, you bypass the actual owner’s logic.&lt;/li&gt;
&lt;li&gt;For purely read-only usage in your domain only, this can be straightforward.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Write-Through
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xbkeff01mlhpeq3potv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xbkeff01mlhpeq3potv.png" alt="Write-Through" width="646" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The microservice writes directly to the cache.&lt;/li&gt;
&lt;li&gt;The cache synchronously writes the change to the underlying database.&lt;/li&gt;
&lt;li&gt;From the microservice’s perspective, it’s only talking to the cache.&lt;/li&gt;
&lt;li&gt;Keeps data consistent but can slow performance if the database call is slow as it must wait for the write to complete.&lt;/li&gt;
&lt;li&gt;Similarly can break domain boundaries if you are writing to another microservice’s database.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Write-Behind (Write-Back)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72rm4f0ggz8xp6ztcziv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72rm4f0ggz8xp6ztcziv.png" alt="Write-Behind" width="636" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The microservice writes to the cache and returns quickly.&lt;/li&gt;
&lt;li&gt;The cache asynchronously updates the database afterward.&lt;/li&gt;
&lt;li&gt;Reduces write latency since it does not wait for the database write, but risks data loss if the cache node fails before persisting to the database, and can cause timing issues if other processes expect immediate writes.&lt;/li&gt;
&lt;li&gt;Similar boundary issues if updating another microservice’s database.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In strict microservices, letting a cache talk directly to another service's database can undermine the bounded context principle unless carefully encapsulated. Often, you'd prefer your own domain data for these strategies, or you might rely on read-only caching for external data, for that you can consider a data sidecar or a data sharing approach that we will discuss later to avoid direct database calls that bypass the rightful domain owner.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caching Topologies
&lt;/h2&gt;

&lt;p&gt;In microservices, caching can take several architectural forms, each physically arranged in distinct ways. Each topology has strengths and limitations, particularly regarding fault tolerance, data consistency, scalability, and complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single In-Memory Caching
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug5qmunh21ze9t59dn6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug5qmunh21ze9t59dn6b.png" alt="Single In-Memory Caching" width="636" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, you simply load data (e.g., user preferences, some small reference set) into local RAM within a microservice instance. Each instance keeps its own cache.&lt;/p&gt;

&lt;h4&gt;
  
  
  Suitable for:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Small or mostly static data sets.&lt;/li&gt;
&lt;li&gt;Your microservice runs as a single instance or you can tolerate minimal updates and data skew.&lt;/li&gt;
&lt;li&gt;The data belongs to your domain (bounded context) so you’re not breaking ownership rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Extremely fast, as data is stored in local memory with no network latency.&lt;/li&gt;
&lt;li&gt;Complexity: Simple to implement. Requires no extra infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Consistency and Multiple Instances: If your microservice is scaled across containers, each instance has its own local cache. Updates in one instance aren’t automatically propagated to others, leading to data skew or stale data if the data changes often.&lt;/li&gt;
&lt;li&gt;Scalability: A single instance’s memory might not handle large data sets.&lt;/li&gt;
&lt;li&gt;Write-Heavy Scenarios: Single in-memory caching suits read-heavy loads. For writes, multiple instances might each update local data, leading to divergent caches or stale state if no synchronization is in place.&lt;/li&gt;
&lt;li&gt;Bounded Context: If you rely on read-through/write-through for data that belongs to another domain, you skip that domain’s service logic unless you encapsulate calls through their API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Still, single in-memory caching is simple and great for static or rarely updated data, or small reference sets that every request needs. For bigger or more complex systems, you’ll often turn to more advanced topologies.&lt;/p&gt;

&lt;h4&gt;
  
  
  Code Snippet:
&lt;/h4&gt;

&lt;p&gt;This example demonstrates how to use an in-memory cache in .NET with IMemoryCache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Register IMemoryCache in Program.cs&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddMemoryCache&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Get or create a cached value&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="k"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_memoryCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetOrCreateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"value 1"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Distributed Caching (Client-Server)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpwbse5ecg6xo9p87yh2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpwbse5ecg6xo9p87yh2t.png" alt="Distributed Caching" width="636" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A distributed cache keeps data in an external caching cluster, often a separate server or group of servers, while microservices connect to it through a client library over the network. Examples include Redis, Memcached, or Apache Ignite/Hazelcast in client-server mode.&lt;/p&gt;

&lt;h4&gt;
  
  
  How It Works:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;You have one unified external caching cluster (i.e. redis).&lt;/li&gt;
&lt;li&gt;You have a cache library in each microservice instance.&lt;/li&gt;
&lt;li&gt;Your code calls this library’s API.&lt;/li&gt;
&lt;li&gt;The library uses a proprietary protocol to talk to the external cluster.&lt;/li&gt;
&lt;li&gt;The clusters stores and replicates data as configured.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bounded context is not violated as no one is hitting someone else's database, each service has its own read-only cache in the caching server. Also, IMDG or IMDB can be used here, if you only need key-value usage, you’d likely configure an IMDG mode, if you want to run queries, you might pick IMDB mode (though that’s less common for a simple caching scenario).&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Consistency: All instances share one cache to read and update. Consistency is simpler to manage.&lt;/li&gt;
&lt;li&gt;Scalability: If the distributed cache cluster is robust (e.g., horizontally sharded or replicated), it can handle large data volumes and concurrency.&lt;/li&gt;
&lt;li&gt;Many real-world microservices rely on distributed caching (e.g., Redis) because it’s straightforward to manage and widely supported.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Slower reads/writes compared to local memory (due to network latency).&lt;/li&gt;
&lt;li&gt;Complexity: Must manage an external caching layer (e.g., multiple Redis nodes, replication, or clustering).&lt;/li&gt;
&lt;li&gt;Availability: If the external cluster is unreachable, caching fails for all microservice instances.&lt;/li&gt;
&lt;li&gt;Fault Tolerance: Potential single point of failure unless replicated or clustered properly. Losing the cache node can disrupt everything.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Code Snippet:
&lt;/h4&gt;

&lt;p&gt;This example demonstrates how to use a distributed cache in .NET with Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Wire Redis in Program.cs and use IDistributedCache to get/set data&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddStackExchangeRedisCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Configuration&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetConnectionString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Redis"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Store a value in the cache&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_distributedCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SetStringAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"value 1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Get the value from the cache&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="k"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_distributedCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetStringAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Replicated Caching (In-Process)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapg36hu894c02fcren8m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapg36hu894c02fcren8m.png" alt="Replicated Caching" width="634" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This type doesn't require an external server. Each microservice instance has an in-process cache, but updates are replicated to all other nodes, and this is handled by the cache engine. Products like Hazelcast, Apache Ignite, GemFire, Coherence, and Infinispan support this mode.&lt;/p&gt;

&lt;h4&gt;
  
  
  How It Works:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;You still use a library (e.g., Hazelcast, Ignite) in each microservice instance.&lt;/li&gt;
&lt;li&gt;Each instance has its own in-process memory cache.&lt;/li&gt;
&lt;li&gt;When your app writes to the local cache, updates are automatically replicated to other instances via a proprietary protocol.&lt;/li&gt;
&lt;li&gt;So every node eventually has the same data in memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Extremely fast local reads (nanosecond-level) because data is in the same process memory.&lt;/li&gt;
&lt;li&gt;Fault tolerance: If one instance fails, others still hold the fully copy of the data in memory (assuming no partition issues).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Scalability: Large data sets can cause scaling issues as every instance must store it.&lt;/li&gt;
&lt;li&gt;Collisions: High update rates risk collisions or “split-brain” scenarios if replication lags. This will be discussed later in Data Collisions section.&lt;/li&gt;
&lt;li&gt;Complexity: More complex coordination among large numbers of instances.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Code Snippet:
&lt;/h4&gt;

&lt;p&gt;This example demonstrates how to use a replicated cache in .NET with Hazelcast:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;HazelcastOptionsBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;With&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Create an Hazelcast client and connect to a server running on localhost&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;HazelcastClientFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartNewClientAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Get the distributed map from the cluster&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;replicatedMap&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetReplicatedMapAsync&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"replicated-map-1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Store a value in the replicated map&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;replicatedMap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PutAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"value 1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Get the value from the replicated map&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="k"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;replicatedMap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Near-Cache Hybrids
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f13eqsl9y73c7msnncm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f13eqsl9y73c7msnncm.png" alt="Near-Cache Hybrids" width="641" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A near-cache approach combines the distributed caching and the replicated caching.&lt;/p&gt;

&lt;h4&gt;
  
  
  How It Works:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;A microservice instance has a local “front” cache for “hot” items with a capacity limit and an eviction policy configured. We will talk later about Eviction Policies.&lt;/li&gt;
&lt;li&gt;There's also a distributed “backing” cache (like Hazelcast or Ignite cluster) that holds the full data set.&lt;/li&gt;
&lt;li&gt;Reads first go to the local near/front cache. If it's not there, they retrieve from the backing cache.&lt;/li&gt;
&lt;li&gt;Writes usually go to the backing cache, which sends invalidates or updates to local near-caches for other instances via a proprietary protocol to ensure they remain in sync.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Blends scalability of a distributed store with fast local reads for frequently accessed keys.&lt;/li&gt;
&lt;li&gt;Reduces repeated remote calls if the item is “hot”.&lt;/li&gt;
&lt;li&gt;Limits local memory usage (only “most recently/frequently used” items).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Additional complexity in configuring two-tier caching.&lt;/li&gt;
&lt;li&gt;Brief staleness possible unless invalidation updates propagate instantaneously.&lt;/li&gt;
&lt;li&gt;Doesn’t store the entire data set locally, so cache misses still require network access to the backing store.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Code Snippet:
&lt;/h4&gt;

&lt;p&gt;This example demonstrates how to use a near cache in .NET with Hazelcast:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;HazelcastOptionsBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;With&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Configure NearCache&lt;/span&gt;
&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NearCaches&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"near-cache-map-1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;NearCacheOptions&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Eviction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;Hazelcast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EvictionOptions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Evicts least recently used entries&lt;/span&gt;
        &lt;span class="n"&gt;EvictionPolicy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EvictionPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lru&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;// Max size for entries&lt;/span&gt;
        &lt;span class="n"&gt;Size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;// Max number of seconds for each entry to stay in the Near Cache&lt;/span&gt;
    &lt;span class="n"&gt;TimeToLiveSeconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// Max number of seconds for each entry can stay in the Near Cache untouched&lt;/span&gt;
    &lt;span class="n"&gt;MaxIdleSeconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;InvalidateOnChange&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Create a Hazelcast client and connect to a server running on localhost&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;HazelcastClientFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartNewClientAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Get the distributed map from the cluster&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;map&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetMapAsync&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"near-cache-map-1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Store a value in the cache&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;SetAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"value 1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Get the value from the cache by key&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="k"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"key 1"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Topologies Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Single In-Memory&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Distributed (Client-Server)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Replicated (In-Process)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Near-Cache (Hybrid)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extremely fast local&lt;/td&gt;
&lt;td&gt;Network-based reads&lt;/td&gt;
&lt;td&gt;Nanosecond local reads&lt;/td&gt;
&lt;td&gt;Local + distributed store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Volume&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Small, mostly static&lt;/td&gt;
&lt;td&gt;Potentially large&lt;/td&gt;
&lt;td&gt;Usually smaller sets&lt;/td&gt;
&lt;td&gt;Large in backing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Update Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very low changes&lt;/td&gt;
&lt;td&gt;Handles high writes&lt;/td&gt;
&lt;td&gt;Moderate updates&lt;/td&gt;
&lt;td&gt;Moderate / High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fault Tolerance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None if multi-instance&lt;/td&gt;
&lt;td&gt;Cluster config dependent&lt;/td&gt;
&lt;td&gt;Node-level replication&lt;/td&gt;
&lt;td&gt;Partial replication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cache is per instance, no unification&lt;/td&gt;
&lt;td&gt;Central store&lt;/td&gt;
&lt;td&gt;Collision risk under concurrency&lt;/td&gt;
&lt;td&gt;Local front can be stale briefly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Caching Patterns and Use Cases
&lt;/h2&gt;

&lt;p&gt;We will discuss some higher-level, application-focused solutions for typical microservice challenges. These patterns can be built on top of different topologies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Sharing
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w6gagaf34mqm81f9ppq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w6gagaf34mqm81f9ppq.png" alt="Data Sharing" width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Scenario: Product microservice owns products information, while Order microservice needs to read that data regularly. Order microservice calling Product microservice’s API constantly might become a bottleneck or add unnecessary network overhead.&lt;/p&gt;

&lt;p&gt;How It Works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Product microservice remains the sole owner of the data (bounded context).&lt;/li&gt;
&lt;li&gt;Order microservice, which needs that data, sets up a local cache to store read-only copies.&lt;/li&gt;
&lt;li&gt;When Order microservice needs the data, it can check its cache first. If it’s stale or missing, it calls Product microservice’s API.&lt;/li&gt;
&lt;li&gt;Order microservice never writes directly to Product microservice’s data store. Product microservice is still the only one responsible to modify its own data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Respects boundaries and achieves strong decoupling.&lt;/li&gt;
&lt;li&gt;Performance: Faster reads due to the local cache for the other services that need the data.&lt;/li&gt;
&lt;li&gt;Fault Tolerance: The other services can continue to operate even if the original service is unavailable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consistency: The other services might not see the changes immediately made by the original service.&lt;/li&gt;
&lt;li&gt;Cache Invalidation: The other services must decide how long it trusts the cached data before refreshing from the original service. So avoid this pattern if the service is write-heavy.&lt;/li&gt;
&lt;li&gt;Memory Overhead: If the dataset is large, the cache can consume significant memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Sidecars
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forke759o1lkbt5vsvadm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forke759o1lkbt5vsvadm.png" alt="Data Sidecars" width="742" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Scenario: Profile microservice owns detailed user profile data. Several other microservices need to read it heavily. They shouldn’t directly connect to Profile microservice’s database, nor spam the Profile microservice API every time.&lt;/p&gt;

&lt;p&gt;How It Works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Profile microservice writes changes to its domain data as usual.&lt;/li&gt;
&lt;li&gt;Whenever data changes, Profile microservice also updates a distributed cache (the “sidecar”).&lt;/li&gt;
&lt;li&gt;Other microservices read from the sidecar, which is effectively read-only for them. The domain logic for writes remains in Profile microservice.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Respects boundaries and achieves strong decoupling.&lt;/li&gt;
&lt;li&gt;Performance: Less load on the microservice. Others read from the sidecar cache instead of making direct calls or updating the database.&lt;/li&gt;
&lt;li&gt;Consistency: Everyone sees a consistent (or eventually consistent) picture from the sidecar.&lt;/li&gt;
&lt;li&gt;Scalability: Sidecar is scalable and can handle large volumes of data efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fault Tolerance: If the cache node goes down, reading services lose their data unless there’s replication or a fallback path.&lt;/li&gt;
&lt;li&gt;Extra Complexity: Setting up the push/refresh logic or using events to keep the sidecar in sync.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Instance Caching
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1m5e707lmkpy7g7wzjza.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1m5e707lmkpy7g7wzjza.png" alt="Multi-Instance Caching" width="660" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Scenario: One microservice, say Order microservice, needs to be scaled to 10 containers to handle high traffic. Each container needs the same reference data or read/writes to a shared domain. You want local caching but must keep them consistent enough.&lt;/p&gt;

&lt;p&gt;How It Works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If each container does single in-memory caching independently, you get data skew.&lt;/li&gt;
&lt;li&gt;Instead, you pick a replicated or near-cache approach so that changes can propagate among instances.

&lt;ul&gt;
&lt;li&gt;Replicated: All instances store the full data set in memory. When one node updates a key, it’s broadcast to others via a proprietary protocol.&lt;/li&gt;
&lt;li&gt;Near-Cache: Each node has a partial local cache and fetches from a backing store if missing or stale.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Each instance can quickly respond to read requests from memory.&lt;/li&gt;
&lt;li&gt;Scalability: You can add more containers without manually syncing caches.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collisions: If multiple nodes write the same key concurrently, overwrites can happen.&lt;/li&gt;
&lt;li&gt;Memory Usage (replicated) or Complex Invalidation (near-cache).&lt;/li&gt;
&lt;li&gt;Consistency: Some nodes might see outdated data briefly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tuple-Space Pattern
&lt;/h3&gt;

&lt;p&gt;Scenario: You have a system that does high speed processing (i.e. a stock trading platform) and relies on all data being in memory for lightning-fast reads and can accept the overhead.&lt;/p&gt;

&lt;p&gt;How It Works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You load all relevant data into an IMDG or IMDB (like a huge in-memory store).&lt;/li&gt;
&lt;li&gt;Reads are basically memory-speed lookups, no disk or external service.&lt;/li&gt;
&lt;li&gt;Writes must also sync with the store or an underlying database eventually.&lt;/li&gt;
&lt;li&gt;The entire microservice logic might revolve around the in-memory “space” (hence the name “tuple-space pattern”, and also the “space-based” architecture style).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Ultra-Fast Reads. Everything is in memory.&lt;/li&gt;
&lt;li&gt;Ideal for: Very high read or compute-intensive tasks (e.g., real-time analytics, stock trading, or matching engines).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Huge Memory usage: Storing all data in RAM can be expensive.&lt;/li&gt;
&lt;li&gt;Complex Writes: If multiple services or instances attempt to update data, concurrency and collisions can be tough to handle.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Wrap-Up on Patterns
&lt;/h3&gt;

&lt;p&gt;These patterns often overlap, for example, a sidecar approach might also leverage multi-instance caching or near-cache logic. The key is to keep the domain lines clear so you never override someone else’s data domain rules and choose a pattern that balances performance with the reality of concurrency, staleness, and memory cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Collisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Understanding Data Collisions
&lt;/h3&gt;

&lt;p&gt;When using replicated caching (or multi-master distributed caching), two instances can update the same record at nearly the same time, with replication lag. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instance A decrements an inventory count from 700 to 690.&lt;/li&gt;
&lt;li&gt;Instance B decrements from 700 to 695.&lt;/li&gt;
&lt;li&gt;Both replication messages cross in flight, overwriting each other. End result might incorrectly show 690 or 695 instead of 685 total.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These inconsistencies are typically called split-brain or data collisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Avoiding Data Collisions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Queueing: Instead of writing to the cache directly, each instance sends a message to a queue. A separate service processes these messages sequentially, ensuring no collisions but the trade off is eventual consistency.&lt;/li&gt;
&lt;li&gt;Compare-and-Set (Version or Timestamp Checks): The microservice checks a version (timestamp or sequence) before updating. If the version changed, it means someone else updated the data and the operation should be retried.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Calculating Collision Probability
&lt;/h3&gt;

&lt;p&gt;Collision probability can be approximated by the following formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Collision_Rate&lt;/span&gt; &lt;span class="err"&gt;≈&lt;/span&gt; &lt;span class="nx"&gt;Number_of_Instances&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Update_Rate&lt;/span&gt;&lt;span class="err"&gt;²&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;Cache_Size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="nx"&gt;Replication_Latency&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Number_of_Instances: How many instances.&lt;/li&gt;
&lt;li&gt;Update_Rate: Writes per second.&lt;/li&gt;
&lt;li&gt;Cache_Size: Total distinct data entries. The bigger it is, the less often the exact same entry collides.&lt;/li&gt;
&lt;li&gt;Replication_Latency: Average time for updates to propagate (ms).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the collision rate is low (like under 1%), you might be fine. If it’s high, you’ll need concurrency mechanisms.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number_of_Instances = 8&lt;/li&gt;
&lt;li&gt;Update_Rate (seconds) = 300&lt;/li&gt;
&lt;li&gt;Cache_Size (rows) = 30000&lt;/li&gt;
&lt;li&gt;Replication_Latency (milliseconds) = 50&lt;/li&gt;
&lt;li&gt;Then Collision Rate is 1.2 per second, which is above 1%, so collision probability is a bit high and we need to consider some concurrency mechanism.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Eviction Policies
&lt;/h2&gt;

&lt;p&gt;Caches are finite. When they fill up, something must be removed to make room for new entries. Various eviction policies address different usage patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time-to-Live (TTL)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: Each entry has an expiration timer. After the time elapses, the cache discards it.&lt;/li&gt;
&lt;li&gt;Pros: Good for data that “naturally” becomes stale quickly (like real-time bidding info).&lt;/li&gt;
&lt;li&gt;Cons: Does not handle the scenario where the cache is simply full (some items might still be unexpired).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Archive (ARC) Policy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: Evicts items based on creation date, e.g., only keep entries under 6 months old.&lt;/li&gt;
&lt;li&gt;Pros: Excellent for storing recent transactions (user orders for the last 6 months) and automatically discarding older data.&lt;/li&gt;
&lt;li&gt;Cons: Also doesn’t handle the scenario of a “full” cache. If the cache is at capacity but none of the data is older than X months, new entries cannot be added.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Least Frequently Used (LFU)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: Evicts the entry with the lowest access frequency.&lt;/li&gt;
&lt;li&gt;Pros: If data is heavily read over time but rarely updated, this can keep popular items in memory.&lt;/li&gt;
&lt;li&gt;Cons: When new items are inserted, many LFU algorithms reset counters. Frequently used items might get evicted if a series of puts occur. Can cause surprising evictions in “put-heavy” workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Least Recently Used (LRU)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: Evicts items that have not been accessed for the longest period.&lt;/li&gt;
&lt;li&gt;Pros: Generally the most intuitive for interactive data. Items used recently remain in cache.&lt;/li&gt;
&lt;li&gt;Cons: Has overhead in tracking recency (often via a linked list or timestamps).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LRU is a common default for near-cache front caches (a “most recently used” approach). Just remember, an MRU eviction policy is the opposite: it evicts the most recently used item (rarely beneficial).&lt;/p&gt;

&lt;h3&gt;
  
  
  Random Replacement (RR)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Definition: When the cache is full, pick an item at random to evict.&lt;/li&gt;
&lt;li&gt;Pros: Minimal overhead, extremely fast.&lt;/li&gt;
&lt;li&gt;Cons: No intelligence about usage patterns; can evict the most popular item.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Selecting the Right Eviction Policy
&lt;/h3&gt;

&lt;p&gt;A recommended approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with Random (RR) if usage patterns are unknown. Measure cache hit rates (via logs, counters, or built-in metrics).&lt;/li&gt;
&lt;li&gt;Experiment with LRU or LFU for a trial period, measuring the difference in hit ratio and overall performance.&lt;/li&gt;
&lt;li&gt;Choose the best performer for your data behavior.&lt;/li&gt;
&lt;li&gt;Time-based polices (TTL, ARC) shine when data is stale after a certain window or you only want to keep recent or valid data.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;Caching in microservices isn’t just about speed, it’s about reducing network calls, managing concurrency, and respecting domain boundaries. Make sure to understand your application's characteristics, data behavior, and the trade-offs of each caching approach before committing to any caching strategy.&lt;/p&gt;




&lt;h3&gt;
  
  
  Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/microservices" rel="noopener noreferrer"&gt;Microservice Architecture Style - Microsoft Learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://martinfowler.com/bliki/BoundedContext.html" rel="noopener noreferrer"&gt;Bounded Context - Martin Fowler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://systemdesignschool.io/blog/eventual-consistency-vs-strong-consistency" rel="noopener noreferrer"&gt;Strong Consistency vs Eventual Consistency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.developertoarchitect.com/lessons" rel="noopener noreferrer"&gt;Microservices Caching Topologies Lessons [76-80] - Mark Richards&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://redis.io/glossary/distributed-caching/" rel="noopener noreferrer"&gt;Distributed Caching - Redis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.hazelcast.com/hazelcast/5.5/cache/overview" rel="noopener noreferrer"&gt;Caching Data - Hazelcast&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hazelcast.com/blog/architectural-patterns-for-caching-microservices/" rel="noopener noreferrer"&gt;Architectural Patterns for Caching Microservices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>microservices</category>
      <category>programming</category>
      <category>architecture</category>
      <category>beginners</category>
    </item>
    <item>
      <title>The Ultimate Cheat Sheet: CLI Man Pages, tldr, and cheat.sh</title>
      <dc:creator>Randa</dc:creator>
      <pubDate>Mon, 13 Jan 2025 18:09:58 +0000</pubDate>
      <link>https://dev.to/randazraik/the-ultimate-cheat-sheet-cli-man-pages-tldr-and-cheatsh-19bc</link>
      <guid>https://dev.to/randazraik/the-ultimate-cheat-sheet-cli-man-pages-tldr-and-cheatsh-19bc</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When you’re coding or working extensively in the command line, having quick references for commands to know how they work is incredibly handy. Typically, you might Google it (or now, ask ChatGPT) to find a command’s usage and examples. However, that often means leaving the terminal, and context-switching can slow you down. Also, you might need to look-up multiple online resources to get what you want.&lt;/p&gt;

&lt;p&gt;This is where CLI cheat-sheet tools come in. They allow you to search or recall command examples on the fly, directly from your shell. In this article, we’ll explore three major solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;man pages: Built-in, offline, and highly detailed documentation on Unix-like systems.&lt;/li&gt;
&lt;li&gt;tldr: Short, simple, example-driven cheat sheets for popular commands.&lt;/li&gt;
&lt;li&gt;cheat.sh: A curl-friendly tool that aggregates both CLI commands and programming snippets (e.g., Python, JavaScript, Go, etc.).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Man Pages
&lt;/h2&gt;

&lt;p&gt;Man pages (short for "manual pages") are the official documentation method on Unix-like systems. They’re typically installed by default, providing offline references for nearly every system command, library, or config file.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Offline &amp;amp; Detailed: Perfect for advanced or obscure flags.&lt;/li&gt;
&lt;li&gt;Official Documentation: Maintained by the system or package authors.&lt;/li&gt;
&lt;li&gt;Searchable: &lt;code&gt;/&amp;lt;pattern&amp;gt;&lt;/code&gt; inside the man page for quick navigation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verbose: Can be overwhelming if you just want a quick example.&lt;/li&gt;
&lt;li&gt;Windows Support: Doesn't support it. You need WSL or rely on PowerShell’s &lt;code&gt;Get-Help&lt;/code&gt; alternative.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Windows

&lt;ul&gt;
&lt;li&gt;Use WSL (Windows Subsystem for Linux) with a distro like Ubuntu installed to get a real &lt;code&gt;man&lt;/code&gt; (no pun intended) out of the box.&lt;/li&gt;
&lt;li&gt;Or rely on PowerShell’s &lt;code&gt;Get-Help&lt;/code&gt; for Windows-native commands (e.g. &lt;code&gt;Get-Help dir&lt;/code&gt;). While not exactly "man," it serves a similar purpose.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Linux/macOS:

&lt;ul&gt;
&lt;li&gt;Typically pre-installed. Just type &lt;code&gt;man &amp;lt;command&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Basic Lookup - Shows very detailed information about the given command. If you want to search in the results for the word &lt;code&gt;variable&lt;/code&gt;, type &lt;code&gt;/variable&lt;/code&gt;, and press &lt;code&gt;n&lt;/code&gt; to jump to the next match:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;man &amp;lt;&lt;span class="nb"&gt;command&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
man &lt;span class="nb"&gt;grep&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Find Commands by Keyword - Shows all man-page entries related to the given keyword:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;man &lt;span class="nt"&gt;-k&lt;/span&gt; zip       &lt;span class="c"&gt;# Shows results for `unzip`, `gzip`, etc.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;whatis - Searches the manual page names and displays the manual page descriptions of any name matched. It's equivalent to &lt;code&gt;whatis ip&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;man &lt;span class="nt"&gt;-f&lt;/span&gt; &amp;lt;term&amp;gt;
man &lt;span class="nt"&gt;-f&lt;/span&gt; ip        &lt;span class="c"&gt;# Displays the man page descriptions matching `ip`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Man Section Control - Displays detailed documentation for a specific topic within the specified manual section. In Linux, different "sections" exist (e.g., 2 for system calls, 3 for library calls). This is crucial if you want library-level details vs. userland commands.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;man &amp;lt;section&amp;gt; &amp;lt;topic&amp;gt;
man 2 open       &lt;span class="c"&gt;# Displays the man page for the `open` system call&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Learn More
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;An online collection of Linux man pages is available at: &lt;a href="https://man7.org/linux/man-pages/" rel="noopener noreferrer"&gt;https://man7.org/linux/man-pages/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  tldr
&lt;/h2&gt;

&lt;p&gt;Tldr (short for "too long; didn't read") is a community-driven project providing concise, example-focused cheat sheets. Instead of swimming through 100 lines in a man page, you get 5–10 lines of the most common usage patterns.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimal &amp;amp; Fast: Great for everyday tasks.&lt;/li&gt;
&lt;li&gt;Actively Updated: Large open-source community.&lt;/li&gt;
&lt;li&gt;Offline Cache: While internet is needed initially, tldr can be used offline afterwards thanks to its caching feature.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limited Depth: Doesn’t always show advanced flags or environment info.&lt;/li&gt;
&lt;li&gt;Requires Installation: Typically not pre-installed, so you need to install a client.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Windows&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Common approach via npm (assuming Node.js is installed):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;npm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tldr&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Linux/macOS&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;npm again is straightforward:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; tldr
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alternatively, install the official Rust Client using Homebrew (or other package managers on other operating systems):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;tlrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Basic Lookup - Shows minimal information about the given command:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tldr &amp;lt;&lt;span class="nb"&gt;command&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
tldr &lt;span class="nb"&gt;cat&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Update Cache - Pulls the latest cheat sheets from the official repo to be used offline:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tldr &lt;span class="nt"&gt;--update&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To load a tldr for a random page:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tldr &lt;span class="nt"&gt;-r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Learn More
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Official Site: &lt;a href="https://tldr.sh" rel="noopener noreferrer"&gt;https://tldr.sh&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;tldr Clients: &lt;a href="https://github.com/tldr-pages/tldr/wiki/Clients" rel="noopener noreferrer"&gt;https://github.com/tldr-pages/tldr/wiki/Clients&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub Repo: &lt;a href="https://github.com/tldr-pages/tldr" rel="noopener noreferrer"&gt;https://github.com/tldr-pages/tldr&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  cheat.sh
&lt;/h2&gt;

&lt;p&gt;cheat.sh is a web-based service you can query via &lt;code&gt;curl&lt;/code&gt;. Unlike tldr or man pages, it includes programming language snippets (&lt;code&gt;python/regex&lt;/code&gt;, &lt;code&gt;go/http&lt;/code&gt;, etc.). Perfect for devs who want both CLI commands and language cheat sheets in one place.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero-Install: Just &lt;code&gt;curl&lt;/code&gt; to fetch results.&lt;/li&gt;
&lt;li&gt;Covers 56 programming languages, several DBMSes, and more than 1000 most important UNIX/Linux commands.&lt;/li&gt;
&lt;li&gt;Ultrafast, returns answers within 100 ms, as a rule.&lt;/li&gt;
&lt;li&gt;Ability to add more cheat sheets and modify existing ones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires Internet: Unless you self-host cheat.sh.&lt;/li&gt;
&lt;li&gt;Inconsistent Formatting: Pulled from various sources, so the style can vary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;No formal installation needed, just use &lt;code&gt;curl&lt;/code&gt; command. It's a REST API, so as long as you have internet and a terminal, you’re all set.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows

&lt;ul&gt;
&lt;li&gt;PowerShell in Windows 10+ already includes &lt;code&gt;curl&lt;/code&gt; as an alias for &lt;code&gt;Invoke-WebRequest&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Git Bash or WSL also have &lt;code&gt;curl&lt;/code&gt; by default or easily installed.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Linux/macOS

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;curl&lt;/code&gt; is usually pre-installed on major distros. Check with &lt;code&gt;curl --version&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Basic Lookup - Shows multiple usage examples for the given command, sometimes more extensive than tldr:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl cheat.sh/&amp;lt;&lt;span class="nb"&gt;command&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
curl cheat.sh/tar
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Subtopic Filtering - Use &lt;code&gt;/~&amp;lt;keyword&amp;gt;&lt;/code&gt; to focus on specific usage:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl cheat.sh/scala/~currying   &lt;span class="c"&gt;# Looks for currying in scala cheat sheets&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Programming languages cheat sheets - For each supported programming language there are several special cheat sheets: its own sheet, &lt;code&gt;hello&lt;/code&gt;, &lt;code&gt;:list&lt;/code&gt; and &lt;code&gt;:learn&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl cheat.sh/lua
curl cheat.sh/lua/hello
curl cheat.sh/lua/:list
curl cheat.sh/lua/:learn
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;To know how to randomize numbers in C# for example:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl cheat.sh/csharp/random
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Output will be sth like this:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="cm"&gt;/*
 * The [`Random` class][1] is used to create random numbers. (Pseudo-
 * random that is of course.).
 *
 * Example:
 *
 * &amp;lt;!-- language: c# --&amp;gt;
 */&lt;/span&gt;

 &lt;span class="n"&gt;Random&lt;/span&gt; &lt;span class="n"&gt;rnd&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Random&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
 &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;13&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// creates a number between 1 and 12&lt;/span&gt;
 &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;dice&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;7&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// creates a number between 1 and 6&lt;/span&gt;
 &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;card&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rnd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;52&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;     &lt;span class="c1"&gt;// creates a number between 0 and 51&lt;/span&gt;
 &lt;span class="c1"&gt;// Rest of details&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Special pages - Few example:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl cheat.sh/:help     &lt;span class="c"&gt;# Description of all special pages and options&lt;/span&gt;
curl cheat.sh/:intro    &lt;span class="c"&gt;# cheat.sh introduction, covering the most important usage questions&lt;/span&gt;
curl cheat.sh/:list     &lt;span class="c"&gt;# Lists all cheat sheets&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pipe into fzf (for super-advanced searching) - If you have fzf installed, you can interactively sift through cheat.sh’s output:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl cheat.sh/python | fzf
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Alias
&lt;/h3&gt;

&lt;p&gt;To speed things up, the curl command is a bit long (at least for me), so we can add an alias for it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Windows: Add the following alias to &lt;code&gt;$PROFILE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="kr"&gt;function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cheat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="kr"&gt;param&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nv"&gt;$topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;curl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cheat.sh/&lt;/span&gt;&lt;span class="nv"&gt;$topic&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Then you can use it this way:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cheat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tar&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cheat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;csharp/random&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Linux: Add the following alias to &lt;code&gt;~/.bashrc&lt;/code&gt; or &lt;code&gt;~/.zshrc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cheat&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    curl &lt;span class="s2"&gt;"cheat.sh/&lt;/span&gt;&lt;span class="nv"&gt;$*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Then you can use it this way:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cheat csharp/random
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Learn More
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Repo: &lt;a href="https://github.com/chubin/cheat.sh" rel="noopener noreferrer"&gt;https://github.com/chubin/cheat.sh&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;You're most likely coding in an editor, so you might wonder how to access cheat sheets specific to your programming language. One option is to open a terminal within your editor and run the commands there. Alternatively, you can check out the cheat.sh repo for instructions on how to integrate your editor with cheat.sh directly.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Compare Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature/Tool&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Man Pages&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;tldr&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;cheat.sh&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Installation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pre-installed on most Unix-like systems&lt;br&gt;(Use WSL on Win)&lt;/td&gt;
&lt;td&gt;Install a tldr client (e.g., via npm)&lt;/td&gt;
&lt;td&gt;No install needed—just &lt;code&gt;curl&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Offline Usage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fully offline&lt;/td&gt;
&lt;td&gt;Cached offline after initial update&lt;/td&gt;
&lt;td&gt;Requires internet unless you self-host&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Detail Level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extremely comprehensive (official docs)&lt;/td&gt;
&lt;td&gt;Concise, covers common commands/features&lt;/td&gt;
&lt;td&gt;Varies; includes code snippets &amp;amp; subtopics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Advanced Flags&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Thorough coverage&lt;/td&gt;
&lt;td&gt;Limited coverage of advanced flags&lt;/td&gt;
&lt;td&gt;Medium coverage (from multiple community sources)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Programming&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;System-level docs only&lt;/td&gt;
&lt;td&gt;CLI commands only&lt;/td&gt;
&lt;td&gt;Includes language cheat sheets (Python, JS, Go, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instant offline results&lt;/td&gt;
&lt;td&gt;Very fast for typical usage&lt;/td&gt;
&lt;td&gt;Usually sub-100ms response, but must be online&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unix-like systems (WSL for Win)&lt;/td&gt;
&lt;td&gt;Cross-platform (Win, Linux, macOS), with a client&lt;/td&gt;
&lt;td&gt;Cross-platform (Win, Linux, macOS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep dive into official docs&lt;/td&gt;
&lt;td&gt;Quick references for everyday commands&lt;/td&gt;
&lt;td&gt;Quick references &lt;strong&gt;plus&lt;/strong&gt; code snippets in multiple languages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Other Tools to Explore
&lt;/h2&gt;

&lt;p&gt;Even with man, tldr, and cheat.sh, you might want to explore more similar tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/srsudar/eg" rel="noopener noreferrer"&gt;eg&lt;/a&gt;: Provides simple, practical command-line examples, acting as a quick-reference companion to man pages.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/cheat/cheat" rel="noopener noreferrer"&gt;Cheat&lt;/a&gt;: Enables creating and viewing interactive command-line cheatsheets, helping *nix admins recall options for commands they use occasionally.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://devhints.io" rel="noopener noreferrer"&gt;devhints&lt;/a&gt;: Provides quick, easy-to-navigate cheatsheets for developers, offering concise references for various tools, frameworks, and programming languages.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;Whether you're diving into man pages for detailed offline docs, using tldr for quick command overviews, or exploring cheat.sh for filtered subtopics and snippets, you'll have everything you need right at your fingertips. You can also mix and match these tools to cover all your bases and tackle any situation. We’ve only touched on the basics here, so consider playing with these tools to explore their full potential.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>productivity</category>
      <category>cli</category>
      <category>cheatsheet</category>
    </item>
    <item>
      <title>MSBuild and .NET Project Files Explained</title>
      <dc:creator>Randa</dc:creator>
      <pubDate>Thu, 26 Dec 2024 20:32:22 +0000</pubDate>
      <link>https://dev.to/randazraik/msbuild-and-project-files-3d4o</link>
      <guid>https://dev.to/randazraik/msbuild-and-project-files-3d4o</guid>
      <description>&lt;h2&gt;
  
  
  What is MSBuild?
&lt;/h2&gt;

&lt;p&gt;MSBuild (Microsoft Build Engine) is a build system and platform for building applications, primarily in the .NET ecosystem. It orchestrates how code is compiled, tested, packaged, and deployed by processing XML project files like &lt;code&gt;.csproj&lt;/code&gt;, &lt;code&gt;.fsproj&lt;/code&gt; and &lt;code&gt;.vbproj&lt;/code&gt;. Visual Studio uses MSBuild, but you can use MSBuild without Visual Studio to build .NET applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features of MSBuild
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Project Building:

&lt;ul&gt;
&lt;li&gt;Compiles your source code into Intermediate Language (IL) and packages it into binaries (&lt;code&gt;.dll&lt;/code&gt;, &lt;code&gt;.exe&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Resolves project dependencies (e.g., NuGet packages) and includes them in the build process.&lt;/li&gt;
&lt;li&gt;Executes additional build tasks (e.g., running tests, creating packages, deployment tasks).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;XML-Based Configuration:

&lt;ul&gt;
&lt;li&gt;Uses XML-based project files to define build instructions in a clear and extensible format.&lt;/li&gt;
&lt;li&gt;Project files are used to define build steps, configurations, dependencies, and more.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Highly Customizable:

&lt;ul&gt;
&lt;li&gt;You can write custom targets and tasks to extend its functionality.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Integrated with .NET CLI:

&lt;ul&gt;
&lt;li&gt;Commands from .NET CLI like &lt;code&gt;dotnet build&lt;/code&gt;, &lt;code&gt;dotnet restore&lt;/code&gt;, and &lt;code&gt;dotnet publish&lt;/code&gt; use MSBuild under the hood to build projects.&lt;/li&gt;
&lt;li&gt;Visual Studio has a built-in support for MSBuild. For editors that are not integrated with MSBuild like VS Code and Zed, the .NET CLI is commonly used to manage builds.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Build Automation and CI/CD Integration:

&lt;ul&gt;
&lt;li&gt;Integrates with CI/CD systems like GitHub Actions, Azure Pipelines, and Jenkins to automate builds, tests, and deployments.&lt;/li&gt;
&lt;li&gt;Defines build and deployment pipelines entirely in MSBuild scripts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Cross-Platform:

&lt;ul&gt;
&lt;li&gt;Initially Windows-only, MSBuild became cross-platform starting with .NET Core, allowing builds on Linux and macOS.&lt;/li&gt;
&lt;li&gt;Ensures the same build logic works across operating systems, making it ideal for CI/CD pipelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  MSBuild Project File
&lt;/h2&gt;

&lt;p&gt;MSBuild processes an XML-based project file format which is configured to describe build items, configurations, and reusable build rules for consistency across projects.&lt;br&gt;
There are different types of project files such as &lt;code&gt;.csproj&lt;/code&gt; for C# projects, &lt;code&gt;.fsproj&lt;/code&gt; for F# projects and &lt;code&gt;.vbproj&lt;/code&gt; for visual basic projects.&lt;/p&gt;


&lt;h2&gt;
  
  
  MSBuild Project File Structure
&lt;/h2&gt;

&lt;p&gt;The following is an overview of the key elements that make up an MSBuild project file, explaining how each contributes to the build process and project configuration.&lt;/p&gt;
&lt;h3&gt;
  
  
  File Root
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The project file begins with the &lt;code&gt;&amp;lt;Project&amp;gt;&lt;/code&gt; root element which acts as the container for all other elements.&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;Project&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Modern .NET projects use SDK-style projects, where the SDK specifies a predefined set of build logic, properties, and imports. Some available SDKs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Microsoft.NET.Sdk&lt;/code&gt;: For console apps or libraries.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Microsoft.NET.Sdk.Web&lt;/code&gt;: For web projects like Web APIs or MVC apps.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Microsoft.NET.Sdk.Worker&lt;/code&gt;: For worker services and background jobs.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Aspire.AppHost.Sdk&lt;/code&gt;: For Aspire app host.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MSTest.Sdk&lt;/code&gt;: For MSTest apps.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Ways of declaring SDK:&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- Inline SDK declaration --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;Project&lt;/span&gt; &lt;span class="na"&gt;Sdk=&lt;/span&gt;&lt;span class="s"&gt;"Microsoft.NET.Sdk"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Project&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- Using the `&amp;lt;Sdk&amp;gt;` element --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;Project&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Sdk&lt;/span&gt; &lt;span class="na"&gt;Name=&lt;/span&gt;&lt;span class="s"&gt;"Microsoft.NET.Sdk"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Key-value pairs used to configure builds and global settings like target framework, build configuration, and output paths.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They are defined within a &lt;code&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/code&gt;. Multiple &lt;code&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/code&gt; sections can be added.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;TargetFramework&amp;gt;&lt;/span&gt;net9.0&lt;span class="nt"&gt;&amp;lt;/TargetFramework&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Conditions can be specified to dynamically enable properties:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;PropertyGroup&lt;/span&gt; &lt;span class="na"&gt;Condition=&lt;/span&gt;&lt;span class="s"&gt;"'$(Configuration)' == 'Release'"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Optimize&amp;gt;&lt;/span&gt;true&lt;span class="nt"&gt;&amp;lt;/Optimize&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Common properties:&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Target Framework&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;TargetFramework&amp;gt;&lt;/code&gt;/&lt;code&gt;&amp;lt;TargetFrameworks&amp;gt;&lt;/code&gt;: Used to specify the .NET version. This property is required.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;TargetFramework&amp;gt;&lt;/span&gt;net9.0&lt;span class="nt"&gt;&amp;lt;/TargetFramework&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;TargetFrameworks&amp;gt;&lt;/span&gt;net9.0;net40;net45&lt;span class="nt"&gt;&amp;lt;/TargetFrameworks&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Common TFMs: &lt;code&gt;net9.0&lt;/code&gt;, &lt;code&gt;net8.0&lt;/code&gt;, &lt;code&gt;netstandard2.1&lt;/code&gt;, &lt;code&gt;netcoreapp3.1&lt;/code&gt;, &lt;code&gt;net481&lt;/code&gt;. OS-specific TFMs (e.g., &lt;code&gt;net5.0-windows&lt;/code&gt;, &lt;code&gt;net6.0-ios&lt;/code&gt;) include platform-specific bindings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can add to source code preprocessor directives for conditional compilation by framework:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#if NET40
&lt;/span&gt;&lt;span class="n"&gt;Console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Target framework: .NET Framework 4.0"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="cp"&gt;#endif
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Output Type&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;OutputType&amp;gt;&lt;/code&gt;: Used to specify the application type.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Available types:&lt;br&gt;
          - &lt;code&gt;Exe&lt;/code&gt; for console apps.&lt;br&gt;
          - &lt;code&gt;Library&lt;/code&gt; for class libraries. (Default)&lt;br&gt;
          - &lt;code&gt;Module&lt;/code&gt; for modules.&lt;br&gt;
          - &lt;code&gt;Winexe&lt;/code&gt; for windows-based programs.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;OutputType&amp;gt;&lt;/span&gt;Exe&lt;span class="nt"&gt;&amp;lt;/OutputType&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Implicit using Directives&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Starting with .NET 6, C# projects automatically include commonly used namespaces via implicit global using directives, reducing the need to manually add them.&lt;/li&gt;
&lt;li&gt;Enabled by default for SDKs like &lt;code&gt;Microsoft.NET.Sdk&lt;/code&gt;, &lt;code&gt;Microsoft.NET.Sdk.Web&lt;/code&gt;, &lt;code&gt;Microsoft.NET.Sdk.Worker&lt;/code&gt;, and &lt;code&gt;Microsoft.NET.Sdk.WindowsDesktop&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;ImplicitUsings&amp;gt;&lt;/code&gt;: Used to enable/disable the feature:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;ImplicitUsings&amp;gt;&lt;/span&gt;enable&lt;span class="nt"&gt;&amp;lt;/ImplicitUsings&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;Using&amp;gt;&lt;/code&gt;: Used to add additional items to global using directives - We will talk about items in the next section:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Using&lt;/span&gt; &lt;span class="na"&gt;Include=&lt;/span&gt;&lt;span class="s"&gt;"System.IO.Pipes"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Compiler and Code Analyzer Warnings&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;TreatWarningsAsErrors&amp;gt;&lt;/code&gt;: Converts all compiler warnings into errors.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;WarningsAsErrors&amp;gt;&lt;/code&gt;: Converts specific compiler warnings into errors.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;CodeAnalysisTreatWarningsAsErrors&amp;gt;&lt;/code&gt;: Converts code analysis warnings into errors.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;NoWarn&amp;gt;&lt;/code&gt;: Suppresses specific warnings and doesn't show them in build outputs.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;TreatWarningsAsErrors&amp;gt;&lt;/span&gt;true&lt;span class="nt"&gt;&amp;lt;/TreatWarningsAsErrors&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;WarningsAsErrors&amp;gt;&lt;/span&gt;CS0168&lt;span class="nt"&gt;&amp;lt;/WarningsAsErrors&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;CodeAnalysisTreatWarningsAsErrors&amp;gt;&lt;/span&gt;true&lt;span class="nt"&gt;&amp;lt;/CodeAnalysisTreatWarningsAsErrors&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;NoWarn&amp;gt;&lt;/span&gt;CS2002&lt;span class="nt"&gt;&amp;lt;/NoWarn&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Package properties&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These properties are used when generating a NuGet package from a project, they define the metadata for the package.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;PackageId&amp;gt;&lt;/code&gt;: A unique identifier for the package. Default value is &lt;code&gt;AssemblyName&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;Version&amp;gt;&lt;/code&gt;: The version of the package. Default value is &lt;code&gt;1.0.0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;Authors&amp;gt;&lt;/code&gt;: The authors of the package. Default value is &lt;code&gt;AssemblyName&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;Company&amp;gt;&lt;/code&gt;: The company name associated with the project. Default value is &lt;code&gt;AssemblyName&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;PackageId&amp;gt;&lt;/span&gt;ClassLibDotNetStandard&lt;span class="nt"&gt;&amp;lt;/PackageId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Version&amp;gt;&lt;/span&gt;1.0.0&lt;span class="nt"&gt;&amp;lt;/Version&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Authors&amp;gt;&lt;/span&gt;your_name&lt;span class="nt"&gt;&amp;lt;/Authors&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Company&amp;gt;&lt;/span&gt;your_company&lt;span class="nt"&gt;&amp;lt;/Company&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Items
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Specify inputs to the build process, such as source files, packages, dependencies, and resources.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They are defined within a &lt;code&gt;&amp;lt;ItemGroup&amp;gt;&lt;/code&gt;. Multiple &lt;code&gt;&amp;lt;ItemGroup&amp;gt;&lt;/code&gt; sections can be added.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Common items:&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;&amp;lt;PackageReference&amp;gt;&lt;/code&gt;: Represents a reference to a package. For simplicity, use &lt;code&gt;dotnet add package&lt;/code&gt; to add a package instead of manually adding it to the project file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;&amp;lt;ProjectReference&amp;gt;&lt;/code&gt;: Represents a reference to another project.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;&amp;lt;EmbeddedResource&amp;gt;&lt;/code&gt;: Represents a resource to be embedded in the generated assembly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;Compile&amp;gt;&lt;/code&gt;: Represents the source files for the compiler. SDK-style projects predefine Compile includes, so no need to explicitly add each source file to the project.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;PackageReference&lt;/span&gt; &lt;span class="na"&gt;Include=&lt;/span&gt;&lt;span class="s"&gt;"Swashbuckle.AspNetCore"&lt;/span&gt; &lt;span class="na"&gt;Version=&lt;/span&gt;&lt;span class="s"&gt;"6.6.2"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;ProjectReference&lt;/span&gt; &lt;span class="na"&gt;Include=&lt;/span&gt;&lt;span class="s"&gt;"..\OtherProject\OtherProject.csproj"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;EmbeddedResource&lt;/span&gt; &lt;span class="na"&gt;Include=&lt;/span&gt;&lt;span class="s"&gt;"fonts\OpenSans.ttf"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Compile&lt;/span&gt; &lt;span class="na"&gt;Include=&lt;/span&gt;&lt;span class="s"&gt;"Program.cs"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tasks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Individual steps within targets to perform certain actions.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;MSBuild includes built-in tasks (e.g., &lt;code&gt;Copy&lt;/code&gt;, &lt;code&gt;Exec&lt;/code&gt;, &lt;code&gt;MakeDir&lt;/code&gt;, &lt;code&gt;Csc&lt;/code&gt;) and supports custom ones (by implementing &lt;code&gt;ITask&lt;/code&gt; or deriving from the helper class &lt;code&gt;Task&lt;/code&gt;)&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;Target&lt;/span&gt; &lt;span class="na"&gt;Name=&lt;/span&gt;&lt;span class="s"&gt;"CustomTarget"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Exec&lt;/span&gt; &lt;span class="na"&gt;Command=&lt;/span&gt;&lt;span class="s"&gt;"dotnet restore"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Copy&lt;/span&gt; &lt;span class="na"&gt;SourceFiles=&lt;/span&gt;&lt;span class="s"&gt;"README.md"&lt;/span&gt; &lt;span class="na"&gt;DestinationFolder=&lt;/span&gt;&lt;span class="s"&gt;"bin\docs\"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Csc&lt;/span&gt; &lt;span class="na"&gt;Sources=&lt;/span&gt;&lt;span class="s"&gt;"@(Compile)"&lt;/span&gt; &lt;span class="na"&gt;OutputAssembly=&lt;/span&gt;&lt;span class="s"&gt;"bin\MyApp.dll"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Target&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Targets
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Group tasks and define sections of the project file as entry points for the build process. e.g., one target cleans build artifacts, while another compiles the source code and outputs binaries.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;BeforeTargets&lt;/code&gt;, &lt;code&gt;AfterTargets&lt;/code&gt; and &lt;code&gt;DependsOnTargets&lt;/code&gt; attributes can be used to order targets.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;Target&lt;/span&gt; &lt;span class="na"&gt;Name=&lt;/span&gt;&lt;span class="s"&gt;"PreBuild"&lt;/span&gt; &lt;span class="na"&gt;BeforeTargets=&lt;/span&gt;&lt;span class="s"&gt;"PreBuildEvent"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Exec&lt;/span&gt; &lt;span class="na"&gt;Command=&lt;/span&gt;&lt;span class="s"&gt;"echo pre build"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Target&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;Target&lt;/span&gt; &lt;span class="na"&gt;Name=&lt;/span&gt;&lt;span class="s"&gt;"PostBuild"&lt;/span&gt; &lt;span class="na"&gt;AfterTargets=&lt;/span&gt;&lt;span class="s"&gt;"PostBuildEvent"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Exec&lt;/span&gt; &lt;span class="na"&gt;Command=&lt;/span&gt;&lt;span class="s"&gt;"echo post build"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Target&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;Target&lt;/span&gt; &lt;span class="na"&gt;Name=&lt;/span&gt;&lt;span class="s"&gt;"PostPostBuild"&lt;/span&gt; &lt;span class="na"&gt;DependsOnTargets=&lt;/span&gt;&lt;span class="s"&gt;"PostBuild"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;Exec&lt;/span&gt; &lt;span class="na"&gt;Command=&lt;/span&gt;&lt;span class="s"&gt;"echo post post build"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/Target&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  C# Web API Project Example
&lt;/h2&gt;

&lt;p&gt;Lets create a dummy C# Web API project from scratch, without using Visual Studio nor &lt;code&gt;dotnet new&lt;/code&gt; command to generate a template. Instead, we will use the terminal to manually create the required files, use the dotnet CLI to build and run the project, and test it using a simple HTTP request. Feel free to use your favorite editor to edit the files.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open your favorite terminal.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create the project directory and navigate to it:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;DemoApp
&lt;span class="nb"&gt;cd &lt;/span&gt;DemoApp
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create the &lt;code&gt;.csproj&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vim DemoApp.csproj
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add the following content to the &lt;code&gt;.csproj&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;Project&lt;/span&gt; &lt;span class="na"&gt;Sdk=&lt;/span&gt;&lt;span class="s"&gt;"Microsoft.NET.Sdk.Web"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;TargetFramework&amp;gt;&lt;/span&gt;net9.0&lt;span class="nt"&gt;&amp;lt;/TargetFramework&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;ImplicitUsings&amp;gt;&lt;/span&gt;enable&lt;span class="nt"&gt;&amp;lt;/ImplicitUsings&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/Project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create the &lt;code&gt;Program.cs&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vim Program.cs
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add the following content to &lt;code&gt;Program.cs&lt;/code&gt; to setup the web app and add a simple minimal API:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WebApplication&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;MapGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/welcome"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"Hello, you!"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Build the project - The output binaries will be placed in the &lt;code&gt;bin/Debug/net9.0&lt;/code&gt; directory by default.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet build
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run the project - The output will indicate that the app is listening on: &lt;code&gt;http://localhost:5000&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet run
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Test the project - The output should be: &lt;code&gt;Hello, you!&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:5000/welcome
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Final output via Zed editor:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wircg7cbpjrdkmjeq4c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wircg7cbpjrdkmjeq4c.png" alt="Web API Project via Zed" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;And that's all! You can play around &lt;code&gt;.csproj&lt;/code&gt; configurations and explore other properties.&lt;/p&gt;




&lt;h2&gt;
  
  
  Learning Resources
&lt;/h2&gt;

&lt;p&gt;Refer to the following resources if you would like to learn more about MSBuild and project files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/core/project-sdk/overview" rel="noopener noreferrer"&gt;Microsoft Learn - .NET Project SDKs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild" rel="noopener noreferrer"&gt;Microsoft Learn - MSBuild&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-concepts" rel="noopener noreferrer"&gt;Microsoft Learn - MSBuild Concepts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-reference" rel="noopener noreferrer"&gt;Microsoft Learn - MSBuild Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dotnet</category>
      <category>msbuild</category>
      <category>csharp</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
