<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kader Khan</title>
    <description>The latest articles on DEV Community by Kader Khan (@abirk).</description>
    <link>https://dev.to/abirk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2883625%2F92e2a1d4-15e6-45de-8914-c7d43590966b.png</url>
      <title>DEV Community: Kader Khan</title>
      <link>https://dev.to/abirk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abirk"/>
    <language>en</language>
    <item>
      <title>WebRTC P2P vs MCU vs SFU</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Tue, 06 Jan 2026 09:07:39 +0000</pubDate>
      <link>https://dev.to/abirk/webrtc-p2p-vs-mcu-vs-sfu-1b89</link>
      <guid>https://dev.to/abirk/webrtc-p2p-vs-mcu-vs-sfu-1b89</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;1. What Is WebRTC (Quick Overview)?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;WebRTC stands for &lt;strong&gt;Web Real-Time Communication&lt;/strong&gt; — an open standard that enables &lt;strong&gt;audio and video streaming directly between browsers and apps&lt;/strong&gt; without plugins. It’s the foundation of modern video calling on the web because it:&lt;/p&gt;

&lt;p&gt;📌 Works in most browsers&lt;br&gt;
📌 Uses real-time protocols (RTP/UDP) for low delay&lt;br&gt;
📌 Secures streams with encryption&lt;br&gt;
📌 Doesn’t require installation of special plugins&lt;/p&gt;

&lt;p&gt;But at its core, WebRTC is originally designed for &lt;strong&gt;peer-to-peer connections&lt;/strong&gt; — meaning &lt;em&gt;one peer connects directly to another&lt;/em&gt;. This is great &lt;em&gt;for 1-to-1 calls&lt;/em&gt;, but becomes complicated with more participants.&lt;/p&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;2. Peer-to-Peer (P2P) ➝ Mesh Architecture&lt;/strong&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  🌐 How P2P works
&lt;/h3&gt;

&lt;p&gt;Imagine you and one other person want a video call. WebRTC makes a &lt;strong&gt;direct connection&lt;/strong&gt; between your device and theirs. Both devices send and receive streams &lt;em&gt;directly&lt;/em&gt; — no server in the middle.&lt;/p&gt;

&lt;p&gt;This is ideal for &lt;strong&gt;one-to-one video calls&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;✔ Low latency&lt;br&gt;
✔ No central server required&lt;br&gt;
✔ No additional cost&lt;/p&gt;
&lt;h3&gt;
  
  
  🧠 But what if more people join?
&lt;/h3&gt;

&lt;p&gt;If you add a &lt;strong&gt;third person&lt;/strong&gt;, each participant must connect with &lt;em&gt;each other&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A ↔ B
A ↔ C
B ↔ C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s 3 connections. If you add a fourth, it becomes more tangled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;6 total connections:
A↔B, A↔C, A↔D,
B↔C, B↔D,
C↔D
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is called a &lt;strong&gt;mesh&lt;/strong&gt; — each peer connects to all others directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  📉 Problems with Mesh
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🔄 &lt;strong&gt;Bandwidth explosion:&lt;/strong&gt; Each peer must send its video stream to every other peer — quickly saturating upload bandwidth.&lt;/li&gt;
&lt;li&gt;🖥 &lt;strong&gt;CPU &amp;amp; encoding cost:&lt;/strong&gt; Each codec needs to encode video multiple times.&lt;/li&gt;
&lt;li&gt;🧪 &lt;strong&gt;Not reliable when peers &amp;gt; ~4–6&lt;/strong&gt;, especially over mobile or slow networks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thus, &lt;strong&gt;mesh works only for very small groups&lt;/strong&gt; (usually up to ~5 participants).&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3. Beyond Mesh — Server-Mediated Architectures&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To build scalable multi-party calling, we introduce a &lt;strong&gt;central media server&lt;/strong&gt;. This server can relieve peers from uploading to every other peer. There are &lt;em&gt;two major ways&lt;/em&gt; to do this:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A. SFU — Selective Forwarding Unit&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  🧠 What SFU does
&lt;/h4&gt;

&lt;p&gt;With SFU:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every peer sends &lt;strong&gt;their stream once&lt;/strong&gt; to the server.&lt;/li&gt;
&lt;li&gt;The SFU &lt;strong&gt;forwards streams&lt;/strong&gt; to all other participants — but it &lt;em&gt;doesn’t decode or re-encode&lt;/em&gt; them.&lt;/li&gt;
&lt;li&gt;Each peer receives the streams it wants and renders them.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;SFU acts like a &lt;strong&gt;traffic hub&lt;/strong&gt;: one upload from each user, and multiple forwards.&lt;/p&gt;

&lt;h4&gt;
  
  
  📊 Example
&lt;/h4&gt;

&lt;p&gt;Imagine 5 participants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You send your stream _once_ → SFU  
SFU sends out your video to Bl, B2, B3, B4 → each gets the streams they subscribed to
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each participant still receives (N-1) streams, but they &lt;em&gt;only upload once&lt;/em&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  ⭐ Advantages of SFU
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;📈 Scales better than mesh — because upload cost on the user side doesn’t explode.&lt;/li&gt;
&lt;li&gt;⚡ Lower server load — the server just &lt;strong&gt;forwards&lt;/strong&gt;, not processes bits deeply.&lt;/li&gt;
&lt;li&gt;🎛 Clients can choose which streams to show (e.g., pin a speaker).&lt;/li&gt;
&lt;li&gt;📱 Supports &lt;em&gt;simulcast&lt;/em&gt; (multiple quality layers) — better adapts to bandwidth.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  ⚠ Limitations
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Still sends multiple streams to each client (could be heavy on download).&lt;/li&gt;
&lt;li&gt;Server introduces another hop — slightly more latency than direct mesh.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;B. MCU — Multipoint Control Unit&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  💡 What MCU does
&lt;/h4&gt;

&lt;p&gt;MCU also receives streams from all peers. But unlike SFU, it &lt;strong&gt;decodes and mixes them&lt;/strong&gt; into a &lt;em&gt;single combined stream&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;✔ Every participant receives &lt;strong&gt;just one stream&lt;/strong&gt; — no matter how many others are in call.&lt;br&gt;
✔ MCU handles mixing, layout, encoding, and then sends that one stream to all clients.&lt;/p&gt;

&lt;h4&gt;
  
  
  🎨 Example
&lt;/h4&gt;

&lt;p&gt;In a call with 5 users:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Each user sends their stream to the MCU.&lt;/li&gt;
&lt;li&gt;MCU combines all 5 videos into a tiled layout (e.g., a 2×2 grid plus one picture).&lt;/li&gt;
&lt;li&gt;That single mixed video is sent back to each participant.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  💎 Advantages of MCU
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;📉 Clients receive only one video stream — minimal CPU &amp;amp; bandwidth.&lt;/li&gt;
&lt;li&gt;📺 Easy consistent layout for all participants.&lt;/li&gt;
&lt;li&gt;📼 Good for legacy devices that can’t handle many streams.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  🔥 Downsides
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;🧠 Very heavy server processing — mixing + encoding is CPU intensive.&lt;/li&gt;
&lt;li&gt;💰 Expensive to scale — server resources grow with participants.&lt;/li&gt;
&lt;li&gt;😴 Less flexible — clients get one view determined by server (can’t rearrange locally).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4. SFU vs MCU — A Quick Comparison&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Mesh (P2P)&lt;/th&gt;
&lt;th&gt;SFU&lt;/th&gt;
&lt;th&gt;MCU&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server Required&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upload per peer&lt;/td&gt;
&lt;td&gt;N-1 streams&lt;/td&gt;
&lt;td&gt;1 stream&lt;/td&gt;
&lt;td&gt;1 stream&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Download per peer&lt;/td&gt;
&lt;td&gt;N-1 streams&lt;/td&gt;
&lt;td&gt;N-1 streams&lt;/td&gt;
&lt;td&gt;1 stream&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server CPU Load&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Very High&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Client CPU Load&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability&lt;/td&gt;
&lt;td&gt;Poor&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Moderate-High&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layout Flexibility&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;5. Why SFU Is Dominating Modern Video Apps&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Today, services like &lt;strong&gt;Zoom, Google Meet, Jitsi, and many WebRTC SaaS platforms&lt;/strong&gt; rely on SFU for group calls because it:&lt;/p&gt;

&lt;p&gt;✔ Offers the best balance between scalability and performance&lt;br&gt;
✔ Allows custom layouts and controls&lt;br&gt;
✔ Supports simulcast adaptation to network conditions&lt;br&gt;
✔ Doesn’t overwhelm the server like classic MCU does ([Clan Meeting][2])&lt;/p&gt;

&lt;p&gt;MCU is still used for special cases like &lt;strong&gt;webinar broadcasting or legacy device support&lt;/strong&gt;, but SFU is the most widely deployed.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;6. Signaling, STUN &amp;amp; TURN — The Supporting Cast&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Real world WebRTC calls don’t magically connect peers:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ &lt;strong&gt;Signaling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;WebRTC uses &lt;strong&gt;signaling servers&lt;/strong&gt; (your app’s backend) to exchange metadata so peers can &lt;em&gt;discover&lt;/em&gt; each other and initiate connections.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ &lt;strong&gt;STUN&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Helps discover each peer’s public IP address through NAT.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ &lt;strong&gt;TURN&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Acts as a relay &lt;strong&gt;when direct connection isn’t possible&lt;/strong&gt; (e.g., firewalls).&lt;/p&gt;

&lt;p&gt;All of these &lt;em&gt;help establish&lt;/em&gt; WebRTC connections before any media is sent.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;7. Practical Examples to Visualize&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🧑‍🤝‍🧑 1-to-1 Call
&lt;/h3&gt;

&lt;p&gt;✔ Mesh / P2P&lt;br&gt;
✔ Direct connection — minimal cost&lt;br&gt;
✔ Best for simple calls&lt;/p&gt;

&lt;h3&gt;
  
  
  👩‍👩‍👦 Small Group (3–6 users)
&lt;/h3&gt;

&lt;p&gt;✔ Mesh still kinda works&lt;br&gt;
✔ But upload &amp;amp; CPU start suffering&lt;/p&gt;

&lt;h3&gt;
  
  
  🧑‍💻 Large Group (8–50+ users)
&lt;/h3&gt;

&lt;p&gt;✔ &lt;strong&gt;Best with SFU&lt;/strong&gt;&lt;br&gt;
✔ Each user uploads once, downloads only what they want&lt;br&gt;
✔ Clients can choose video layout&lt;/p&gt;

&lt;h3&gt;
  
  
  📺 Webinar / Broadcast
&lt;/h3&gt;

&lt;p&gt;✔ &lt;strong&gt;MCU or Hybrid&lt;/strong&gt;&lt;br&gt;
✔ Mixed stream broadcast to many viewers&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;8. Summary — How WebRTC Makes Video Conferencing Work&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;WebRTC enables real-time audio/video streaming&lt;/strong&gt; in browsers and apps.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;two peers&lt;/strong&gt;, direct P2P works fine.&lt;/li&gt;
&lt;li&gt;As participants grow, P2P becomes inefficient (mesh).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SFU&lt;/strong&gt; solves this by forwarding streams through a central server with minimal processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCU&lt;/strong&gt; mixes all media into one stream but at high server cost.&lt;/li&gt;
&lt;li&gt;Real apps often use hybrid models — e.g., P2P when only 2 users, SFU for groups, and even MCU for broadcasting large sessions.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>systemdesign</category>
      <category>devops</category>
      <category>webrtc</category>
      <category>webdev</category>
    </item>
    <item>
      <title>WebSocket VS Polling VS SSE</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Sat, 03 Jan 2026 20:40:21 +0000</pubDate>
      <link>https://dev.to/abirk/websocket-vs-polling-vs-sse-17ii</link>
      <guid>https://dev.to/abirk/websocket-vs-polling-vs-sse-17ii</guid>
      <description>&lt;h2&gt;
  
  
  📌 The Classic Request-Response Model (and Its Limitations)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How Standard Web Apps Work
&lt;/h3&gt;

&lt;p&gt;In a typical web app:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;client&lt;/strong&gt; (browser/app) sends a request to the server.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;server&lt;/strong&gt; processes it (DB access, computation, etc.).&lt;/li&gt;
&lt;li&gt;The server sends back a &lt;strong&gt;response&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The connection closes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This cycle is simple and efficient for most applications.&lt;/p&gt;

&lt;p&gt;👉 But here’s the key problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Once the response is done, the server &lt;em&gt;cannot&lt;/em&gt; send fresh data to the client unless the client asks again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Example: A Stock Market App
&lt;/h3&gt;

&lt;p&gt;Suppose you have a simple stock application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧑‍💻 Clients A, B, C connect and request current stock prices.&lt;/li&gt;
&lt;li&gt;📡 The server responds — and bam! connection closes.&lt;/li&gt;
&lt;li&gt;📉 Later, prices change on the server.&lt;/li&gt;
&lt;li&gt;But clients A, B, C still only have &lt;em&gt;old&lt;/em&gt; (stale) data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This becomes a real-time problem:&lt;br&gt;
👉 How does the server tell clients that data has changed?&lt;/p&gt;


&lt;h2&gt;
  
  
  🚀 Solution 1: WebSockets
&lt;/h2&gt;

&lt;p&gt;WebSockets let you keep a &lt;strong&gt;persistent full-duplex connection&lt;/strong&gt; open between clients and servers.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Does This Mean?
&lt;/h3&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → Server → Response → Connection closes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WebSockets keep the connection open:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client ↔ Server ↔ Client ↔ Server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The server to push updates anytime.&lt;/li&gt;
&lt;li&gt;The client to send data anytime.&lt;/li&gt;
&lt;li&gt;Both sides talk without closing the connection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How It Works (Simple Diagram)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client                         Server
  | — WebSocket handshake →     |
  |                             |
  | ← Accept &amp;amp; open channel —   |
  |                             |
  | — Updates can flow both →   |
  |                             |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the connection is open, either side can send data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros of WebSockets
&lt;/h3&gt;

&lt;p&gt;✅ Real real-time updates&lt;br&gt;
✅ Low latency&lt;br&gt;
✅ Full duplex (two-way communication)&lt;/p&gt;
&lt;h3&gt;
  
  
  Cons of WebSockets
&lt;/h3&gt;

&lt;p&gt;❌ Hard to scale — it’s &lt;strong&gt;stateful&lt;/strong&gt; (server must remember every connected client)&lt;br&gt;
❌ If you have millions of connections, scaling horizontally becomes expensive&lt;br&gt;
❌ Servers must synchronize updates among themselves in clustered systems&lt;/p&gt;


&lt;h2&gt;
  
  
  🚀 Solution 2: Polling
&lt;/h2&gt;

&lt;p&gt;Polling is the simplest alternative to WebSockets.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Is Polling?
&lt;/h3&gt;

&lt;p&gt;Instead of keeping a connection alive, the client asks the server again and again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client: “Any new updates?”
Server: “Nope.”
Client: “Any new updates?”
Server: “Yes — here you go!”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Simple Polling Example
&lt;/h3&gt;

&lt;p&gt;Let’s say the client checks every &lt;strong&gt;2 seconds&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0s → “Give me new data”
2s → “Give me new data”
4s → “Give me new data”
…
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If new data appears at 3.5s, the client will only get it at the next poll (4s).&lt;/p&gt;

&lt;p&gt;👉 That means the &lt;em&gt;maximum delay&lt;/em&gt; is equal to your poll interval — 2 seconds in this example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros of Polling
&lt;/h3&gt;

&lt;p&gt;✅ Easy to implement&lt;br&gt;
✅ Works with load balancers and many servers&lt;br&gt;
✅ Stateless — each request is independent&lt;/p&gt;

&lt;h3&gt;
  
  
  Cons of Polling
&lt;/h3&gt;

&lt;p&gt;❌ Not truly real-time&lt;br&gt;
❌ Can waste requests if no new data&lt;br&gt;
❌ Frequent polling may still add network load&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Solution 3: Long Polling
&lt;/h2&gt;

&lt;p&gt;Long polling is an optimized form of polling.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Long Polling?
&lt;/h3&gt;

&lt;p&gt;Instead of responding immediately, the server &lt;strong&gt;holds the request open&lt;/strong&gt; until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New data arrives, or&lt;/li&gt;
&lt;li&gt;A timeout expires&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it responds with data in one shot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: Long Polling for 5 Seconds
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → Server: “Any updates?”  
Server: Hold request for 5 seconds

If updates come within 5s:
  Server → Client: Latest updates
Then client immediately re-requests.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros of Long Polling
&lt;/h3&gt;

&lt;p&gt;✅ Fewer requests than short polling&lt;br&gt;
✅ More “real-time” feel than simple polling&lt;br&gt;
✅ Still stateless&lt;/p&gt;

&lt;h3&gt;
  
  
  Cons of Long Polling
&lt;/h3&gt;

&lt;p&gt;❌ Can still hold server resources&lt;br&gt;
❌ Not as instant as WebSockets&lt;br&gt;
❌ Server must manage held requests&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 Comparing the Approaches
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Real-Time&lt;/th&gt;
&lt;th&gt;Scalability&lt;/th&gt;
&lt;th&gt;Server Load&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Polling&lt;/td&gt;
&lt;td&gt;Moderate (delayed)&lt;/td&gt;
&lt;td&gt;🔥 Easy&lt;/td&gt;
&lt;td&gt;🔥 Medium&lt;/td&gt;
&lt;td&gt;🟢 Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long Polling&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;🔥 Good&lt;/td&gt;
&lt;td&gt;🔥 Medium&lt;/td&gt;
&lt;td&gt;🟡 Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebSockets&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;🔻 Hard&lt;/td&gt;
&lt;td&gt;🔻 High&lt;/td&gt;
&lt;td&gt;🟡 Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🧠 Real-World Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do You Always Need Full Real-Time?
&lt;/h3&gt;

&lt;p&gt;Not always.&lt;/p&gt;

&lt;p&gt;For example, in a stock chart app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You might only need fresh price &lt;em&gt;updates&lt;/em&gt;, not two-way communication.&lt;/li&gt;
&lt;li&gt;Buying/selling can still happen via regular POST API routes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WebSockets might be &lt;em&gt;overkill&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Polling or long polling might be perfectly fine.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Polling Works Well with Load Balancers
&lt;/h3&gt;

&lt;p&gt;When you scale with many backend servers and a load balancer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Polling requests get distributed across servers,&lt;/li&gt;
&lt;li&gt;You avoid being tied to one server connection,&lt;/li&gt;
&lt;li&gt;If a server goes down, your next poll goes to another healthy server.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏁 My Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Real-time systems aren’t magic — they’re about choosing the right tool for the job:&lt;/p&gt;

&lt;p&gt;🔹 Need instant push updates? → &lt;strong&gt;WebSockets&lt;/strong&gt;&lt;br&gt;
🔹 Need lightweight, scalable updates? → &lt;strong&gt;Polling / Long Polling&lt;/strong&gt;&lt;br&gt;
🔹 Want a mix of both? → Start with polling, evolve as needed&lt;/p&gt;

&lt;p&gt;Every choice has trade-offs. Understanding the fundamental communication patterns helps you make the best architectural decision — and prevents unnecessary complexity early on.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>networking</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Consistent Hashing - System Design</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Wed, 31 Dec 2025 22:10:07 +0000</pubDate>
      <link>https://dev.to/abirk/consistent-hashing-system-design-4167</link>
      <guid>https://dev.to/abirk/consistent-hashing-system-design-4167</guid>
      <description>&lt;h2&gt;
  
  
  📌 1) 💥 The Core Problem: Traditional Hashing Breaks in Distributed Systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ The Scenario
&lt;/h3&gt;

&lt;p&gt;In a distributed system (lots of servers handling data), we must decide &lt;strong&gt;which server stores what data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A naive approach might be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;serverIndex = hash(key) % N
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;N&lt;/code&gt; = number of servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚨 What Goes Wrong with This?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Reassignment on Scale Changes:&lt;/strong&gt;&lt;br&gt;
Suppose you initially have 3 servers, so you store data using &lt;code&gt;hash(key) % 3&lt;/code&gt;. If you add a 4th server — the output of &lt;code&gt;hash(key) % N&lt;/code&gt; changes for &lt;em&gt;almost all keys&lt;/em&gt; instead of just the new ones, because &lt;code&gt;N&lt;/code&gt; changed. This forces &lt;strong&gt;huge data reshuffling&lt;/strong&gt; across all 3 servers — terrible at scale.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Server Failures Reassign All Keys:&lt;/strong&gt;&lt;br&gt;
If one server dies, now &lt;code&gt;N&lt;/code&gt; changes again, so most keys will get recomputed to new locations — even if the data &lt;em&gt;itself&lt;/em&gt; didn’t move — causing many cache or lookup failures.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;➡ That means &lt;strong&gt;every server change leads to data migrations proportional to the size of the dataset&lt;/strong&gt; — extremely expensive for millions of keys.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 2) 🧠 The Core Idea of Consistent Hashing
&lt;/h2&gt;

&lt;p&gt;Consistent hashing solves exactly the above problems by reshaping the hashing strategy:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ Both &lt;em&gt;servers&lt;/em&gt; and &lt;em&gt;keys&lt;/em&gt; are placed onto the same &lt;strong&gt;circular hash space&lt;/strong&gt; (“hash ring”).
&lt;/h3&gt;

&lt;p&gt;Each server and each data key gets a &lt;strong&gt;hash value&lt;/strong&gt; that represents a position on this circle.&lt;/p&gt;

&lt;p&gt;Imagine the hash output as degrees on a clock:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 ——————————————— 359
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It wraps around like a circle — meaning address &lt;code&gt;359&lt;/code&gt; is next to &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔ The rule for placing data:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;To decide where a piece of data belongs, hash the key and then move &lt;em&gt;clockwise around the circle&lt;/em&gt; until you find the first server.&lt;/strong&gt;&lt;br&gt;
That server becomes the owner of that piece of data.&lt;/p&gt;

&lt;p&gt;This &lt;em&gt;clockwise traversal&lt;/em&gt; is the fundamental idea — and here’s why it matters.&lt;/p&gt;


&lt;h2&gt;
  
  
  📌 3) 🌀 How Clockwise Traversal Works — Step by Step
&lt;/h2&gt;
&lt;h3&gt;
  
  
  📍 Step A — Place Servers on a Ring
&lt;/h3&gt;

&lt;p&gt;When the system starts, each server’s identity (e.g., IP address) is hashed to a position:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server A -&amp;gt; hash = 50  
Server B -&amp;gt; hash = 150  
Server C -&amp;gt; hash = 300  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the hash ring, that might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 — A(50) — B(150) — C(300) — (wraps to 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This division implicitly creates &lt;em&gt;ranges&lt;/em&gt; of the ring managed by each server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From after C back to A covers one region&lt;/li&gt;
&lt;li&gt;From after A to B covers another&lt;/li&gt;
&lt;li&gt;And so on&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  📍 Step B — Assign Data Keys
&lt;/h3&gt;

&lt;p&gt;Now if you receive a data key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Key1 hashed -&amp;gt; 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You traverse &lt;em&gt;clockwise&lt;/em&gt; from position &lt;code&gt;100&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;100 -&amp;gt; next server clockwise = B(150)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So &lt;strong&gt;Key1 is stored on server B&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Another example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Key2 hashed -&amp;gt; 320  
320 -&amp;gt; next server clockwise = A(50, after wraparound)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key2 is stored on A — because after you go past the highest server hash, you wrap to the lowest one.&lt;/p&gt;

&lt;p&gt;This &lt;strong&gt;clockwise rule&lt;/strong&gt; ensures:&lt;/p&gt;

&lt;p&gt;👉 Every key maps to exactly &lt;em&gt;one&lt;/em&gt; server&lt;br&gt;
👉 You never have gaps — because the ring loops indefinitely&lt;/p&gt;


&lt;h2&gt;
  
  
  📌 4) 🧩 What Happens When a Server Is Added?
&lt;/h2&gt;
&lt;h3&gt;
  
  
  📌 The Problem Before Consistent Hashing
&lt;/h3&gt;

&lt;p&gt;Adding a new server normally forces remapping of &lt;em&gt;all keys&lt;/em&gt;. That means huge data movement.&lt;/p&gt;
&lt;h3&gt;
  
  
  📌 What Consistent Hashing Does Instead
&lt;/h3&gt;

&lt;p&gt;Suppose we add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server D -&amp;gt; hash = 200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the ring looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 — A(50) — B(150) — D(200) — C(300)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only keys that &lt;strong&gt;fell between B and D in the ring used to be assigned to C&lt;/strong&gt;, &lt;em&gt;before&lt;/em&gt; D existed.&lt;/p&gt;

&lt;p&gt;Now when you insert D, data whose hashes lie between &lt;code&gt;B(150)&lt;/code&gt; and &lt;code&gt;D(200)&lt;/code&gt; will be transferred to D — but &lt;strong&gt;all other keys stay exactly where they are&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the critical benefit:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧠 &lt;strong&gt;Only the keys in the range that D takes over change their assignment. Everything else stays the same.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that’s &lt;em&gt;exactly&lt;/em&gt; what “consistent” means — only a small, &lt;em&gt;predictable&lt;/em&gt; subset is redistributed.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 5) 🧠 What Happens When a Server Is Removed or Fails?
&lt;/h2&gt;

&lt;p&gt;Let’s say server B (at hash 150) fails.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All keys that were assigned to B go to the &lt;em&gt;next server clockwise&lt;/em&gt; — which now is D (at 200).&lt;/li&gt;
&lt;li&gt;Keys originally mapped to A and C remain untouched.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means most keys stay where they were, &lt;strong&gt;only the ones belonging to the removed server migrate&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 6) 📌 Why This Minimizes Disruption
&lt;/h2&gt;

&lt;p&gt;Traditional &lt;code&gt;% N&lt;/code&gt; hashing redistributes almost &lt;strong&gt;all keys&lt;/strong&gt; when &lt;code&gt;N&lt;/code&gt; changes.&lt;/p&gt;

&lt;p&gt;Consistent hashing redistributes only the keys that were mapped to:&lt;/p&gt;

&lt;p&gt;✔ the area between the new server’s predecessor and itself (on addition)&lt;/p&gt;

&lt;p&gt;✔ the removed server’s range (on removal)&lt;/p&gt;

&lt;p&gt;That’s only ~&lt;strong&gt;1/N&lt;/strong&gt; of the total keys — meaning only a &lt;strong&gt;small portion&lt;/strong&gt; moves.&lt;/p&gt;

&lt;p&gt;This is why consistent hashing scales beautifully.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 7) 🧠 Load Balancing &amp;amp; Virtual Nodes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ⚠ Uneven Load Problem
&lt;/h3&gt;

&lt;p&gt;Without extra care, a server could accidentally be placed such that it covers a large arc of the ring — leading to &lt;em&gt;uneven load&lt;/em&gt;: one server gets many keys, others get few.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Solution: Virtual Nodes
&lt;/h3&gt;

&lt;p&gt;Instead of mapping each server &lt;em&gt;once&lt;/em&gt; on the ring, each server gets &lt;em&gt;many virtual points&lt;/em&gt; (replicas) scattered around the circle.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server A -&amp;gt; spots at 10, 110, 210  
Server B -&amp;gt; spots at 40, 140, 240  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This spreads the data load more evenly, because each server participates in many regions of the hash space — smoothing out uneven gaps.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 8) 🔎 Practical Uses &amp;amp; Why It Matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Consistent hashing is widely used in real production systems&lt;/strong&gt; to enable:&lt;/p&gt;

&lt;p&gt;✅ Distributed caching (e.g., Memcached, Redis) — so cache nodes can scale without evictions everywhere.&lt;br&gt;
✅ Distributed databases (e.g., Cassandra, Dynamo) — to shard data efficiently.&lt;br&gt;
✅ Content Delivery Networks (CDNs) — to cache content close to clients with minimal reshuffle.&lt;br&gt;
✅ Load Balancing in microservices — to route requests consistently by user/session.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 9) Summary: Why It Matters in Real Systems
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Traditional Hashing&lt;/th&gt;
&lt;th&gt;Consistent Hashing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Key mapping&lt;/td&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;td&gt;Circular traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node addition&lt;/td&gt;
&lt;td&gt;Redistributes &lt;em&gt;almost all keys&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Only ~1/N keys move&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node removal&lt;/td&gt;
&lt;td&gt;Redistributes &lt;em&gt;almost all keys&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Only keys from removed node move&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load balance&lt;/td&gt;
&lt;td&gt;Can be uneven&lt;/td&gt;
&lt;td&gt;Virtual nodes smooth it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Consistent hashing turns what would be a chaotic, system-wide reshuffle into a &lt;em&gt;local, predictable relocation&lt;/em&gt; — ideal for high-scale, dynamic infrastructure.&lt;/p&gt;




</description>
      <category>systemdesign</category>
      <category>algorithms</category>
      <category>architecture</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Event Sourcing - System Design Pattern</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Tue, 30 Dec 2025 13:02:03 +0000</pubDate>
      <link>https://dev.to/abirk/event-sourcing-system-design-pattern-10k7</link>
      <guid>https://dev.to/abirk/event-sourcing-system-design-pattern-10k7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;“Imagine every action in your system writes to a timeline. This timeline can be read later to rebuild any version of the system — like time travel.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  ✅ &lt;strong&gt;The Problem with Traditional CRUD Systems&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In traditional systems (like most apps we’ve built):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We update the database to change state (e.g., set &lt;code&gt;status = “processed”&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;We &lt;em&gt;overwrite&lt;/em&gt; old values&lt;/li&gt;
&lt;li&gt;We lose history — we only store the &lt;em&gt;latest&lt;/em&gt; state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📌 This leads to real problems such as:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No audit trail&lt;/strong&gt;&lt;br&gt;
We often can’t answer questions like: &lt;em&gt;“What exactly happened to this order between 10:01 and 10:03?”&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inconsistencies due to partial failures&lt;/strong&gt;&lt;br&gt;
If part of a workflow fails (e.g., processing succeeds, but updating state fails), the system goes into an &lt;em&gt;inconsistent state&lt;/em&gt; with no clear way to fix it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hard to debug or replay history&lt;/strong&gt;&lt;br&gt;
we cannot rewind to a &lt;em&gt;point in time&lt;/em&gt; and reconstruct what state should have been.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 As systems scale with heavy workloads, these problems get worse. we need a better way to track changes than just “update this value now.”&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ &lt;strong&gt;Event Sourcing — The Core Idea (Solved Problem)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Event Sourcing says:&lt;/strong&gt;&lt;br&gt;
👉 Instead of saving &lt;em&gt;only the current state&lt;/em&gt; in the database, save &lt;em&gt;every change as an event&lt;/em&gt; in order.&lt;/p&gt;

&lt;p&gt;These events are:&lt;/p&gt;

&lt;p&gt;✔ Immutable (never changed after they’re written)&lt;br&gt;
✔ Ordered (every event has a timestamp or sequence)&lt;br&gt;
✔ Replayed to reconstruct the current state&lt;/p&gt;

&lt;p&gt;So instead of doing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We store events like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PriceChanged from 90 ➝ 100 at 10:01AM&lt;/li&gt;
&lt;li&gt;PriceChanged from 100 ➝ 110 at 10:10AM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To &lt;em&gt;compute&lt;/em&gt; the current state, we simply &lt;strong&gt;replay&lt;/strong&gt; those events.&lt;/p&gt;


&lt;h2&gt;
  
  
  💡 &lt;strong&gt;What Event Sourcing Solves (In Simple Terms)&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional CRUD&lt;/th&gt;
&lt;th&gt;Event Sourcing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Only current state&lt;/td&gt;
&lt;td&gt;Full history of all changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard to track why something happened&lt;/td&gt;
&lt;td&gt;we can replay to see &lt;em&gt;why&lt;/em&gt; something happened&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Race conditions can corrupt data&lt;/td&gt;
&lt;td&gt;we always record events in a safe log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard to debug&lt;/td&gt;
&lt;td&gt;we’ve got an audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So the &lt;strong&gt;problem&lt;/strong&gt; being solved is not just scaling — it’s:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“How do we store &lt;em&gt;every&lt;/em&gt; change in a way we can trace, debug, and rebuild the system state reliably?”&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  📦 &lt;strong&gt;Event Sourcing Architecture (AWS)&lt;/strong&gt;
&lt;/h2&gt;


&lt;h2&gt;
  
  
  🧱 &lt;strong&gt;AWS Architecture Example — Ride Booking (From AWS Guidance)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AWS provides a real architecture pattern for event sourcing:&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;1. User Action — Client Calls API Gateway&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A user does something, e.g., &lt;em&gt;Book a Ride&lt;/em&gt;.&lt;br&gt;
This request first hits &lt;strong&gt;Amazon API Gateway&lt;/strong&gt;, which exposes a public API endpoint.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;2. Lambda Writes an Event to AWS Kinesis(Kafka in AWS)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Lambda function acts as a &lt;em&gt;command handler&lt;/em&gt;:&lt;/p&gt;

&lt;p&gt;✔ It checks business logic&lt;br&gt;
✔ It creates an event like &lt;code&gt;RideBooked&lt;/code&gt;&lt;br&gt;
✔ It sends this event to &lt;strong&gt;Amazon Kinesis Data Streams&lt;/strong&gt; — an append-only event storage and streaming service&lt;/p&gt;

&lt;p&gt;📌 &lt;strong&gt;Why Kinesis?&lt;/strong&gt;&lt;br&gt;
Because it can handle very high write throughput and acts as an &lt;strong&gt;event log&lt;/strong&gt; we can replay.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;3. Events Are Stored &amp;amp; Archived&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Kinesis doesn’t just stream — we can also:&lt;/p&gt;

&lt;p&gt;✔ Archive events in &lt;strong&gt;Amazon S3&lt;/strong&gt; for long-term retention (for compliance &amp;amp; audits)&lt;br&gt;
✔ Retain events for replay or future analysis&lt;/p&gt;

&lt;p&gt;This means our system generates a complete history of every change, backed up indefinitely.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;4. Event Processor Lambda Builds Materialized Views&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Another Lambda function consumes events from Kinesis to build &lt;strong&gt;read models&lt;/strong&gt; (optimized tables that are easy to query). Typical read stores are:&lt;/p&gt;

&lt;p&gt;✔ Amazon Aurora (MySQL/PostgreSQL)&lt;br&gt;
✔ Amazon DynamoDB&lt;/p&gt;

&lt;p&gt;This process creates &lt;em&gt;current state views&lt;/em&gt; for read-heavy use cases.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;5. Replay to Rebuild State (Hydration Model)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If something goes wrong, or we want to compute state at any point in time, we simply &lt;strong&gt;replay the events using hydration model&lt;/strong&gt; stored in Kinesis + archived in S3.&lt;/p&gt;

&lt;p&gt;This is what it calls &lt;strong&gt;Hydration&lt;/strong&gt; — re-deriving the current or historical state of the system from the event log.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧠 &lt;strong&gt;Hydration Model Explained (Simple)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Think of hydration as:&lt;/p&gt;

&lt;p&gt;🎬 &lt;strong&gt;Re-running the entire timeline of events&lt;/strong&gt;&lt;br&gt;
so that our system always ends up in the correct state.&lt;/p&gt;

&lt;p&gt;For example &lt;strong&gt;Video streaming platform&lt;/strong&gt;, in this service example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event 1: VideoUploaded&lt;/li&gt;
&lt;li&gt;Event 2: VideoProcessingStarted&lt;/li&gt;
&lt;li&gt;Event 3: VideoProcessingSucceeded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To know current state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;state = "initial"
apply VideoUploaded → state="uploaded"
apply VideoProcessingStarted → state="processing"
apply VideoProcessingSucceeded → state="success"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s &lt;strong&gt;Hydration&lt;/strong&gt; — it rebuilds state by replaying events in order, not by reading a single “status” value.&lt;/p&gt;




&lt;h2&gt;
  
  
  🐘 &lt;strong&gt;Why Kafka or Kinesis Are Used&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Both Kafka (used in the transcript example) and Kinesis (AWS alternative) are &lt;strong&gt;event streaming platforms&lt;/strong&gt; — essentially massive, durable, ordered logs of events. Also these are make sure specially, &lt;strong&gt;consumer group and topic partitions&lt;/strong&gt; concepts make sure processors are getting sequential events and patch those sequentially too.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why this matters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;✔ We can &lt;em&gt;replay events&lt;/em&gt; — essential for event sourcing&lt;br&gt;
✔ We can scale horizontally (many consumers)&lt;br&gt;
✔ We guarantee event order within partitions — crucial for replay and consistent state reconstruction&lt;/p&gt;


&lt;h2&gt;
  
  
  📌 &lt;strong&gt;Consumer Groups &amp;amp; Topic Partitions (Why They Matter)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When the event volume is large, we cannot have &lt;em&gt;one&lt;/em&gt; server read everything.&lt;/p&gt;

&lt;p&gt;So we use:&lt;/p&gt;
&lt;h3&gt;
  
  
  🔹 Kafka Consumer Group
&lt;/h3&gt;

&lt;p&gt;Multiple workers that form a group and share work.&lt;br&gt;
Each worker gets assigned &lt;em&gt;partitions&lt;/em&gt; so no duplicates occur.&lt;/p&gt;
&lt;h3&gt;
  
  
  🔹 Topic Partitions
&lt;/h3&gt;

&lt;p&gt;A topic (event category) is split into partitions — think of partitions as &lt;em&gt;divided lanes of the event log&lt;/em&gt;. This allows:&lt;/p&gt;

&lt;p&gt;✔ Parallel processing&lt;br&gt;
✔ Ordered event consumption &lt;em&gt;per partition&lt;/em&gt;&lt;br&gt;
✔ Scale without losing order for each entity&lt;/p&gt;

&lt;p&gt;For example in the video streaming platform pipelines, &lt;strong&gt;video A events are always in partition 0&lt;/strong&gt; and &lt;strong&gt;video B in partition 1&lt;/strong&gt;, so events for each video are always processed in order even across many workers.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;Problem Being Solved&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Traditional system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Database:
video_id | status
------------------
123      | "processing"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Problems:&lt;br&gt;
✔ What if the update failed?&lt;br&gt;
✔ What do you show to the user?&lt;br&gt;
✔ What if you need to know the exact steps the video went through?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event Sourcing Pattern&lt;/strong&gt; solves it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event Log:
1. VideoUploaded(videoID=123)
2. VideoProcessingStarted(videoID=123)
3. VideoProcessingProgress(videoID=123, percent=50)
4. VideoProcessingFailed(videoID=123, error="timeout")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To get state:&lt;/p&gt;

&lt;p&gt;Hydration Model reads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apply VideoUploaded → status="uploaded"
apply VideoProcessingStarted → status="processing"
apply VideoProcessingProgress → status="processing:50%"
apply VideoProcessingFailed → status="failed"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;We can even show why the failure happened&lt;/strong&gt; — something impossible with simple CRUD.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  🧩 &lt;strong&gt;AWS Services we can use&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API entrypoint&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;API Gateway&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command processor&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event storage&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kinesis Data Streams&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Archive &amp;amp; audit log&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event distribution&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EventBridge / DynamoDB Streams&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read-optimized views&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Aurora / DynamoDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async processing&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Lambda consumers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>cloudnative</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>CQRS Pattern and Event Sourcing System Design</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Mon, 29 Dec 2025 16:13:35 +0000</pubDate>
      <link>https://dev.to/abirk/cqrs-pattern-and-event-sourcing-system-design-leb</link>
      <guid>https://dev.to/abirk/cqrs-pattern-and-event-sourcing-system-design-leb</guid>
      <description>&lt;p&gt;&lt;strong&gt;Core Concepts and Overview&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CQRS (Command Query Responsibility Segregation) separates the operations that modify data (commands) from those that read data (queries).&lt;/li&gt;
&lt;li&gt;Traditional applications handle CRUD (Create, Read, Update, Delete) operations in a single database and layer, potentially causing bottlenecks during heavy read/write loads.&lt;/li&gt;
&lt;li&gt;CQRS addresses this by splitting the system into two parts:&lt;/li&gt;
&lt;li&gt;Command side: handles all data mutation (create, update, delete).&lt;/li&gt;
&lt;li&gt;Query side: handles all read operations.&lt;/li&gt;
&lt;li&gt;This separation helps optimize system performance, scalability, and maintainability, especially in high-complexity systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traditional Application Architecture and Its Limitations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users interact with a server layer exposing REST API endpoints (GET, POST, PATCH, DELETE).&lt;/li&gt;
&lt;li&gt;The server processes requests via controllers and service layers, directly performing CRUD operations on a single database.&lt;/li&gt;
&lt;li&gt;Scaling traditional apps involves vertical scaling (adding CPU/RAM) or horizontal scaling (adding more server instances).&lt;/li&gt;
&lt;li&gt;Bottleneck: When reads and writes compete on the same database, locks during updates cause delays and slow queries, especially under high load (example: Amazon product price updates vs reads).&lt;/li&gt;
&lt;li&gt;This leads to database contention and performance degradation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CQRS Pattern Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Presentation Layer&lt;/td&gt;
&lt;td&gt;User Interface and REST API endpoints that act as the entry point for all requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;Routes read (query) requests to the query side and mutation (command) requests to the command side&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command Side&lt;/td&gt;
&lt;td&gt;Handles commands (create, update, delete) and writes to a dedicated write database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query Side&lt;/td&gt;
&lt;td&gt;Handles queries (read operations) from a separate read database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event System&lt;/td&gt;
&lt;td&gt;Synchronizes changes from the write database to the read database using events and queues&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Separate Databases for reads and writes: read database is optimized for queries (often denormalized), write database is normalized and optimized for transactions.&lt;/li&gt;
&lt;li&gt;The write model processes commands validating and authorizing them before updating the write database.&lt;/li&gt;
&lt;li&gt;The read model processes queries against the read database, which is updated asynchronously via events emitted after writes.&lt;/li&gt;
&lt;li&gt;This results in eventual consistency between read and write databases, acceptable in many business scenarios but unsuitable for highly real-time systems (e.g., stock markets).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Event Sourcing Integration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CQRS can be combined with Event Sourcing, where every change is stored as an append-only event log rather than directly updating the state.&lt;/li&gt;
&lt;li&gt;The system stores immutable logs of all commands/events, which can be replayed to rebuild the current state of the database (hydration).&lt;/li&gt;
&lt;li&gt;This provides fault tolerance; if the read database becomes corrupt or stale, it can be regenerated from the event log.&lt;/li&gt;
&lt;li&gt;Event logs can also trigger side effects such as sending promotional or notification emails.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Practical AWS-Based System Design Example&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AWS Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;Routes requests based on HTTP method to command or query services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elastic Load Balancer (ELB)&lt;/td&gt;
&lt;td&gt;Distributes requests among multiple horizontally scaled EC2 instances for command/query services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EC2 Instances (Command Handlers)&lt;/td&gt;
&lt;td&gt;Execute commands, perform validation and authorization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka (or AWS Kinesis)&lt;/td&gt;
&lt;td&gt;Event/message broker for append-only event logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQS Queues&lt;/td&gt;
&lt;td&gt;Handle asynchronous event processing and fan-out to services like email notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda Functions&lt;/td&gt;
&lt;td&gt;Process events to update read database and trigger other actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB (Read DB)&lt;/td&gt;
&lt;td&gt;Stores denormalized data optimized for fast queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse or similar&lt;/td&gt;
&lt;td&gt;Example write database storing append-only logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudFront CDN&lt;/td&gt;
&lt;td&gt;Caches GET requests for faster read performance with cache invalidation upon updates&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;The architecture enables horizontal scalability, fault tolerance, and efficient separation of concerns.&lt;/li&gt;
&lt;li&gt;Read and write paths can be independently optimized with different database technologies (SQL for writes, NoSQL for reads).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits and Trade-offs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Improved scalability by separating reads and writes.&lt;/li&gt;
&lt;li&gt;Reduced contention and locking issues on databases.&lt;/li&gt;
&lt;li&gt;Flexibility to use different databases optimized for different workloads.&lt;/li&gt;
&lt;li&gt;Fault tolerance and recoverability via event sourcing.&lt;/li&gt;
&lt;li&gt;Ability to implement complex business logic and authorization in command handlers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Eventual consistency model means read data may lag slightly behind writes.&lt;/li&gt;
&lt;li&gt;Added architectural complexity unsuitable for small or simple applications.&lt;/li&gt;
&lt;li&gt;Complexity in keeping read and write databases synchronized.&lt;/li&gt;
&lt;li&gt;Not ideal for systems requiring strong real-time consistency guarantees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to Use CQRS&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suitable for complex, large-scale, distributed systems.&lt;/li&gt;
&lt;li&gt;When read and write workloads have different performance, scaling, or consistency requirements.&lt;/li&gt;
&lt;li&gt;When multiple microservices and databases are involved, requiring data segregation.&lt;/li&gt;
&lt;li&gt;When eventual consistency is acceptable for the business domain.&lt;/li&gt;
&lt;li&gt;Not recommended for small/simple applications or those needing immediate strong consistency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Insights&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CQRS is a powerful pattern for scaling complex applications by segregating commands and queries.&lt;/li&gt;
&lt;li&gt;The use of different databases for read and write sides is central to the pattern.&lt;/li&gt;
&lt;li&gt;Event sourcing complements CQRS by maintaining a reliable audit log and enabling system state reconstruction.&lt;/li&gt;
&lt;li&gt;AWS ecosystem components like API Gateway, ELB, EC2, Lambda, DynamoDB, Kafka/Kinesis, and SQS can effectively implement CQRS with event sourcing.&lt;/li&gt;
&lt;li&gt;Eventual consistency is a core characteristic and must be carefully evaluated against application needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffzxlesqmmppcla631th5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffzxlesqmmppcla631th5.png" alt=" " width="800" height="659"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Meet Pulsimo - Monitor Your Systems with Precision &amp; Power</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Sun, 16 Nov 2025 13:58:07 +0000</pubDate>
      <link>https://dev.to/abirk/meet-pulsimo-monitor-your-systems-with-precision-power-32ji</link>
      <guid>https://dev.to/abirk/meet-pulsimo-monitor-your-systems-with-precision-power-32ji</guid>
      <description>&lt;h3&gt;
  
  
  Have you ever wondered—
&lt;/h3&gt;

&lt;p&gt;If your production backend or database service crashes, &lt;strong&gt;how fast&lt;/strong&gt; do you actually get notified, and &lt;strong&gt;how quickly&lt;/strong&gt; can you jump into troubleshooting?&lt;/p&gt;

&lt;h3&gt;
  
  
  My Personal Research:
&lt;/h3&gt;

&lt;h3&gt;
  
  
  🏭 Typical Industrial Use Case
&lt;/h3&gt;

&lt;p&gt;If a Prometheus + Alertmanager setup is properly tuned, you usually get notified within &lt;strong&gt;1–1.5 minutes&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⏱️ As-Fast-As-Possible Estimated Timeline Theory
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Scrape Interval&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let’s assume Prometheus scrapes metrics every &lt;strong&gt;15–30 seconds&lt;/strong&gt;, which is common in well-optimized setups.&lt;br&gt;
If we take &lt;strong&gt;15 seconds&lt;/strong&gt; as the fastest scenario, the earliest delay starts here.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Rule Evaluation Interval&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After scraping, alerting rules are evaluated every &lt;strong&gt;15 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Rules Manifest (for: 1m or reduced)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Assume you've configured the rule such that if the service is down for &lt;strong&gt;10 seconds&lt;/strong&gt;, Prometheus should fire an alert.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Alertmanager buffering (minimal assumptions)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Ignoring group_wait, group_interval, repeat_interval to keep it raw—&lt;br&gt;
Let’s assume Alertmanager needs ~&lt;strong&gt;10 seconds&lt;/strong&gt; to process and send the first notification.&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 Combined Timeline
&lt;/h2&gt;

&lt;p&gt;Putting it all together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scrape delay → &lt;strong&gt;~15s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Rule evaluation delay → &lt;strong&gt;~15s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Down detection threshold → &lt;strong&gt;~10s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Alertmanager handling → &lt;strong&gt;~10s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Network jitter → (Optional small fluctuation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;Total: ~50 seconds – ~1 minute&lt;/strong&gt;&lt;br&gt;
In real-world noisy networks → &lt;strong&gt;up to 1.5 minutes&lt;/strong&gt;&lt;br&gt;
This means you start taking action &lt;strong&gt;1–1.5 minutes &lt;em&gt;after&lt;/em&gt; the actual outage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;During this time, your data loss may be small or large—depending on how critical the endpoint is.&lt;br&gt;
But for mission-critical endpoints, &lt;strong&gt;data loss &lt;em&gt;will&lt;/em&gt; happen&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 But what if you could know &lt;em&gt;within just 10 seconds&lt;/em&gt;?
&lt;/h2&gt;

&lt;p&gt;Imagine receiving outage alerts &lt;strong&gt;~50 seconds earlier&lt;/strong&gt; than Prometheus.&lt;/p&gt;

&lt;p&gt;Not just faster alerts—you could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Closely monitor application behavior in real-time&lt;/li&gt;
&lt;li&gt;Understand performance patterns&lt;/li&gt;
&lt;li&gt;Visualize dependency graphs&lt;/li&gt;
&lt;li&gt;Analyze blast radius&lt;/li&gt;
&lt;li&gt;Improve MTTR, SLA, SPOF detection&lt;/li&gt;
&lt;li&gt;Perform critical path analysis&lt;/li&gt;
&lt;li&gt;And much more...&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Introducing &lt;strong&gt;Pulsimo&lt;/strong&gt; 🎉
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;on-premise focused endpoint monitoring platform&lt;/strong&gt; designed to give ultra-fast detection and deep observability.&lt;/p&gt;

&lt;p&gt;Currently in &lt;strong&gt;public beta&lt;/strong&gt;.&lt;br&gt;
Any kind of feedback is truly appreciated.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://pulsimo.github.io" rel="noopener noreferrer"&gt;https://pulsimo.github.io&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If anyone is interested in contributing — feel free to reach out!&lt;/p&gt;




</description>
      <category>devops</category>
      <category>pulsimo</category>
      <category>monitoring</category>
      <category>prometheus</category>
    </item>
    <item>
      <title>Calico Node Readiness Probe Failed Issues</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Wed, 22 Oct 2025 18:11:04 +0000</pubDate>
      <link>https://dev.to/abirk/calico-node-readiness-probe-failed-issues-42i8</link>
      <guid>https://dev.to/abirk/calico-node-readiness-probe-failed-issues-42i8</guid>
      <description>&lt;h1&gt;
  
  
  🛠️ Resolving Calico Node Readiness Issues: A Practical Guide
&lt;/h1&gt;

&lt;h2&gt;
  
  
  🧩 Problem Overview
&lt;/h2&gt;

&lt;p&gt;In Kubernetes clusters utilizing Calico as the networking solution, nodes may occasionally report a "not ready" status due to BIRD (Border Gateway Protocol) not initializing properly. This issue often stems from Calico's IP autodetection mechanism selecting an unintended network interface, leading to misconfigured BGP sessions which is impacting node to node communication.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Symptoms
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pods on the affected node cannot communicate with pods on other nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The node's status is "not ready" in the Kubernetes cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;BIRD logs indicate errors like:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  bird: Unable to open configuration file /etc/calico/confd/config/bird.cfg: No such file or directory
  bird: Unable to open configuration file /etc/calico/confd/config/bird6.cfg: No such file or directory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These errors suggest that BIRD cannot find its configuration files, often due to incorrect IP autodetection.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧭 Root Cause
&lt;/h2&gt;

&lt;p&gt;Calico's default IP autodetection method (&lt;code&gt;first-found&lt;/code&gt;) may select an unintended interface, especially in nodes with multiple network interfaces. This misconfiguration can lead to BIRD being unable to establish proper BGP sessions, resulting in the node being marked as "not ready".&lt;/p&gt;

&lt;h2&gt;
  
  
  ✅ Solution Approach
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Identify the Correct Network Interface&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Determine the appropriate network interface for Calico's BGP peering. Typically, this would be the primary network interface used for inter-node communication.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Set IP Autodetection Method Temporarily&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To test the new configuration, set the &lt;code&gt;IP_AUTODETECTION_METHOD&lt;/code&gt; environment variable on the Calico node DaemonSet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;set env &lt;/span&gt;daemonset/calico-node &lt;span class="nt"&gt;-n&lt;/span&gt; calico-system &lt;span class="nv"&gt;IP_AUTODETECTION_METHOD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eth0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;eth0&lt;/code&gt; with the correct interface name identified in the previous step.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Verify the Configuration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Check the status of the Calico node pods to ensure they are running correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; calico-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, inspect the logs of the Calico node pods to confirm that BIRD has initialized without errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; calico-system calico-node-&amp;lt;pod-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. &lt;strong&gt;Set IP Autodetection Method Permanently&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To make the change permanent, update the Calico Installation resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;operator.tigera.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Installation&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tigera-operator&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;calicoNetwork&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nodeAddressAutodetectionV4&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;interface&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eth0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply the updated configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; &amp;lt;your-installation-file&amp;gt;.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that the specified interface is used for IP autodetection across all nodes in the cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Restart Calico Node Pods&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After applying the changes, restart the Calico node pods to apply the new configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout restart daemonset/calico-node &lt;span class="nt"&gt;-n&lt;/span&gt; calico-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command restarts the Calico node DaemonSet, ensuring that all pods pick up the new configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧪 Verification
&lt;/h2&gt;

&lt;p&gt;After completing the steps above, verify that the node has transitioned to a "ready" state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ensure that the node in question is listed as "Ready".&lt;/p&gt;

&lt;p&gt;Also, confirm that BIRD is running without errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; calico-system calico-node-&amp;lt;pod-id&amp;gt; &lt;span class="nt"&gt;--&lt;/span&gt; birdcl show status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output should indicate that BIRD is initialized and ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistent Configuration&lt;/strong&gt;: Ensure that the IP autodetection method is consistently configured across all nodes to avoid network inconsistencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Regular Monitoring&lt;/strong&gt;: Regularly monitor the status of Calico node pods and BIRD to detect and resolve issues promptly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;: Document the network interfaces and configurations used for IP autodetection to facilitate troubleshooting and future configurations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;By following this approach, you can resolve Calico node readiness issues related to IP autodetection and ensure stable networking within your Kubernetes cluster.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>networking</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>Local Docker Registry Setup Guide</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Fri, 14 Mar 2025 16:16:14 +0000</pubDate>
      <link>https://dev.to/abirk/local-docker-registry-setup-guide-1cno</link>
      <guid>https://dev.to/abirk/local-docker-registry-setup-guide-1cno</guid>
      <description>&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Make sure your machine has public IP associate with itself&lt;/li&gt;
&lt;li&gt;Ensure you have &lt;code&gt;sudo&lt;/code&gt; privileges on your system.&lt;/li&gt;
&lt;li&gt;Update your system's package list and upgrade existing packages.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Install Docker and Docker Compose
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Update Your System:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install Docker:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; docker.io
   &lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add User to Docker Group:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;usermod &lt;span class="nt"&gt;-aG&lt;/span&gt; docker &lt;span class="nv"&gt;$USER&lt;/span&gt;
   newgrp docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Verify Docker Installation:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   docker &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Run a Local Docker Registry
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run the Registry:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 5000:5000 &lt;span class="nt"&gt;--name&lt;/span&gt; registry &lt;span class="nt"&gt;--restart&lt;/span&gt; always registry:2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Verify the Registry is Running:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   curl http://localhost:5000/v2/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check Available Registry Images:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   curl http://localhost:5000/v2/_catalog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Secure the Registry with Authentication
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create Authentication Credentials:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /etc/docker/registry
   &lt;span class="nb"&gt;sudo chmod &lt;/span&gt;777 /etc/docker/registry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install Apache Utilities (htpasswd):&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; apache2-utils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generate Credentials:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   htpasswd &lt;span class="nt"&gt;-Bbn&lt;/span&gt; &amp;lt;username&amp;gt; &amp;lt;password&amp;gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/docker/registry/htpasswd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Login to the Private Registry:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   docker login localhost:5000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Secure the Registry with SSL/TLS
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install Certbot for SSL Certificates:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; certbot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generate an SSL Certificate:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;certbot certonly &lt;span class="nt"&gt;--standalone&lt;/span&gt; &lt;span class="nt"&gt;-d-&lt;/span&gt;&amp;lt;your_domain_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Run the Registry with SSL &amp;amp; Authentication:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At First Stop the running registry&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   docker stop registry &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; docker &lt;span class="nb"&gt;rm &lt;/span&gt;registry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the registry again with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 5000:5000 &lt;span class="nt"&gt;--name&lt;/span&gt; registry &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-v&lt;/span&gt; /etc/docker/registry:/auth &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-v&lt;/span&gt; /etc/letsencrypt:/certs &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"REGISTRY_AUTH=htpasswd"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"REGISTRY_AUTH_HTPASSWD_REALM=&amp;lt;your_realm&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"REGISTRY_HTTP_TLS_CERTIFICATE=/certs/live/&amp;lt;domain&amp;gt;/fullchain.pem"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"REGISTRY_HTTP_TLS_KEY=/certs/live/&amp;lt;domain&amp;gt;/privkey.pem"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   registry:2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Test Secure Connection:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   curl &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &amp;lt;user&amp;gt;:&lt;span class="s1"&gt;'&amp;lt;password&amp;gt;'&lt;/span&gt; https://&amp;lt;domain&amp;gt;:5000/v2/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;If you encounter any issues, run the following commands to adjust permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo chmod&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; 755 /etc/letsencrypt/
&lt;span class="nb"&gt;sudo chmod&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; 755 /etc/letsencrypt/live/
&lt;span class="nb"&gt;sudo chmod&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; 644 /etc/letsencrypt/live/&amp;lt;domain&amp;gt;/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="nb"&gt;sudo chmod&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; 644 /etc/letsencrypt/archive/&amp;lt;domain&amp;gt;/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;640 /etc/docker/registry/htpasswd
&lt;span class="nb"&gt;sudo chown &lt;/span&gt;root:docker /etc/docker/registry/htpasswd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>docker</category>
      <category>kubernetes</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Longhorn CSI pvc attachment issues fixing with multipath</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Wed, 19 Feb 2025 18:15:56 +0000</pubDate>
      <link>https://dev.to/abirk/longhorn-pvc-attachment-issues-fixing-with-multipath-5cd6</link>
      <guid>https://dev.to/abirk/longhorn-pvc-attachment-issues-fixing-with-multipath-5cd6</guid>
      <description>&lt;h1&gt;
  
  
  🚀 Longhorn CSI Mount Issue Fix
&lt;/h1&gt;

&lt;h2&gt;
  
  
  ❗ Issue
&lt;/h2&gt;

&lt;p&gt;Pods using Longhorn volumes may fail to start due to errors in &lt;code&gt;longhorn-csi-plugin&lt;/code&gt;, specifically related to &lt;strong&gt;mount failures&lt;/strong&gt; caused by &lt;code&gt;multipathd&lt;/code&gt;.  &lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Error Message
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-xxxx /var/lib/kubelet/pods/xxx/mount
Output: mount: /var/lib/kubelet/pods/xxx/mount: /dev/longhorn/pvc-xxxx already mounted or mount point busy.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🎯 Root Cause
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;multipath daemon (&lt;code&gt;multipathd&lt;/code&gt;)&lt;/strong&gt; automatically creates multipath devices for block devices, including Longhorn volumes. This results in &lt;strong&gt;conflicts when mounting Longhorn volumes&lt;/strong&gt;, preventing pods from starting.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1️⃣ Check Longhorn Devices
&lt;/h3&gt;

&lt;p&gt;Run the following command to &lt;strong&gt;list devices created by Longhorn&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lsblk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔹 Longhorn devices typically have names like &lt;code&gt;/dev/sd[x]&lt;/code&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  2️⃣ Modify &lt;code&gt;multipath.conf&lt;/code&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create the configuration file&lt;/strong&gt; (if it doesn’t exist):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo touch&lt;/span&gt; /etc/multipath.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add the following blacklist rule&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;   &lt;span class="n"&gt;blacklist&lt;/span&gt; {
       &lt;span class="n"&gt;devnode&lt;/span&gt; &lt;span class="s2"&gt;"^sd[a-z0-9]+"&lt;/span&gt;
   }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  3️⃣ Restart Multipath Service
&lt;/h3&gt;

&lt;p&gt;Apply the changes by restarting the multipath daemon:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart multipathd.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  4️⃣ Verify Configuration
&lt;/h3&gt;

&lt;p&gt;Check if the new configuration is applied:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;multipath &lt;span class="nt"&gt;-t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🎉 &lt;strong&gt;Your pods should now be able to mount Longhorn volumes correctly!&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📌 Additional Tips
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ensure that &lt;code&gt;longhorn-csi-plugin&lt;/code&gt; logs are clear of mount errors.&lt;/li&gt;
&lt;li&gt;If the issue persists, consider rebooting the node after applying the fix.&lt;/li&gt;
&lt;li&gt;Check the status of multipath with:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  systemctl status multipathd.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🛠️ Need More Help?
&lt;/h3&gt;

&lt;p&gt;🔹 Visit the &lt;a href="https://longhorn.io/docs/" rel="noopener noreferrer"&gt;Longhorn Documentation&lt;/a&gt;&lt;br&gt;&lt;br&gt;
🔹 Join the &lt;a href="https://github.com/longhorn/longhorn" rel="noopener noreferrer"&gt;Longhorn Community&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Happy Deploying!&lt;/strong&gt;  &lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>webdev</category>
      <category>tutorial</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
    <item>
      <title>AWS EBS Multi-attach Clustered Storage System with GlusterFS</title>
      <dc:creator>Kader Khan</dc:creator>
      <pubDate>Wed, 19 Feb 2025 18:04:40 +0000</pubDate>
      <link>https://dev.to/abirk/aws-ebs-multi-attach-clustered-storage-system-with-glusterfs-37l7</link>
      <guid>https://dev.to/abirk/aws-ebs-multi-attach-clustered-storage-system-with-glusterfs-37l7</guid>
      <description>&lt;h1&gt;
  
  
  Clustered Storage System with GlusterFS on AWS EC2 Instances
&lt;/h1&gt;

&lt;p&gt;This guide describes how to set up a clustered storage system using GlusterFS on two AWS EC2 instances, utilizing EBS Multi-Attach for shared storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Minimum 2 EC2 (minimum c5.large) instances &lt;/li&gt;
&lt;li&gt;  Both instances in the same AWS region and availability zone&lt;/li&gt;
&lt;li&gt;  Tested on AWS Provided Ubuntu 24.04 LTS&lt;/li&gt;
&lt;li&gt;  EBS Volume Type: Provisioned IOPS SSD (io2) to support Multi-Attach&lt;/li&gt;
&lt;li&gt;  SSH access to EC2 instances (ensure both instances have public IPs)&lt;/li&gt;
&lt;li&gt;  External EBS Multi-Attach enabled for volumes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Solution Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Create EC2 Instances and EBS Volumes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Attach EBS Volumes to Both Instances&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Format and Mount EBS Volumes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Install GlusterFS and Set Up Cluster&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create GlusterFS Volume and Mount Shared Storage&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Auto-mount EBS Volume on Instance Reboot&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Respective Diagram
&lt;/h3&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6xxxiacorp4mmdcpqtar.png" alt="Image description" width="800" height="428"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Step 1: Create EC2 Instances and EBS Volumes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 Create EC2 Instances
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Create 2 EC2 instances in the same region and availability zone on aws console.&lt;/li&gt;
&lt;li&gt;  Assign public IPs to the instances so that they can be accessed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1.2 Create EBS Volumes (io2)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Create EBS volumes with the type &lt;strong&gt;Provisioned IOPS SSD (io2)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Ensure that these volumes are in the same availability zone as your EC2 instances.&lt;/li&gt;
&lt;li&gt;  Enable &lt;strong&gt;Multi-Attach&lt;/strong&gt; for both volumes so they can be attached to multiple instances.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1.3 Attach EBS Volumes to EC2 Instances
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Attach the created EBS volumes to both EC2 instances.&lt;/li&gt;
&lt;li&gt;  Goto EBS Volume then Select the EBS Volume (io2) then on the upper right corner Select Action then Attach Volume.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 2: Format and Mount EBS Volumes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Verify EBS Volume Connection
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SSH into both EC2 instances and verify the connection of the EBS volume by running:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lsblk

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure that the volume is attached to the instance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.2 Format the EBS Volume
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Format the EBS volume once on either of the instances (only format it once for the shared file system):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;mkfs.xfs /dev/nvme1n1  &lt;span class="c"&gt;# Replace 'nvme1n1' with your disk name&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.3 Mount the EBS Volume
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Create a directory to mount the EBS volume:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; /home/ubuntu/data

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mount the EBS volume to the newly created directory:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;mount /dev/nvme1n1 /home/ubuntu/data

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Verify the mount:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; /home/ubuntu/data

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 3: Install GlusterFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Install GlusterFS on Both Instances
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Add the GlusterFS repository and install the GlusterFS server package:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;add-apt-repository ppa:gluster/glusterfs-10
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; glusterfs-server

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.2 Start GlusterFS Service
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Start the GlusterFS service on both instances:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start glusterd
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;glusterd

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 4: Set Up GlusterFS Cluster
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Peer Probe
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Choose one instance as the primary and the other as the secondary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On the &lt;strong&gt;primary instance&lt;/strong&gt;, run the following command to add the secondary instance to the cluster:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;gluster peer probe &amp;lt;secondary_instance_privateIP&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If successful, the output will display: &lt;code&gt;Success&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 5: Create GlusterFS Volume
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Create Shared Volume
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;On the &lt;strong&gt;primary instance&lt;/strong&gt;, create a GlusterFS volume called &lt;code&gt;shared-volume&lt;/code&gt; using the mounted EBS volumes. This volume will be replicated across both instances:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;gluster volume create shared-volume replica 2 transport tcp &amp;lt;primary_instance_privateIP&amp;gt;:/home/ubuntu/data &amp;lt;secondary_instance_privateIP&amp;gt;:/home/ubuntu/data force

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5.2 Start the GlusterFS Volume
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Start the &lt;code&gt;shared-volume&lt;/code&gt; on from any instances once:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;gluster volume start shared-volume

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 6: Mount GlusterFS Volume
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 Mount the GlusterFS Volume on Both Instances
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Create a mount point directory (e.g., &lt;code&gt;/mnt/shared&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; /mnt/shared

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mount the &lt;code&gt;shared-volume&lt;/code&gt; GlusterFS volume to this directory:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;mount &lt;span class="nt"&gt;-t&lt;/span&gt; glusterfs &amp;lt;primary_instance_privateIP&amp;gt;:/shared-volume /mnt/shared

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify that the GlusterFS volume is mounted correctly by creating or updating a file in the &lt;code&gt;/mnt/shared&lt;/code&gt; directory on either instance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 7: Auto-Mount EBS Volume on Reboot
&lt;/h2&gt;

&lt;h3&gt;
  
  
  7.1 Add to &lt;code&gt;/etc/fstab&lt;/code&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Edit &lt;code&gt;/etc/fstab&lt;/code&gt; to automatically mount the EBS volume on reboot:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'/dev/nvme1n1 /home/ubuntu/data xfs defaults 0 0'&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; /etc/fstab

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You have successfully set up a clustered storage system using GlusterFS with shared EBS volumes in AWS. The shared storage is now accessible from both EC2 instances, and the GlusterFS volume ensures that data is synchronized between the two instances.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>aws</category>
      <category>learning</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
