<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: LÊ PHAN TẤN LỘC</title>
    <description>The latest articles on DEV Community by LÊ PHAN TẤN LỘC (@tan_loc).</description>
    <link>https://dev.to/tan_loc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3904644%2F9bd14d1d-fc6c-4cbe-9721-adada58a2f92.jpg</url>
      <title>DEV Community: LÊ PHAN TẤN LỘC</title>
      <link>https://dev.to/tan_loc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tan_loc"/>
    <language>en</language>
    <item>
      <title>Cross-Zone Load Balancing trên AWS NLB: Bài Học Từ Triển Khai RabbitMQ Trên EKS</title>
      <dc:creator>LÊ PHAN TẤN LỘC</dc:creator>
      <pubDate>Tue, 05 May 2026 13:38:33 +0000</pubDate>
      <link>https://dev.to/tan_loc/cross-zone-load-balancing-tren-aws-nlb-bai-hoc-tu-trien-khai-rabbitmq-tren-eks-m9m</link>
      <guid>https://dev.to/tan_loc/cross-zone-load-balancing-tren-aws-nlb-bai-hoc-tu-trien-khai-rabbitmq-tren-eks-m9m</guid>
      <description>&lt;h1&gt;
  
  
  Cross-Zone Load Balancing trên AWS NLB: Bài Học Từ Triển Khai RabbitMQ Trên EKS
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tác giả:&lt;/strong&gt; Lê Phan Tấn Lộc — DevOps Engineer&lt;br&gt;
&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;AWS&lt;/code&gt;, &lt;code&gt;NLB&lt;/code&gt;, &lt;code&gt;EKS&lt;/code&gt;, &lt;code&gt;RabbitMQ&lt;/code&gt;, &lt;code&gt;Kubernetes&lt;/code&gt;, &lt;code&gt;Load Balancing&lt;/code&gt;, &lt;code&gt;Networking&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Mở Đầu
&lt;/h2&gt;

&lt;p&gt;Trong một lần triển khai RabbitMQ lên Amazon EKS theo mô hình Kubernetes Operator, tôi gặp phải một lỗi kỳ lạ: kết nối từ ứng dụng ngoài vào RabbitMQ qua AWS Network Load Balancer (NLB) lúc được, lúc không — hoàn toàn không nhất quán. Test port 5672 bằng &lt;code&gt;bash /dev/tcp&lt;/code&gt; thì TCP handshake thành công một lần, thất bại lần tiếp theo, rồi lại thành công. Không có lỗi ứng dụng, không có log rõ ràng, NLB target health đều &lt;code&gt;healthy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Sau khi đào sâu vào tài liệu AWS và kiến trúc mạng, tôi phát hiện ra nguyên nhân gốc rễ: &lt;strong&gt;Cross-Zone Load Balancing bị tắt mặc định trên NLB&lt;/strong&gt;. Bài này là phân tích kỹ thuật về cơ chế đó, tại sao nó gây ra vấn đề trong kiến trúc này, và cách khắc phục.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kiến Trúc Triển Khai
&lt;/h2&gt;

&lt;p&gt;Trước khi đi vào vấn đề, hãy hiểu bức tranh tổng thể của hệ thống.&lt;/p&gt;

&lt;h3&gt;
  
  
  Yêu Cầu
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mỗi project cần một RabbitMQ cluster riêng, độc lập&lt;/li&gt;
&lt;li&gt;Ứng dụng ngoài VPC (qua VPC Peering) cần kết nối vào cổng AMQP 5672&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quyết Định Kiến Trúc
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────┐
│                         EKS Cluster                              │
│                       (ap-northeast-1)                           │
│                                                                  │
│   ┌──────────────────────────────────────────────────────────┐  │
│   │  Namespace: devops                                        │  │
│   │                                                          │  │
│   │  ┌─────────────────────┐   ┌────────────────────────┐   │  │
│   │  │  RabbitmqCluster     │   │  RabbitmqCluster       │   │  │
│   │  │  rabbitmq-1        │   │  rabbitmq-2        │   │  │
│   │  │  (2 replicas)        │   │  (2 replicas)          │   │  │
│   │  └─────────────────────┘   └────────────────────────┘   │  │
│   │                                                          │  │
│   │  ┌────────────────────────────────────────────────────┐  │  │
│   │  │  RabbitMQ Cluster Operator (controller)            │  │  │
│   │  └────────────────────────────────────────────────────┘  │  │
│   └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │                              │
         ▼                              ▼
   ┌──────────┐                  ┌──────────────┐
   │   ALB    │                  │  NLB Internal│
   │(Layer 7) │                  │  (Layer 4)   │
   │ port 443 │                  │  port 5672+  │
   └──────────┘                  └──────────────┘
         │                              │
         ▼                              ▼
  rabbitmq-1.abc.io           App từ VPC Peering
  (Management UI)               (10.21.0.0/16)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hai cách expose dịch vụ:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;ALB (Layer 7)&lt;/th&gt;
&lt;th&gt;NLB (Layer 4)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dùng cho&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Management UI (HTTP 15672)&lt;/td&gt;
&lt;td&gt;AMQP protocol (TCP 5672)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tại sao&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ALB hiểu HTTP, hỗ trợ host-based routing, sticky sessions&lt;/td&gt;
&lt;td&gt;AMQP là TCP thuần — ALB không hỗ trợ TCP tùy ý&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cách expose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ingress (ALB Ingress Controller)&lt;/td&gt;
&lt;td&gt;TargetGroupBinding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Tại Sao Dùng TargetGroupBinding?
&lt;/h3&gt;

&lt;p&gt;Thay vì tạo một &lt;code&gt;Service: LoadBalancer&lt;/code&gt; mới (sẽ sinh ra một NLB mới, tốn tiền), chúng tôi dùng &lt;strong&gt;TargetGroupBinding&lt;/strong&gt; — một CRD của AWS Load Balancer Controller cho phép bind một Kubernetes Service vào một Target Group có sẵn của NLB. Mỗi project dùng một port khác nhau trên cùng một NLB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# TargetGroupBinding cho project 1&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;elbv2.k8s.aws/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TargetGroupBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rabbitmq-tgb&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;devops&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rabbitmq-1&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5672&lt;/span&gt;
  &lt;span class="na"&gt;targetGroupARN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:elasticloadbalancing:ap-northeast-1:...&lt;/span&gt;
  &lt;span class="na"&gt;targetType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ip&lt;/span&gt;
  &lt;span class="na"&gt;networking&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ipBlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.0.0.0/16&lt;/span&gt;    &lt;span class="c1"&gt;# VPC EKS&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5672&lt;/span&gt;
            &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ipBlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.21.0.0/16&lt;/span&gt;   &lt;span class="c1"&gt;# VPC Peering&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5672&lt;/span&gt;
            &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Vấn Đề: Kết Nối Lúc Được, Lúc Không
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mô Tả Triệu Chứng
&lt;/h3&gt;

&lt;p&gt;Khi test từ EC2 trong VPC peering (10.21.x.x) vào NLB internal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Lần 1: OK&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"echo &amp;gt;/dev/tcp/internal-nlb.amazonaws.com/5672"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OPEN"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"CLOSED"&lt;/span&gt;
OPEN

&lt;span class="c"&gt;# Lần 2: Fail&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"echo &amp;gt;/dev/tcp/internal-nlb.amazonaws.com/5672"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OPEN"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"CLOSED"&lt;/span&gt;
CLOSED

&lt;span class="c"&gt;# Lần 3: OK lại&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"echo &amp;gt;/dev/tcp/internal-nlb.amazonaws.com/5672"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"OPEN"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"CLOSED"&lt;/span&gt;
OPEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NLB Target Health:&lt;/strong&gt; Tất cả targets đều &lt;code&gt;healthy&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;RabbitMQ pods:&lt;/strong&gt; Running, không có crash hay restart&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Security Group / NACL:&lt;/strong&gt; Đã verify, rules đều đúng  &lt;/p&gt;

&lt;p&gt;Vấn đề không nằm ở ứng dụng, mà nằm ở tầng network, cụ thể là cách NLB định tuyến traffic qua các Availability Zone.&lt;/p&gt;


&lt;h2&gt;
  
  
  Cross-Zone Load Balancing: Cơ Chế Và Hành Vi
&lt;/h2&gt;
&lt;h3&gt;
  
  
  NLB Phân Phối Traffic Như Thế Nào?
&lt;/h3&gt;

&lt;p&gt;AWS NLB là Layer 4 load balancer. Khác với ALB, NLB &lt;strong&gt;không&lt;/strong&gt; kiểm tra HTTP headers hay URL — nó chỉ nhìn vào IP và port để quyết định routing.&lt;/p&gt;

&lt;p&gt;Khi bạn tạo một NLB, AWS deploy một &lt;strong&gt;load balancer node&lt;/strong&gt; (là một ENI thực tế với IP của AWS) tại &lt;strong&gt;mỗi Availability Zone mà NLB được enable&lt;/strong&gt;. DNS của NLB trả về tất cả các IP này.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;NLB&lt;/span&gt; &lt;span class="n"&gt;DNS&lt;/span&gt;: &lt;span class="n"&gt;internal&lt;/span&gt;-&lt;span class="n"&gt;xxxx&lt;/span&gt;.&lt;span class="n"&gt;elb&lt;/span&gt;.&lt;span class="n"&gt;ap&lt;/span&gt;-&lt;span class="n"&gt;northeast&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;.&lt;span class="n"&gt;amazonaws&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;
    → &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;1&lt;/span&gt;.&lt;span class="m"&gt;45&lt;/span&gt;  (&lt;span class="n"&gt;AZ&lt;/span&gt;: &lt;span class="n"&gt;ap&lt;/span&gt;-&lt;span class="n"&gt;northeast&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; — &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="n"&gt;NLB&lt;/span&gt;)
    → &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;2&lt;/span&gt;.&lt;span class="m"&gt;67&lt;/span&gt;  (&lt;span class="n"&gt;AZ&lt;/span&gt;: &lt;span class="n"&gt;ap&lt;/span&gt;-&lt;span class="n"&gt;northeast&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; — &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="n"&gt;NLB&lt;/span&gt;)
    → &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;3&lt;/span&gt;.&lt;span class="m"&gt;89&lt;/span&gt;  (&lt;span class="n"&gt;AZ&lt;/span&gt;: &lt;span class="n"&gt;ap&lt;/span&gt;-&lt;span class="n"&gt;northeast&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; — &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="n"&gt;NLB&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Client connect tới NLB → DNS resolver trả về một trong các IP trên (round-robin DNS hoặc dựa vào TTL cache) → NLB node nhận request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Zone Load Balancing Là Gì?
&lt;/h3&gt;

&lt;p&gt;Đây là thiết lập quyết định NLB node có thể route traffic sang AZ khác không.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-Zone = OFF (mặc định):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → NLB node AZ-A → CHỈ forward đến targets trong AZ-A
Client → NLB node AZ-C → CHỈ forward đến targets trong AZ-C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cross-Zone = ON:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → NLB node AZ-A → có thể forward đến targets ở BẤT KỲ AZ nào
Client → NLB node AZ-C → có thể forward đến targets ở BẤT KỲ AZ nào
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AWS Documentation:&lt;/strong&gt; &lt;em&gt;"By default, each Network Load Balancer node distributes traffic across the registered targets in its Availability Zone only."&lt;/em&gt;&lt;br&gt;&lt;br&gt;
— &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html" rel="noopener noreferrer"&gt;AWS NLB Documentation&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Vấn Đề Xảy Ra Khi Không Có Target Trong Một AZ
&lt;/h3&gt;

&lt;p&gt;Đây là điểm mấu chốt. Giả sử:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NLB được enable trên 3 AZs: ap-northeast-1a, 1c, 1d&lt;/li&gt;
&lt;li&gt;RabbitMQ pods chạy trên các nodes thuộc ap-northeast-1a và ap-northeast-1c&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Không có pod nào trong ap-northeast-1d&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Với Cross-Zone OFF:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NLB node 1a → healthy targets ✓ → forward OK
NLB node 1c → healthy targets ✓ → forward OK  
NLB node 1d → KHÔNG có targets → Connection REFUSED/TIMEOUT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Và AWS DNS phân phối đều cả 3 IP → 1/3 requests fail.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Tại Sao NLB Không Tự Xóa AZ Có Vấn Đề?
&lt;/h3&gt;

&lt;p&gt;Câu hỏi tự nhiên: nếu 1d không có target, NLB không tự remove IP đó khỏi DNS sao?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Câu trả lời: Có — nhưng chỉ khi health check fails, không phải khi không có targets.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hành vi chính xác:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trạng thái AZ&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Có targets, đều healthy&lt;/td&gt;
&lt;td&gt;DNS giữ IP, traffic normal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Có targets, tất cả unhealthy&lt;/td&gt;
&lt;td&gt;DNS &lt;strong&gt;xóa IP&lt;/strong&gt; của AZ đó&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Không có targets nào&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DNS &lt;strong&gt;vẫn giữ IP&lt;/strong&gt; của AZ đó&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tất cả AZ đều unhealthy&lt;/td&gt;
&lt;td&gt;Fail-open: DNS trả về tất cả IP&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Đây là điểm bẫy quan trọng: NLB không "biết" rằng một AZ trống. Health check chỉ chạy trên registered targets — nếu không có target, không có health check, không có fail signal → DNS không thay đổi.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  DNS TTL Và Hiệu Ứng Caching
&lt;/h2&gt;

&lt;p&gt;Dù AZ có vấn đề đã bị xóa khỏi DNS (trường hợp targets unhealthy), vẫn còn một tầng vấn đề: &lt;strong&gt;DNS TTL caching&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;NLB DNS có TTL = &lt;strong&gt;60 giây&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The DNS entry also specifies the time-to-live (TTL) of 60 seconds. This helps ensure that the IP addresses can be remapped quickly in response to changing traffic."&lt;/em&gt;&lt;br&gt;&lt;br&gt;
— &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html" rel="noopener noreferrer"&gt;AWS ELB Documentation&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Nhưng 60 giây là TTL của DNS record ở phía AWS. Trong thực tế:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OS DNS cache&lt;/strong&gt; — nhiều Linux distros không cache DNS theo TTL, nhưng &lt;code&gt;nscd&lt;/code&gt;, &lt;code&gt;systemd-resolved&lt;/code&gt; thì có&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application DNS cache&lt;/strong&gt; — JVM nặng về DNS caching (default là 30s, có thể config đến vĩnh viễn)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NLB internal propagation&lt;/strong&gt; — khi NLB thay đổi IP set, propagation mất vài chục giây&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Timeline khi một AZ bị remove:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;T+0s:   AZ bị mark unhealthy/empty → NLB starts removing from DNS
T+30s:  NLB DNS propagated, nhưng client vẫn dùng cached IP cũ
T+60s:  DNS TTL expire → client refresh → IP cũ không còn trong response
T+90s:  Tất cả connections mới đi đúng AZ có targets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trong khoảng T+0s đến T+60s: &lt;strong&gt;intermittent failures&lt;/strong&gt; — đúng như những gì tôi quan sát.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phân Tích Kiến Trúc Của Chúng Tôi
&lt;/h2&gt;

&lt;p&gt;Quay lại deployment cụ thể. NLB internal được gắn với các private subnet trải đều trên 3 AZ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;NLB internal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;├── ap-northeast-1a → subnet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.0.1.0/24 (private)&lt;/span&gt;
&lt;span class="na"&gt;├── ap-northeast-1c → subnet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.0.2.0/24 (private)&lt;/span&gt;  
&lt;span class="na"&gt;└── ap-northeast-1d → subnet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.0.3.0/24 (private)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Karpenter chạy RabbitMQ pods trên &lt;code&gt;node-on-demand&lt;/code&gt; pool. Pods được schedule vào các node mà Karpenter đang chạy — không đảm bảo phân bố đều 3 AZ. Khi pods chỉ rơi vào 2 trong 3 AZs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AZ 1a: rabbitmq-pod-0 → target healthy ✓
AZ 1c: rabbitmq-pod-1 → target healthy ✓
AZ 1d: (không có pod) → NLB node 1d không có target → connection refused
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Client từ VPC peering (10.21.x.x) resolve DNS → nhận cả 3 IP → đôi khi hit IP của AZ 1d → &lt;strong&gt;CLOSED&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Giải Pháp
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Bật Cross-Zone Load Balancing (Recommended)
&lt;/h3&gt;

&lt;p&gt;Đây là giải pháp đơn giản nhất:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Bật cross-zone trên NLB&lt;/span&gt;
aws elbv2 modify-load-balancer-attributes &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load-balancer-arn&lt;/span&gt; arn:aws:elasticloadbalancing:ap-northeast-1:xxx:loadbalancer/net/... &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attributes&lt;/span&gt; &lt;span class="nv"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;load_balancing.cross_zone.enabled,Value&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hoặc trong Terraform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lb"&lt;/span&gt; &lt;span class="s2"&gt;"rabbitmq_nlb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rabbitmq-internal"&lt;/span&gt;
  &lt;span class="nx"&gt;internal&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;load_balancer_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"network"&lt;/span&gt;

  &lt;span class="nx"&gt;enable_cross_zone_load_balancing&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Quan trọng!&lt;/span&gt;

  &lt;span class="nx"&gt;subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_1a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_1c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_1d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; AWS tính phí data transfer cross-AZ (~$0.01/GB). Với traffic AMQP thông thường, con số này nhỏ và đáng để đổi lấy sự ổn định.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kết quả sau khi bật:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → NLB node 1d (không có target local) → cross-zone → forward đến 1a hoặc 1c → OK ✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: Giới Hạn NLB Chỉ Dùng AZ Có Pods
&lt;/h3&gt;

&lt;p&gt;Nếu biết trước pods chạy ở AZ nào, chỉ enable NLB subnet cho những AZ đó:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Chỉ enable 2 subnets thay vì 3&lt;/span&gt;
aws elbv2 set-subnets &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load-balancer-arn&lt;/span&gt; arn:aws:elasticloadbalancing:... &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subnets&lt;/span&gt; subnet-1a subnet-1c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nhược điểm: phải maintain manually, không phù hợp với Karpenter dynamic scheduling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Dùng Pod Topology Spread Constraints
&lt;/h3&gt;

&lt;p&gt;Đảm bảo pods phân bố đúng AZ với NLB subnets. Trong &lt;code&gt;RabbitmqCluster&lt;/code&gt; spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;podAntiAffinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rabbitmq-1&lt;/span&gt;
          &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/hostname&lt;/span&gt;
  &lt;span class="c1"&gt;# Kết hợp với topology spread để đảm bảo phân bố đều AZ&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nhưng với Karpenter, cách này vẫn không đảm bảo 100% pods rải đều sang AZ có NLB subnet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ Recommendation: Option 1 (bật cross-zone) + Option 3 (anti-affinity) kết hợp.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TargetGroupBinding: Pattern Kết Nối Kubernetes Với NLB
&lt;/h2&gt;

&lt;p&gt;Phần này tôi muốn chia sẻ thêm về TargetGroupBinding — pattern ít được biết đến nhưng rất hữu ích trong môi trường EKS production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vấn Đề Với Service Type: LoadBalancer
&lt;/h3&gt;

&lt;p&gt;Cách thông thường để expose TCP service ra ngoài trong Kubernetes là:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoadBalancer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AWS Load Balancer Controller sẽ tự tạo một NLB mới. Nhưng khi bạn có nhiều services cần expose TCP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 NLB/service = tốn tiền (NLB ~$16/tháng + data transfer)&lt;/li&gt;
&lt;li&gt;Khó quản lý port khi nhiều projects dùng chung&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  TargetGroupBinding: Dùng Chung NLB, Port Khác Nhau
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;NLB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;internal-rabbitmq.example.com&lt;/span&gt;
&lt;span class="s"&gt;├── Port 5672 → Target Group 1   → rabbitmq-1 pods&lt;/span&gt;
&lt;span class="s"&gt;├── Port 5673 → Target Group 2 → rabbitmq-2 pods&lt;/span&gt;
&lt;span class="s"&gt;├── Port 5674 → Target Group 3 rabbitmq-3 pods&lt;/span&gt;
&lt;span class="s"&gt;└── Port 5675 → Target Group 4 → rabbitmq-4 pods&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mỗi project có một TargetGroupBinding riêng:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;elbv2.k8s.aws/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TargetGroupBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rabbitmq-2-tgb&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;devops&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rabbitmq-2&lt;/span&gt;  &lt;span class="c1"&gt;# ClusterIP service của RabbitmqCluster&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5672&lt;/span&gt;
  &lt;span class="na"&gt;targetGroupARN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:elasticloadbalancing:...:targetgroup/rabbitmq-2/xxx&lt;/span&gt;
  &lt;span class="na"&gt;targetType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ip&lt;/span&gt;
  &lt;span class="na"&gt;networking&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ipBlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.21.0.0/16&lt;/span&gt;  &lt;span class="c1"&gt;# VPC peering&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5673&lt;/span&gt;  &lt;span class="c1"&gt;# Port trên NLB cho project này&lt;/span&gt;
            &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AWS Load Balancer Controller tự động:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Discover pods của &lt;code&gt;rabbitmq-2&lt;/code&gt; service&lt;/li&gt;
&lt;li&gt;Register pod IPs vào target group ARN tương ứng&lt;/li&gt;
&lt;li&gt;Deregister pods khi chúng terminate&lt;/li&gt;
&lt;li&gt;Cập nhật khi pods scale up/down&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Không cần tạo NLB mới.&lt;/strong&gt; Tiết kiệm chi phí đáng kể cho môi trường nhiều project.&lt;/p&gt;




&lt;h2&gt;
  
  
  ALB Cho Management UI: Sticky Sessions Là Bắt Buộc
&lt;/h2&gt;

&lt;p&gt;Một bài học khác từ lần triển khai này: RabbitMQ Management UI yêu cầu session affinity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tại Sao?
&lt;/h3&gt;

&lt;p&gt;RabbitMQ Management UI là một ứng dụng multi-node. Khi bạn login vào node A, session token được lưu trên node A. Nếu request tiếp theo route đến node B → &lt;strong&gt;401 Unauthorized hoặc 502&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cấu Hình Sticky Sessions Trên ALB
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Trong Ingress annotations&lt;/span&gt;
&lt;span class="na"&gt;alb.ingress.kubernetes.io/target-group-attributes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="s"&gt;stickiness.enabled=true,&lt;/span&gt;
  &lt;span class="s"&gt;stickiness.lb_cookie.duration_seconds=86400&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ALB dùng cookie &lt;code&gt;AWSALBTG&lt;/code&gt; để pin session về một target cụ thể trong 24 giờ.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lưu ý:&lt;/strong&gt; Đây chỉ cần thiết cho Management UI (HTTP). AMQP connections (port 5672) không cần sticky session vì chúng duy trì persistent TCP connection — khi connection đã được establish đến một node, nó sẽ duy trì kết nối đó suốt vòng đời của connection.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Bảng Tóm Tắt: ALB vs NLB Cho RabbitMQ
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tiêu chí&lt;/th&gt;
&lt;th&gt;ALB&lt;/th&gt;
&lt;th&gt;NLB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7 (HTTP/HTTPS)&lt;/td&gt;
&lt;td&gt;4 (TCP/UDP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dùng cho RabbitMQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Management UI (15672)&lt;/td&gt;
&lt;td&gt;AMQP (5672)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sticky Sessions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✓ Cookie-based&lt;/td&gt;
&lt;td&gt;✓ Source IP-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSL Termination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✓ ACM Certificate&lt;/td&gt;
&lt;td&gt;✗ (pass-through)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-Zone Default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✓ Enabled&lt;/td&gt;
&lt;td&gt;✗ Disabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cao hơn (LCU)&lt;/td&gt;
&lt;td&gt;Thấp hơn cho TCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1ms thêm (HTTP processing)&lt;/td&gt;
&lt;td&gt;Cực thấp (Layer 4)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Checklist Triển Khai RabbitMQ Với NLB Trên EKS
&lt;/h2&gt;

&lt;p&gt;Dựa trên kinh nghiệm thực tế, đây là checklist cần check trước khi go-live:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;□ Cross-Zone Load Balancing bật trên NLB
□ TargetGroupBinding dùng targetType: ip (không phải instance)
□ Security Group / NACL cho phép traffic từ nguồn cần thiết
□ PodAntiAffinity: required (không phải preferred) để tránh 2 pods cùng node
□ ALB sticky sessions bật cho Management UI
□ PodDisruptionBudget: minAvailable &amp;gt;= 1 để tránh mất quorum khi drain
□ Persistence: storageClassName phải explicit (không để trống)
□ RabbitMQ additionalConfig: consumer_timeout để tránh timeout cho long-running consumers
□ Test connectivity từ tất cả subnets/VPCs sẽ dùng
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Kết Luận
&lt;/h2&gt;

&lt;p&gt;Vấn đề &lt;strong&gt;lúc được lúc không&lt;/strong&gt; trong network thường là triệu chứng của một trong ba nguyên nhân: DNS resolution không đồng nhất, Load Balancing không đều qua AZ, hoặc asymmetric routing trong VPC Peering.&lt;/p&gt;

&lt;p&gt;Trong trường hợp này, &lt;strong&gt;Cross-Zone Load Balancing mặc định bị tắt trên NLB&lt;/strong&gt; là root cause. Khi Karpenter schedule pods vào 2 trong 3 AZs mà NLB được enable, 1/3 requests sẽ hit NLB node không có local target → connection refused.&lt;/p&gt;

&lt;p&gt;Bài học chính:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Luôn bật Cross-Zone Load Balancing cho NLB&lt;/strong&gt; khi workload không đảm bảo phủ đều tất cả AZs — đặc biệt với dynamic schedulers như Karpenter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TargetGroupBinding&lt;/strong&gt; là pattern mạnh để share NLB giữa nhiều services, tiết kiệm chi phí trong môi trường multi-project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ALB sticky sessions là bắt buộc&lt;/strong&gt; cho RabbitMQ Management UI, không cần thiết cho AMQP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NLB DNS TTL = 60s&lt;/strong&gt; — sau khi thay đổi cấu hình NLB, cần đợi DNS propagate trước khi kết luận fix đã có hiệu quả&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PodAntiAffinity: required&lt;/strong&gt; thay vì preferred để đảm bảo HA thực sự — scheduler đã từng pack cả 2 replicas vào cùng một node trong môi trường thực&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Tài Liệu Tham Khảo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html" rel="noopener noreferrer"&gt;AWS NLB Documentation — Cross-Zone Load Balancing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html" rel="noopener noreferrer"&gt;AWS ELB — How Elastic Load Balancing Works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/targetgroupbinding/targetgroupbinding/" rel="noopener noreferrer"&gt;AWS Load Balancer Controller — TargetGroupBinding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.rabbitmq.com/kubernetes/operator/install-operator" rel="noopener noreferrer"&gt;RabbitMQ Cluster Operator — Installation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/rabbitmq/cluster-operator" rel="noopener noreferrer"&gt;RabbitMQ Cluster Operator — GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/advanced-multi-az-resilience-patterns/network-load-balancer-nlb-architectures.html" rel="noopener noreferrer"&gt;AWS Advanced Multi-AZ Resilience Patterns&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/optimizing-data-transfer-costs-when-using-aws-network-load-balancer/" rel="noopener noreferrer"&gt;Optimizing Data Transfer Costs with AWS NLB&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>k8s</category>
      <category>kubernetes</category>
      <category>elb</category>
    </item>
  </channel>
</rss>
