<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dayul Lee</title>
    <description>The latest articles on DEV Community by Dayul Lee (@lukyday007).</description>
    <link>https://dev.to/lukyday007</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3907102%2F0410295f-6d9a-4d67-83e9-b3508e712838.png</url>
      <title>DEV Community: Dayul Lee</title>
      <link>https://dev.to/lukyday007</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lukyday007"/>
    <language>en</language>
    <item>
      <title>Network Part 4 - Where to Split, Why to Read?</title>
      <dc:creator>Dayul Lee</dc:creator>
      <pubDate>Fri, 01 May 2026 09:04:36 +0000</pubDate>
      <link>https://dev.to/lukyday007/network-part-4-where-to-split-why-to-read-37n1</link>
      <guid>https://dev.to/lukyday007/network-part-4-where-to-split-why-to-read-37n1</guid>
      <description>&lt;p&gt;Published: April 29, 2026&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;October 4, 2021. Facebook, Instagram, and WhatsApp went completely dark for roughly six hours — all at once. The servers were fine. No bad deploy. A single command run during routine maintenance withdrew every one of Facebook's BGP routes. The internet forgot how to reach Facebook's data centers. Traffic had nowhere to go. Facebook ceased to exist on the internet.&lt;/p&gt;

&lt;p&gt;The servers were running. The load balancers were healthy. Everything was fine. Requests just couldn't get in. That's what happens when traffic distribution breaks at the routing layer. No matter how well-built the system behind the load balancer is — if requests can't reach it, none of it matters.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;Not all load balancers work the same way. Some look only at the outside of a packet and route it fast. Others open the packet, read what's inside, and decide based on the contents. In Part 1, the trade-off was clear: L4 is fast because it stays ignorant, L7 is precise because it pays to know. Load balancers face the same choice. Which layer do you split traffic at?&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  DNS Round Robin — Blind by Design
&lt;/h3&gt;

&lt;p&gt;The most primitive form of load balancing starts at DNS. Register multiple server IPs under one domain, and hand out a different IP in rotation for each incoming request. That's &lt;strong&gt;DNS round-robin&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.cloudflare.com/learning/dns/glossary/round-robin-dns/" rel="noopener noreferrer"&gt;Cloudflare Learning: What is round-robin DNS?&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;ref. &lt;a href="https://www.cloudflare.com/learning/performance/what-is-dns-load-balancing/" rel="noopener noreferrer"&gt;Cloudflare Learning: What is DNS load balancing?&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                ┌───────────────┐
                │     Client    │
                └───────────────┘
                        ↓
              "What's example.com?"
                        ↓
    ┌──────────────────────────────────────┐
    │               DNS Server             │
    │  (Returns a different IP each time)  │
    └──────────────────────────────────────┘
        ┌───────────────┼───────────────┐
  [1st request]   [2nd request]    [3rd request]
       ↙                ↓                ↘
 ┌────────────┐   ┌────────────┐   ┌────────────┐
 │  Server A  │   │ Server B   │   │  Server C  │
 │192.168.0.1 │   │192.168.0.2 │   │192.168.0.3 │
 └────────────┘   └────────────┘   └────────────┘

[Structural limits]

✗ Blind to server state
  → DNS keeps returning Server A even when it's overloaded
✗ Can't detect failures
  → DNS keeps responding with Server B's IP even after it goes down
✗ TTL caching
  → once a client receives an IP, it keeps hitting that server until TTL expires
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DNS round-robin looks balanced in theory.&lt;/p&gt;

&lt;p&gt;Imagine a theme park with three parking lots — A, B, and C. The navigation app at the entrance sends cars in rotation: first to A, next to B, then C. Arithmetically balanced.&lt;/p&gt;

&lt;p&gt;But the navigation app doesn't check the server every time. It trusts the answer it got for a fixed window. "This information is valid for 10 minutes" — timer starts, server goes unchecked. That's &lt;strong&gt;TTL (Time-To-Live)&lt;/strong&gt;: an expiration date on information.&lt;/p&gt;

&lt;p&gt;This is where the breakdown happens. Picture a convoy of tourist buses arriving back-to-back. The first bus gets "go to Lot A." Every bus behind it copies that answer without checking — their devices already have it cached. The server is ready to send the next convoy to Lots B and C. But the buses aren't asking anymore. Lot A is jammed. Lots B and C sit empty.&lt;/p&gt;

&lt;p&gt;Economist George Akerlof described this structure in his 1970 paper "The Market for Lemons" as &lt;strong&gt;Information Asymmetry&lt;/strong&gt;. In the used car market, sellers know the defects; buyers don't. That gap alone distorts the entire market.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://en.wikipedia.org/wiki/The_Market_for_Lemons" rel="noopener noreferrer"&gt;Information Asymmetry&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;DNS round-robin works the same way. The DNS server knows Server A is overloaded. The client won't find out until TTL expires. The distribution gets skewed — not because caching is broken, but because of a structural disconnect between the party that has the information and the party that needs it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DNS round-robin looks like load balancing. In practice, it's blind rotation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  L4 Load Balancer — Fast by Choice
&lt;/h3&gt;

&lt;p&gt;The L4 load balancer follows the same philosophy introduced in Part 1. It doesn't open the packet. It reads only the destination address (IP) and port number on the envelope, and decides where to send it from there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Transport Layer]

            ┌───────────────────┐
            │   Client Request  │
            └───────────────────┘      
                      ↓
      ┌─────────────────────────────────┐
      │         L4 Load Balancer        │
      │                                 │
      │         ✓ IP address            │
      │       ✓ Check Port number       │
      │        ✗ Packet content         │
      └─────────────────────────────────┘    
        ↙             ↓             ↘
  ┌───────────┐ ┌───────────┐ ┌───────────┐
  │ Server A  │ │ Server B  │ │ Server C  │
  └───────────┘ └───────────┘ └───────────┘      
  ( Based on IP hash or least connections )        
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No content inspection means fast decisions. Millions of concurrent connections, handled. It fits environments where large numbers of clients open simple TCP connections simultaneously — game servers, for example.&lt;/p&gt;

&lt;p&gt;L4 is a strategy that accepts the information gap. It makes routing decisions without knowing what's inside the packet. Where DNS round-robin failed because it lacked information, L4 turns that same ignorance into a deliberate choice. DNS misdistributes because it doesn't know. L4 trades knowing for speed.&lt;/p&gt;

&lt;p&gt;The limits follow from that choice. No content visibility means no URL-based routing. You can't send &lt;code&gt;/api/payments&lt;/code&gt; to the payments cluster and &lt;code&gt;/api/products&lt;/code&gt; to the product cluster. You can't read cookies. Session persistence isn't possible.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  L7 Load Balancer — Informed by Design
&lt;/h3&gt;

&lt;p&gt;The L7 load balancer reads the packet. HTTP headers, URL paths, cookies, request body. It opens the envelope, reads the letter, and routes it to whoever handles that specific content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Application Layer]

              ┌───────────────────┐
              │   Client Request  │
              └───────────────────┘
                        ↓
            ┌────────────────────────┐
            │    L7 Load Balancer    │
            │                        │
            │  ✓ IP address / port   │
            │  ✓ HTTP method / URL   │
            │  ✓ Host header         │
            │  ✓ Cookies / body      │
            └────────────────────────┘
          ↙             ↓             ↘
    ┌───────────┐  ┌──────────┐   ┌──────────┐
    │  Payment  │  │  Product │   │   User   │
    │  Server   │  │  Server  │   │  Server  │
    └───────────┘  └──────────┘   └──────────┘
               Routing based on URL path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading the URL means &lt;code&gt;/api/payments&lt;/code&gt; goes to the payments server and &lt;code&gt;/api/products&lt;/code&gt; goes to the product server. Reading cookies enables &lt;strong&gt;session persistence&lt;/strong&gt; — if a user's cart data lives only on Server A, L7 reads the user ID from the cookie and keeps sending that user back to Server A.&lt;/p&gt;

&lt;p&gt;L7 is a strategy that pays to close the information gap. Transaction Cost Theory from Part 2 applies here too. Acquiring information has a cost. Parsing headers, inspecting URLs, reading cookies — all of it is the price of knowing. In exchange for paying that cost, L7 can make decisions L4 simply cannot.&lt;/p&gt;

&lt;p&gt;The trade-off is structural. Every request gets parsed and interpreted. That overhead is categorically higher than L4. As traffic scales, the cost compounds.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  L4 vs L7 — Where the Bottleneck Is
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;L4 Load Balancer&lt;/th&gt;
      &lt;th&gt;L7 Load Balancer&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Sees&lt;/td&gt;
      &lt;td&gt;IP address, port number&lt;/td&gt;
      &lt;td&gt;HTTP headers, URL, cookies, request body&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Speed&lt;/td&gt;
      &lt;td&gt;Fast&lt;/td&gt;
      &lt;td&gt;Slower&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Routes by&lt;/td&gt;
      &lt;td&gt;Connection count, IP hash&lt;/td&gt;
      &lt;td&gt;URL path, cookies, headers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Can do&lt;/td&gt;
      &lt;td&gt;Simple TCP distribution&lt;/td&gt;
      &lt;td&gt;Content-based routing&lt;br&gt;A/B testing&lt;br&gt;Session persistence&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Common use&lt;/td&gt;
      &lt;td&gt;
        &lt;span&gt;Game servers&lt;/span&gt;
        &lt;span&gt;High-volume streaming&lt;/span&gt;
      &lt;/td&gt;
      &lt;td&gt;
        &lt;span&gt;API Gateway&lt;/span&gt;
        &lt;span&gt;Microservices&lt;/span&gt;
      &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Goldratt's Theory of Constraints from Network Part 1 applies directly here. The constraint is never fixed — &lt;strong&gt;it's wherever the system is closest to 100% saturation&lt;/strong&gt;. The question of which OSI layer is the bottleneck becomes the question of which load balancer to use.&lt;/p&gt;

&lt;p&gt;Concurrent connections approaching the limit: L4. Requests that need to be routed based on their content: L7. In practice, many production systems layer both — L4 receives traffic first and distributes it across server groups, then L7 handles fine-grained routing within each group.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.haproxy.com/blog/layer-4-and-layer-7-proxy-mode" rel="noopener noreferrer"&gt;HAProxy Blog: Layer 4 and Layer 7 Proxy Mode&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Own the Information, or Let It Go
&lt;/h3&gt;

&lt;p&gt;Three systems. Same problem. Three different answers.&lt;/p&gt;

&lt;p&gt;DNS round-robin failed because it couldn't see server state. It rotated blind. The information gap distorted the distribution.&lt;/p&gt;

&lt;p&gt;L4 chose to give up information. It makes decisions without knowing the contents — and converts that ignorance directly into speed. The gap becomes an asset.&lt;/p&gt;

&lt;p&gt;L7 chose to buy information. It pays in parsing time and gets precision in return. The gap gets closed at a cost.&lt;/p&gt;

&lt;p&gt;What Akerlof showed wasn't that information gaps are inherently bad — it's that &lt;strong&gt;how you handle the gap is what determines the outcome&lt;/strong&gt;. Used car markets that ignored the gap collapsed. Markets that bridged it with warranties survived.&lt;/p&gt;

&lt;p&gt;Load balancers work the same way. Ignore the gap and you get DNS. Accept it and you get L4. Close it and you get L7.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The question isn't whether the information gap exists. It's what you do with it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the same structure that's run through every part of this series. Goldratt asked where the constraint is. Coase and Williamson explained the conditions under which paying transaction costs makes sense. Akerlof showed how information gaps split behavior. Different names across four parts — but the same question underneath.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where is the bottleneck right now, and what are you willing to give up to clear it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;DNS round-robin assigns slots without knowing server state. Information asymmetry distorts the distribution. L4 gives up information and gets speed. L7 acquires information and gets precision. Each approach makes a different call on where to absorb the cost.&lt;/p&gt;

&lt;p&gt;Which layer you split traffic at isn't a technical preference — it's a trade-off decision. The same question this series has been asking from Part 1. Where is the constraint, and what do you give up to resolve it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Know where the bottleneck is, and you'll know where to split. Know how to handle the information gap, and you'll know how to split.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next up: everything covered so far — OSI layers, TCP handshake costs, HTTP evolution, load balancing — comes together in real systems. Three scenarios: an e-commerce platform, a live chat service, and a payment system. Where does the bottleneck form, and which choices resolve it?&lt;/p&gt;

</description>
      <category>network</category>
      <category>dns</category>
      <category>l4</category>
      <category>l7</category>
    </item>
    <item>
      <title>Network Part 3 - The Evolution of HTTP and the Cost of Every Trade-off</title>
      <dc:creator>Dayul Lee</dc:creator>
      <pubDate>Fri, 01 May 2026 08:57:17 +0000</pubDate>
      <link>https://dev.to/lukyday007/network-part-3-the-evolution-of-http-and-the-cost-of-every-trade-off-i7i</link>
      <guid>https://dev.to/lukyday007/network-part-3-the-evolution-of-http-and-the-cost-of-every-trade-off-i7i</guid>
      <description>&lt;p&gt;Published: April 25, 2026&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When the past answer becomes the present problem, we call it Path Dependency. TCP was designed in 1981. It solved the right problems for its time. By the time HTTP was carrying the modern web, that 40-year-old foundation was starting to show its age.&lt;/p&gt;

&lt;p&gt;Keep-Alive solved the contract problem. One connection, many requests. Cheaper by design. But the queue was still single-file. Fix the engine, and suddenly the road is the problem. That road was TCP.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP/1.1 — One Lane, No Exceptions
&lt;/h3&gt;

&lt;p&gt;HTTP/1.1 had one rigid rule: one connection handles one request at a time, in order.&lt;/p&gt;

&lt;p&gt;Keep-Alive meant you didn't have to renegotiate a new contract for every exchange. But the delivery itself was still sequential. One large image delayed at the front of the queue, and every lightweight text file behind it had to wait. The connection was alive — it just couldn't move two things at once. This is &lt;strong&gt;Head-of-Line Blocking (HOLB)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Browsers tried to route around it by opening up to six parallel connections per server. But that just brought back the port exhaustion and handshake overhead from Part 2. The problem wasn't solved. It was transferred — into a different form of cost.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP/2 — Same Road, Different Lane
&lt;/h3&gt;

&lt;p&gt;Released in 2015, HTTP/2 attacked HOL Blocking at the application layer with &lt;strong&gt;multiplexing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of sending whole files in sequence, HTTP/2 breaks requests and responses into small frames and interleaves them over a single connection. Small payloads can slip through between chunks of a large one. On paper, it looked like true parallel processing.&lt;/p&gt;

&lt;p&gt;But TCP was still underneath. And TCP is obsessed with order.&lt;/p&gt;

&lt;p&gt;If a single packet is lost in transit, TCP halts everything — including frames from entirely unrelated requests — until that packet is retransmitted and order is restored. HTTP/2 had widened the lanes at the application layer. The transport layer's old rules froze them anyway.&lt;/p&gt;

&lt;p&gt;This is Path Dependency in action. &lt;strong&gt;The choice to build HTTP on top of TCP meant every improvement had to work within TCP's constraints.&lt;/strong&gt; HTTP/2 was a sustaining innovation — the best possible improvement within the existing system. But it couldn't break the structural ceiling, because the ceiling was TCP itself.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://web.dev/articles/performance-http2" rel="noopener noreferrer"&gt;web.dev: HTTP/2&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP/3 — Cutting the Path
&lt;/h3&gt;

&lt;p&gt;Google made the call. Ditching TCP entirely.&lt;/p&gt;

&lt;p&gt;But this is where a common misconception takes hold. Dropping TCP didn't mean dropping reliability. It meant replacing everything TCP did with something that did it better.&lt;/p&gt;

&lt;p&gt;The new foundation is UDP. Unlike TCP, UDP makes no guarantees — no ordering, no retransmission, no reliability. It just fires packets and moves on. Google built QUIC directly on top of that bare foundation — taking on everything TCP used to handle (retransmission, encryption, connection management), but doing it per stream. One stream stalls, the rest keep moving. HOL Blocking, gone at the root.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐        ┌─────────────────┐
│      Before     │        │      After      │
├─────────────────┤        ├─────────────────┤
│   HTTP/1.1·2    │        │     HTTP/3      │
│        ↕        │    →   │        ↕        │
│       TCP       │        │      QUIC       │
│        ↕        │        │        ↕        │
│       IP        │        │       UDP       │
└─────────────────┘        └─────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why HTTP/3 is called "UDP-based." QUIC sits on top of UDP, and HTTP/3 runs on top of QUIC. TCP wasn't abandoned — its role was replaced.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why YouTube and Zoom were already on UDP&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Real-time streaming services made this call long ago. When a packet drops during a video call, waiting for TCP to retransmit it freezes the screen. It's better to skip that moment and move to the next frame. When continuity matters more than completeness, TCP's reliability becomes a liability.&lt;/p&gt;

&lt;p&gt;QUIC brought that same instinct to general web traffic. A lost packet stalls only the stream it belongs to. Everything else keeps moving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;0-RTT — Cutting the cost of security too&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 1.5 RTT cost from Part 2 was just the TCP handshake. Add HTTPS, and TLS negotiation stacks on top. One connection, up to 3 RTT before a single byte of data moves.&lt;/p&gt;

&lt;p&gt;QUIC has TLS 1.3 built in — connection and security negotiation happen simultaneously. Return visits skip the handshake entirely. The 1.5 RTT from Part 2 collapses to zero.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────┬────────────────────┐
│     TCP + TLS       │        QUIC        │
├─────────────────────┼────────────────────┤
│    TCP   1.5 RTT    │ First visit 1 RTT  │
│    TLS   1.5 RTT    │ Return visit 0RTT  │
│    ──────────────   │                    │
│    Total  3 RTT     │                    │
└─────────────────────┴────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trade-off is real. By abandoning TCP, the application now owns packet loss handling, connection reliability, and security. Simpler to use. Far more complex underneath.&lt;/p&gt;

&lt;p&gt;This is what the Innovator's Dilemma calls disruptive innovation. Not an improvement on what existed — a replacement of it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.chromium.org/quic/" rel="noopener noreferrer"&gt;Chromium: QUIC&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;ref. &lt;a href="https://www.rfc-editor.org/rfc/rfc9000" rel="noopener noreferrer"&gt;RFC 9000: QUIC&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  How the Three Versions Compare
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP/1.1  — Requests must wait in line
time →  t1     t2     t3     t4     t5
R1     [====] [====]
R2                   [====] [====]
R3                                 [====]

R2 can't start until R1 finishes. R3 can't start until R2 finishes.
──────────────────────────────────────────
HTTP/2  — One lost packet freezes everything
time →  t1     t2     t3     t4     t5
R1      [====] [====] [====]
R2      [====] ✕ ← packet lost
R3      [====]        ← waiting...  ← waiting...

When ✕ occurs, R1, R2, and R3 all freeze.
──────────────────────────────────────────
HTTP/3  — Only the affected stream pauses
time →  t1     t2     t3     t4     t5
R1      [====] [====] [====] [====]
R2      [====] ✕ ← packet lost        [retransmit]
R3      [====] [====] [====] [====] ← keeps moving

When ✕ occurs, only R2 pauses. R1 and R3 continue.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each version made a different call on where to absorb the cost.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Path Dependency and the Innovator's Dilemma
&lt;/h3&gt;

&lt;p&gt;The evolution of HTTP is a case study in two management concepts that explain why the right answer takes so long to arrive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path Dependency.&lt;/strong&gt; Every router and firewall on the internet was optimized for TCP. Even when better alternatives existed, leaving TCP behind wasn't just a technical decision — it was an infrastructure negotiation with the entire global network. A choice made in 1981 constrained technical decisions well into the 2020s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Innovator's Dilemma.&lt;/strong&gt; HTTP/1.1 to HTTP/2 was sustaining innovation — improving performance while staying within the existing system. Safe, but structurally limited. HTTP/3 was disruptive innovation — abandoning the standard entirely and rebuilding on a new foundation. Risky, but the only way to break the ceiling.&lt;/p&gt;

&lt;p&gt;Where the two concepts meet is HTTP/3 itself. Path Dependency explains why it took 40 years. The Innovator's Dilemma explains why it had to happen at all.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.hbs.edu/faculty/Pages/item.aspx?num=46" rel="noopener noreferrer"&gt;Clayton Christensen: The Innovator's Dilemma&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;HTTP/1.1 trimmed the negotiation fee. HTTP/2 increased the transaction density. HTTP/3 rejected the legacy constraints entirely to win back speed. The evolution of protocols isn't a search for the right answer. It's a history of choosing the best trade-off for the era.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Improve along the path, or change the path itself. That question doesn't only apply to protocols.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next up: even with HTTP/3 handling requests efficiently, traffic still has to land somewhere. When tens of thousands of requests arrive at once, something has to decide where they go. That's the load balancer — and the choice between L4 and L7 turns out to be another trade-off worth understanding.&lt;/p&gt;

</description>
      <category>largescale</category>
      <category>network</category>
      <category>backend</category>
      <category>http</category>
    </item>
    <item>
      <title>Network Part 2 - The Cost of a TCP Handshake</title>
      <dc:creator>Dayul Lee</dc:creator>
      <pubDate>Fri, 01 May 2026 08:50:01 +0000</pubDate>
      <link>https://dev.to/lukyday007/prologue-what-is-large-scale-processing-1jd9</link>
      <guid>https://dev.to/lukyday007/prologue-what-is-large-scale-processing-1jd9</guid>
      <description>&lt;p&gt;Published: April 13, 2026&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Picture a ticket drop. Tens of thousands of people clicking at the same moment. The server collapses — before a single byte of real data has moved. That's the strange part. Nothing was actually exchanged yet. So what wore the server out?&lt;/p&gt;

&lt;p&gt;There's work that has to happen before the data can flow. Until that work is done, the transaction hasn't started. It can't.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;Every transaction has a setup cost. Verifying the other party. Aligning on terms. Confirming readiness on both sides. The more a transaction depends on trust, the more that setup costs.&lt;/p&gt;

&lt;p&gt;Networks are no different. Before data can move reliably, both sides have to establish a shared understanding — that packets will arrive, that order will be preserved, that nothing will go missing without a response. None of that is free. It takes time. And that time adds up faster than most people expect.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  TCP's Call — Contract Before Data
&lt;/h3&gt;

&lt;p&gt;TCP made a deliberate choice: &lt;strong&gt;no data moves until a contract is in place.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both sides exchange signals to confirm they're ready. Until that exchange completes, not a single byte of actual payload is transmitted. The entire window is spent on process. On paperwork.&lt;/p&gt;

&lt;p&gt;That's the handshake. Trust purchased at the cost of speed.&lt;/p&gt;

&lt;p&gt;The deeper problem is that this contract doesn't carry over. Every new connection starts from scratch. One user, one handshake — manageable. Ten thousand users hitting the server simultaneously — the setup cost alone is enough to bring it down, before the real work has even begun.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client                            Server
  |                                 |
  |  ———————————— SYN ——————————&amp;gt;   |  "Can we connect?"
  |                                 |
  |  &amp;lt;————————— SYN-ACK —————————   |  "Yes. Are you ready?"
  |                                 |
  |  ———————————— ACK ——————————&amp;gt;   |  "Ready. Let's go."
  |                                 |
  |       [ Data transfer ]         |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three signals. Only then does data flow. SYN and SYN-ACK are the negotiation. ACK is the signature. The actual transaction — the data — doesn't start until all three are done.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  The Bill Comes Twice
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Opening the connection — RTT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The time it takes to complete the handshake is tied to physical distance. Seoul to a US server: roughly 150ms per round trip. That's RTT — Round Trip Time.&lt;/p&gt;

&lt;p&gt;TCP needs at least 1.5 round trips before the server receives the first byte of data. From the moment a user clicks to the moment the server registers what was sent: &lt;strong&gt;225ms (150ms × 1.5)&lt;/strong&gt; is already gone.&lt;/p&gt;

&lt;p&gt;One request, 225ms. Ten thousand concurrent users, ten thousand instances of that cost. No amount of server-side optimization touches it. RTT is a fixed cost — locked to physics, not infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Closing the connection — TIME_WAIT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The bill doesn't stop when the connection ends. TCP holds a closed port in TIME_WAIT for up to two minutes. The reason is defensive: late-arriving packets from the old connection shouldn't collide with a new one using the same port.&lt;/p&gt;

&lt;p&gt;From Part 1: roughly 28,000 ports are available. A server handling 500 connections per second will accumulate 60,000 TIME_WAIT ports (500/s × 120s) before the two-minute window clears. That's more than double the limit. New connections stop being possible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After connection closes:

Port 5001  [TIME_WAIT ——————————— 2 min ———————————]
Port 5002  [TIME_WAIT ——————————— 2 min ———————————]
Port 5003  [TIME_WAIT ——————————— 2 min ———————————]
  ...
Port 5028  [TIME_WAIT ——————————— 2 min ———————————]

→ 28,000 ports exhausted. No new connections accepted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RTT is the cost of opening. TIME_WAIT is the cost of closing. TCP charges on both ends.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.cloudflare.com/learning/cdn/glossary/round-trip-time-rtt/" rel="noopener noreferrer"&gt;Cloudflare Learning: What is round-trip time?&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;ref. &lt;a href="https://www.cloudflare.com/learning/ddos/glossary/tcp-ip/" rel="noopener noreferrer"&gt;Cloudflare Learning: What is TCP/IP?&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Transaction Cost Theory and Keep-Alive
&lt;/h3&gt;

&lt;p&gt;In 1937, Ronald Coase famously asked why markets aren't frictionless. His answer was simple: every transaction demands a hidden tax—the cost of searching for partners, negotiating terms, and signing contracts. TCP is a network implementation of that idea. Every connection comes with a negotiation fee. And Transaction Cost Theory points to exactly one solution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.ubs.com/microsites/nobel-perspectives/en/laureates/oliver-williamson.html" rel="noopener noreferrer"&gt;UBS Nobel Perspectives: Oliver Williamson — Transaction Cost Theory&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The most effective way to reduce transaction costs is to reduce the number of transactions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not faster contracts — fewer contracts. That's the logic behind HTTP Keep-Alive. Instead of opening and closing a connection for every request, Keep-Alive holds the connection open across multiple requests. The handshake cost gets distributed — not paid once per request, but once per session.&lt;/p&gt;

&lt;p&gt;Keep-Alive solved the contract problem. But it didn't solve everything. Even inside a persistent connection, there was still a hard rule: requests had to be handled in the order they arrived. Fix the negotiation fee, and suddenly the queue itself becomes the bottleneck.&lt;/p&gt;

&lt;p&gt;That's where the next part picks up.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;The TCP handshake is the price of trust. RTT is what you pay to open a connection. TIME_WAIT is what you owe after closing one. In the language of Transaction Cost Theory, TCP is a protocol that charges a negotiation fee on every single connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The optimization insight isn't "connect faster." It's "connect less." HTTP has been moving in that direction ever since.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next up: HTTP/1.1 solved the connection frequency problem with Keep-Alive. Then it ran into a different wall. One queue, no passing. Fix the engine — and suddenly the road is one lane.&lt;/p&gt;

</description>
      <category>largescale</category>
      <category>network</category>
      <category>backend</category>
      <category>tcp</category>
    </item>
    <item>
      <title>Network Part 1 - The OSI Model as a Fault Map</title>
      <dc:creator>Dayul Lee</dc:creator>
      <pubDate>Fri, 01 May 2026 07:09:21 +0000</pubDate>
      <link>https://dev.to/lukyday007/network-part-1-the-osi-model-as-a-fault-map-2846</link>
      <guid>https://dev.to/lukyday007/network-part-1-the-osi-model-as-a-fault-map-2846</guid>
      <description>&lt;p&gt;Published: March 27, 2026&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In a previous post, we watched a single DNS misconfiguration on one AWS server bring 3,500 companies across 60 countries to a standstill. DNS lives at Layer 7. The failure started there.&lt;/p&gt;

&lt;p&gt;This kind of thing repeats. On June 21, 2022, a misconfigured BGP route at Cloudflare blocked 50% of all global HTTP traffic. No server was overloaded. No deployment had gone wrong. Packets simply lost their way and looped endlessly through the network. This time, the failure was at Layer 3.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both incidents share one thing: it took far too long to find the cause. Because no one knew which layer had failed.&lt;/p&gt;

&lt;p&gt;The OSI model is not a taxonomy for networking textbooks. &lt;strong&gt;It's a fault map — a way to pinpoint exactly where a system breaks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://blog.cloudflare.com/cloudflare-outage-on-june-21-2022/" rel="noopener noreferrer"&gt;Cloudflare Blog: Cloudflare outage on June 21, 2022&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;



&lt;h3&gt;
  
  
  Why the Layers Don't Talk to Each Other
&lt;/h3&gt;

&lt;p&gt;Before the fault map makes sense, this question needs an answer. Why does the OSI model split into 7 layers at all? Wouldn't it be more efficient if each layer could see what the others were doing?&lt;/p&gt;

&lt;p&gt;In 1968, software engineer Melvin Conway proposed what has since become foundational in systems design:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure."&lt;/em&gt; — Conway's Law&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The OSI model is that principle applied to network architecture. Each layer communicates only through a defined interface. Internal implementation stays private. Layer 4 has no idea whether Layer 7 is speaking HTTP or gRPC. Layer 3 doesn't know — or care — whether Layer 4 is TCP or UDP.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;deliberate ignorance&lt;/strong&gt;. And that ignorance produces two trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Freedom to change:&lt;/strong&gt; Migrating from HTTP/1.1 to HTTP/2 happens entirely within Layer 7. Everything below stays untouched. The layers are decoupled by design.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fault isolation:&lt;/strong&gt; A routing failure at Layer 3 has no bearing on your application logic at Layer 7. The blast radius is contained to one layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's why the Cloudflare outage could be called "a Layer 3 problem" immediately. Without the layered design, the cause would have been buried somewhere in the full stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Each layer chose not to know the others. That's exactly what makes it possible to know which layer broke.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://martinfowler.com/bliki/ConwaysLaw.html" rel="noopener noreferrer"&gt;Martin Fowler: Conway's Law&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;



&lt;h3&gt;
  
  
  Every Layer Has Its Own Breaking Point
&lt;/h3&gt;

&lt;p&gt;Goldratt's Theory of Constraints is direct: the output of any system is capped by its weakest link. Networks are no exception. But the &lt;em&gt;nature&lt;/em&gt; of the bottleneck changes depending on which layer you're looking at.&lt;/p&gt;

&lt;p&gt;Packets travel down from L7 to L1 on the sender's side — each layer wrapping the data in its own envelope. On the receiving end, they unwrap back up from L1 to L7. Seven layers. Seven handoffs. Under high-volume traffic, one of those handoffs will crack first. The question is which one, and why.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;L4 — Speed was the goal. Awareness was the price.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Layer 4 is deliberately blind to content. It sees an IP address, a port number, a protocol — TCP or UDP — and nothing else. It never opens the packet. Think of it as a courier that delivers sealed envelopes without knowing what's inside. That's why it's fast.&lt;/p&gt;

&lt;p&gt;But that choice has structural consequences. Every TCP connection occupies a port. Port numbers top out at 65,535 — with a realistic working range of around 28,000. Once concurrent connections hit that ceiling, the system stops accepting new ones. No exceptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L4's bottleneck is connection count.&lt;/strong&gt; Ticketing drops, flash sales, live-streamed events — any scenario where thousands of users connect simultaneously runs straight into this wall.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;L7 — Awareness was the goal. Speed was the price.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Layer 7 sees everything: HTTP headers, URL paths, cookies, request bodies. It reads the packet, understands the context, and makes decisions accordingly. That's enormously powerful.&lt;/p&gt;

&lt;p&gt;But that knowledge is expensive. Parsing takes time. Authentication takes time. Decompression, routing logic, business rules — they all stack. The per-request Logic Latency at L7 is higher than anywhere below it by design. As traffic scales, those costs don't just add — they compound.&lt;/p&gt;

&lt;p&gt;L4 stays blind and stays fast. L7 stays aware and pays for it. Neither is a flawed design. They made different trade-offs.&lt;/p&gt;



&lt;p&gt;Pull back to all seven layers, and the picture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         Rate of Saturation → 100%
L7  [████████████░░]  Logic Latency spikes   ← felt first
L4  [███████░░░░░░░]  Concurrency ceiling
L3  [█████░░░░░░░░░]  Routing overhead
L1  [███░░░░░░░░░░░]  Throughput saturation  ← when this goes, everything goes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;L7 hits the wall first. L1 going down means nothing gets through at all. Under high-volume load, there's only one question that matters: &lt;strong&gt;which layer is closest to 100% Saturation right now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How to resolve L4 and L7 bottlenecks in practice — that's Part 4 (Load Balancers).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://sre.google/sre-book/monitoring-distributed-systems/" rel="noopener noreferrer"&gt;Google SRE Book: Monitoring Distributed Systems&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;ref. &lt;a href="https://www.rfc-editor.org/rfc/rfc793" rel="noopener noreferrer"&gt;RFC 793: Transmission Control Protocol&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;



&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;The OSI model isn't a protocol classification system. Each layer is an independent failure candidate with its own breaking point. And the reason those layers exist in the first place is itself a trade-off — give up awareness to gain speed, or give up speed to gain awareness.&lt;/p&gt;

&lt;p&gt;The layer where Saturation hits 100% first is the constraint. The boundaries between layers are what make that constraint findable — and fixable — without touching everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engineers who understand this don't panic when something breaks. They don't touch the whole system. They ask which layer. Then they fix that layer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next up: Layer 4, up close. We'll look at the hidden cost of TCP's 3-way handshake — the process every connection must complete before a single byte of real data moves. Under load, that turns out to be anything but cheap.&lt;/p&gt;

</description>
      <category>osi7</category>
      <category>network</category>
      <category>backend</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Prologue - What is Large-scale Processing?</title>
      <dc:creator>Dayul Lee</dc:creator>
      <pubDate>Fri, 01 May 2026 06:58:27 +0000</pubDate>
      <link>https://dev.to/lukyday007/prologue-what-is-large-scale-processing-ma1</link>
      <guid>https://dev.to/lukyday007/prologue-what-is-large-scale-processing-ma1</guid>
      <description>&lt;p&gt;Published: March 18, 2026&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;3 AM, October 2025. A single DNS configuration error on an AWS server brought Snapchat, Roblox, and McDonald's to a standstill. 3,500 companies across 60 countries were stopped cold by one small crack.&lt;/p&gt;

&lt;p&gt;Systems are far more fragile than we think. Large-scale processing isn't a trend about boosting server specs. It's the engineering discipline that keeps services alive at the edge of their limits.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;So where does "large-scale" actually begin? 10,000 users? A million? That's the wrong question. Large-scale isn't a number. &lt;strong&gt;It's the moment a system hits the ceiling of its available resources.&lt;/strong&gt; That's why what's a normal Tuesday for Amazon can be a catastrophe for a growing startup.&lt;/p&gt;

&lt;p&gt;This series is about how to detect that ceiling, understand why systems break, and build things that hold.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  The Signals Before a System Breaks
&lt;/h3&gt;

&lt;p&gt;Systems don't collapse without warning. There are always signs. The Google SRE team calls them the &lt;strong&gt;Four Golden Signals&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https:https://sre.google/sre-book/monitoring-distributed-systems/" rel="noopener noreferrer"&gt;Google SRE: Monitoring Distributed Systems&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; How long does it take to handle a request? A gap between successful and failed response times is often the first sign something's wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traffic:&lt;/strong&gt; How much demand is hitting the system right now? Think RPS — requests per second.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors:&lt;/strong&gt; How many requests are failing? Explicit 500s, silent wrong responses — both count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Saturation:&lt;/strong&gt; How "full" is the system? This is the most direct signal of large-scale stress. When latency starts climbing, saturation is usually already on its way up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any one of these looks off, the system is already approaching its limit.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  So What Exactly is "Large"?
&lt;/h3&gt;

&lt;p&gt;The Golden Signals tell you the &lt;em&gt;state&lt;/em&gt; of a system. But to actually fix things, you need to understand the &lt;em&gt;nature&lt;/em&gt; of the load. The same word — "large-scale" — means something completely different depending on what's overwhelming the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traffic (Too many requests)&lt;/strong&gt;&lt;br&gt;
How many requests per unit time? How many connections can the system hold?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TPS / QPS:&lt;/strong&gt; Transactions or queries per second. The real measure of system throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency:&lt;/strong&gt; Simultaneous active connections. The deciding factor during flash sales or ticketing rushes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Volume (Too much data)&lt;/strong&gt;&lt;br&gt;
How large is the data, and how fast does it need to move?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Throughput:&lt;/strong&gt; Data transferred per second (MB/s). The usual bottleneck in video streaming or large file uploads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Complexity (Too hard to process)&lt;/strong&gt;&lt;br&gt;
How much computation does a single request require? How many systems does it touch?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logic Latency:&lt;/strong&gt; The more complex the logic, the slower the response — and the faster saturation spikes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real outages usually involve all three at once. But if you can't separate the causes, you can't fix them.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Where Business Thinking Meets Engineering
&lt;/h3&gt;

&lt;p&gt;Picture a factory floor. One slow machine holds up the entire line. It doesn't matter how fast everything else runs.&lt;/p&gt;

&lt;p&gt;Eliyahu M. Goldratt formalized this as the &lt;strong&gt;Theory of Constraints (TOC)&lt;/strong&gt;: *"The throughput of any system is determined by its weakest link — the Constraints."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.lean.org/lexicon-terms/theory-of-constraints/" rel="noopener noreferrer"&gt;Lean Enterprise Institute: TOC&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Servers work the same way. &lt;strong&gt;The point where Saturation hits 100% first — that's the Bottleneck.&lt;/strong&gt; Large-scale engineering is about finding which component saturates first as traffic grows, then eliminating that constraint with the right strategy.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  When You Hit a Wall
&lt;/h3&gt;

&lt;p&gt;Once you've found the bottleneck, you need to increase capacity. There are two ways to do it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://www.geeksforgeeks.org/overview-of-scaling-vertical-and-horizontal-scaling/" rel="noopener noreferrer"&gt;GeeksforGeeks: Vertical and Horizontal Scaling&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vertical Scaling (Scale-up):&lt;/strong&gt; Upgrade the single node — more CPU, more RAM. Fast to implement, but there's a ceiling. And it's expensive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal Scaling (Scale-out):&lt;/strong&gt; Add more nodes and distribute the load. More complex, but theoretically limitless.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  [Single Server]         [Multiple Servers]

  ┌─────────────┐         ┌───┐ ┌───┐ ┌───┐
  │   CPU ↑↑↑   │         │ S │ │ S │ │ S │
  │   RAM ↑↑↑   │    →    │ 1 │ │ 2 │ │ 3 │
  │   SSD ↑↑↑   │         └───┘ └───┘ └───┘
  └─────────────┘           Load Balancer

      Scale-up                Scale-out
    (Has limits)        (Infinitely expandable)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scale-up buys simplicity at the cost of a ceiling.&lt;br&gt;&lt;br&gt;
Scale-out removes the ceiling at the cost of complexity.&lt;br&gt;&lt;br&gt;
Neither is the right answer. There's only the right trade-off for the constraint you're solving.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next
&lt;/h3&gt;

&lt;p&gt;At the end of the day, large-scale processing is &lt;strong&gt;Strategic Bottleneck Management&lt;/strong&gt; — controlling &lt;strong&gt;Latency&lt;/strong&gt; and &lt;strong&gt;Errors&lt;/strong&gt; by managing &lt;strong&gt;Saturation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ref. &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/welcome.html" rel="noopener noreferrer"&gt;AWS Well-Architected Framework&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Next up: a single HTTP request makes its way to a server by passing through 7 layers — the &lt;strong&gt;OSI model&lt;/strong&gt;. We'll trace that journey and see exactly where large-scale traffic creates bottlenecks at each layer, and what engineers have done about it.&lt;/p&gt;

</description>
      <category>largescale</category>
      <category>network</category>
      <category>backend</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
