Naz Quadri

Posted on Mar 31 • Originally published at nazquadri.dev

What "Connected" Means in TCP

#linux #programming #systems #tutorial

What "Connected" Means in TCP

The Three-Packet Handshake Between Two Kernels Who've Never Met

Reading time: ~13 minutes

You called connect(). Your code moved on. You're "connected."

Nothing physical connected. No wire was plugged in. No circuit was closed. Three packets flew across the network and landed in two kernel data structures — a hash table entry on your machine and a hash table entry on the server — and that's it. That's the whole "connection." A gentleman's agreement between two kernels who've never met, maintained by nothing more than both sides keeping their word.

The moment either kernel loses that state — crash, memory pressure, a firewall that forgets to tell anyone — your "connection" evaporates. The wire is still there. The bytes stop flowing.

Most of us debug socket code for years without understanding what connect() actually does. Every mysterious hang is a debt being collected. Let's fix it.

The State Machine Behind `connect()`

The kernel maintains a state machine for every TCP connection. You've probably seen a diagram of it in a textbook and immediately forgotten it. That's fair — the full diagram has eleven states and looks like a fire escape route — which it is, in a way. Every state represents a moment when one side might crash and the other needs a graceful exit. RFC 793 defined all eleven in 1981.

But here's the part that matters: when you call connect(), your kernel doesn't "connect" to anything. It starts a negotiation. A very specific, three-packet negotiation called the handshake.

Here's what actually happens:

Your machine                    Remote machine
     │                               │
     │  ──── SYN ──────────────────► │   "I want to talk. My sequence starts at X."
     │                               │
     │  ◄─── SYN-ACK ─────────────── │   "OK. Mine starts at Y. I got your X."
     │                               │
     │  ──── ACK ──────────────────► │   "Got it. Now we're both synchronized."
     │                               │
   ESTABLISHED                  ESTABLISHED

Three packets. That's the whole handshake.

The SYN packet contains a randomly chosen sequence number — a 32-bit integer that will be used to label every byte your machine sends. The server responds with its own SYN (it needs to start its own sequence), plus an ACK for yours. Your ACK completes the circle.

After those three packets, both kernels have entered the ESTABLISHED state in their connection tables. The "connection" now exists.

A Pair of Hash Table Entries

What does that state actually look like? On Linux, every socket is a file descriptor — the same integer handle the kernel uses for files, pipes, and everything else. The kernel maintains a hash table of sockets keyed by a four-tuple:

(source IP, source port, destination IP, destination port)

This four-tuple uniquely identifies a connection. Your machine can have thousands of connections to the same server on the same port — as long as the source port is different, they're different entries in the hash table.

When connect() returns, there is one entry in your kernel's connection table and one in the server's. That's your "connection." It's not a circuit. It's not a reserved channel. It's two structs in two hash tables on two machines, agreeing to honor a sequence number protocol.

When you're not sending anything, nothing is happening on the wire. The connection just... sits there. In RAM. Two kernels keeping state about each other.

Why Sequence Numbers Exist

The original problem TCP solved is: the internet is unreliable. Packets get dropped. They arrive out of order. Routers duplicate them. They might take wildly different paths.

IP doesn't care. IP's job is to route packets best-effort and move on. TCP's job is to build a reliable, ordered byte stream on top of that chaos.

The way it does this is with sequence numbers. Every byte you send has a position in the stream. If packet 3 arrives before packet 2, the receiver buffers it and waits. If packet 2 never arrives, the receiver asks for it again. If the same data arrives twice (duplicated in transit), the sequence number identifies the duplicate and it gets discarded.

The sequence number you start with isn't zero. It's chosen randomly, for security: if it were predictable, an attacker could inject packets into your stream by guessing the next expected sequence number. This attack is called a blind injection attack — Kevin Mitnick used predictable sequence numbers in his 1994 attack on Tsutomu Shimomura to hijack a trusted connection. RFC 6528 randomizes initial sequence numbers specifically to prevent this. The randomness is the defense.

The Buffer Between You and the Wire

Here's the piece of the mental model that surprises people most.

When you call send(), your bytes do not go on the wire. They go into a buffer in the kernel.

The kernel's socket send buffer sits between your application and the network stack. send() copies your bytes there and returns immediately. The kernel sends them when it decides to — based on network conditions, the receiver's capacity, and a timer.

# sock = socket already in ESTABLISHED state
sock.send(b"hello world")
# At this point: bytes are in kernel buffer.
# They have NOT left your machine.
# The remote end has NOT received them.
# send() returned anyway.

This has a corollary that catches people off guard: send() can return before the bytes leave your machine. It can return while your laptop is offline. It just means "I accepted your bytes." Delivery is a separate promise, made later, without your involvement.

The receive buffer works the other way. When packets arrive, the kernel puts the data in the receive buffer and sends an ACK back to the sender — "got it" — before your application calls recv(). Your code might be sleeping. The kernel is already acknowledging on your behalf.

This decoupling is what makes TCP reliable without making your code complicated. The kernel manages retransmits, flow control, and reordering. You see a clean byte stream.

That's why send() returns before the data is delivered. There's a kernel buffer between send() and the wire. send() accepts bytes; it doesn't send them.

Window Size and the Invisible Flow Controller

The receiver tells the sender how much space it has in its receive buffer. This is the window size, advertised in every TCP segment.

If the receiver's buffer fills up — because the application is slow to call recv() — the window size shrinks. When it hits zero, the sender stops sending. Entirely.

This is called flow control, and it's operating silently in every TCP connection you've ever used. Your send() call doesn't hang because of the network. It hangs because the application on the other end isn't reading fast enough, and that information has propagated backward through two kernel buffers and a TCP window advertisement.

That's why your socket send() sometimes blocks. The buffer is full. The buffer is full because the window is zero. The window is zero because the remote application is backed up. You're feeling the pressure of something happening three hops away.

That's why database connection pools backpressure: a slow consumer in the application tier propagates all the way back to the TCP send buffer of the client.

Nagle's Algorithm: The Uninvited Optimizer

In the 1980s, a network engineer named John Nagle noticed that people were sending single-character packets over slow serial links. Every keypress in a terminal session became a 41-byte TCP/IP frame — 40 bytes of headers, 1 byte of data. The network was clogging up with tiny packets.

His fix was Nagle's algorithm: don't send a small packet if there's outstanding unacknowledged data. Wait until you have a full packet's worth, or until your outstanding data gets acknowledged.

It made a lot of sense in 1984. It still makes sense for bulk data transfers.

It is a disaster for latency-sensitive protocols.

The classic symptom: you're writing a client that sends a small request and waits for a response. The request is 10 bytes. Nagle buffers it because you sent a header 5 milliseconds ago that hasn't been ACK'd yet. Your round-trip time triples for no reason.

The fix is TCP_NODELAY:

sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

That's why every high-performance network library sets TCP_NODELAY by default now. Nagle's algorithm is opt-out behavior inherited from a world with 1200 baud modems.

That's why Redis, PostgreSQL's wire protocol, and every low-latency RPC framework explicitly disable Nagle. The moment you're doing request/response over TCP, you want control over when the packet leaves.

The Slow Close: `close()` vs `shutdown()`

Here's where the state machine comes back to bite you.

When you're done with a connection, you call close(). What actually happens is more involved.

TCP is a full-duplex protocol. You have a stream going in each direction, independently. Closing a connection means closing both streams, but they don't have to close at the same time.

close() closes the whole socket — both directions. If you haven't read all incoming data yet, that data is lost. This bites you when you're parsing an HTTP response: you close() before reading the error body and get a connection reset instead of the error message you needed.

shutdown() is more precise:

sock.shutdown(socket.SHUT_WR)   # I'm done sending. Remote can still send to me.
sock.shutdown(socket.SHUT_RD)   # I'm done receiving.
sock.shutdown(socket.SHUT_RDWR) # Both.

When you shutdown(SHUT_WR), your kernel sends a FIN packet to the remote end. That FIN means "I'm done sending data." The remote end can still send data back to you, and you can still receive it. Both sides have to send a FIN before the connection is truly closed.

The four-packet close handshake (FIN → ACK → FIN → ACK) mirrors the three-packet open, but split across time: the two sides often close independently, because one side might have more to say.

TIME_WAIT: The Ghost That Haunts Your Port Numbers

After the final ACK, the connection isn't immediately gone. The kernel enters TIME_WAIT state and holds the four-tuple — source IP, source port, destination IP, destination port — for 2 × MSL (Maximum Segment Lifetime).

On Linux, TCP_TIMEWAIT_LEN is hardcoded to 60 seconds — Linux sets this constant directly rather than computing 2×MSL, though the 2×MSL formula from RFC 793 describes the intent. This is not configurable via sysctl. The similarly named tcp_fin_timeout controls something else entirely — the FIN_WAIT_2 timeout, not TIME_WAIT. Confusing the two is a common mistake, and one I've made more than once.

Why? Because that final ACK might have been lost. If the remote end re-sends its FIN (because it didn't get the ACK), your kernel needs to be able to ACK it. If the connection were immediately gone, your kernel would send a RST instead, which is rude and could leave the remote end confused.

There's a second reason: the internet is not instantaneous. Old duplicate packets from the previous connection on this same four-tuple might still be in transit somewhere. TIME_WAIT prevents a new connection from misinterpreting those ancient packets as belonging to it.

This matters when you're running a server that handles thousands of short-lived connections. Every client that disconnects leaves a TIME_WAIT entry behind. On a busy server, you can accumulate tens of thousands of these entries, each holding a port number hostage for 2 minutes.

That's why servers run out of ports. Not because they're out of listening ports. Because the ephemeral port range — the range the kernel uses for outbound connections — is full of TIME_WAIT ghosts.

# See the carnage
ss -s
# TIME-WAIT: 18432  ← these are connections waiting to die

The fix on Linux is SO_REUSEADDR, which lets you bind to an address/port combination that's still in TIME_WAIT. Most server frameworks set this automatically. When you see mysterious "address already in use" errors after restarting a server, you've met TIME_WAIT in person.

That's why your server runs out of ports after handling many short-lived connections. Not listening ports — ephemeral ports. The range is full of TIME_WAIT ghosts.

The Gentleman's Agreement

Let's put it all together.

When you call connect(), you initiate a negotiation. Three packets create state in two kernel tables. The "connection" is those two table entries, nothing more. No circuit. No reserved bandwidth. No wire that's "yours."

While the connection is open, two kernel buffers — yours and theirs — mediate every byte. send() means "here, kernel, deal with this." recv() means "give me whatever showed up." The network happens in between, managed entirely by kernel code you didn't write, running on a schedule you don't control.

The receiver's window size limits how fast you can send. Nagle's algorithm may hold your packets hostage for milliseconds. Congestion control — a whole other topic we're glossing over here — may slow your throughput based on packet loss detected on the path between you.

When you close the connection, the kernel sends FINs, waits for the remote side, and then holds the ghost of the connection for up to 60 seconds.

None of this is visible to your code. You called connect(). You called send(). You called recv(). The bytes arrived in order and the stream made sense.

The "connection" is a gentleman's agreement. Both sides promise to track sequence numbers, ACK each other's bytes, respect each other's window sizes, and retransmit anything that goes unacknowledged. Neither side promises to stay alive, respond quickly, or tell the other if the application crashes. There's no heartbeat. No "are you still there?" The agreement holds until one side breaks it — or just goes silent and lets the retransmission timer expire.

Every TCP connection on the planet is two structs in two hash tables, honouring a handshake that happened milliseconds or hours ago. The wire doesn't care. The agreement is all there is.

DEV Community

What "Connected" Means in TCP

What "Connected" Means in TCP

The Three-Packet Handshake Between Two Kernels Who've Never Met

The State Machine Behind `connect()`

A Pair of Hash Table Entries

Why Sequence Numbers Exist

The Buffer Between You and the Wire

Window Size and the Invisible Flow Controller

Nagle's Algorithm: The Uninvited Optimizer

The Slow Close: `close()` vs `shutdown()`

TIME_WAIT: The Ghost That Haunts Your Port Numbers

The Gentleman's Agreement

Further Reading

Top comments (0)

What "Connected" Means in TCP

The Three-Packet Handshake Between Two Kernels Who've Never Met

The State Machine Behind connect()

A Pair of Hash Table Entries

Why Sequence Numbers Exist

The Buffer Between You and the Wire

Window Size and the Invisible Flow Controller

Nagle's Algorithm: The Uninvited Optimizer

The Slow Close: close() vs shutdown()

TIME_WAIT: The Ghost That Haunts Your Port Numbers

The Gentleman's Agreement

Further Reading

The State Machine Behind `connect()`

The Slow Close: `close()` vs `shutdown()`