1. Netcat (nc): The Swiss Army Knife
Netcat reads and writes data across network connections using TCP or UDP. It is the rawest form of network communication.
Core Modes
- Client Mode (Connect): Acts like Telnet. Used to test if a port is open and accepting traffic.
nc -vz 192.168.1.5 80
# -v: Verbose (tells you what happened)
# -z: Zero-I/O mode (scans for listening daemons, doesn't send data)
- Server Mode (Listen): Creates a temporary server. Great for testing firewall rules (e.g., "Can Server A reach Server B on port 9090?").
# On Server B (Receiver):
nc -l 9090
# -l: Listen mode
-
File Transfer (The "Hack"): If
scporrsyncaren't available, you can pipe files through raw sockets.
# Receiver:
nc -l 9090 > received_file.txt
# Sender:
nc [Receiver_IP] 9090 < original_file.txt
2. Tcpdump: The CLI Microscope
When you can't use Wireshark (because there is no GUI), you use tcpdump. It captures packets directly from the kernel.
Key Flags to Memorize
-
-i eth0: Listen on interfaceeth0(oranyfor all interfaces). -
-n: Crucial. Don't resolve Hostnames or Ports. (Shows1.2.3.4:80instead ofgoogle.com:http). This speeds up output significantly. -
-w capture.pcap: Write output to a file (so you can open it in Wireshark later). -
-v: Verbose (show more header details like TTL, ID).
The Filter Syntax (BPF)
It uses the same filter language as Wireshark.
# Capture only traffic from a specific IP on port 80
sudo tcpdump -i eth0 -n src 192.168.1.5 and dst port 80
# Capture everything EXCEPT SSH (so you don't flood your own logs)
sudo tcpdump -i eth0 port not 22
3. Dig (dig): The DNS Scalpel
nslookup is deprecated/old. dig (Domain Information Groper) is the modern standard because it shows the exact query and response structure.
Understanding the Output
Running dig google.com gives you:
-
HEADER: Status (e.g.,
NOERRORorNXDOMAIN). If you seeNXDOMAIN, the domain doesn't exist. - QUESTION SECTION: What you asked for.
- ANSWER SECTION: The result (IPs).
- AUTHORITY SECTION: Who owns the domain (Nameservers).
- ADDITIONAL SECTION: IPs of the nameservers.
Power User Commands
- Trace the Recursion: See the full path from Root(.) to TLD(.com) to Auth Server.
dig +trace google.com
- Short Mode: Great for scripting. Returns only the IP.
dig +short google.com
- Direct Query: Bypass your local DNS and ask a specific server (e.g., ask Google's 8.8.8.8 directly).
dig @8.8.8.8 google.com
4. Nmap: The Cartographer
Nmap scans a network to map "live" hosts and open ports. It works by sending packets and analyzing the subtle differences in responses.
Scan Types
-
SYN Scan (
-sS): The "Stealth" scan. It sends aSYNpacket. If the server repliesSYN-ACK, Nmap knows the port is open but sends aRST(Reset) immediately. It never completes the 3-way handshake, so it often doesn't show up in application logs. Requires
sudo.Version Detection (
-sV): Connects to the port and listens to the "Banner" to guess the software version (e.g., "Apache 2.4.41").OS Detection (
-O): Analyzes IP TTLs and TCP Window sizes to guess the Operating System (Linux, Windows, connection stack differences).
# The "Aggressive" Scan (OS detection, Version detection, Script scanning, Traceroute)
nmap -A 192.168.1.5
5. Debugging: Latency vs. Bandwidth
In DevOps, "The network is slow" is a vague complaint. You must distinguish between two completely different bottlenecks.
A. Latency (The "Distance")
- Definition: The time it takes for a single packet to travel from Source to Destination.
- Analogy: The speed limit of the road. Even if the road is empty, it takes time to drive from New York to London.
- The Cause: Physical distance (fiber optic length), number of router hops, congested queues.
- Tools:
-
ping: Measures RTT (Round Trip Time). -
mtr(My Traceroute): Combines ping and traceroute. Shows packet loss at each hop. - Tip: If loss starts at Hop 3 and continues to the end, Hop 3 is the problem. If loss is only at Hop 3 but Hop 4 is 0%, Hop 3 is just de-prioritizing ICMP (ignoring pings), which is fine.
B. Bandwidth (The "Width")
- Definition: The maximum amount of data that can be transmitted in a fixed amount of time.
- Analogy: The number of lanes on the highway.
- The Cause: Link capacity (1Gbps cable vs 100Mbps cable).
- Tools:
-
iperf3: The gold standard. requires installation on both ends (client and server). It floods the link with data to test pure capacity.
# Server side
iperf3 -s
# Client side
iperf3 -c [Server_IP]
C. The Hidden Trap: Throughput & Window Size
You can have huge Bandwidth (10Gbps) and low Throughput if Latency is high.
- TCP Window Size: TCP waits for an acknowledgment (ACK) before sending more data. If the Latency (RTT) is high, the sender spends most of its time waiting, not sending.
- Bandwidth-Delay Product (BDP): In "Long Fat Networks" (High Bandwidth + High Latency, like Trans-Atlantic cables), you must tune the TCP Window Size to keep the pipe full.
- Formula:
-
DevOps Fix: Tuning Linux Kernel parameters (
net.ipv4.tcp_window_scaling).
Here is a Real-World Troubleshooting Cheat Sheet.
The Scenario:
You are a DevOps Engineer. A developer complains: "The Web App can't connect to the Database (PostgreSQL), or it's extremely slow."
Your Mission: Isolate the root cause using the tools we just discussed.
Step 1: The "Is it Alive?" Check (Layer 3 - Network)
Goal: Determine if the Database server is reachable network-wise.
Tool: mtr (or ping)
Run this from the Web Server:
mtr -r -c 10 db.prod.internal
Analyze the Output:
- Scenario A (Good): 0% Packet Loss, Low Latency (<1ms for LAN).
Verdict: Network path is fine. Proceed to Step 2.
Scenario B (Bad - 100% Loss): "Destination Host Unreachable."
Verdict: The server is down, or there is no route (Routing Table issue).
Scenario C (Bad - High Loss): Loss starts at Hop 2.
Verdict: A specific router/switch in the path is failing.
Step 2: The "Address Book" Check (Layer 7 - DNS)
Goal: Ensure the application is trying to connect to the correct IP address.
Tool: dig
dig +short db.prod.internal
Analyze the Output:
-
Output:
10.0.1.50 - Action: Compare this IP with your AWS Console/Inventory. Is it the correct DB server?
-
Trap: Sometimes a developer hardcodes an old IP in
/etc/hosts. Check that file too! -
Trap: If you get
NXDOMAIN, the DNS record is missing entirely.
Step 3: The "Is the Door Open?" Check (Layer 4 - Transport)
Goal: The server is up, and the IP is right. Is the Database software listening on Port 5432, or is a Firewall blocking us?
Tool: nc (Netcat) or telnet
nc -zv 10.0.1.50 5432
Analyze the Output:
-
Scenario A (Success):
Connection to 10.0.1.50 5432 port [tcp/postgresql] succeeded! Verdict: Firewall is open, DB is listening. The issue is likely Application Layer (wrong password, DB overload).
Scenario B (Connection Refused):
Ncat: Connection refused.Verdict: Packet reached the server, but the Server said "Go Away." The DB service is likely crashed/stopped.
Scenario C (Timeout): It hangs forever...
Verdict: Firewall Drop. The packet hit a black hole (Security Group/UFW). It never got a reply.
Step 4: The "Deep Dive" (Packet Analysis)
Goal: The connection is "flaky" or "slow," but netcat works intermittently. We need to see the handshake.
Tool: tcpdump
Run this on the Web Server while triggering the database connection:
# Capture traffic to the DB IP on port 5432, don't resolve names (-n)
sudo tcpdump -i eth0 -n host 10.0.1.50 and port 5432
Analyze the Output:
- Case 1: The "SYN Flood" (Firewall/Packet Loss)
12:01:01 IP WebServer > DBServer: Flags [S], seq 123...
12:01:02 IP WebServer > DBServer: Flags [S], seq 123... (Retransmission)
12:01:04 IP WebServer > DBServer: Flags [S], seq 123... (Retransmission)
Diagnosis: You see only
[S](SYN) packets going out, but no reply. The other side is ignoring you. Confirm Firewall/Security Groups.Case 2: The "Reset" (Service Down)
12:01:01 IP WebServer > DBServer: Flags [S]
12:01:01 IP DBServer > WebServer: Flags [R.], seq 0
Diagnosis: You see an
[R](RST) flag immediately. The server OS received the request but no application was bound to that port to handle it. Check if Postgres Service is running.Case 3: The "Zero Window" (Overload)
12:01:01 IP DBServer > WebServer: Flags [.], win 0
-
Diagnosis:
win 0means the Database Server is screaming "STOP! My buffer is full." It cannot process data fast enough. The DB is CPU/Memory starved.
Summary Checklist
| Symptom | Tool to Use | Likely Cause |
|---|---|---|
| "Host Unreachable" |
ping / mtr
|
Network down, Routing issue. |
| "NXDOMAIN" | dig |
DNS typo or missing record. |
| "Connection Refused" | nc -zv |
Service (Postgres) is stopped. |
| "Connection Timed Out" | nc -zv |
Firewall (AWS Security Group) Dropping packets. |
| "Connection Reset" | tcpdump |
Service crashed or misconfigured Proxy. |
| "Slow / Stalling" | tcpdump |
Packet Loss (Retransmissions) or Server Overload (Zero Window). |
Top comments (0)