DEV Community: Mordecai

Understanding How Containers Communicate in Docker and Kubernetes

Mordecai — Thu, 07 May 2026 15:44:49 +0000

A beginner-friendly guide to Docker networking

Why This Matters

When you run an application in Docker, it doesn't automatically know how to reach other services. A container is isolated by default — it has its own network namespace, its own IP address, and its own view of the world. For two services to talk, you have to explicitly connect them.
We would be exploring possible scenarios for communication among containers.

This is something I learned while building SwiftDeploy. My Go API and Nginx were in separate containers and I didn't have full understanding of ow they communicated with each other.

Scenario 1: Two Containers Talking to Each Other

This is the most common scenario — an API and a database, or a frontend and a backend.

The wrong way is to use localhost. If your API tries to connect to localhost:5432 for PostgreSQL, it won't work. Inside a container, localhost refers to the container itself — not your host machine, not another container.

The right way is to use a Docker network. When two containers join the same network, they can reach each other by service name.

networks:
  myapp-net:
    driver: bridge

services:
  api:
    image: my-api
    networks:
      - myapp-net

  database:
    image: postgres
    networks:
      - myapp-net

Now the API can connect to the database using database:5432 — Docker's internal DNS resolves the service name to the container's IP automatically.

In SwiftDeploy, Nginx reaches the API using api:3000 — not localhost:3000. That's why it works.

How it works under the hood:
Docker creates a virtual bridge network. Every container on that network gets an internal IP (like 172.18.0.2). Docker runs an internal DNS server that maps service names to these IPs. When the API says "connect to database", Docker's DNS resolves it to 172.18.0.3 or whatever IP the database got.

Scenario 2: Container Talking to the Host Machine

Sometimes a container needs to reach something running directly on your laptop — like a local development server or a database running outside Docker.

You can't use localhost from inside a container to reach the host. Instead use:

On Mac/Windows: host.docker.internal — Docker provides this hostname automatically
On Linux: 172.17.0.1 — the default Docker bridge gateway IP

# Inside a container on Linux
db = connect("172.17.0.1:5432")

# Inside a container on Mac/Windows
db = connect("host.docker.internal:5432")

Alternatively, use --network host when running the container — this removes the network isolation entirely and the container shares the host's network stack. localhost works again but you lose isolation.

docker run --network host my-app

Scenario 3: Two Different Applications on the Same Machine (No Docker)

When two regular applications run on the same machine — no containers — they communicate through localhost and ports.

Application A listens on port 8000:

app.run(host="0.0.0.0", port=8000)

Application B connects to it:

response = requests.get("http://localhost:8000/api")

The operating system routes the traffic internally — it never leaves the machine. This is fast but means both apps must be on the same machine.

Scenario 4: One Container, One Regular Application

This is the reverse proxy pattern — exactly what SwiftDeploy uses.

Nginx runs in a container. The API runs as a regular process on the host. How does Nginx reach the API?

Option 1 — Port mapping:
Map the API's host port into the container:

# API runs on host port 3000
# Nginx container uses host.docker.internal:3000 to reach it

Option 2 — Host network mode:
Run Nginx with --network host. Now it can use localhost:3000 directly.

In SwiftDeploy both the API and Nginx run in containers on the same Docker network — so they use service name discovery instead. But the pattern above is common in development setups.

Scenario 5: Kubernetes — How Pods Communicate

In Kubernetes, containers run inside pods. Communication works at two levels:

Within a pod — containers share localhost:
If two containers are in the same pod they share a network namespace. They communicate on localhost just like two processes on the same machine.

# Both containers in this pod share localhost
spec:
  containers:
    - name: api
      ports:
        - containerPort: 8000
    - name: sidecar
      # can reach api at localhost:8000

Between pods — use Services:
Pods get dynamic IPs that change when they restart. You never hardcode a pod IP. Instead you create a Service — a stable DNS name that routes to whatever pods match a label selector.

apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  selector:
    app: api          # routes to pods with this label
  ports:
    - port: 80
      targetPort: 8000

Now any pod in the cluster can reach the API at api-service:80 — Kubernetes DNS resolves it to the right pod IP automatically. Even if the pod restarts and gets a new IP, the Service name stays the same.

ClusterIP vs NodePort vs LoadBalancer:

ClusterIP — only accessible inside the cluster (like Docker's internal network)
NodePort — exposes the service on every node's IP at a specific port
LoadBalancer — provisions a cloud load balancer with a public IP

Summary Table

Scenario	How they communicate	Key tool
Container ↔ Container	Service name on shared network	Docker network
Container → Host	`host.docker.internal` or `172.17.0.1`	Docker bridge gateway
App ↔ App (no Docker)	`localhost:port`	OS network stack
Container ↔ App	Port mapping or host network	`ports:` in compose
Pod ↔ Pod (Kubernetes)	Service DNS name	Kubernetes Service
Pod ↔ Pod (same pod)	`localhost`	Shared network namespace

What I Learned Building SwiftDeploy

When setting up SwiftDeploy, since both Nginx and the Go API were running in separate containers, Nginx used Docker service discovery (api:3000) rather than localhost to communicate with the API:

# Wrong — localhost doesn't reach another container
proxy_pass http://localhost:3000;

# Right — use the service name
proxy_pass http://api:3000;

How SwiftDeploy Was Structured

SwiftDeploy used multiple containers:

Client
   ↓
Nginx Container
   ↓
API Container

Later, I added:

OPA container
observability and metrics
policy evaluation

All of these containers needed controlled communication.

The important design decision was:

only Nginx was publicly exposed
internal services stayed inside the Docker network

That separation was intentional for both architecture and security reasons.

One word change. That's how important Docker networking is to understand. Once I put both containers on the same named network and used the service name, everything worked.

The mental model that helped me most: each container is like a separate computer. To connect two computers you need a network. Docker networks are that network, and service names are like hostnames.

Read my SwiftDeploy project writeup here: https://dev.to/mordecai_amehson/swiftdeploy-a-tool-that-writes-its-own-infrastructure-170d

SwiftDeploy: A Tool That Writes Its Own Infrastructure

Mordecai — Thu, 07 May 2026 03:39:01 +0000

What is SwiftDeploy?

Most DevOps work involves writing config files manually — nginx.conf, docker-compose.yml, environment variables. SwiftDeploy flips this. You describe what you want in one file (manifest.yaml) and the tool generates everything else automatically.
The manifest is the single source of truth. Every generated file derives from it. Change the manifest, regenerate, everything updates consistently.

Part 1: The Design — A Tool That Writes Its Own Files

The core idea is template substitution. I created two template files with placeholder variables:

templates/nginx.conf.tmpl      → contains {{NGINX_PORT}}, {{SERVICE_PORT}}
templates/docker-compose.yml.tmpl → contains {{SERVICE_IMAGE}}, {{SERVICE_MODE}}

When you run swiftdeploy init, the CLI reads the manifest and uses sed to replace every placeholder with the real value:

sed -e "s|{{NGINX_PORT}}|8080|g" templates/nginx.conf.tmpl > nginx.conf

This means:

Nothing is hardcoded
Change the manifest, run init, get fresh configs
The grader can delete generated files, run init, and verify everything regenerates correctly

The API is written in Go — a single binary that compiles down to 11.9MB. No runtime dependencies, fast startup, well within the 300MB image size limit.

Part 2: The Guardrails — OPA Policy Engine

Before deploying or promoting, SwiftDeploy asks OPA: "is this allowed?"
OPA (Open Policy Agent) is a separate container that makes yes/no decisions based on rules you write in a language called Rego. The key principle is the CLI never makes the decision itself — it just asks OPA and surfaces the answer.

Why isolate decisions in OPA?

If you hardcode thresholds in the CLI:

if [ $DISK_GB -lt 10 ]; then exit 1; fi

Changing a threshold means editing the CLI code, testing it, redeploying. With OPA, you edit a policy file and restart OPA. The CLI doesn't change.
Infrastructure policy

package infrastructure

default allow := false

allow if {
    input.disk_free_gb >= data.thresholds.min_disk_free_gb
    input.cpu_load <= data.thresholds.max_cpu_load
}

The thresholds live in a separate JSON file — not hardcoded in the Rego. Change thresholds.json, restart OPA, new limits apply immediately.
Canary safety policy
Before promoting canary to stable, the CLI scrapes /metrics and sends the data to OPA:

package canary

allow if {
    input.error_rate <= data.thresholds.max_error_rate
    input.p99_latency_ms <= data.thresholds.max_p99_latency_ms
}

If error rate exceeds 1% or P99 latency exceeds 500ms, promotion is blocked with a clear message.

Part 3: The Chaos — What Happens When Things Break

The API has a /chaos endpoint (canary mode only) that simulates degraded behaviour:

# Inject 80% error rate
curl -X POST http://localhost:8080/chaos \
  -d '{"mode": "error", "rate": 0.8}'

# Try to promote — gets blocked
./swiftdeploy promote stable
# → Error rate 80.00% exceeds maximum 1.00%

I ran this during testing and it worked exactly as designed. The canary policy caught the degraded state and blocked promotion. Once I recovered chaos and restarted the API to reset metrics, the promotion succeeded.
This is the value of the policy gate — it prevents you from accidentally promoting a broken canary to production.

Part 4: Live Metrics and Audit

The API exposes /metrics in Prometheus format:

http_requests_total{method="GET",path="/",status_code="200"} 42
http_request_duration_seconds_p99 0.0034
app_mode 1
chaos_active 0

swiftdeploy status scrapes this every 3 seconds and shows a live dashboard. Every scrape appends to history.jsonl. swiftdeploy audit then parses this file and generates audit_report.md — a markdown table showing mode changes, error rates, and policy violations over time.

Lessons Learned

Timing matters with containers — OPA needs to start before the policy check runs. I had to start OPA first, wait 4 seconds, run the check, then bring up the rest of the stack.
Metrics are cumulative — when testing chaos, errors accumulate in the counter. Restarting the API resets the counter. In production you'd use a sliding window.
Generated files don't belong in git — they're derived from the manifest. Anyone cloning the repo runs swiftdeploy init to get fresh configs.
Go was the right choice — the API image is 11.9MB. A Python equivalent would be 200MB+. Single binary, no dependencies, instant startup.

Repository

Full source code: https://github.com/Hacker-Dark/swiftdeploy

How I Built a Real-Time DDoS Detection Engine from Scratch

Mordecai — Mon, 27 Apr 2026 03:12:54 +0000

Imagine you run a cloud storage platform serving thousands of users. One day, an attacker floods your server with millions of requests per second. Your server crashes. Real users can't access their files. You lose money and trust.
This is a DDoS attack — Distributed Denial of Service. The goal of this project was to build a tool that detects these attacks automatically and blocks them before they cause damage.
No off-the-shelf tools. No Fail2Ban. Pure Python, built from scratch.

What the Tool Does
Here's the full picture of what I built:
Nginx (logs every request as JSON)
↓
Detector daemon reads logs in real time
↓
Sliding window tracks request rates
↓
Baseline learns what normal traffic looks like
↓
Anomaly detector compares current rate to baseline
↓
If anomalous → block IP with iptables + send Slack alert
↓
Auto-unban after cooldown period
Everything runs continuously as a background service on a Linux server.

Part 1: Reading Nginx Logs in Real Time
The first challenge was getting the tool to watch incoming traffic live. Nginx was configured to write every HTTP request as a JSON line to a log file:
json{
"source_ip": "102.91.99.217",
"timestamp": "2026-04-27T02:31:00+00:00",
"method": "GET",
"path": "/",
"status": 200,
"response_size": 6674
}
To read this in real time, I used a technique called log tailing — the same thing tail -f does in Linux. The program opens the file, jumps to the end, and then sits in a loop reading new lines as they appear:
pythonwith open(log_path, "r") as f:
f.seek(0, 2) # jump to end of file
while True:
line = f.readline()
if not line:
time.sleep(0.1) # wait for new data
continue
yield parse_line(line) # process the line
Every time Nginx writes a new request, the detector picks it up within 100 milliseconds.

Part 2: The Sliding Window
Now that we're reading requests in real time, we need to know how fast each IP is sending requests. This is where the sliding window comes in.
A sliding window answers the question: "How many requests has this IP sent in the last 60 seconds?"
The naive approach would be a counter that resets every minute. But that's inaccurate — an attacker could send 1000 requests in the last 10 seconds of one minute and the first 10 seconds of the next, and the counter would never catch it.
Instead, I used Python's collections.deque — a double-ended queue that lets us add to one end and remove from the other efficiently.
Here's how it works:
pythonfrom collections import deque
import time

ip_window = deque() # stores timestamps of recent requests

def record_request(ip):
now = time.time()
cutoff = now - 60 # 60 second window

# Add current timestamp
ip_window.append(now)

# Evict timestamps older than 60 seconds from the left
while ip_window and ip_window[0] < cutoff:
    ip_window.popleft()

# Rate = number of requests in window / window size
rate = len(ip_window) / 60
return rate

Every time a request comes in, we add its timestamp. Every time we check the rate, we first remove any timestamps older than 60 seconds from the left side of the deque. The rate is simply the count of remaining timestamps divided by 60.
This gives us an accurate, always up-to-date requests-per-second count for every IP on the server.

Part 3: The Baseline — Teaching the Tool What "Normal" Looks Like
Knowing the current rate isn't enough. We need to know if that rate is unusual.
At 3am, 5 requests per second might be suspicious. At noon, it might be completely normal. The tool needs to learn from actual traffic patterns — not from hardcoded values.
This is the rolling baseline. Here's how it works:

Every second, we record how many requests the server received that second
We keep a 30-minute history of these per-second counts
Every 60 seconds, we calculate the mean (average) and standard deviation of these counts

pythonsamples = [count for _, count in rolling_window]

mean = sum(samples) / len(samples)
variance = sum((x - mean) ** 2 for x in samples) / len(samples)
stddev = math.sqrt(variance)
The mean tells us what a typical second looks like. The standard deviation tells us how much variation is normal.
I also maintain per-hour slots — separate baselines for each hour of the day. If the current hour has enough data (at least 5 samples), I prefer that over the general baseline. This means the tool naturally adapts to rush hours vs quiet hours.
To prevent the tool from failing on a fresh start with no data, I set floor values:

Minimum mean: 0.1 req/s
Minimum stddev: 0.1

Part 4: Detecting Anomalies
With a baseline established, detection becomes a statistical question: "Is this IP's current rate unusually high compared to normal?"
I use two detection methods — whichever fires first:
Method 1: Z-Score
The z-score measures how many standard deviations above the mean a value is:
pythonz_score = (current_rate - baseline_mean) / baseline_stddev
If the z-score exceeds 1.5, the IP is anomalous. A z-score of 1.5 means the rate is 1.5 standard deviations above normal — statistically unusual.
For example:

Baseline mean: 1.0 req/s
Baseline stddev: 0.5
Current rate: 2.5 req/s
Z-score: (2.5 - 1.0) / 0.5 = 3.0 → anomalous!

Method 2: Rate Multiplier
Sometimes the stddev is very small and the z-score math doesn't capture obvious spikes. So I also check if the rate is more than 1.5x the baseline mean:
pythonif current_rate > 1.5 * baseline_mean:
# flag as anomalous
Error Rate Tightening
If an IP is sending a lot of 4xx or 5xx errors (bad requests, unauthorized attempts), I automatically tighten the thresholds by 30%. An IP probing for vulnerabilities gets less tolerance.

Part 5: Blocking with iptables
When an IP is flagged as anomalous, we block it at the kernel level using iptables. This is more powerful than blocking at the application level because the packets are dropped before they even reach Nginx.
pythonimport subprocess

def ban_ip(ip):
subprocess.run([
"iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"
])
The -I INPUT inserts the rule at the top of the INPUT chain. -j DROP silently drops all packets from that IP. The attacker's requests never even reach the server.
You can verify bans are active with:
bashsudo iptables -L INPUT -n
Auto-Unban with Backoff Schedule
Permanent bans aren't always appropriate — the IP might be a legitimate user who got flagged by mistake. So I implemented an automatic unban system with a backoff schedule:

First offence: banned for 10 minutes
Second offence: banned for 30 minutes
Third offence: banned for 2 hours
Fourth offence: permanently banned

Each time an IP is unbanned, a Slack notification is sent with the next ban duration if they reoffend.

Part 6: Slack Alerts
Every ban and unban sends an immediate Slack notification via webhook:
pythonrequests.post(webhook_url, json={
"text": f"🚨 IP BANNED: {ip}\nRate: {rate} req/s\nBaseline: {mean} req/s\nDuration: {duration}"
})
The alert includes the condition that fired, the current rate, the baseline, and the ban duration — everything needed to understand what happened without digging through logs.

Part 7: The Live Dashboard
The tool serves a web dashboard on port 8080 that refreshes every 3 seconds showing:

Global requests per second
Current baseline mean and stddev
List of banned IPs with reasons
Top 10 source IPs
CPU and memory usage

Built with pure Python's http.server — no frameworks needed.

What I Learned
Building this from scratch taught me things no tutorial ever could:

Statistical anomaly detection is surprisingly approachable once you understand z-scores
deque is one of the most useful Python data structures for time-based problems
iptables is incredibly powerful — blocking at kernel level is orders of magnitude more efficient than application-level blocking
Baselines must be dynamic — hardcoded thresholds always fail in production because traffic patterns change by hour, day, and season
A daemon is not a cron job — continuous processing requires careful thought about memory, threading, and graceful shutdown

The Stack

Python 3.12 — detector daemon
Docker + Docker Compose — Nextcloud and Nginx deployment
Nginx — reverse proxy with JSON access logging
iptables — kernel-level IP blocking
Slack webhooks — real-time alerts
systemd — keeps the daemon running persistently

Repository
The full source code is available at:
https://github.com/Hacker-Dark/hng-stage3-devops

Built as part of the HNG DevOps Internship Stage 3 task.