How We Trapped 1.4 Million Bots and Hackers Using a Kubernetes Honeypot (And What They Taught Us)

#programming #tutorial

How We Trapped 1.4 Million Bots and Hackers Using a Kubernetes Honeypot (And What They Taught Us)

The Real Cost of Bot Traffic Nobody Talks About

You've probably checked your analytics dashboard and wondered: "Who are all these visitors?" If you're running any kind of web service at scale, the answer is uncomfortable. Somewhere between 30-60% of your traffic isn't human. It's bots. Some are benign crawlers. Others are actively trying to break into your infrastructure.

But here's the thing that kept me up at night: we were blocking them at the edge, which meant we were flying blind. Every bot we rate-limited just adjusted its tactics. Every IP we blacklisted was replaced by three more. We were treating symptoms while the disease evolved faster than our defenses.

Then we asked a different question: What if we stopped trying to keep them out and instead invited them in?

That question led to a 90-day experiment that changed how we think about security. We deployed Krawl, a deception honeypot on our Kubernetes cluster, and the results were staggering. 1.4 million bot sessions. 539 distinct attacker profiles. 18% command injection attempts. But more importantly, we learned exactly what we're actually up against.

Let me walk you through everything.

Why Traditional Bot Defense Fails

Before we dive into the honeypot solution, you need to understand why your current defenses are essentially fighting with one hand tied behind your back.

Traditional approaches to bot mitigation follow a simple playbook: detect, block, repeat. You set up rate limiting rules. You implement CAPTCHA challenges. You maintain IP blocklists. And yeah, this works... for about two weeks. Then the attackers adapt.

The fundamental problem is visibility. When you're blocking bots at the edge (CloudFlare, WAF, whatever), you're making binary decisions with incomplete information. Is this a legitimate data center? Is this a residential proxy being used maliciously? Is this a new AI scraper we haven't seen before? You're guessing.

Meanwhile, the bot operators are running their own optimization loop. They're testing payloads. They're probing for injection points. They're harvesting credentials. And every time you block them, they just add another layer of obfuscation and try again from a different angle.

The conversation your security team should be having is this: Stop trying to keep them out. Observe them in a contained environment. Learn their patterns. Then make informed decisions.

That's where deception comes in.

The Honeypot Strategy: Catch Them in the Act

A honeypot is intentionally vulnerable infrastructure designed to attract and trap attackers. In the security world, it's been around for decades. But deploying one on Kubernetes at scale? That's where it gets interesting.

Here's how Krawl works conceptually:

Step 1: Create a shadow infrastructure that looks exactly like your real system. Fake API endpoints that return structurally correct but completely fabricated data. Fake credentials scattered throughout crawlable HTML. Fake database connection strings in .env files. To a bot, it looks like a goldmine. To you, it's a mousetrap.

Step 2: Deploy it alongside your real service. This is where Kubernetes makes things elegant. You're running both your legitimate application and the honeypot in the same cluster, on the same network. Bots don't know the difference.

Step 3: Log everything they do. Every request. Every payload. Every credential they try. Every injection vector they test. You're building a complete behavioral profile of each attacker.

Step 4: Use this intelligence to inform your real defenses. You're not guessing anymore. You know exactly what attacks are coming and where they're coming from.

Setting Up Krawl: A Practical Example

Let me show you how we actually deployed this. The setup is simpler than you'd think.

First, here's the basic Kubernetes deployment. We created a separate namespace to keep things organized:

apiVersion: v1
kind: Namespace
metadata:
  name: honeypot

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: krawl-honeypot
  namespace: honeypot
spec:
  replicas: 3
  selector:
    matchLabels:
      app: honeypot
  template:
    metadata:
      labels:
        app: honeypot
    spec:
      containers:
      - name: krawl
        image: krawl:latest
        ports:
        - containerPort: 8080
        env:
        - name: LOG_LEVEL
          value: "DEBUG"
        - name: HONEYPOT_MODE
          value: "aggressive"
        - name: ELASTICSEARCH_HOST
          value: "elasticsearch.monitoring:9200"
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        volumeMounts:
        - name: config
          mountPath: /etc/krawl/config
      volumes:
      - name: config
        configMap:
          name: honeypot-config

---
apiVersion: v1
kind: Service
metadata:
  name: honeypot-service
  namespace: honeypot
spec:
  selector:
    app: honeypot
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: honeypot-config
  namespace: honeypot
data:
  endpoints.json: |
    {
      "fake_endpoints": [
        {
          "path": "/admin",
          "method": "GET",
          "response": {
            "status": 401,
            "body": {"error": "Unauthorized"}
          }
        },
        {
          "path": "/api/v1/users",
          "method": "GET",
          "response": {
            "status": 200,
            "body": {
              "users": [
                {"id": 1, "username": "admin", "email": "admin@internal.corp"},
                {"id": 2, "username": "dba", "email": "db_admin@internal.corp"}
              ]
            }
          }
        },
        {
          "path": "/.env",
          "method": "GET",
          "response": {
            "status": 200,
            "body": "DB_HOST=prod-db-internal.aws.amazonaws.com\nDB_USER=admin\nDB_PASS=fake_password_123\nAPI_KEY=sk-fake-api-key-here"
          }
        },
        {
          "path": "/api/v1/secrets",
          "method": "POST",
          "response": {
            "status": 200,
            "body": {"secret_key": "fake-secret-that-looks-real"}
          }
        }
      ],
      "logging_config": {
        "capture_payloads": true,
        "capture_headers": true,
        "capture_user_agents": true,
        "log_injection_attempts": true
      }
    }

The magic happens in the logging layer. Here's a simplified version of what we captured:


python
import json
from datetime import datetime
from typing import Dict, Any
from elasticsearch import Elasticsearch

class HoneypotLogger:
    def __init__(self, es_host: str = "localhost:9200"):
        self.es = Elasticsearch([es_host])
        self.index_name = "honeypot-events"

    def log_request(self, request_data: Dict[str, Any]) -> None:
        """
        Log every request hitting the honeypot with full behavioral context
        """
        event = {
            "timestamp": datetime.utcnow().isoformat(),
            "source_ip": request_data.get("source_ip"),
            "user_agent": request_data.get("user_agent"),
            "path": request_data.get("path"),
            "method": request_data.get("method"),
            "query_params": request_data.get("query_params"),
            "payload": request_data.get("body"),
            "headers": request_data.get("headers"),
            "injection_detected": self._detect_injection(request_data),
            "bot_type": self._classify_bot(request_data),
            "threat_score": self._calculate_threat_score(request_data),
            "requested_secrets": self._extract_secret_attempts(request_data)
        }

        self.es.index(index=self.index_name, document=event)

    def _detect_injection(self, request_data: Dict[str, Any]) -> bool:
        """
        Look for common injection patterns in the request
        """
        dangerous_patterns = [
            r"';.*DROP.*TABLE",
            r"\$\{.*\}",
            r"\`.*\`",
            r"<.*script.*>",
            r"exec\(",
            r"system\(",
            r"os\.system",
            r"\|\s*bash",
            r"&&\s*rm\s*-rf"
        ]

        payload = request_data.get("body", "") + request_data.get("path", "")

        import re
        for pattern in dangerous_patterns:
            if re.search(pattern, payload, re.IGNORECASE):
                return True
        return False

    def _classify_bot(self, request_data: Dict[str, Any]) -> str:
        """
        Identify the type of bot based on behavioral signals
        """
        user_agent = request_data.get("user_agent", "").lower()
        path = request_data.get("path", "").lower()

        if any(ua in user_agent for ua in ["chatgpt", "gpt-", "claude"]):
            return "ai_scraper"
        elif any(ua in user_agent for ua in ["sqlmap", "nikto", "nessus"]):
            return "security_scanner"
        elif any(pattern in path for pattern in ["admin", "wp-admin", ".env", "credentials"]):
            if self._detect_injection(request_data):
                return "active_exploit_kit"
            return "credential_harvester"
        elif request_data.get("method") == "OPTIONS":
            return "reconnaissance"

        return "generic_bot"

    def _calculate_threat_score(self, request_data: Dict[str, Any]) -> float:
        """
        Assign a threat level from 0-100
        """
        score = 0.0

        if self._detect_injection(request_data):
            score += 40

        if any(path in request_data.get("path", "") for path in 
               ["admin", ".env", "credentials", "secrets", "keys"]):
            score += 20

        if request_data.get("method") == "POST":
            score += 10

        # High-entropy user agent suggests obfuscation
        ua = request_data.get("user_agent", "")
        if len(set(ua)) / len(ua) > 0.7 if ua else False:
            score += 15

        return min(score, 100.0)

    def _extract_secret_attempts(self, request_data: Dict[str, Any]) -> list:
        """
        Track which fake secrets the attacker tried to access
        """
        secrets_targeted = []
        path = request_data.get("path", "").lower()

        secret_keywords = ["api_key", "password", "token", "secret", "key", "auth"]
        for keyword in secret_keywords:
            if keyword in path or keyword in request_data.get("body", ""):
                secrets_targeted.append(keyword)

        return list(set(secrets_targeted))

# Usage in your honeypot service
logger = HoneypotLogger(es_host="elasticsearch.monitoring:9200")

# Inside your request handler
@app.route("/<path:path>", methods=["GET", "POST", "PUT", "DELETE"])
def honeypot_handler(path):
    request_data = {
        "source_ip": request.remote_addr,
        "user_agent": request.headers

---

## Want This Automated for Your Business?

I build **custom AI bots, automation pipelines, and trading systems** that run 24/7 and generate revenue on autopilot.

**[Hire me on Fiverr](https://www.fiverr.com/users/mikog7998)** — AI bots, web scrapers, data pipelines, and automation built to your spec.

**[Browse my templates on Gumroad](https://mikog7998.gumroad.com)** — ready-to-deploy bot templates, automation scripts, and AI toolkits.

## Recommended Resources

If you want to go deeper on the topics covered in this article:

- [Hands-On Machine Learning (O'Reilly)](https://www.amazon.com/dp/1098125975?tag=masterclaw-20)
- [Designing Machine Learning Systems](https://www.amazon.com/dp/1098107969?tag=masterclaw-20)
- [AI Engineering (Chip Huyen)](https://www.amazon.com/dp/1098166302?tag=masterclaw-20)

*Some links above are affiliate links — they help support this content at no extra cost to you.*