<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 云微</title>
    <description>The latest articles on DEV Community by 云微 (@yunwei37).</description>
    <link>https://dev.to/yunwei37</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1139584%2F27142ebf-c0a3-449b-9482-d63e79238a26.jpeg</url>
      <title>DEV Community: 云微</title>
      <link>https://dev.to/yunwei37</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yunwei37"/>
    <language>en</language>
    <item>
      <title>ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 02 Jun 2026 11:11:23 +0000</pubDate>
      <link>https://dev.to/yunwei37/acrfence-preventing-semantic-rollback-attacks-in-agent-checkpoint-restore-5eja</link>
      <guid>https://dev.to/yunwei37/acrfence-preventing-semantic-rollback-attacks-in-agent-checkpoint-restore-5eja</guid>
      <description>&lt;p&gt;AI agent frameworks are bringing checkpoint/restore, time travel, and rewind into everyday developer workflows. If an agent makes a mistake, it can go back to a checkpoint. If a user wants to explore another path, the agent can branch from an earlier state. This is useful for debugging and human-in-the-loop control, but it becomes dangerous once the agent has already called external tools.&lt;/p&gt;

&lt;p&gt;Traditional checkpoint/restore rolls back local state. It cannot undo side effects that have already happened in the external world. For ordinary programs, the usual answer is idempotency: retry the external call with the same request id, and the server returns the previous result instead of executing the action again. But an LLM agent is not an ordinary deterministic program. After restore, it may synthesize a semantically equivalent tool call with slightly different fields, such as a new UUID, timestamp, nonce, or reference number. The server cannot see that this is a retry of the same intent. It only sees a new valid request.&lt;/p&gt;

&lt;p&gt;This post is based on our arXiv paper &lt;a href="https://arxiv.org/abs/2603.20625" rel="noopener noreferrer"&gt;&lt;strong&gt;ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore&lt;/strong&gt;&lt;/a&gt;. We introduce &lt;strong&gt;semantic rollback attacks&lt;/strong&gt;: attacks that exploit the gap between rolled-back agent state and non-rolled-back external state to trigger duplicate irreversible actions or revive consumed authority.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Transfer Example
&lt;/h2&gt;

&lt;p&gt;Suppose a user asks an agent to transfer $500 to Bob. The agent calls a bank API, generates a unique reference id &lt;code&gt;a1b2c3d4&lt;/code&gt;, and the transfer succeeds. The agent then calls Bob's MCP service to confirm the receipt. Bob's service returns a malformed response that crashes the agent. The framework restores the agent to a checkpoint before the transfer.&lt;/p&gt;

&lt;p&gt;After restore, the agent again executes the intent "transfer $500 to Bob." This time, however, it generates a different reference id, &lt;code&gt;f9a8b7c6&lt;/code&gt;. The bank's duplicate detection logic only sees two different references, so it accepts the second transfer. Bob receives $1000, while the agent's local view remains "I transferred once."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb49uvxu6e68vznmry8p3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb49uvxu6e68vznmry8p3.png" alt="Action Replay attack flow" width="800" height="558"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1: Action Replay. A malicious MCP service triggers a crash after a successful transfer. After restore, the agent reissues the transfer with a new reference id, so the bank treats it as a new transaction.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The key point is not that the transfer API lacks idempotency. The problem is that the precondition for idempotency is broken. Systems such as Stripe and AWS ECS rely on the caller retrying with the same idempotency key or the same critical parameters. An LLM agent rethinks after restore and may produce a different token sequence. Even at temperature 0, byte-identical tool calls are not guaranteed. As a result, traditional server-side deduplication cannot recognize a "semantically same" retry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Cause: Local Rollback, External Progress
&lt;/h2&gt;

&lt;p&gt;Checkpoint/restore systems can save local process state, conversation context, variables, file descriptors, and related runtime state. They cannot automatically undo committed external effects. Transfers, emails, cloud resource creation, data deletion, and one-time token consumption are all irreversible side effects from the framework's point of view.&lt;/p&gt;

&lt;p&gt;In the agent setting, three facts combine badly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The agent state is rolled back.&lt;/strong&gt; The agent returns to an old checkpoint and no longer remembers that the transfer succeeded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The external state is not rolled back.&lt;/strong&gt; The bank ledger, approval system, or cloud control plane still records the previous successful action.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The post-restore tool call may differ.&lt;/strong&gt; The LLM may regenerate UUIDs, nonces, timestamps, or even change the target object under user guidance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxiv5f4q9aif0yojtxtif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxiv5f4q9aif0yojtxtif.png" alt="Divergence between agent state and external state" width="800" height="310"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 2: Restore only affects local agent state. External state keeps moving forward. This divergence is the core of semantic rollback attacks.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This resembles the classic output commit problem in distributed systems: once output has been committed to the outside world, rolling back the local process alone cannot take the whole system back in time. The new twist is that an LLM agent may synthesize a different request after restore, blurring the boundary between "retry" and "new request."&lt;/p&gt;

&lt;h2&gt;
  
  
  Attack 1: Action Replay
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Action Replay&lt;/strong&gt; targets irreversible tool calls that have already succeeded. The attacker does not need to control the bank or compromise the agent. It is enough to control a later service in the agent's tool chain, such as Bob's invoice-confirmation MCP service or a seemingly harmless callback endpoint.&lt;/p&gt;

&lt;p&gt;The attack path is direct:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The agent executes an irreversible action after a checkpoint, such as a transfer or cloud resource creation.&lt;/li&gt;
&lt;li&gt;The external service returns success, and the side effect is committed.&lt;/li&gt;
&lt;li&gt;An attacker-controlled later tool returns a malformed response, triggering crash or restore.&lt;/li&gt;
&lt;li&gt;The agent returns to the old checkpoint and repeats the same task.&lt;/li&gt;
&lt;li&gt;The LLM generates a fresh request id, so the target service cannot recognize the repeated intent and commits again.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcls3mys0lmtpn12nceie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcls3mys0lmtpn12nceie.png" alt="Normal execution compared with attack execution" width="800" height="424"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 3: Normal execution transfers once. In the attack path, crash-induced restore causes the same semantic action to execute twice.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In our experiments, we used Claude Code CLI backed by Qwen3-32B. External services were simulated as MCP tool servers: a bank service with UUID-based duplicate detection and a malicious payee service that crashes the agent after a successful transfer. Across 10 checkpoint/restore trials, all 10 produced duplicate commits. A no-checkpoint baseline produced none. This confirms that the vulnerability comes from the interaction between restore and external side effects, not from ordinary model randomness alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attack 2: Authority Resurrection
&lt;/h2&gt;

&lt;p&gt;The second attack class is &lt;strong&gt;Authority Resurrection&lt;/strong&gt;, which targets one-time authorization tokens or short-lived credentials.&lt;/p&gt;

&lt;p&gt;Consider an enterprise data deletion workflow. The agent first obtains manager approval, and the approval service returns a one-time token. The agent uses that token to delete Alice's data, and the server marks the token as consumed. A user or malicious insider then rewinds the agent to the checkpoint immediately after approval was granted. In the agent's local state, the token appears again. In the external approval system, the token should already be consumed.&lt;/p&gt;

&lt;p&gt;If the target service validates tokens statelessly, for example by checking only a signature and expiration time, the agent may reuse the same token on another target, such as Bob's data. The audit log may show that the manager approved deletion for Alice, while Bob's data was also deleted. The discrepancy is visible only by correlating approval and execution logs.&lt;/p&gt;

&lt;p&gt;Our experiment simulated two approval services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Validation mode&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Stateless validation, checking only token signature&lt;/td&gt;
&lt;td&gt;2/2 reuse attempts succeeded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stateful validation, recording token consumption server-side&lt;/td&gt;
&lt;td&gt;All reuse attempts were rejected&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This shows that checkpoint/restore can do more than duplicate financial side effects. It can also break authorization semantics by reviving authority that should have been consumed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Not One Framework's Bug
&lt;/h2&gt;

&lt;p&gt;The paper surveys reports across multiple frameworks and communities. The concrete symptoms differ, but they point to the same boundary: restore, retry, approval, preemption, and human-in-the-loop flows can cause tool calls to execute more than once, while frameworks generally do not enforce exactly-once semantics at the tool boundary.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework or system&lt;/th&gt;
&lt;th&gt;Observed issue type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;Tool nodes may re-execute after resume or interrupt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;Workflows run twice, causing repeated emails or actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google ADK&lt;/td&gt;
&lt;td&gt;Rewind documentation warns that external side effects are not undone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen / OpenAI Agents&lt;/td&gt;
&lt;td&gt;Graph nodes or function calls are triggered repeatedly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code / Cursor&lt;/td&gt;
&lt;td&gt;Duplicate tool behavior around approval, checkpoint, or undo flows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenHands / Vercel AI / LiveKit / n8n&lt;/td&gt;
&lt;td&gt;Duplicate messages, repeated tool calls, doubled token cost, or repeated charges&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These cases do not mean every framework has the same bug. They show that "restoring agent state to the past" while "the external world remains in the present" is a systemic issue. Relying on developers to make tools idempotent is not enough, because the post-restore agent request may not be the same request.&lt;/p&gt;

&lt;h2&gt;
  
  
  ACRFence: Replay-or-Fork at the Tool Boundary
&lt;/h2&gt;

&lt;p&gt;ACRFence does not try to make every LLM agent deterministic. Instead, it records irreversible effects at the tool boundary and enforces &lt;strong&gt;replay-or-fork&lt;/strong&gt; semantics after restore.&lt;/p&gt;

&lt;p&gt;ACRFence can be deployed as an MCP proxy or a similar tool-call proxy between the agent and external services. For each irreversible tool call, ACRFence records an effect log that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;thread and branch identifiers, to distinguish execution branches in the same session;&lt;/li&gt;
&lt;li&gt;tool name and arguments;&lt;/li&gt;
&lt;li&gt;return value or error;&lt;/li&gt;
&lt;li&gt;runtime context, such as process, network, and file-access context, which can be enriched by eBPF-based system-level monitors such as &lt;a href="https://github.com/eunomia-bpf/agentsight/" rel="noopener noreferrer"&gt;AgentSight&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;consumed credentials or authorization objects, when applicable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the agent restores from a checkpoint and issues another tool call, ACRFence does not immediately forward it. It first compares the new call with the historical effect log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantically equivalent: replay.&lt;/strong&gt; If the new call only changes non-intent fields such as request id or timestamp, while recipient, amount, resource target, and other intent fields are the same, ACRFence returns the previously recorded response without re-executing the external operation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantically divergent: fork.&lt;/strong&gt; If the new call changes intent-critical fields, such as a different recipient or a different customer deletion target, ACRFence blocks the call, shows the prior effect log, and requires an explicit fork.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential reuse: reject or inform.&lt;/strong&gt; If the call tries to reuse a consumed token, ACRFence informs the agent before the request reaches the target service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We use an analyzer LLM for semantic comparison instead of requiring every tool to provide a hand-written schema and idempotency rule. For example, two &lt;code&gt;transfer&lt;/code&gt; calls with different UUIDs but the same amount and recipient should be treated as the same intent. Two &lt;code&gt;delete_customer_data&lt;/code&gt; calls with the same approval token but different customer ids should be treated as dangerous divergence. The analyzer runs only on the restore path, not on every normal tool call.&lt;/p&gt;

&lt;p&gt;ACRFence aims to provide two guarantees:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Replay safety:&lt;/strong&gt; semantically equivalent irreversible calls after restore do not execute again; ACRFence returns the cached result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Divergence detection:&lt;/strong&gt; semantically different calls after restore must explicitly fork; they cannot silently inherit external effects or authority from an old branch.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How This Differs from Idempotency and Durable Execution
&lt;/h2&gt;

&lt;p&gt;Idempotency is still important, but it solves the problem of "the same request is retried." ACRFence works one level higher, at agent intent: request fields may change while intent stays the same, or the fields may look valid while intent has drifted to a new target.&lt;/p&gt;

&lt;p&gt;Durable execution systems usually require deterministic orchestrator logic, with nondeterministic values recorded as side effects and replayed on recovery. That works well for traditional workflows. LLM agents, however, generate their next action from context. Rather than assuming post-restore calls will be byte-identical, ACRFence treats divergence as expected and makes replay versus fork explicit at the tool boundary.&lt;/p&gt;

&lt;p&gt;In this division of labor, checkpoint/restore lets the agent return to an earlier state. ACRFence ensures that reconnecting that old state to the external world does not duplicate irreversible side effects or revive consumed authority.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and Next Steps
&lt;/h2&gt;

&lt;p&gt;The work validates the two attack classes, while ACRFence itself remains a design that still needs a full implementation and system evaluation. Several challenges remain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The analyzer LLM may misclassify calls, so false replay and false fork risks need careful evaluation.&lt;/li&gt;
&lt;li&gt;An adaptive attacker who knows the comparison logic may craft ambiguous parameters to evade semantic detection.&lt;/li&gt;
&lt;li&gt;The boundary between "intent fields" and "non-intent fields" is not always obvious for every tool.&lt;/li&gt;
&lt;li&gt;The current experiments cover one model and one framework; more agent frameworks, models, and real tool ecosystems should be evaluated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core conclusion is clear: once agent frameworks introduce checkpoint, rewind, time travel, and branch exploration, external tool calls cannot rely only on traditional idempotency keys. The restore path is a new security boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Checkpoint/restore makes AI agents easier to debug, recover, and steer across multiple execution paths. But once agents can call external tools, local rollback and external non-rollback create a semantic gap. Action Replay can turn one payment, one resource creation, or one email into many. Authority Resurrection can make consumed authorization reappear in local agent state.&lt;/p&gt;

&lt;p&gt;ACRFence records irreversible effects at the tool boundary and enforces replay-or-fork after restore: same intent replays the result without re-execution, different intent must explicitly fork, and consumed credentials cannot be silently reused. As more agent frameworks support checkpoint and time travel, this kind of tool-boundary semantics will become part of the reliability and security foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2603.20625" rel="noopener noreferrer"&gt;ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://criu.org/" rel="noopener noreferrer"&gt;CRIU: Checkpoint/Restore In Userspace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/oss/python/langgraph/persistence" rel="noopener noreferrer"&gt;LangGraph Persistence and Time Travel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/oss/python/langgraph/durable-execution" rel="noopener noreferrer"&gt;LangGraph Durable Execution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/specification/2025-03-26" rel="noopener noreferrer"&gt;Model Context Protocol Specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.stripe.com/api/idempotent_requests" rel="noopener noreferrer"&gt;Stripe API: Idempotent Requests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.temporal.io/develop/go/side-effects" rel="noopener noreferrer"&gt;Temporal Side Effects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/sessions/session/rewind/" rel="noopener noreferrer"&gt;Google ADK Session Rewind&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/hashicorp/vault/issues/28378" rel="noopener noreferrer"&gt;Vault issue #28378: single-use token reappears after snapshot restore&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Runtime Observability and Enforcement for Opaque AI Agents with eBPF: Beyond Sandboxes and Approvals</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 02 Jun 2026 11:11:21 +0000</pubDate>
      <link>https://dev.to/yunwei37/runtime-observability-and-enforcement-for-opaque-ai-agents-with-ebpf-beyond-sandboxes-and-approvals-8n6</link>
      <guid>https://dev.to/yunwei37/runtime-observability-and-enforcement-for-opaque-ai-agents-with-ebpf-beyond-sandboxes-and-approvals-8n6</guid>
      <description>&lt;p&gt;AI coding agents now run for hours, complete entire features end-to-end,&lt;br&gt;
optimize production GPU kernels, and merge thousands of pull requests&lt;br&gt;
autonomously. Meanwhile, most agent security still relies on human-in-the-loop&lt;br&gt;
approval, and Anthropic's own data shows users approve 93% of prompts without&lt;br&gt;
meaningful review. The result is predictable: products add bypass modes, users&lt;br&gt;
disable permission gates, and 65% of firms report agent security incidents.&lt;/p&gt;

&lt;p&gt;But the deeper problem is not approval fatigue. It is that the agent harness&lt;br&gt;
(the prompt loop, tool routing, permission logic, and sandbox defaults) is&lt;br&gt;
increasingly a third-party product the platform team did not write, running in a&lt;br&gt;
sandbox the platform team may not own. The harness is not a trusted security&lt;br&gt;
boundary. This post argues for separating agent security into three layers with&lt;br&gt;
three different owners: intent authorization (harness-owned), execution&lt;br&gt;
isolation (ownership contested), and side-effect verification (must be&lt;br&gt;
platform-owned). When the layers agree, you have confidence. When they&lt;br&gt;
disagree, you need independent observability and enforcement at the OS level to&lt;br&gt;
detect it, and that is exactly the layer most agent platforms are missing. We&lt;br&gt;
are building projects towards this direction:&lt;br&gt;
&lt;a href="https://github.com/eunomia-bpf/agentsight/" rel="noopener noreferrer"&gt;AgentSight&lt;/a&gt; for runtime observation and&lt;br&gt;
&lt;a href="https://github.com/eunomia-bpf/ActPlane" rel="noopener noreferrer"&gt;ActPlane&lt;/a&gt; for runtime harness enforcement, both using eBPF to provide an&lt;br&gt;
independent runtime observability and enforcement below the agent harness.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Now: Complexity Up, Guardrails Behind
&lt;/h2&gt;

&lt;p&gt;The important change in 2026 is not that agents exist. It is the scale and&lt;br&gt;
duration of what they do.&lt;/p&gt;

&lt;p&gt;A year ago, the typical agent task was "fix this bug" or "write this function."&lt;br&gt;
In 2026, agents routinely run for hours on complex, multi-step work. OpenAI&lt;br&gt;
documented a Codex session that &lt;a href="https://developers.openai.com/blog/run-long-horizon-tasks-with-codex" rel="noopener noreferrer"&gt;ran for 25 hours uninterrupted&lt;/a&gt;,&lt;br&gt;
consuming 13 million tokens and producing 30,000 lines of code from a blank&lt;br&gt;
repository. Anthropic's agentic coding report cites a &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;12.5-million-line&lt;br&gt;
codebase change completed in a single 7-hour run&lt;/a&gt;. Meta's&lt;br&gt;
&lt;a href="https://engineering.fb.com/2026/04/02/developer-tools/kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure/" rel="noopener noreferrer"&gt;KernelEvolve&lt;/a&gt; uses multi-agent coordination to write and optimize&lt;br&gt;
production GPU kernels, compressing work that previously required weeks of&lt;br&gt;
expert systems engineering into hours. On SWE-bench Verified, &lt;a href="https://www.vals.ai/benchmarks/swebench" rel="noopener noreferrer"&gt;top agents now&lt;br&gt;
resolve 60–70%&lt;/a&gt; of real GitHub issues, up from under 30% in early&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Devin has &lt;a href="https://cognition.ai/blog/devin-annual-performance-review-2025" rel="noopener noreferrer"&gt;merged hundreds of thousands of pull requests&lt;/a&gt;
across enterprise customers with a 67% merge rate. Goldman Sachs &lt;a href="https://www.cnbc.com/2025/07/11/goldman-sachs-autonomous-coder-pilot-marks-major-ai-milestone.html" rel="noopener noreferrer"&gt;deployed
hundreds of Devin instances&lt;/a&gt; across a 12,000-person engineering team.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Beyond coding, general-purpose autonomous agents have gone mainstream.&lt;br&gt;
&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, an open-source agent with&lt;br&gt;
over 300,000 GitHub stars, connects to LLMs and executes shell commands,&lt;br&gt;
browser automation, email, calendar, and file operations on the user's machine.&lt;br&gt;
CrowdStrike called it &lt;a href="https://www.crowdstrike.com/en-us/blog/what-security-teams-need-to-know-about-openclaw-ai-super-agent/" rel="noopener noreferrer"&gt;"the AI Super Agent" security teams need to worry&lt;br&gt;
about&lt;/a&gt;:&lt;br&gt;
between January and April 2026, &lt;a href="https://www.reco.ai/blog/openclaw-the-ai-agent-security-crisis-unfolding-right-now" rel="noopener noreferrer"&gt;470 security advisories&lt;/a&gt;&lt;br&gt;
were filed against it across three disclosure waves.&lt;/p&gt;

&lt;p&gt;These are not research demos. They are production workflows: background tasks,&lt;br&gt;
parallel execution, multi-hour sessions, end-to-end feature development, kernel&lt;br&gt;
optimization, and enterprise-scale code changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meanwhile, the guardrails designed to keep agents safe have not kept pace.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most agent security still relies on human-in-the-loop approval: a prompt asks&lt;br&gt;
the user to approve or deny each action before it executes. This works for short&lt;br&gt;
sessions with a few tool calls. It does not work when an agent makes hundreds of&lt;br&gt;
decisions over hours of autonomous operation.&lt;/p&gt;

&lt;p&gt;The evidence suggests that approval-based control is already failing in&lt;br&gt;
practice. Anthropic's own data shows that &lt;a href="https://www.anthropic.com/engineering/claude-code-auto-mode" rel="noopener noreferrer"&gt;Claude Code users approve 93% of&lt;br&gt;
permission prompts&lt;/a&gt;, a rate consistent with rubber-stamping&lt;br&gt;
rather than meaningful review. An independent stress test of Claude Code's auto&lt;br&gt;
mode found an &lt;a href="https://arxiv.org/abs/2604.04978" rel="noopener noreferrer"&gt;81% false negative rate&lt;/a&gt; on ambiguous&lt;br&gt;
state-changing actions, meaning the classifier allowed 4 out of 5 actions that&lt;br&gt;
should have required human review. Real incidents have followed: in documented&lt;br&gt;
cases, users running agents without permission gates had their &lt;a href="https://gist.github.com/hartphoenix/698eb8ef8b08ad2ce6a99cf7346cd7cc" rel="noopener noreferrer"&gt;home directories&lt;br&gt;
deleted&lt;/a&gt; by &lt;code&gt;rm -rf&lt;/code&gt; commands the agent generated. A 2026&lt;br&gt;
industry survey found that &lt;a href="https://www.kiteworks.com/cybersecurity-risk-management/ai-agent-security-incidents-2026/" rel="noopener noreferrer"&gt;65% of firms reported AI agent security&lt;br&gt;
incidents&lt;/a&gt;, primarily&lt;br&gt;
unauthorized data access, credential exposure, and exfiltration to external&lt;br&gt;
endpoints, with most involving organizations lacking proper agent access&lt;br&gt;
controls.&lt;/p&gt;

&lt;p&gt;Products have responded by adding bypass mechanisms. Claude Code offers&lt;br&gt;
&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;. Windsurf's Cascade agent &lt;a href="https://stackbuilt.co/blog/windsurf-vs-cursor-2026" rel="noopener noreferrer"&gt;proceeds&lt;br&gt;
autonomously&lt;/a&gt; where Cursor stops to ask. Community guides now&lt;br&gt;
focus on "how to safely use YOLO mode." Anthropic researcher Nicholas Carlini&lt;br&gt;
ran &lt;a href="https://x.com/nicholas_carlini" rel="noopener noreferrer"&gt;16 parallel Claude agents with permissions bypassed&lt;/a&gt;, with the&lt;br&gt;
caveat: "Run this in a container, not your actual machine."&lt;/p&gt;

&lt;p&gt;This is the tension: &lt;strong&gt;the more capable agents become, the more users want to&lt;br&gt;
let them run uninterrupted, and the less effective human-in-the-loop becomes as&lt;br&gt;
the primary security boundary.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That tension is what creates the need for a different security model.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Accountability Gap
&lt;/h2&gt;

&lt;p&gt;The deeper issue is not just that agents are more capable. It is that the agent&lt;br&gt;
harness, the component that decides what the agent does, is increasingly a&lt;br&gt;
third-party product the platform team did not write.&lt;/p&gt;

&lt;p&gt;A modern agent harness is not a thin wrapper around a model. It includes a&lt;br&gt;
prompt loop, planning and retry logic, tool routing, MCP clients, permission&lt;br&gt;
modes, approval gates, hooks, memory, logs, credential handling, and sometimes&lt;br&gt;
sandbox defaults. In many deployments, that harness comes from a hosted&lt;br&gt;
coding-agent service or an open-source framework the platform team does not&lt;br&gt;
control.&lt;/p&gt;

&lt;p&gt;This is already visible across the ecosystem. GitHub Copilot's &lt;a href="https://docs.github.com/en/copilot/concepts/about-copilot-coding-agent" rel="noopener noreferrer"&gt;coding&lt;br&gt;
agent&lt;/a&gt; runs autonomously in GitHub Actions, researching&lt;br&gt;
repositories, creating plans, making changes, and opening pull requests. OpenAI&lt;br&gt;
&lt;a href="https://developers.openai.com/codex/cloud" rel="noopener noreferrer"&gt;Codex&lt;/a&gt; runs background tasks in sandboxed cloud environments with&lt;br&gt;
controlled network access. Claude Code runs cloud sessions in Anthropic-managed&lt;br&gt;
VMs with scoped credentials. Kubernetes SIG is defining &lt;a href="https://agent-sandbox.sigs.k8s.io/" rel="noopener noreferrer"&gt;Agent&lt;br&gt;
Sandbox&lt;/a&gt; for isolated, stateful agent workloads. Recent research&lt;br&gt;
datasets show &lt;a href="https://arxiv.org/abs/2602.09185" rel="noopener noreferrer"&gt;agent-authored pull requests at scale&lt;/a&gt; across real&lt;br&gt;
repositories.&lt;/p&gt;

&lt;p&gt;The ownership split is now explicit in major platforms. Anthropic's shared&lt;br&gt;
responsibility framework &lt;a href="https://www.anthropic.com/research/trustworthy-agents" rel="noopener noreferrer"&gt;divides agent security into four&lt;br&gt;
layers&lt;/a&gt; (Model, Harness, Tools, Environment) and&lt;br&gt;
stresses that an agent's behavior depends on all four working together, so the&lt;br&gt;
harness, tools, and environment, the layers shaped by the deploying party, are&lt;br&gt;
as decisive as the model itself. Anthropic itself notes that even together,&lt;br&gt;
these layered safeguards are not a guarantee. The question the framework&lt;br&gt;
leaves open is what happens when a failure crosses these layers, and whether&lt;br&gt;
the deployer has independent observability to detect it. In cloud infrastructure,&lt;br&gt;
the analogous gap in shared responsibility led to independent observability&lt;br&gt;
and audit services (CloudTrail, Config, GuardDuty) controlled by the&lt;br&gt;
customer, not the provider. Agent infrastructure has no equivalent yet: the&lt;br&gt;
deployer is told it owns harness, tools, and environment, but often has no&lt;br&gt;
independent way to verify what those layers actually did at runtime.&lt;/p&gt;

&lt;p&gt;GitHub's agentic&lt;br&gt;
workflow architecture starts from the premise that &lt;a href="https://github.blog/ai-and-ml/generative-ai/under-the-hood-security-architecture-of-github-agentic-workflows/" rel="noopener noreferrer"&gt;"agents cannot be trusted by&lt;br&gt;
default, especially in the presence of untrusted inputs"&lt;/a&gt;,&lt;br&gt;
using kernel-enforced communication boundaries that hold even if the agent&lt;br&gt;
container is compromised. OpenAI's Codex documentation &lt;a href="https://developers.openai.com/codex/agent-approvals-security" rel="noopener noreferrer"&gt;acknowledges&lt;/a&gt;&lt;br&gt;
that "devcontainers provide substantial protection, but they do not prevent&lt;br&gt;
every attack."&lt;/p&gt;

&lt;p&gt;The platform team still owns the repository, the CI runner, the Kubernetes&lt;br&gt;
cluster, the service accounts, the secrets, and the internal network. But the&lt;br&gt;
runtime acting on those assets may be opaque.&lt;/p&gt;

&lt;p&gt;There is also a second split that matters even more for platform teams: &lt;strong&gt;the&lt;br&gt;
sandbox may not be controlled by the environment owner either.&lt;/strong&gt; If the agent&lt;br&gt;
runs in a provider-managed cloud (Claude Code on the web runs in&lt;br&gt;
&lt;a href="https://docs.anthropic.com/en/docs/claude-code/security" rel="noopener noreferrer"&gt;Anthropic-managed isolated VMs&lt;/a&gt; with scoped credential&lt;br&gt;
proxies; Codex runs in &lt;a href="https://developers.openai.com/codex/concepts/sandboxing" rel="noopener noreferrer"&gt;OpenAI-managed containers&lt;/a&gt;), the&lt;br&gt;
platform team cannot attach its own monitoring, modify isolation policy, or&lt;br&gt;
inspect the sandbox internals. Even Anthropic's own managed agent architecture&lt;br&gt;
explicitly &lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;decouples the "brain" (Claude + harness) from the&lt;br&gt;
"hands" (sandboxes)&lt;/a&gt;, treating containers as disposable and ensuring tokens are never reachable&lt;br&gt;
from the sandbox where generated code runs. This is good architecture, but it is the provider's architecture,&lt;br&gt;
not the platform team's.&lt;/p&gt;

&lt;p&gt;When agents run locally or on self-hosted infrastructure (GitHub now &lt;a href="https://github.blog/changelog/2025-10-28-copilot-coding-agent-now-supports-self-hosted-runners/" rel="noopener noreferrer"&gt;supports&lt;br&gt;
self-hosted runners&lt;/a&gt; for its coding agent, and Kubernetes&lt;br&gt;
Agent Sandbox provides &lt;a href="https://agent-sandbox.sigs.k8s.io/" rel="noopener noreferrer"&gt;gVisor/Kata-backed isolation&lt;/a&gt; under the&lt;br&gt;
platform operator's control), the environment owner can wrap the agent in its&lt;br&gt;
own sandbox and observability. When agents run in provider-managed&lt;br&gt;
environments, independent observability and enforcement must move to the&lt;br&gt;
boundaries the platform team does control.&lt;/p&gt;

&lt;p&gt;This creates the accountability gap: &lt;strong&gt;the platform team is responsible for&lt;br&gt;
production impact from a workload it cannot fully inspect, running in a sandbox&lt;br&gt;
it may not own.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The old mental model was simple: the agent is risky, so put it in a sandbox.&lt;br&gt;
The new reality has a different trust boundary: the agent and its harness are&lt;br&gt;
part of the workload, and the environment owner needs independent runtime observability.&lt;/p&gt;
&lt;h2&gt;
  
  
  Three Layers, Three Questions
&lt;/h2&gt;

&lt;p&gt;MCP, sandboxes, and OS-level observability are all necessary for agent security.&lt;br&gt;
They are not interchangeable. Each answers a fundamentally different question,&lt;br&gt;
and each has a different owner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent authorization&lt;/strong&gt; (MCP, tool gateways, approval prompts) answers: what&lt;br&gt;
is the agent &lt;em&gt;supposed&lt;/em&gt; to do? Which tools may it call, under which identity,&lt;br&gt;
with which scopes? This is the right place to enforce access control before a&lt;br&gt;
dangerous action happens. But a tool approval is not proof of side effects. A&lt;br&gt;
framework log saying "run tests" does not prove that the process tree only ran&lt;br&gt;
tests. An MCP server can be well-authenticated and still be part of a workflow&lt;br&gt;
that causes unexpected local effects. This layer is typically owned or mediated&lt;br&gt;
by the agent harness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execution isolation&lt;/strong&gt; (containers, VMs, network policy, namespaces) answers:&lt;br&gt;
what &lt;em&gt;can&lt;/em&gt; the agent reach? Which files, network endpoints, credentials, and&lt;br&gt;
syscalls are available? This is the right place to limit blast radius. But a&lt;br&gt;
sandbox does not automatically record what the agent attempted within its&lt;br&gt;
constraints: which process read a secret, which subprocess opened a network&lt;br&gt;
connection, whether the sandbox policy matched the approved intent. This layer's&lt;br&gt;
ownership is contested: it may belong to the agent provider, the platform team,&lt;br&gt;
or both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Side-effect verification&lt;/strong&gt; (OS/runtime observability) answers: what &lt;em&gt;actually&lt;br&gt;
happened&lt;/em&gt;? Which processes ran, which files were read, which network connections&lt;br&gt;
were opened, which credentials were accessed? This layer provides facts about&lt;br&gt;
execution, independent of what the framework reported or the sandbox intended.&lt;br&gt;
This layer must be owned by the environment operator. Otherwise there is no&lt;br&gt;
independent source of truth.&lt;/p&gt;

&lt;p&gt;The security model is the combination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;authorize intent  →  isolate execution  →  verify side effects
(harness-owned)      (ownership contested)  (must be platform-owned)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When all three layers agree, you have confidence. When they disagree, you need&lt;br&gt;
OS-level observability and controls, independent of the harness, to detect the&lt;br&gt;
mismatch, contain the damage, and reconstruct what happened.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Independence Matters
&lt;/h2&gt;

&lt;p&gt;The reason to keep these layers independent follows from the trends above, but&lt;br&gt;
also from a deeper structural argument about ownership and trust.&lt;/p&gt;
&lt;h3&gt;
  
  
  Approval fatigue
&lt;/h3&gt;

&lt;p&gt;When approvals are relaxed (as the evidence above shows they routinely are),&lt;br&gt;
the other two layers must compensate. If you auto-approve routine actions, you&lt;br&gt;
need an independent way to verify what those actions actually did. If you&lt;br&gt;
bypass permissions for speed, you need stronger containment and stronger observability.&lt;/p&gt;
&lt;h3&gt;
  
  
  Harness opacity
&lt;/h3&gt;

&lt;p&gt;When the harness is opaque, application-level telemetry cannot be the sole&lt;br&gt;
source of truth. OpenTelemetry GenAI conventions and framework-level tracing are&lt;br&gt;
valuable when you own the framework. But opaque agent apps, closed-source&lt;br&gt;
runtimes, hosted execution, stripped binaries, and arbitrary subprocess trees&lt;br&gt;
can all break the assumption that the framework trace is complete. OpenClaw&lt;br&gt;
illustrates this directly: its behavior is &lt;a href="https://arxiv.org/html/2603.27517v2" rel="noopener noreferrer"&gt;non-deterministic across&lt;br&gt;
runs&lt;/a&gt;, producing different tool-calling&lt;br&gt;
sequences for the same input, which makes static code review inadequate and&lt;br&gt;
drove multiple teams to build dedicated runtime observability tools for it&lt;br&gt;
(&lt;a href="https://www.sentinelone.com/blog/oneclaw-discovery-and-observability-for-the-agentic-era/" rel="noopener noreferrer"&gt;OneClaw&lt;/a&gt;,&lt;br&gt;
&lt;a href="https://www.epsilla.com/blogs/clawtrace-launch-openclaw-agent-observability" rel="noopener noreferrer"&gt;ClawTrace&lt;/a&gt;).&lt;br&gt;
Security researchers have already found &lt;a href="https://thehackernews.com/2025/12/researchers-uncover-30-flaws-in-ai.html" rel="noopener noreferrer"&gt;30+ vulnerabilities across all major AI&lt;br&gt;
IDEs&lt;/a&gt; (Cursor, Copilot, Windsurf, Claude Code), enabling data theft&lt;br&gt;
and remote code execution through prompt injection into agent tool chains.&lt;/p&gt;

&lt;p&gt;The MCP layer records intended tool calls. The OS layer records actual side&lt;br&gt;
effects. When the harness is opaque, the gap between these two is exactly where&lt;br&gt;
security incidents live.&lt;/p&gt;
&lt;h3&gt;
  
  
  The trust boundary is an ownership boundary
&lt;/h3&gt;

&lt;p&gt;The deepest reason for independence is that the three layers serve different&lt;br&gt;
owners with different incentives.&lt;/p&gt;

&lt;p&gt;The harness provider's goal is to complete the user's task: maximize&lt;br&gt;
autonomous coding productivity, reduce permission friction, deliver results.&lt;br&gt;
The platform team's goal is to protect the repository, secrets, cluster,&lt;br&gt;
CI runner, internal network, and production APIs. These goals are not opposed,&lt;br&gt;
but they are not identical. When they conflict, when the fastest path to task&lt;br&gt;
completion involves reading credentials, opening network connections, or&lt;br&gt;
modifying files outside the workspace, the harness will optimize for&lt;br&gt;
completion unless an independent boundary stops it.&lt;/p&gt;

&lt;p&gt;This is why &lt;a href="https://arxiv.org/abs/2602.09947" rel="noopener noreferrer"&gt;Bhattarai and Vu argue&lt;/a&gt; that&lt;br&gt;
"probabilistic compliance is not compliance": training-based and&lt;br&gt;
classifier-based defenses may reduce empirical attack rates, but cannot provide&lt;br&gt;
deterministic guarantees under adversarial conditions. Only architectural&lt;br&gt;
enforcement can. Red Hat's experience deploying multi-agent systems on Kagenti&lt;br&gt;
frames the same insight differently: this is &lt;a href="https://next.redhat.com/2026/03/05/zero-trust-ai-agents-on-kubernetes-what-i-learned-deploying-multi-agent-systems-on-kagenti/" rel="noopener noreferrer"&gt;"a multi-tenancy problem disguised&lt;br&gt;
as an AI problem"&lt;/a&gt;. The agent is an untrusted tenant. The&lt;br&gt;
platform needs the same kind of isolation, identity, and audit controls it would&lt;br&gt;
apply to any untrusted workload.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/" rel="noopener noreferrer"&gt;OWASP Top 10 for Agentic Applications&lt;/a&gt; reinforces this&lt;br&gt;
framing. Its top risk (ASI01, Agent Goal Hijacking) is that "agents cannot&lt;br&gt;
reliably distinguish instructions from data," and a single malicious input from a&lt;br&gt;
repository, issue, MCP response, or web page can redirect the agent to perform&lt;br&gt;
harmful actions using its legitimate tools. This is not a hypothetical:&lt;br&gt;
&lt;a href="https://bishopfox.com/blog/otto-support-confused-deputy" rel="noopener noreferrer"&gt;Bishop Fox demonstrated&lt;/a&gt; confused deputy attacks where&lt;br&gt;
instructions embedded in support tickets caused agents to exfiltrate data using&lt;br&gt;
authorized tools, with "the user's name on every audit log entry." &lt;a href="https://www.docker.com/blog/mcp-horror-stories-github-prompt-injection/" rel="noopener noreferrer"&gt;Docker&lt;br&gt;
documented&lt;/a&gt; a GitHub prompt injection chain where a&lt;br&gt;
malicious issue hijacked an MCP-connected agent to steal confidential data from&lt;br&gt;
private repositories.&lt;/p&gt;

&lt;p&gt;The threat model for platform teams therefore has three adversary categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threat&lt;/th&gt;
&lt;th&gt;Which layer fails&lt;/th&gt;
&lt;th&gt;Runtime observability detects&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Compromised agent&lt;/strong&gt; (prompt injection, malicious repo/issue/MCP response)&lt;/td&gt;
&lt;td&gt;Intent layer: agent is tricked into unintended actions&lt;/td&gt;
&lt;td&gt;Actual side effects diverge from stated intent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Untrusted harness&lt;/strong&gt; (opaque permission logic, incomplete logs, unauditable internal state)&lt;/td&gt;
&lt;td&gt;Cannot verify harness completeness&lt;/td&gt;
&lt;td&gt;OS-level facts independent of harness reporting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Sandbox escape or policy gap&lt;/strong&gt; (container breakout, mounted credentials, network bypass)&lt;/td&gt;
&lt;td&gt;Isolation layer fails or is misconfigured&lt;/td&gt;
&lt;td&gt;Detects behavior outside expected sandbox boundary&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AISI's &lt;a href="https://arxiv.org/abs/2603.02277" rel="noopener noreferrer"&gt;SandboxEscapeBench&lt;/a&gt; makes the third category concrete:&lt;br&gt;
frontier models can reliably escape container sandboxes under&lt;br&gt;
misconfigurations that plausibly occur in real systems, and the researchers&lt;br&gt;
discovered four unintended escape paths the benchmark designers had missed.&lt;br&gt;
Their recommendation: &lt;a href="https://arxiv.org/abs/2603.02277" rel="noopener noreferrer"&gt;"treat plain Docker isolation as insufficient by&lt;br&gt;
default."&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In all three cases, OS/runtime observability is the independent control&lt;br&gt;
that lets the platform team detect the problem, regardless of which other layer&lt;br&gt;
failed.&lt;/p&gt;
&lt;h2&gt;
  
  
  What OS-Level Monitoring Captures
&lt;/h2&gt;

&lt;p&gt;At the OS/runtime layer, observability captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process lineage&lt;/strong&gt;: the full tree from agent to subprocess to network call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File access&lt;/strong&gt;: which paths were read or written, including credential paths&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network behavior&lt;/strong&gt;: connections, destinations, timing, data volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container metadata&lt;/strong&gt;: namespace, cgroup, pod identity, service account&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subprocess behavior&lt;/strong&gt;: commands that bypass framework instrumentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This data is collected below the application layer, typically via eBPF,&lt;br&gt;
audit subsystems, or kernel instrumentation. It does not require modifying the&lt;br&gt;
agent app. Its key property is independence: the observability is owned and&lt;br&gt;
operated by the environment operator, not by the agent provider.&lt;/p&gt;

&lt;p&gt;This makes cross-layer comparison possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Framework report:    run tests
Sandbox policy:      workspace mounted, registry allowed, SA token mounted
OS observability:       agent → shell → python → curl
                     read: /var/run/secrets/.../token
                     connect: unknown external host
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer saw a different part of the event. Without the OS layer, this is an&lt;br&gt;
undetected credential theft: a service account token read and exfiltrated while&lt;br&gt;
the framework logged only "running tests." The platform team discovers the&lt;br&gt;
breach days later, if at all. OS-level observability is what turns an invisible data leak into a real-time&lt;br&gt;
detection.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deployment Reality
&lt;/h2&gt;

&lt;p&gt;OS-level observability is strongest when you control the host, node, or VM where the&lt;br&gt;
agent executes. If the agent runs entirely in a provider-managed environment,&lt;br&gt;
you may not be able to attach eBPF inside it.&lt;/p&gt;

&lt;p&gt;In that case, the same model applies, but observability shifts to the boundaries you do control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repository permissions and branch protection&lt;/li&gt;
&lt;li&gt;Scoped credentials with minimal lifetime&lt;/li&gt;
&lt;li&gt;CI/CD and GitHub audit logs&lt;/li&gt;
&lt;li&gt;Network proxies and webhook events&lt;/li&gt;
&lt;li&gt;Artifact access logs&lt;/li&gt;
&lt;li&gt;Provider-supplied session logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This observability is weaker than owning the runtime boundary, but it is still better&lt;br&gt;
than treating the agent transcript as the only source of truth.&lt;/p&gt;

&lt;p&gt;The design question for platform teams is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Where is the lowest layer I actually control?&lt;br&gt;
That is where independent observability should live.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  AgentSight and ActPlane: Observe, Then Enforce
&lt;/h2&gt;

&lt;p&gt;We are building open-source tools that implement the verification layer&lt;br&gt;
described above, each addressing a different half of the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/eunomia-bpf/agentsight/" rel="noopener noreferrer"&gt;AgentSight&lt;/a&gt;&lt;/strong&gt; is a zero-instrumentation observability tool for&lt;br&gt;
AI agents. It uses eBPF to intercept SSL/TLS traffic and monitor process&lt;br&gt;
behavior at the system boundary, with no code changes, no SDKs, and no&lt;br&gt;
framework integration required. Point it at any agent process (Claude Code,&lt;br&gt;
Codex, a custom Python agent) and it captures the full picture: process&lt;br&gt;
lineage, LLM API calls (prompts and completions), file access, network&lt;br&gt;
connections, and tool invocations, all correlated into a live timeline. This is&lt;br&gt;
the "see what actually happened" layer. Because it operates below the&lt;br&gt;
application, it works even when the agent runtime is opaque, closed-source, or&lt;br&gt;
running arbitrary subprocesses that bypass framework-level tracing. In&lt;br&gt;
practice, this means detecting credential access, data exfiltration attempts,&lt;br&gt;
and unauthorized network connections as they happen, not days later when an&lt;br&gt;
external party reports the breach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/eunomia-bpf/ActPlane" rel="noopener noreferrer"&gt;ActPlane&lt;/a&gt;&lt;/strong&gt; is an OS-level harness for AI agents. Where AgentSight&lt;br&gt;
observes, ActPlane enforces. You write behavioral contracts in a YAML-based&lt;br&gt;
rule language (labeled information-flow control, not static allow-lists), and&lt;br&gt;
ActPlane compiles them into an eBPF program that enforces constraints at the&lt;br&gt;
kernel level: every &lt;code&gt;exec&lt;/code&gt;, file open, and network connect in the agent's&lt;br&gt;
entire process tree is checked against the policy. When a rule is violated,&lt;br&gt;
ActPlane blocks the action and feeds a human-readable reason back to the agent&lt;br&gt;
through its hook system, so the agent self-corrects rather than failing&lt;br&gt;
silently. The rule language supports data-flow tracking across fork/exec&lt;br&gt;
chains, causal ordering ("run tests before committing"), and staleness&lt;br&gt;
invalidation, going well beyond what sandboxes or tool-layer guards can&lt;br&gt;
express.&lt;/p&gt;

&lt;p&gt;The two tools are complementary. AgentSight provides runtime observability:&lt;br&gt;
independent, below-the-application visibility into what the agent did. ActPlane&lt;br&gt;
provides the enforcement plane: deterministic, kernel-level guarantees about&lt;br&gt;
what the agent cannot do. Together they implement the "verify side effects"&lt;br&gt;
layer of the three-layer model, independent of the harness provider and&lt;br&gt;
independent of who owns the sandbox.&lt;/p&gt;

&lt;p&gt;Both are possible implementations of this architecture, not the only ones.&lt;br&gt;
The important point is the separation: observe and enforce at a layer the&lt;br&gt;
environment operator controls, regardless of which agent runtime sits above.&lt;/p&gt;

&lt;p&gt;This also addresses ecosystem gaps Anthropic identifies: the need for&lt;br&gt;
cross-deployment security telemetry sharing and open standards for agent&lt;br&gt;
security. Independent runtime observability that travels with the workload,&lt;br&gt;
rather than being locked to a specific harness or provider, is the foundation&lt;br&gt;
for both.&lt;/p&gt;
&lt;h2&gt;
  
  
  Practical Checklist
&lt;/h2&gt;

&lt;p&gt;If you are building or evaluating an agent platform, ask these questions at&lt;br&gt;
each layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent authorization (MCP / tool access):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are MCP servers allowlisted?&lt;/li&gt;
&lt;li&gt;Are OAuth scopes minimal and audience-bound?&lt;/li&gt;
&lt;li&gt;Are local MCP servers treated as code execution risk?&lt;/li&gt;
&lt;li&gt;Are high-risk tools gated by human approval?&lt;/li&gt;
&lt;li&gt;Are tool calls logged with enough context for audit?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Execution isolation (sandboxing):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is filesystem access default-deny or broad workspace mount?&lt;/li&gt;
&lt;li&gt;Can the agent reach cloud metadata endpoints?&lt;/li&gt;
&lt;li&gt;Is network egress restricted by domain, IP, or proxy?&lt;/li&gt;
&lt;li&gt;Are service account tokens mounted into the environment?&lt;/li&gt;
&lt;li&gt;Are process, memory, CPU, and runtime duration bounded?&lt;/li&gt;
&lt;li&gt;Who owns the sandbox policy: the platform team or the agent provider?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Side-effect verification (runtime observability):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you reconstruct process lineage for an agent session?&lt;/li&gt;
&lt;li&gt;Can you see file and credential access below the framework?&lt;/li&gt;
&lt;li&gt;Can you correlate network egress with pod, service account, and command?&lt;/li&gt;
&lt;li&gt;Can you detect mismatch between tool intent and OS side effects?&lt;/li&gt;
&lt;li&gt;Can you replay an incident without trusting only framework logs?&lt;/li&gt;
&lt;li&gt;Can you demonstrate to auditors (SOC 2, ISO 27001) how automated agent
access to production data and credentials is monitored and logged?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Guardrail integration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which side effects should be blocked immediately?&lt;/li&gt;
&lt;li&gt;Which should trigger alert or human review?&lt;/li&gt;
&lt;li&gt;Which policies belong in MCP config, sandbox config, Kubernetes policy,
eBPF/LSM, or network controls?&lt;/li&gt;
&lt;li&gt;What happens when framework logs and OS-level observability disagree?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Agent runtimes are becoming more capable, more managed, and more opaque. The&lt;br&gt;
security model cannot depend on any single layer, especially when the layers&lt;br&gt;
have different owners.&lt;/p&gt;

&lt;p&gt;The harness is not a trusted boundary. The sandbox ownership depends on the&lt;br&gt;
deployment model. The only layer the environment operator can guarantee it&lt;br&gt;
owns is OS/runtime observability.&lt;/p&gt;

&lt;p&gt;MCP authorizes intent. Sandboxes constrain execution. OS-level observability verifies side&lt;br&gt;
effects. Each is necessary; none is sufficient. The practical model is their&lt;br&gt;
separation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;authorize intent  →  isolate execution  →  verify side effects
(harness-owned)      (ownership contested)  (must be platform-owned)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The implementation details vary by deployment, but the separation, and the&lt;br&gt;
ownership question, is the part that should remain stable.&lt;/p&gt;

&lt;p&gt;If you are exploring this space, &lt;a href="https://github.com/eunomia-bpf/agentsight/" rel="noopener noreferrer"&gt;AgentSight&lt;/a&gt; and&lt;br&gt;
&lt;a href="https://github.com/eunomia-bpf/ActPlane" rel="noopener noreferrer"&gt;ActPlane&lt;/a&gt; are our open-source starting points for the observation&lt;br&gt;
and enforcement layers respectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/copilot/concepts/about-copilot-coding-agent" rel="noopener noreferrer"&gt;GitHub Docs: About Copilot coding agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/ai-and-ml/generative-ai/under-the-hood-security-architecture-of-github-agentic-workflows/" rel="noopener noreferrer"&gt;GitHub: Security Architecture of Agentic Workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2025-10-28-copilot-coding-agent-now-supports-self-hosted-runners/" rel="noopener noreferrer"&gt;GitHub: Copilot coding agent supports self-hosted runners&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/cloud" rel="noopener noreferrer"&gt;OpenAI Codex cloud documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/blog/run-long-horizon-tasks-with-codex" rel="noopener noreferrer"&gt;OpenAI: Run long horizon tasks with Codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/agent-approvals-security" rel="noopener noreferrer"&gt;OpenAI: Codex Agent Approvals &amp;amp; Security&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;Anthropic 2026 Agentic Coding Trends Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research/trustworthy-agents" rel="noopener noreferrer"&gt;Anthropic: Trustworthy Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/claude-code-auto-mode" rel="noopener noreferrer"&gt;Anthropic Engineering: Claude Code auto mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/claude-code-sandboxing" rel="noopener noreferrer"&gt;Anthropic Engineering: Making Claude Code More Secure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;Anthropic Engineering: Scaling Managed Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www-cdn.anthropic.com/43ec7e770925deabc3f0bc1dbf0133769fd03812.pdf" rel="noopener noreferrer"&gt;Anthropic NIST RFI on Agentic Security&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code/security" rel="noopener noreferrer"&gt;Claude Code security documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/permission-modes" rel="noopener noreferrer"&gt;Claude Code permission modes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices" rel="noopener noreferrer"&gt;MCP Security Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/tutorials/security/authorization" rel="noopener noreferrer"&gt;MCP Authorization documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://agent-sandbox.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kubernetes SIGs Agent Sandbox&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox" rel="noopener noreferrer"&gt;Google Cloud: Agent Sandbox on GKE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" rel="noopener noreferrer"&gt;OpenTelemetry GenAI semantic conventions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://engineering.fb.com/2026/04/02/developer-tools/kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure/" rel="noopener noreferrer"&gt;Meta KernelEvolve&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.vals.ai/benchmarks/swebench" rel="noopener noreferrer"&gt;SWE-Bench Verified Leaderboard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cognition.ai/blog/devin-annual-performance-review-2025" rel="noopener noreferrer"&gt;Devin's 2025 Performance Review&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2025/07/11/goldman-sachs-autonomous-coder-pilot-marks-major-ai-milestone.html" rel="noopener noreferrer"&gt;Goldman Sachs autonomous coder pilot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://next.redhat.com/2026/03/05/zero-trust-ai-agents-on-kubernetes-what-i-learned-deploying-multi-agent-systems-on-kagenti/" rel="noopener noreferrer"&gt;Red Hat: Zero trust AI agents on Kubernetes with Kagenti&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2602.09185" rel="noopener noreferrer"&gt;AIDev: Studying AI Coding Agents on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2605.07135" rel="noopener noreferrer"&gt;Agentic Workflow Injection in GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.04978" rel="noopener noreferrer"&gt;Measuring the Permission Gate: Claude Code Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2602.09947" rel="noopener noreferrer"&gt;Trustworthy Agentic AI Requires Deterministic Architectural Boundaries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.13630" rel="noopener noreferrer"&gt;SafeHarness: Security Architecture for LLM-based Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2603.02277" rel="noopener noreferrer"&gt;SandboxEscapeBench: Can AI Agents Escape Their Sandboxes?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/" rel="noopener noreferrer"&gt;OWASP Top 10 for Agentic Applications 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://bishopfox.com/blog/otto-support-confused-deputy" rel="noopener noreferrer"&gt;Bishop Fox: The Confused Deputy, MCP Attack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.docker.com/blog/mcp-horror-stories-github-prompt-injection/" rel="noopener noreferrer"&gt;Docker: MCP Horror Stories, GitHub Prompt Injection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thehackernews.com/2025/12/researchers-uncover-30-flaws-in-ai.html" rel="noopener noreferrer"&gt;30+ Vulnerabilities in AI Coding Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kiteworks.com/cybersecurity-risk-management/ai-agent-security-incidents-2026/" rel="noopener noreferrer"&gt;AI Agent Security Incidents Hit 65% of Firms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.bvp.com/atlas/securing-ai-agents-the-defining-cybersecurity-challenge-of-2026" rel="noopener noreferrer"&gt;Bessemer: Securing AI agents in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/articles/securing-autonomous-ai-agents-kubernetes/" rel="noopener noreferrer"&gt;InfoQ: Securing Autonomous AI Agents on Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.crowdstrike.com/en-us/blog/what-security-teams-need-to-know-about-openclaw-ai-super-agent/" rel="noopener noreferrer"&gt;CrowdStrike: What Security Teams Need to Know About OpenClaw&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reco.ai/blog/openclaw-the-ai-agent-security-crisis-unfolding-right-now" rel="noopener noreferrer"&gt;Reco.ai: The OpenClaw Agent Security Crisis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.27517v2" rel="noopener noreferrer"&gt;OpenClaw Security Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.sentinelone.com/blog/oneclaw-discovery-and-observability-for-the-agentic-era/" rel="noopener noreferrer"&gt;SentinelOne: OneClaw Discovery and Observability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.epsilla.com/blogs/clawtrace-launch-openclaw-agent-observability" rel="noopener noreferrer"&gt;Epsilla: ClawTrace Agent Observability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//agentsight_paper.md"&gt;AgentSight blog post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/eunomia-bpf/agentsight/" rel="noopener noreferrer"&gt;AgentSight repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/eunomia-bpf/ActPlane" rel="noopener noreferrer"&gt;ActPlane repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ebpf</category>
      <category>ai</category>
      <category>security</category>
      <category>observability</category>
    </item>
    <item>
      <title>When CPU Noise Slows Down GPU Inference: Measuring Scheduler and IRQ Impact with eBPF</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Sun, 31 May 2026 23:35:33 +0000</pubDate>
      <link>https://dev.to/yunwei37/when-cpu-noise-slows-down-gpu-inference-measuring-scheduler-and-irq-impact-with-ebpf-egg</link>
      <guid>https://dev.to/yunwei37/when-cpu-noise-slows-down-gpu-inference-measuring-scheduler-and-irq-impact-with-ebpf-egg</guid>
      <description>&lt;p&gt;GPU inference often looks like a GPU problem, but the CPU still sits on the critical path. It prepares inputs, launches CUDA kernels, manages synchronization, handles runtime calls, and shares cores with system work, interrupts, and other tenants. If that CPU-side launch path is delayed, the GPU can be left waiting even when the GPU kernels themselves are fast.&lt;/p&gt;

&lt;p&gt;This post asks a concrete question: when an LLM inference workload is running on a GPU, how much do Linux CPU scheduling decisions and IRQ handling actually matter?&lt;/p&gt;

&lt;p&gt;To answer it, we built an eBPF tracing tool, &lt;code&gt;cuda_sched_trace&lt;/code&gt;, that records CUDA kernel launches, scheduler context switches, and hard/soft IRQ events with nanosecond timestamps. We then ran Qwen3 0.6B inference under clean and noisy-neighbor conditions: CPU load from &lt;code&gt;stress-ng&lt;/code&gt;, network load from &lt;code&gt;iperf3&lt;/code&gt;, disk load from &lt;code&gt;fio&lt;/code&gt;, a combined heavy-load case, and a mitigation case using CPU pinning and priority adjustment.&lt;/p&gt;

&lt;p&gt;The short version: in a clean environment, scheduler and IRQ overhead are small. Under production-like noisy-neighbor conditions, they can become very real. Combined CPU, network, and disk interference reduced throughput by &lt;strong&gt;20.5%&lt;/strong&gt;, while simple CPU pinning reduced context switches by &lt;strong&gt;96.3%&lt;/strong&gt; and recovered most of the lost throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CPU Scheduling Shows Up in GPU Inference
&lt;/h2&gt;

&lt;p&gt;Modern GPU workloads, particularly LLM inference and training, require tight coordination between CPU and GPU execution. The CPU is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preparing input data and kernel parameters&lt;/li&gt;
&lt;li&gt;launching GPU kernels through CUDA APIs&lt;/li&gt;
&lt;li&gt;managing memory transfers and synchronization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An interruption to that CPU-side workflow can delay GPU kernel submission. In the worst case, the GPU has available compute capacity but no new work to execute.&lt;/p&gt;

&lt;p&gt;The motivation comes partly from Meta's work on &lt;code&gt;sched_ext&lt;/code&gt; for AI training optimization, where production issues include "IRQs preempting our important tasks." Network interrupts (&lt;code&gt;NET_RX&lt;/code&gt;/&lt;code&gt;NET_TX&lt;/code&gt;) and block device interrupts can matter for large distributed training jobs, and custom scheduling policies can improve AI workload performance by 5-20%.&lt;/p&gt;

&lt;p&gt;But the impact is workload-dependent. A single-node LLM inference loop is not the same as distributed training with all-reduce traffic. Before investing in custom scheduling, we wanted measurements that separate scheduler problems from normal application behavior.&lt;/p&gt;

&lt;p&gt;The study has four goals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Measure the baseline impact of CPU scheduling on GPU kernel launches.&lt;/li&gt;
&lt;li&gt;Characterize IRQ interference patterns and their performance cost.&lt;/li&gt;
&lt;li&gt;Quantify noisy-neighbor impact under CPU, network, disk, and combined load.&lt;/li&gt;
&lt;li&gt;Evaluate how much CPU pinning and priority adjustment help.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tracing the Launch Path
&lt;/h2&gt;

&lt;p&gt;We developed &lt;code&gt;cuda_sched_trace&lt;/code&gt;, an eBPF-based tracing tool that combines CUDA API uprobes, Linux scheduler tracepoints, and IRQ tracepoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  CUDA API Tracing
&lt;/h3&gt;

&lt;p&gt;The tool attaches uprobes to CUDA Driver and Runtime APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Attach to CUDA Driver API&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"uprobe/cuLaunchKernel"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;trace_cuLaunchKernel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Capture: timestamp, pid, tid, grid/block dimensions, shared memory, stream&lt;/span&gt;
    &lt;span class="c1"&gt;// Mark process as GPU process for scheduler tracking&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Attach to CUDA Runtime API&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"uprobe/cudaLaunchKernel"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;trace_cudaLaunchKernel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"uprobe/cudaDeviceSynchronize"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;trace_cudaDeviceSynchronize_enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"uretprobe/cudaDeviceSynchronize"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;trace_cudaDeviceSynchronize_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scheduler Event Tracing
&lt;/h3&gt;

&lt;p&gt;Scheduler activity is captured through &lt;code&gt;sched_switch&lt;/code&gt;, filtered to GPU-related processes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tp_btf/sched_switch"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sched_switch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;preempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;task_struct&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;task_struct&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Only track if prev or next is a GPU process&lt;/span&gt;
    &lt;span class="c1"&gt;// Record: timestamp, prev/next pid, off-cpu/on-cpu duration&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  IRQ Tracing
&lt;/h3&gt;

&lt;p&gt;Hard and soft IRQs are tracked through kernel tracepoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tp_btf/irq_handler_entry"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;irq_handler_entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;irq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;irqaction&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Track hard IRQ entry, record IRQ number and handler name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tp_btf/irq_handler_exit"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;irq_handler_exit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;irq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;irqaction&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Calculate IRQ duration&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tp_btf/softirq_entry"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;softirq_entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;vec_nr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Track soft IRQ: TIMER, NET_RX, NET_TX, BLOCK, SCHED, RCU, etc.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tp_btf/softirq_exit"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;softirq_exit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;vec_nr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Calculate soft IRQ duration&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data path is straightforward: the GPU application issues CUDA calls; eBPF programs observe CUDA, scheduler, and IRQ events in kernel space; events are sent through a BPF ring buffer; analysis scripts parse the resulting CSV.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────┐
│                         User Space                               │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │ GPU App     │    │ cuda_sched  │    │ Analysis Scripts    │  │
│  │ (qwen3.cu)  │    │ _trace      │    │ (Python)            │  │
│  └──────┬──────┘    └──────┬──────┘    └──────────┬──────────┘  │
│         │                  │                       │             │
│         │ CUDA calls       │ perf_event            │ CSV parsing │
│         ▼                  ▼                       ▼             │
├─────────────────────────────────────────────────────────────────┤
│                         Kernel Space                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │ uprobes     │    │ tracepoints │    │ BPF Ring Buffer     │  │
│  │ (CUDA API)  │    │ (sched,irq) │    │ (Event Queue)       │  │
│  └─────────────┘    └─────────────┘    └─────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Benchmark and Environment
&lt;/h2&gt;

&lt;p&gt;The benchmark is Qwen3 0.6B LLM inference using &lt;code&gt;qwen3.cu&lt;/code&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Qwen3-0.6B-FP32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task&lt;/td&gt;
&lt;td&gt;Single-turn Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input&lt;/td&gt;
&lt;td&gt;"What is eBPF?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;~30-50 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Pattern&lt;/td&gt;
&lt;td&gt;Burst submission (~950 launches per token)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU Memory&lt;/td&gt;
&lt;td&gt;~3 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This benchmark is useful because it resembles modern LLM inference, mixes compute-bound and memory-bound kernels, shows a clear burst submission pattern, and produces a measurable throughput metric in tokens per second.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;24 cores (specific model TBD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU&lt;/td&gt;
&lt;td&gt;NVIDIA GPU with CUDA support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Sufficient for model + system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Linux 6.15.11-061511-generic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel&lt;/td&gt;
&lt;td&gt;BTF-enabled for CO-RE eBPF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CUDA&lt;/td&gt;
&lt;td&gt;Driver API + Runtime API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We used three interference tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;stress-ng&lt;/td&gt;
&lt;td&gt;CPU load&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--cpu 0 --cpu-method fft&lt;/code&gt; (all cores)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iperf3&lt;/td&gt;
&lt;td&gt;Network I/O&lt;/td&gt;
&lt;td&gt;Server + Client, 10 parallel streams, 60s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fio&lt;/td&gt;
&lt;td&gt;Disk I/O&lt;/td&gt;
&lt;td&gt;&lt;code&gt;randwrite, bs=4k, iodepth=32, 4 jobs&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The full experiment has six scenarios:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Interference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;Clean environment&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy CPU&lt;/td&gt;
&lt;td&gt;CPU-intensive&lt;/td&gt;
&lt;td&gt;stress-ng on all cores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Network&lt;/td&gt;
&lt;td&gt;Network I/O&lt;/td&gt;
&lt;td&gt;iperf3 localhost loopback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Disk&lt;/td&gt;
&lt;td&gt;Disk I/O&lt;/td&gt;
&lt;td&gt;fio random write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy Load&lt;/td&gt;
&lt;td&gt;Combined&lt;/td&gt;
&lt;td&gt;CPU + Network + Disk simultaneously&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;td&gt;CPU pinning&lt;/td&gt;
&lt;td&gt;stress-ng + taskset -c 0-3 + nice -n -10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Data collection follows the same pattern in every run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start tracing&lt;/span&gt;
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./cuda_sched_trace &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trace.csv 2&amp;gt; trace.log &amp;amp;
&lt;span class="nv"&gt;TRACE_PID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$!&lt;/span&gt;

&lt;span class="c"&gt;# Run benchmark&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;qwen3.cu
/usr/bin/time &lt;span class="nt"&gt;-v&lt;/span&gt; ./runcu Qwen3-0.6B-FP32.gguf &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"What is eBPF?"&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; 1

&lt;span class="c"&gt;# Stop tracing&lt;/span&gt;
&lt;span class="nb"&gt;sudo kill&lt;/span&gt; &lt;span class="nt"&gt;-SIGINT&lt;/span&gt; &lt;span class="nv"&gt;$TRACE_PID&lt;/span&gt;

&lt;span class="c"&gt;# Analyze results&lt;/span&gt;
python3 analyze_scheduler_impact.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Analysis Method
&lt;/h2&gt;

&lt;p&gt;The central analysis compares consecutive CUDA kernel launches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Launch_i -&amp;gt; [interval] -&amp;gt; Launch_i+1

Group A: Launches with NO context switch in interval (normal flow)
Group B: Launches with context switch in interval (preempted)

Preemption Penalty = median(Group B interval) - median(Group A interval)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To compare runs of different lengths, scheduler and IRQ counts are normalized per 1,000 kernel launches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sched/1K = (Total Context Switches / Total Kernel Launches) x 1000
IRQ/1K = (Total IRQs / Total Kernel Launches) x 1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Performance impact is reported as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Slowdown % = (Baseline tok/s - Scenario tok/s) / Baseline tok/s x 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  RQ1: Does CPU Scheduler Significantly Impact GPU Performance in Clean Environments?
&lt;/h2&gt;

&lt;p&gt;The first question is whether scheduler preemption matters when the machine is otherwise clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experiment design&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Condition: clean system, no artificial interference&lt;/li&gt;
&lt;li&gt;Metrics: context switch frequency, preemption penalty, total runtime impact&lt;/li&gt;
&lt;li&gt;Analysis: launch-pair comparison with and without context switches&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total Runtime&lt;/td&gt;
&lt;td&gt;79.5 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Launches&lt;/td&gt;
&lt;td&gt;51,464&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Switches&lt;/td&gt;
&lt;td&gt;592 (7.44 Hz)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OFF-CPU Time&lt;/td&gt;
&lt;td&gt;7.88 ms (0.01%)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Launch-pair analysis shows that almost every consecutive launch pair is unaffected by context switches:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Group&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Percentage&lt;/th&gt;
&lt;th&gt;P50 Interval&lt;/th&gt;
&lt;th&gt;P90 Interval&lt;/th&gt;
&lt;th&gt;P99 Interval&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No Context Switch&lt;/td&gt;
&lt;td&gt;51,401&lt;/td&gt;
&lt;td&gt;99.9%&lt;/td&gt;
&lt;td&gt;2 us&lt;/td&gt;
&lt;td&gt;4 us&lt;/td&gt;
&lt;td&gt;4 us&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;With Context Switch&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;0.1%&lt;/td&gt;
&lt;td&gt;15.3 ms&lt;/td&gt;
&lt;td&gt;15.5 ms&lt;/td&gt;
&lt;td&gt;5.0 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The median preemption penalty is &lt;strong&gt;15.3 ms&lt;/strong&gt;. That is large for the affected pairs, but only 62 pairs were affected.&lt;/p&gt;

&lt;p&gt;Tail-latency attribution confirms that most outliers are not caused by scheduler preemption:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Percentile&lt;/th&gt;
&lt;th&gt;Total Outliers&lt;/th&gt;
&lt;th&gt;With Context Switch&lt;/th&gt;
&lt;th&gt;Attribution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P95+&lt;/td&gt;
&lt;td&gt;2,580&lt;/td&gt;
&lt;td&gt;62 (2.4%)&lt;/td&gt;
&lt;td&gt;97.6% application&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P99+&lt;/td&gt;
&lt;td&gt;515&lt;/td&gt;
&lt;td&gt;62 (12.0%)&lt;/td&gt;
&lt;td&gt;88.0% application&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The total scheduler impact is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Impact = Affected Pairs x Penalty = 62 x 15ms = 0.93 seconds
Percentage = 0.93 / 79.5 = 1.2%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Finding:&lt;/strong&gt; in clean environments, CPU scheduler impact is minimal at &lt;strong&gt;1.2%&lt;/strong&gt;. The vast majority of kernel launch pairs, &lt;strong&gt;99.9%&lt;/strong&gt;, are unaffected by context switches. Tail latency mostly comes from application behavior such as token-generation boundaries, not scheduler preemption.&lt;/p&gt;

&lt;h2&gt;
  
  
  RQ2: What Is the Impact of IRQ Interrupts on GPU Performance?
&lt;/h2&gt;

&lt;p&gt;The second question is whether IRQs directly interfere with the CPU-side launch path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experiment design&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Condition: clean system with IRQ tracing enabled&lt;/li&gt;
&lt;li&gt;Metrics: IRQ frequency, duration, type distribution&lt;/li&gt;
&lt;li&gt;Analysis: IRQ time as percentage of total runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total Runtime&lt;/td&gt;
&lt;td&gt;4.99 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Launches&lt;/td&gt;
&lt;td&gt;125,236&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soft IRQs&lt;/td&gt;
&lt;td&gt;653 events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard IRQs&lt;/td&gt;
&lt;td&gt;0 events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Soft IRQ type distribution:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Total Time&lt;/th&gt;
&lt;th&gt;Avg Time&lt;/th&gt;
&lt;th&gt;Max Time&lt;/th&gt;
&lt;th&gt;Percentage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TIMER&lt;/td&gt;
&lt;td&gt;317&lt;/td&gt;
&lt;td&gt;0.77 ms&lt;/td&gt;
&lt;td&gt;2.4 us&lt;/td&gt;
&lt;td&gt;30.1 us&lt;/td&gt;
&lt;td&gt;49%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RCU&lt;/td&gt;
&lt;td&gt;291&lt;/td&gt;
&lt;td&gt;0.40 ms&lt;/td&gt;
&lt;td&gt;1.4 us&lt;/td&gt;
&lt;td&gt;17.2 us&lt;/td&gt;
&lt;td&gt;45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NET_RX&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;0.13 ms&lt;/td&gt;
&lt;td&gt;4.5 us&lt;/td&gt;
&lt;td&gt;14.0 us&lt;/td&gt;
&lt;td&gt;4.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SCHED&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;0.07 ms&lt;/td&gt;
&lt;td&gt;4.9 us&lt;/td&gt;
&lt;td&gt;18.9 us&lt;/td&gt;
&lt;td&gt;2.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Total IRQ impact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total IRQ Time: 1.38 ms
Percentage of Runtime: 0.0276%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are real reasons to worry about IRQs: direct handler time, cache pollution, CPU pipeline disruption, and delay accumulation on critical paths. But for this local inference workload, actual IRQ impact is small.&lt;/p&gt;

&lt;p&gt;The reason is the workload shape. Qwen3 submits about 950 launches in a burst lasting less than 100 us, so IRQs rarely land inside the burst. Most IRQs happen between bursts during CPU compute. TIMER interrupts dominate and have a small cache footprint. There is little network I/O, so &lt;code&gt;NET_RX&lt;/code&gt; appears only 30 times, and there are no hard IRQs from NVMe or SSD block-device interrupts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finding:&lt;/strong&gt; IRQ impact is negligible for local LLM inference at &lt;strong&gt;0.0276%&lt;/strong&gt;. This does not mean IRQs never matter. Distributed training with network communication or on-the-fly data loading can see much higher IRQ impact, estimated around &lt;strong&gt;5-20%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  RQ3: How Do Noisy Neighbors Affect GPU Performance?
&lt;/h2&gt;

&lt;p&gt;The third question is the most production-relevant one: what happens when the GPU workload shares a machine with other CPU, network, and disk activity?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experiment design&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Interference&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Reference point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy CPU&lt;/td&gt;
&lt;td&gt;stress-ng (all cores)&lt;/td&gt;
&lt;td&gt;CPU contention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Network&lt;/td&gt;
&lt;td&gt;iperf3 (10 streams)&lt;/td&gt;
&lt;td&gt;Network IRQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Disk&lt;/td&gt;
&lt;td&gt;fio (4 jobs, randwrite)&lt;/td&gt;
&lt;td&gt;Block IRQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy Load&lt;/td&gt;
&lt;td&gt;All three combined&lt;/td&gt;
&lt;td&gt;Production simulation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;td&gt;CPU stress + taskset + nice&lt;/td&gt;
&lt;td&gt;Mitigation test&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Normalized metrics per 1,000 kernel launches:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Launches&lt;/th&gt;
&lt;th&gt;Sched/1K&lt;/th&gt;
&lt;th&gt;Soft IRQ/1K&lt;/th&gt;
&lt;th&gt;Hard IRQ/1K&lt;/th&gt;
&lt;th&gt;IRQ Time (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;56,882&lt;/td&gt;
&lt;td&gt;22.8&lt;/td&gt;
&lt;td&gt;5.8&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy CPU&lt;/td&gt;
&lt;td&gt;61,184&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11,932.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6.4&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Network&lt;/td&gt;
&lt;td&gt;154,394&lt;/td&gt;
&lt;td&gt;6.0&lt;/td&gt;
&lt;td&gt;2.7&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Disk&lt;/td&gt;
&lt;td&gt;126,670&lt;/td&gt;
&lt;td&gt;29.3&lt;/td&gt;
&lt;td&gt;3.9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Heavy Load&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99,424&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6,044.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.4&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;td&gt;108,984&lt;/td&gt;
&lt;td&gt;445.2&lt;/td&gt;
&lt;td&gt;2.8&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.71&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Performance impact:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;tok/s&lt;/th&gt;
&lt;th&gt;Runtime (s)&lt;/th&gt;
&lt;th&gt;Slowdown&lt;/th&gt;
&lt;th&gt;Context Switch Increase&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;54.77&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy CPU&lt;/td&gt;
&lt;td&gt;49.93&lt;/td&gt;
&lt;td&gt;4.15&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;524x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Network&lt;/td&gt;
&lt;td&gt;53.23&lt;/td&gt;
&lt;td&gt;7.22&lt;/td&gt;
&lt;td&gt;2.8%&lt;/td&gt;
&lt;td&gt;0.26x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy Disk&lt;/td&gt;
&lt;td&gt;54.95&lt;/td&gt;
&lt;td&gt;5.60&lt;/td&gt;
&lt;td&gt;-0.3%&lt;/td&gt;
&lt;td&gt;1.3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Heavy Load&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;43.56&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.97&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20.5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;265x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;td&gt;53.75&lt;/td&gt;
&lt;td&gt;5.10&lt;/td&gt;
&lt;td&gt;1.9%&lt;/td&gt;
&lt;td&gt;19.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scenario Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Noisy CPU (&lt;code&gt;stress-ng&lt;/code&gt;)&lt;/strong&gt; causes the most direct scheduling pressure. Context switches increase &lt;strong&gt;524x&lt;/strong&gt;, from 22.8 to 11,932.8 per 1,000 launches, and throughput drops by &lt;strong&gt;8.8%&lt;/strong&gt;. The mechanism is simple: the CFS scheduler time-slices between the GPU process and &lt;code&gt;stress-ng&lt;/code&gt; workers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Noisy Network (&lt;code&gt;iperf3&lt;/code&gt;)&lt;/strong&gt; behaves differently. Context switches actually decrease, because the network load changes CPU competition patterns, while soft IRQs rise slightly. Throughput drops only &lt;strong&gt;2.8%&lt;/strong&gt;. In this local setup, network I/O primarily shows up as IRQ overhead rather than scheduler pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Noisy Disk (&lt;code&gt;fio&lt;/code&gt;)&lt;/strong&gt; introduces the first hard IRQs, corresponding to block-device interrupts, but context switches remain low and throughput is effectively unchanged at &lt;strong&gt;-0.3%&lt;/strong&gt; slowdown. Disk I/O has little impact on this workload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heavy Load (CPU + Network + Disk)&lt;/strong&gt; is the worst case. Throughput drops by &lt;strong&gt;20.5%&lt;/strong&gt;, and scheduler events rise to 6,044.6 per 1,000 launches, a &lt;strong&gt;265x&lt;/strong&gt; increase over baseline. Interestingly, that is only &lt;strong&gt;50.7%&lt;/strong&gt; of the context-switch rate in the Noisy CPU case. The interference sources compete with each other, but their combined effect is still worst overall.&lt;/p&gt;

&lt;p&gt;Heavy-load soft IRQ breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Total Time&lt;/th&gt;
&lt;th&gt;Avg Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RCU&lt;/td&gt;
&lt;td&gt;213&lt;/td&gt;
&lt;td&gt;217.4 us&lt;/td&gt;
&lt;td&gt;1.0 us&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TIMER&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;122.9 us&lt;/td&gt;
&lt;td&gt;7.2 us&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SCHED&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;33.3 us&lt;/td&gt;
&lt;td&gt;6.7 us&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Finding:&lt;/strong&gt; noisy neighbors significantly affect GPU performance. Combined CPU, network, and disk interference causes &lt;strong&gt;20.5%&lt;/strong&gt; degradation. The signatures differ by source: CPU contention increases context switches, network I/O affects IRQ overhead, disk I/O introduces block interrupts with little throughput impact here, and combined load is worst due to cumulative effects.&lt;/p&gt;

&lt;h2&gt;
  
  
  RQ4: Can CPU Pinning Effectively Mitigate Scheduler Impact?
&lt;/h2&gt;

&lt;p&gt;The fourth question is whether a simple deployment-level mitigation helps before reaching for a custom scheduler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Experiment design&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Baseline: Noisy CPU scenario with &lt;code&gt;stress-ng&lt;/code&gt; on all cores&lt;/li&gt;
&lt;li&gt;Optimized: same &lt;code&gt;stress-ng&lt;/code&gt; load, but the GPU process runs with:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;taskset -c 0-3&lt;/code&gt; to pin it to cores 0-3&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nice -n -10&lt;/code&gt; to give it higher priority&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Noisy CPU&lt;/th&gt;
&lt;th&gt;Optimized&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sched/1K&lt;/td&gt;
&lt;td&gt;11,932.8&lt;/td&gt;
&lt;td&gt;445.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.3% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tok/s&lt;/td&gt;
&lt;td&gt;49.93&lt;/td&gt;
&lt;td&gt;53.75&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.6% improvement&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vs. Baseline&lt;/td&gt;
&lt;td&gt;8.8% slower&lt;/td&gt;
&lt;td&gt;1.9% slower&lt;/td&gt;
&lt;td&gt;Significant recovery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CPU pinning and priority adjustment recover most of the lost throughput. But the optimized case still has 445.2 scheduler events per 1,000 launches, compared with 22.8 in the clean baseline. That is still &lt;strong&gt;19.5x&lt;/strong&gt; higher than baseline.&lt;/p&gt;

&lt;p&gt;Complete elimination is hard because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;stress-ng&lt;/code&gt; workers may still be scheduled on cores 0-3.&lt;/li&gt;
&lt;li&gt;System daemons and kernel threads cannot be fully excluded by &lt;code&gt;taskset&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;IRQ affinity may still route interrupts to pinned cores.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For stronger isolation, the next steps are kernel-level isolation and IRQ placement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Use isolcpus kernel parameter (boot time)&lt;/span&gt;
&lt;span class="nv"&gt;isolcpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4-7 &lt;span class="nv"&gt;nohz_full&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4-7

&lt;span class="c"&gt;# 2. Bind GPU process to isolated cores&lt;/span&gt;
taskset &lt;span class="nt"&gt;-c&lt;/span&gt; 4-7 ./gpu_app

&lt;span class="c"&gt;# 3. Bind IRQs away from GPU cores&lt;/span&gt;
&lt;span class="nb"&gt;echo &lt;/span&gt;0-3 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /proc/irq/&lt;span class="k"&gt;*&lt;/span&gt;/smp_affinity_list

&lt;span class="c"&gt;# 4. Use cgroups for CPU isolation&lt;/span&gt;
cgcreate &lt;span class="nt"&gt;-g&lt;/span&gt; cpu:gpu_workload
cgset &lt;span class="nt"&gt;-r&lt;/span&gt; cpuset.cpus&lt;span class="o"&gt;=&lt;/span&gt;4-7 gpu_workload
cgexec &lt;span class="nt"&gt;-g&lt;/span&gt; cpu:gpu_workload ./gpu_app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Finding:&lt;/strong&gt; CPU pinning is highly effective. It reduces context switches by &lt;strong&gt;96.3%&lt;/strong&gt; and recovers &lt;strong&gt;7.6%&lt;/strong&gt; throughput. But full recovery under heavy load requires deeper isolation such as &lt;code&gt;isolcpus&lt;/code&gt;, &lt;code&gt;nohz_full&lt;/code&gt;, cpusets, and IRQ affinity management.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Results Mean
&lt;/h2&gt;

&lt;p&gt;The results point to four practical insights.&lt;/p&gt;

&lt;p&gt;First, environment matters. Scheduler impact ranges from &lt;strong&gt;1.2%&lt;/strong&gt; in a clean environment to &lt;strong&gt;20.5%&lt;/strong&gt; under combined heavy load. Optimizing the scheduler on a quiet dedicated server may not be worth the complexity. On a shared host, it can be the difference between stable and degraded inference.&lt;/p&gt;

&lt;p&gt;Second, workload shape matters. Qwen3 has bursty kernel submission, roughly 950 launches in less than 100 us per token burst. That shape makes it resilient to many IRQs because interrupts usually occur between bursts. A different workload with continuous network communication, streaming input, or tighter CPU-GPU handoff might behave differently.&lt;/p&gt;

&lt;p&gt;Third, interference sources have distinct signatures:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Interference&lt;/th&gt;
&lt;th&gt;Primary Impact&lt;/th&gt;
&lt;th&gt;Secondary Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;Context switches&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;IRQ overhead&lt;/td&gt;
&lt;td&gt;Slight scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk&lt;/td&gt;
&lt;td&gt;Hard IRQs&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Combined&lt;/td&gt;
&lt;td&gt;All of above&lt;/td&gt;
&lt;td&gt;Worst overall&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fourth, simple mitigations work, but only up to a point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU pinning: very effective, &lt;strong&gt;96%&lt;/strong&gt; context-switch reduction&lt;/li&gt;
&lt;li&gt;Priority adjustment: helpful but limited&lt;/li&gt;
&lt;li&gt;Full isolation: requires kernel configuration and IRQ affinity management&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison with Meta's sched_ext Findings
&lt;/h2&gt;

&lt;p&gt;Our results differ from Meta's AI training observations because the workload is different.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Meta (AI Training)&lt;/th&gt;
&lt;th&gt;Our Study (LLM Inference)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary Issue&lt;/td&gt;
&lt;td&gt;Network IRQ (NET_RX)&lt;/td&gt;
&lt;td&gt;CPU scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IRQ Impact&lt;/td&gt;
&lt;td&gt;5-20%&lt;/td&gt;
&lt;td&gt;0.03% (local inference)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimization&lt;/td&gt;
&lt;td&gt;sched_ext layer&lt;/td&gt;
&lt;td&gt;taskset + nice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workload&lt;/td&gt;
&lt;td&gt;Distributed training&lt;/td&gt;
&lt;td&gt;Single-node inference&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key difference is communication. Distributed training constantly exchanges data through all-reduce, making &lt;code&gt;NET_RX&lt;/code&gt; a major bottleneck. Local inference has minimal network I/O, so the dominant issue under noise is CPU scheduling rather than network interrupts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;There are several limits to this study:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;eBPF tracing itself adds &lt;strong&gt;1-5%&lt;/strong&gt; overhead.&lt;/li&gt;
&lt;li&gt;The tool only supports CUDA, not OpenCL or HIP.&lt;/li&gt;
&lt;li&gt;The trace does not include GPU-side execution timing, so it cannot directly measure actual kernel runtime.&lt;/li&gt;
&lt;li&gt;IRQ attribution is limited: the trace cannot always identify which process caused a given IRQ.&lt;/li&gt;
&lt;li&gt;The experiments use a single GPU and do not cover multi-GPU behavior.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;p&gt;For production deployments:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Expected Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dedicated Server&lt;/td&gt;
&lt;td&gt;No optimization needed&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared Server (light)&lt;/td&gt;
&lt;td&gt;taskset + nice&lt;/td&gt;
&lt;td&gt;5-10% improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared Server (heavy)&lt;/td&gt;
&lt;td&gt;isolcpus + IRQ affinity&lt;/td&gt;
&lt;td&gt;15-20% improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;CPU limits + nodeSelector&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The decision tree is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is GPU workload latency-sensitive?
├── No -&amp;gt; No optimization needed
└── Yes -&amp;gt; Is server shared?
    ├── No -&amp;gt; Monitor only, optimize if needed
    └── Yes -&amp;gt; How heavy is colocated load?
        ├── Light -&amp;gt; taskset + nice
        └── Heavy -&amp;gt; isolcpus + dedicated cores
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;CPU scheduling and IRQ handling do not always matter for GPU inference, but they matter under the conditions where production systems often run: shared hosts, background load, and noisy neighbors.&lt;/p&gt;

&lt;p&gt;The clean baseline shows minimal overhead: &lt;strong&gt;1.2%&lt;/strong&gt; scheduler impact and &lt;strong&gt;0.03%&lt;/strong&gt; IRQ impact. But combined CPU, network, and disk interference causes &lt;strong&gt;20.5%&lt;/strong&gt; throughput degradation. CPU pinning cuts context switches by &lt;strong&gt;96.3%&lt;/strong&gt; and recovers most of the lost performance, but not all of it.&lt;/p&gt;

&lt;p&gt;The practical lesson is to measure first. Use tracing to identify whether your workload is scheduler-bound, IRQ-sensitive, or mostly application-limited. Then choose the mitigation that matches the signature: CPU pinning for CPU contention, IRQ affinity for interrupt interference, I/O tuning for block-device pressure, and full CPU isolation when the workload is latency-sensitive and colocated load is heavy.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Meta Platforms, Inc. "Accelerating AI Training with sched_ext." Linux Plumbers Conference 2025. &lt;a href="https://lpc.events/event/19/contributions/2039/" rel="noopener noreferrer"&gt;https://lpc.events/event/19/contributions/2039/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;NVIDIA Corporation. "CUDA Driver API Reference." &lt;a href="https://docs.nvidia.com/cuda/cuda-driver-api/" rel="noopener noreferrer"&gt;https://docs.nvidia.com/cuda/cuda-driver-api/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Linux Kernel Documentation. "BPF Documentation." &lt;a href="https://www.kernel.org/doc/html/latest/bpf/" rel="noopener noreferrer"&gt;https://www.kernel.org/doc/html/latest/bpf/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;stress-ng. "A tool to load and stress a computer system." &lt;a href="https://github.com/ColinIanKing/stress-ng" rel="noopener noreferrer"&gt;https://github.com/ColinIanKing/stress-ng&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;iperf3. "A TCP, UDP, and SCTP network bandwidth measurement tool." &lt;a href="https://github.com/esnet/iperf" rel="noopener noreferrer"&gt;https://github.com/esnet/iperf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;fio. "Flexible I/O Tester." &lt;a href="https://github.com/axboe/fio" rel="noopener noreferrer"&gt;https://github.com/axboe/fio&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ebpf</category>
      <category>gpu</category>
      <category>llm</category>
      <category>performance</category>
    </item>
    <item>
      <title>eBPF Tutorial by Example 50: Composable Traffic Control with TCX Links</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Sun, 31 May 2026 23:34:31 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-by-example-50-composable-traffic-control-with-tcx-links-5hmo</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-by-example-50-composable-traffic-control-with-tcx-links-5hmo</guid>
      <description>&lt;p&gt;Ever tried attaching multiple BPF programs to the TC ingress path and got frustrated managing qdisc handles, filter priorities, and the &lt;code&gt;tc&lt;/code&gt; CLI? Or needed one application's TC program to coexist safely with another's without accidentally overwriting it? Traditional &lt;code&gt;cls_bpf&lt;/code&gt; attachment through &lt;code&gt;tc&lt;/code&gt; works, but it inherits decades of queueing discipline plumbing that was never designed for the BPF-centric world. What if you could attach, order, and manage TC programs using the same link-based API that XDP and cgroup programs already enjoy?&lt;/p&gt;

&lt;p&gt;This is what &lt;strong&gt;TCX&lt;/strong&gt; (Traffic Control eXtension) solves. Introduced by Daniel Borkmann and merged in Linux 6.6, TCX provides a lightweight, fd-based multi-program attach infrastructure for the TC ingress and egress data path. Programs get BPF link semantics (safe ownership, auto-detachment on close, and explicit ordering through &lt;code&gt;BPF_F_BEFORE&lt;/code&gt; / &lt;code&gt;BPF_F_AFTER&lt;/code&gt; flags) without touching a single qdisc or filter priority.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll attach two TCX ingress programs to the loopback interface, place one before the other, query the kernel's live chain state, and generate traffic to verify execution order.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The complete source code: &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/50-tcx" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/50-tcx&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction to TCX: Why Classic TC Attachment Needed a Rethink
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Qdisc Plumbing and Unsafe Ownership
&lt;/h3&gt;

&lt;p&gt;Classic &lt;code&gt;tc&lt;/code&gt; BPF attachment (&lt;code&gt;cls_bpf&lt;/code&gt;) was bolted onto the existing Traffic Control framework. To attach a BPF program, you first needed a &lt;code&gt;clsact&lt;/code&gt; qdisc on the interface, then added a filter with a handle and priority. This worked fine for a single operator, but created real problems in cloud-native environments where multiple applications need to attach TC programs to the same interface:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No ownership model&lt;/strong&gt;: A &lt;code&gt;tc filter del&lt;/code&gt; from one application can accidentally remove another application's program. There's no protection against this because classic tc filters are identified by handle/priority, not by the process that created them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Priority conflicts&lt;/strong&gt;: Two applications might pick the same priority number. The second attachment silently replaces the first.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Permanent attachment by default&lt;/strong&gt;: Classic tc filters persist until explicitly removed. If the application that attached a filter crashes without cleanup, the filter remains, potentially with stale program logic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CLI dependency&lt;/strong&gt;: Even with libbpf, the attachment model was tied to netlink, the same mechanism the &lt;code&gt;tc&lt;/code&gt; CLI uses. This meant your BPF application was sharing a control plane with every other tc user on the system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These issues became acute in projects like Cilium, where the BPF dataplane needs to coexist with third-party CNI plugins, observability agents, and security tools that all want to hook into TC.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Link-Based Multi-Program Management
&lt;/h3&gt;

&lt;p&gt;TCX takes a fundamentally different approach. Instead of piggybacking on qdisc infrastructure, it provides a dedicated, qdisc-less extension point for BPF programs at the TC ingress and egress hooks. The key design principles:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BPF Link Semantics&lt;/strong&gt;: &lt;code&gt;bpf_program__attach_tcx()&lt;/code&gt; creates a &lt;code&gt;BPF_LINK_TYPE_TCX&lt;/code&gt; link. Like XDP links and cgroup links, TCX links give you safe ownership: the link is pinned to the file descriptor, auto-detaches when the fd is closed, and cannot be accidentally overridden by another application.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explicit Ordering&lt;/strong&gt;: Instead of implicit priority numbers, you place programs relative to each other using &lt;code&gt;BPF_F_BEFORE&lt;/code&gt; and &lt;code&gt;BPF_F_AFTER&lt;/code&gt;. You can also use &lt;code&gt;BPF_F_REPLACE&lt;/code&gt; to atomically swap a specific program. All operations support an &lt;code&gt;expected_revision&lt;/code&gt; field that prevents race conditions during concurrent modifications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chain Return Codes&lt;/strong&gt;: TCX defines simplified return codes that make multi-program composition explicit:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Return Code&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TCX_NEXT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;Non-terminating; pass the packet to the next program in the chain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TCX_PASS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Accept the packet and terminate the chain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TCX_DROP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Drop the packet and terminate the chain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TCX_REDIRECT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Redirect the packet and terminate the chain&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Unknown return codes are mapped to &lt;code&gt;TCX_NEXT&lt;/code&gt; for forward compatibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coexistence with Classic TC&lt;/strong&gt;: TCX links can coexist with traditional &lt;code&gt;cls_bpf&lt;/code&gt; filters on the same interface. The kernel runs TCX programs first, then falls through to classic &lt;code&gt;tcf_classify()&lt;/code&gt; if present. This allows gradual migration from classic tc to TCX without a disruptive cutover.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing the eBPF Program
&lt;/h2&gt;

&lt;p&gt;Our BPF object contains two programs that demonstrate chain composition. Here is the complete source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/bpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_endian.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef TCX_NEXT
#define TCX_NEXT -1
#endif
&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef TCX_PASS
#define TCX_PASS 0
#endif
&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;LICENSE&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;__u64&lt;/span&gt; &lt;span class="n"&gt;stats_hits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u64&lt;/span&gt; &lt;span class="n"&gt;classifier_hits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;last_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;last_protocol&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;last_ifindex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcx/ingress"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;tcx_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__sk_buff&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;stats_hits&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;last_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skb&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;last_protocol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ntohs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skb&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;protocol&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;last_ifindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skb&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TCX_NEXT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcx/ingress"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;tcx_classifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__sk_buff&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;classifier_hits&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TCX_PASS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's walk through this step by step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Section Names: &lt;code&gt;SEC("tcx/ingress")&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;SEC("tcx/ingress")&lt;/code&gt; annotation tells libbpf that this program should be attached to the TCX ingress hook rather than the classic TC classifier. This is not just a naming convention; libbpf maps this section name to &lt;code&gt;BPF_PROG_TYPE_SCHED_CLS&lt;/code&gt; with the appropriate attach type for TCX. The corresponding egress variant is &lt;code&gt;SEC("tcx/egress")&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Note that &lt;code&gt;SEC("tc")&lt;/code&gt;, &lt;code&gt;SEC("classifier")&lt;/code&gt;, and &lt;code&gt;SEC("action")&lt;/code&gt; are now considered deprecated by libbpf in favor of the &lt;code&gt;tcx/*&lt;/code&gt; section names.&lt;/p&gt;

&lt;h3&gt;
  
  
  Global Variables as Counters
&lt;/h3&gt;

&lt;p&gt;Instead of using a BPF map for counters, we use global variables (&lt;code&gt;stats_hits&lt;/code&gt;, &lt;code&gt;classifier_hits&lt;/code&gt;, &lt;code&gt;last_len&lt;/code&gt;, etc.). The libbpf skeleton exposes these through &lt;code&gt;skel-&amp;gt;bss-&amp;gt;stats_hits&lt;/code&gt;, which makes the user-space code simpler. This is fine for a single-CPU demo; for production use, you would want per-CPU maps to avoid data races.&lt;/p&gt;

&lt;h3&gt;
  
  
  Return Codes: &lt;code&gt;TCX_NEXT&lt;/code&gt; vs &lt;code&gt;TCX_PASS&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the heart of TCX composition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tcx_stats&lt;/code&gt; returns &lt;code&gt;TCX_NEXT&lt;/code&gt;, which means "I've done my work, now pass the packet to the next program in the chain." The chain continues executing.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tcx_classifier&lt;/code&gt; returns &lt;code&gt;TCX_PASS&lt;/code&gt;, which is a terminal verdict: the packet is accepted and no further programs in the chain run.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we had placed &lt;code&gt;tcx_classifier&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; &lt;code&gt;tcx_stats&lt;/code&gt; in the chain, &lt;code&gt;tcx_stats&lt;/code&gt; would never execute because &lt;code&gt;TCX_PASS&lt;/code&gt; terminates the chain. Ordering matters, and TCX makes it explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  User-Space Loader: Attaching and Querying the Chain
&lt;/h2&gt;

&lt;p&gt;The user-space code demonstrates three key TCX operations: attaching programs, ordering them relative to each other, and querying the live chain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Attach the First Program
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;classifier_link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_tcx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tcx_classifier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This attaches &lt;code&gt;tcx_classifier&lt;/code&gt; to the TCX ingress hook on the specified interface. Passing &lt;code&gt;NULL&lt;/code&gt; for options means "use defaults", so the program gets appended to the chain. At this point, the chain has one program.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Insert the Second Program &lt;em&gt;Before&lt;/em&gt; the First
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;LIBBPF_OPTS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_tcx_opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;before_opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BPF_F_BEFORE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relative_fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__fd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tcx_classifier&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="n"&gt;stats_link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_tcx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tcx_stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;before_opts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;bpf_tcx_opts&lt;/code&gt; structure tells the kernel to insert &lt;code&gt;tcx_stats&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; &lt;code&gt;tcx_classifier&lt;/code&gt; in the chain. The &lt;code&gt;.relative_fd&lt;/code&gt; field identifies the reference point, which is the fd of the already-attached classifier program. After this, the chain is: &lt;code&gt;tcx_stats&lt;/code&gt; → &lt;code&gt;tcx_classifier&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You could equivalently use &lt;code&gt;BPF_F_AFTER&lt;/code&gt; with a different reference to achieve the same ordering. The important point is that you express the desired order directly, rather than hoping that two numeric priorities sort correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Query the Chain
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;LIBBPF_OPTS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_prog_query_opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prog_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prog_ids&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;link_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;link_ids&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_prog_query_opts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_TCX_INGRESS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After attachment, the loader queries the kernel for the live chain state. The returned data includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;revision&lt;/code&gt;&lt;/strong&gt;: A monotonically increasing counter that changes on every chain modification. This is the value you would pass as &lt;code&gt;expected_revision&lt;/code&gt; if you wanted to perform atomic updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;prog_ids[]&lt;/code&gt;&lt;/strong&gt;: The BPF program IDs in chain order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;link_ids[]&lt;/code&gt;&lt;/strong&gt;: The corresponding BPF link IDs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows any observer to determine exactly which programs are attached and in what order, which is invaluable for debugging multi-program pipelines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Generate Traffic and Read Counters
&lt;/h3&gt;

&lt;p&gt;The loader sends a UDP packet to &lt;code&gt;127.0.0.1&lt;/code&gt; (port 9, discard) to trigger the chain, waits briefly, then reads the global variables to verify both programs executed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  tcx_stats hits      : %llu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;stats_hits&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  tcx_classifier hits : %llu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;classifier_hits&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If both counters are 1, the chain worked as expected: &lt;code&gt;tcx_stats&lt;/code&gt; ran first (recording metadata and returning &lt;code&gt;TCX_NEXT&lt;/code&gt;), then &lt;code&gt;tcx_classifier&lt;/code&gt; ran second (counting the packet and returning &lt;code&gt;TCX_PASS&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Compilation and Execution
&lt;/h2&gt;

&lt;p&gt;This example requires Linux 6.6+ with TCX support and a recent libbpf.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bpf-developer-tutorial/src/50-tcx
make
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./tcx_demo &lt;span class="nt"&gt;-i&lt;/span&gt; lo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attached TCX programs to lo (ifindex=1)
TCX ingress chain revision: 3
  slot 0: prog_id=812 link_id=901
  slot 1: prog_id=811 link_id=900

Counters:
  tcx_stats hits      : 1
  tcx_classifier hits : 1
  last ifindex        : 1
  last protocol       : 0x0800
  last length         : 46
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The revision is 3 because the chain was modified twice: once when &lt;code&gt;tcx_classifier&lt;/code&gt; was attached (revision went from 0 to 1), and once when &lt;code&gt;tcx_stats&lt;/code&gt; was inserted before it (revision went to 2). The query itself increments the revision to 3.&lt;/p&gt;

&lt;p&gt;If you want to inspect the attach behavior without traffic, add &lt;code&gt;-n&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./tcx_demo &lt;span class="nt"&gt;-i&lt;/span&gt; lo &lt;span class="nt"&gt;-n&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;-v&lt;/code&gt; to enable libbpf debug output, which is helpful for seeing the low-level BPF syscall sequence.&lt;/p&gt;

&lt;h2&gt;
  
  
  How This Differs from Lesson 20 (Classic TC)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="//../20-tc/README.md"&gt;Lesson 20-tc&lt;/a&gt; teaches the classic TC data path: creating a &lt;code&gt;clsact&lt;/code&gt; qdisc, attaching a &lt;code&gt;SEC("tc")&lt;/code&gt; program as a filter, and using &lt;code&gt;__sk_buff&lt;/code&gt; for packet inspection. That lesson is still valuable because the &lt;strong&gt;packet processing model&lt;/strong&gt; is identical: TCX programs receive the same &lt;code&gt;__sk_buff&lt;/code&gt; context and use the same helpers for packet parsing.&lt;/p&gt;

&lt;p&gt;What TCX replaces is the &lt;strong&gt;control plane&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Classic TC (Lesson 20)&lt;/th&gt;
&lt;th&gt;TCX (Lesson 50)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Attach mechanism&lt;/td&gt;
&lt;td&gt;Netlink / &lt;code&gt;tc&lt;/code&gt; CLI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bpf_program__attach_tcx()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ownership&lt;/td&gt;
&lt;td&gt;None; anyone can &lt;code&gt;tc filter del&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;BPF link; auto-detaches on fd close&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ordering&lt;/td&gt;
&lt;td&gt;Implicit priority numbers&lt;/td&gt;
&lt;td&gt;Explicit &lt;code&gt;BPF_F_BEFORE&lt;/code&gt; / &lt;code&gt;BPF_F_AFTER&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-program&lt;/td&gt;
&lt;td&gt;Manual priority management&lt;/td&gt;
&lt;td&gt;Built-in chain with revision tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Section name&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SEC("tc")&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SEC("tcx/ingress")&lt;/code&gt; / &lt;code&gt;SEC("tcx/egress")&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel requirement&lt;/td&gt;
&lt;td&gt;Any modern kernel&lt;/td&gt;
&lt;td&gt;Linux 6.6+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you are building new libbpf-based networking tools, TCX is the recommended interface. Cilium has already migrated from classic tc to TCX for its dataplane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we learned how TCX modernizes TC program attachment by replacing qdisc-based plumbing with BPF link semantics. We attached two ingress programs, controlled their execution order with &lt;code&gt;BPF_F_BEFORE&lt;/code&gt;, queried the live chain with &lt;code&gt;bpf_prog_query_opts()&lt;/code&gt;, and verified that both programs executed in the correct order. TCX provides safe ownership, explicit ordering, revision-aware updates, and coexistence with classic TC, making it the foundation for composable, multi-program traffic control in modern eBPF applications.&lt;/p&gt;

&lt;p&gt;If you'd like to learn more about eBPF, visit our tutorial code repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or website &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt; for more examples and complete tutorials.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://lore.kernel.org/bpf/20230707172455.7634-3-daniel@iogearbox.net/" rel="noopener noreferrer"&gt;TCX kernel commit: fd-based tcx multi-prog infra with link support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_SCHED_CLS/" rel="noopener noreferrer"&gt;BPF_PROG_TYPE_SCHED_CLS documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ebpf.io/ebpf-library/libbpf/userspace/bpf_program__attach_tcx/" rel="noopener noreferrer"&gt;bpf_program__attach_tcx libbpf API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://bpfconf.ebpf.io/bpfconf2024/bpfconf2024_material/tcx_netkit_update_and_global_sk_iter.pdf" rel="noopener noreferrer"&gt;Cilium TCX &amp;amp; Netkit update (BPFConf 2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://oldvger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf" rel="noopener noreferrer"&gt;Generic multi-prog API, tcx links and meta device (BPFConf 2023)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.kernel.org/bpf/" rel="noopener noreferrer"&gt;https://docs.kernel.org/bpf/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ebpf</category>
      <category>tutorial</category>
      <category>network</category>
    </item>
    <item>
      <title>eBPF Tutorial by Example: BPF Token for Delegated Privilege and Secure Program Loading</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 17 Mar 2026 07:48:37 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-by-example-bpf-token-for-delegated-privilege-and-secure-program-loading-3b5i</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-by-example-bpf-token-for-delegated-privilege-and-secure-program-loading-3b5i</guid>
      <description>&lt;p&gt;Ever needed to let a container or CI job load an eBPF program without giving it full &lt;code&gt;CAP_BPF&lt;/code&gt; or &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt;? Or wanted to expose XDP packet processing to a tenant workload while ensuring it can only create the specific map types and program types you've approved? Before BPF token, the answer was binary: either you had the capabilities to do &lt;em&gt;everything&lt;/em&gt; in BPF, or you could do &lt;em&gt;nothing&lt;/em&gt;. There was no middle ground.&lt;/p&gt;

&lt;p&gt;This is what &lt;strong&gt;BPF Token&lt;/strong&gt; solves. Introduced by Andrii Nakryiko and merged in Linux 6.9, BPF token is a delegation mechanism that lets a privileged process (like a container runtime or systemd) create a precisely scoped permission set for BPF operations, then hand it to an unprivileged process through a bpffs mount. The unprivileged process can load programs, create maps, and attach hooks, but only the types that were explicitly allowed. No broad capabilities required.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll set up a delegated bpffs mount in a user namespace, derive a BPF token from it, and use libbpf to load and attach a minimal XDP program, all from a process that has zero BPF capabilities of its own.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The complete source code: &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_token" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_token&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction to BPF Token: Solving the Privilege Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: All-or-Nothing BPF Capabilities
&lt;/h3&gt;

&lt;p&gt;Traditional eBPF requires &lt;code&gt;CAP_BPF&lt;/code&gt; for program loading and map creation, plus additional capabilities like &lt;code&gt;CAP_PERFMON&lt;/code&gt; for tracing, &lt;code&gt;CAP_NET_ADMIN&lt;/code&gt; for networking hooks, and &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; for certain advanced operations. These capabilities are inherently &lt;strong&gt;system-wide&lt;/strong&gt;: you cannot namespace or sandbox &lt;code&gt;CAP_BPF&lt;/code&gt;. As the kernel documentation explains, this is by design: BPF tracing helpers like &lt;code&gt;bpf_probe_read_kernel()&lt;/code&gt; can access arbitrary kernel memory, which fundamentally cannot be scoped to a single namespace.&lt;/p&gt;

&lt;p&gt;This creates a real problem in multi-tenant environments:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Container isolation&lt;/strong&gt;: A Kubernetes pod that needs to run a simple XDP program must be given &lt;code&gt;CAP_BPF&lt;/code&gt; + &lt;code&gt;CAP_NET_ADMIN&lt;/code&gt;, which also grants it the ability to load &lt;em&gt;any&lt;/em&gt; BPF program type and create &lt;em&gt;any&lt;/em&gt; map type. There's no way to say "you can load XDP programs but not kprobes."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CI/CD pipelines&lt;/strong&gt;: A build job that tests an eBPF-based observability tool needs root-equivalent capabilities to load programs, even though the test only exercises a specific, well-known program type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Third-party integrations&lt;/strong&gt;: A service mesh sidecar that attaches sockops programs needs capabilities that also grant it the ability to trace every process on the host.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result is that organizations either give broad BPF capabilities (weakening their security posture) or prohibit BPF entirely in unprivileged contexts (limiting the technology's adoption).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Scoped Delegation Through bpffs
&lt;/h3&gt;

&lt;p&gt;BPF token takes a different approach. Instead of trying to namespace capabilities (which is fundamentally unsafe for BPF), it introduces an explicit delegation model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;privileged process&lt;/strong&gt; (container runtime, init system, platform daemon) creates a bpffs instance with specific delegation options that define exactly which BPF operations are allowed.&lt;/li&gt;
&lt;li&gt;The privileged process passes this bpffs mount to an &lt;strong&gt;unprivileged process&lt;/strong&gt; (container, CI job, tenant workload).&lt;/li&gt;
&lt;li&gt;The unprivileged process derives a &lt;strong&gt;BPF token&lt;/strong&gt; from the bpffs mount. The token is a file descriptor that carries the delegated permission set.&lt;/li&gt;
&lt;li&gt;When the unprivileged process makes &lt;code&gt;bpf()&lt;/code&gt; syscalls (through libbpf or directly), it passes the token fd. The kernel checks permissions against the token instead of against the process's capabilities.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The token is scoped along four independent axes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Delegation Option&lt;/th&gt;
&lt;th&gt;What It Controls&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delegate_cmds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which &lt;code&gt;bpf()&lt;/code&gt; commands are allowed&lt;/td&gt;
&lt;td&gt;&lt;code&gt;prog_load:map_create:btf_load:link_create&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delegate_maps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which map types can be created&lt;/td&gt;
&lt;td&gt;&lt;code&gt;array:hash:ringbuf&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delegate_progs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which program types can be loaded&lt;/td&gt;
&lt;td&gt;&lt;code&gt;xdp:socket_filter&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delegate_attachs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which attach types are allowed&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;xdp:cgroup_inet_ingress&lt;/code&gt; or &lt;code&gt;any&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each axis is a bitmask. If a bit isn't set, the corresponding operation is denied even if the token is present. This gives platform engineers fine-grained control: you can allow a container to load XDP programs with array maps but deny it access to kprobes, perf events, or hash-of-maps.&lt;/p&gt;

&lt;h3&gt;
  
  
  The User Namespace Constraint
&lt;/h3&gt;

&lt;p&gt;One critical design decision: &lt;strong&gt;a BPF token must be created inside the same user namespace as the bpffs instance, and that user namespace must not be &lt;code&gt;init_user_ns&lt;/code&gt;&lt;/strong&gt;. This is intentional. It means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A host-namespace bpffs (the one at &lt;code&gt;/sys/fs/bpf&lt;/code&gt;) does &lt;strong&gt;not&lt;/strong&gt; produce usable tokens. Tokens only work when the bpffs is associated with a non-init user namespace.&lt;/li&gt;
&lt;li&gt;The privileged parent configures the bpffs before passing it to the child, but the child (in its own user namespace) is the one that creates and uses the token.&lt;/li&gt;
&lt;li&gt;This design prevents a process with an existing token from using it to escalate privileges outside its namespace boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How libbpf Makes It Transparent
&lt;/h3&gt;

&lt;p&gt;For applications built with libbpf (which is most of them), token usage is nearly transparent. You have three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explicit path&lt;/strong&gt;: Set &lt;code&gt;bpf_object_open_opts.bpf_token_path&lt;/code&gt; when opening the BPF object. libbpf will derive the token from the specified bpffs mount.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment variable&lt;/strong&gt;: Set &lt;code&gt;LIBBPF_BPF_TOKEN_PATH&lt;/code&gt; to point to the bpffs mount. libbpf picks it up automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default path&lt;/strong&gt;: If the default &lt;code&gt;/sys/fs/bpf&lt;/code&gt; is a delegated bpffs in the current user namespace, libbpf uses it implicitly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the token is derived, libbpf passes it to every relevant syscall (&lt;code&gt;BPF_MAP_CREATE&lt;/code&gt;, &lt;code&gt;BPF_BTF_LOAD&lt;/code&gt;, &lt;code&gt;BPF_PROG_LOAD&lt;/code&gt;, and &lt;code&gt;BPF_LINK_CREATE&lt;/code&gt;) without any source-code changes in the BPF application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing the eBPF Program
&lt;/h2&gt;

&lt;p&gt;The BPF side of this demo is intentionally minimal: a tiny XDP program on loopback. This keeps the focus on the token workflow. Here's the complete source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;vmlinux.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;LICENSE&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;token_stats&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__u64&lt;/span&gt; &lt;span class="n"&gt;packets&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;last_ifindex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_MAP_TYPE_ARRAY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;token_stats&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;stats_map&lt;/span&gt; &lt;span class="nf"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".maps"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xdp"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;handle_packet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;xdp_md&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;token_stats&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;stats_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;packets&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;last_ifindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ingress_ifindex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;XDP_PASS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few design choices to note:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;BPF_MAP_TYPE_ARRAY&lt;/code&gt;&lt;/strong&gt; was chosen because the delegation policy explicitly allows &lt;code&gt;array&lt;/code&gt; maps. If we had used a hash map instead, loading would fail because the token doesn't grant &lt;code&gt;hash&lt;/code&gt; map creation permission. This is the token model in action; even trivial program changes can be caught by the delegation policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;SEC("xdp")&lt;/code&gt;&lt;/strong&gt; matches the &lt;code&gt;delegate_progs=xdp&lt;/code&gt; policy. If you changed this to &lt;code&gt;SEC("kprobe/...")&lt;/code&gt;, the kernel would reject it at load time with an &lt;code&gt;EPERM&lt;/code&gt; because kprobe isn't in the allowed program types.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;XDP_PASS&lt;/code&gt;&lt;/strong&gt; simply lets every packet through. The program's only purpose is to prove that a token-backed load and attach succeeded. In production, you'd replace this with real packet-processing logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  User-Space Loader: Token-Backed Loading
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;token_trace.c&lt;/code&gt; loader is a standard libbpf skeleton program with one key addition: it passes a &lt;code&gt;bpf_token_path&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_object_open_opts&lt;/span&gt; &lt;span class="n"&gt;open_opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

&lt;span class="n"&gt;open_opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;open_opts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;open_opts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bpf_token_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token_path&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;skel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token_trace_bpf__open_opts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;open_opts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From this point on, libbpf takes over. When it calls &lt;code&gt;bpf(BPF_MAP_CREATE)&lt;/code&gt; to create &lt;code&gt;stats_map&lt;/code&gt;, it includes the token fd. When it calls &lt;code&gt;bpf(BPF_PROG_LOAD)&lt;/code&gt; for the XDP program, it includes the token fd. When it calls &lt;code&gt;bpf(BPF_LINK_CREATE)&lt;/code&gt; to attach to the interface, it includes the token fd.&lt;/p&gt;

&lt;p&gt;The rest of the loader is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token_trace_bpf__load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// token used for map_create + prog_load&lt;/span&gt;
&lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_xdp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handle_packet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// token used for link_create&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After attaching, the loader reads the map before and after generating a test packet to verify the program executed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;map_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;before&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ... generate UDP packet to 127.0.0.1 ...&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;map_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;after&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"delta          : %llu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;after&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;packets&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;before&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;packets&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the delta is 1, the XDP program was successfully loaded and attached using only delegated capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Namespace Orchestrator: &lt;code&gt;token_userns_demo&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Because BPF token requires a non-init user namespace, running a bare &lt;code&gt;token_trace -t /sys/fs/bpf&lt;/code&gt; on the host won't work. The &lt;code&gt;token_userns_demo.c&lt;/code&gt; wrapper automates the complex namespace choreography. Here's the full sequence:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Fork and Create Namespaces
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parent (root, init_user_ns)          child (unprivileged, new userns)
         │                                        │
         │   fork()                               │
         ├────────────────────────────────────────&amp;gt;│
         │                                        │
         │                            unshare(CLONE_NEWUSER)
         │                            unshare(CLONE_NEWNS | CLONE_NEWNET)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The child creates a new user namespace (where it maps itself to uid/gid 0), a new mount namespace (so bpffs mounts are private), and a new network namespace (so &lt;code&gt;lo&lt;/code&gt; is a fresh interface it can attach to).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create bpffs and Configure Delegation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parent (root, init_user_ns)          child (new userns)
         │                                        │
         │                            fs_fd = fsopen("bpf", 0)
         │   &amp;lt;───── send fs_fd via SCM_RIGHTS ────│
         │                                        │
    fsconfig(fs_fd, "delegate_cmds", ...)         │  (waiting for ack)
    fsconfig(fs_fd, "delegate_maps", "array")     │
    fsconfig(fs_fd, "delegate_progs", "xdp:...")  │
    fsconfig(fs_fd, "delegate_attachs", "any")    │
    fsconfig(fs_fd, FSCONFIG_CMD_CREATE)          │
         │                                        │
         │   ───────── send ack ─────────────────&amp;gt;│
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The child calls &lt;code&gt;fsopen("bpf", 0)&lt;/code&gt; to create a bpffs filesystem context in its user namespace, then sends the file descriptor to the parent via a Unix socket (&lt;code&gt;SCM_RIGHTS&lt;/code&gt;). The parent, running as root in the init namespace, configures the delegation policy with &lt;code&gt;fsconfig()&lt;/code&gt;, then materializes the filesystem with &lt;code&gt;FSCONFIG_CMD_CREATE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This two-step dance is necessary because: (a) the bpffs must be created in the child's user namespace (for the token to be valid there), but (b) only the privileged parent can set delegation options (because those options grant BPF capabilities).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Mount and Load
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;child &lt;span class="o"&gt;(&lt;/span&gt;new userns&lt;span class="o"&gt;)&lt;/span&gt;
         │
    mnt_fd &lt;span class="o"&gt;=&lt;/span&gt; fsmount&lt;span class="o"&gt;(&lt;/span&gt;fs_fd, 0, 0&lt;span class="o"&gt;)&lt;/span&gt;
    token_path &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/proc/self/fd/&amp;lt;mnt_fd&amp;gt;"&lt;/span&gt;
    set_loopback_up&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;exec&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"./token_trace"&lt;/span&gt;, &lt;span class="s2"&gt;"-t"&lt;/span&gt;, token_path, &lt;span class="s2"&gt;"-i"&lt;/span&gt;, &lt;span class="s2"&gt;"lo"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The child materializes the bpffs as a detached mount (no mount point needed, since &lt;code&gt;/proc/self/fd/&amp;lt;mnt_fd&amp;gt;&lt;/code&gt; gives a path), brings the loopback interface up in its network namespace, and &lt;code&gt;exec&lt;/code&gt;s &lt;code&gt;token_trace&lt;/code&gt; with the bpffs path. From &lt;code&gt;token_trace&lt;/code&gt;'s perspective, it's just opening a BPF object with a token path. It doesn't know or care about the namespace setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing a bpffs Mount Manually
&lt;/h2&gt;

&lt;p&gt;If you want to experiment with the mount syntax outside the demo wrapper, the repository includes a helper script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bpf-developer-tutorial/src/features/bpf_token
bash setup_token_bpffs.sh /tmp/bpf-token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mounts bpffs at &lt;code&gt;/tmp/bpf-token&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;delegate_cmds=prog_load:map_create:btf_load:link_create
delegate_maps=array
delegate_progs=xdp:socket_filter
delegate_attachs=any
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;socket_filter&lt;/code&gt;?&lt;/strong&gt; libbpf performs a trivial program-load probe before loading the real BPF object. This probe uses a generic &lt;code&gt;BPF_PROG_TYPE_SOCKET_FILTER&lt;/code&gt; program to detect kernel feature support. Without &lt;code&gt;socket_filter&lt;/code&gt; in the delegation policy, the probe fails and libbpf refuses to proceed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;delegate_attachs=any&lt;/code&gt;?&lt;/strong&gt; The same libbpf probe path also triggers attach-type validation in the kernel's token checking code. Using &lt;code&gt;any&lt;/code&gt; avoids having to enumerate every possible attach type for probe compatibility.&lt;/p&gt;

&lt;p&gt;Note that a host-namespace mount like this is useful for inspecting the delegation policy (e.g., with &lt;code&gt;bpftool token list&lt;/code&gt;), but won't produce working tokens unless the &lt;code&gt;bpf(BPF_TOKEN_CREATE)&lt;/code&gt; syscall comes from a matching non-init user namespace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compilation and Execution
&lt;/h2&gt;

&lt;p&gt;Build all binaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bpf-developer-tutorial/src/features/bpf_token
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the end-to-end demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./token_userns_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;token path     : /proc/self/fd/5
interface      : lo (ifindex=1)
packets before : 0
packets after  : 1
delta          : 1
last ifindex   : 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;delta: 1&lt;/code&gt; confirms that the XDP program was successfully loaded and attached using a BPF token, with no &lt;code&gt;CAP_BPF&lt;/code&gt; or &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; in the child process.&lt;/p&gt;

&lt;p&gt;Add &lt;code&gt;-v&lt;/code&gt; for verbose libbpf output to see the token being created and used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./token_userns_demo &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you already manage your own delegated bpffs in a user namespace, you can run the loader directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./token_trace &lt;span class="nt"&gt;-t&lt;/span&gt; /proc/self/fd/&amp;lt;mnt-fd&amp;gt; &lt;span class="nt"&gt;-i&lt;/span&gt; lo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Applications
&lt;/h2&gt;

&lt;p&gt;While this tutorial uses a minimal XDP program, the BPF token pattern scales to production scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Container runtimes&lt;/strong&gt; (LXD, Docker, Kubernetes): Mount a delegated bpffs into a container with only the program and map types the workload needs. LXD already supports this through its &lt;code&gt;security.delegate_bpf&lt;/code&gt; option.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CI/CD testing&lt;/strong&gt;: Give build jobs the ability to load and test specific eBPF programs without granting them host-level capabilities. The delegation policy acts as an allowlist for BPF operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-tenant BPF platforms&lt;/strong&gt;: A platform daemon creates per-tenant bpffs mounts with different delegation policies. One tenant might be allowed XDP + array maps, while another might get tracepoint + ringbuf access.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LSM integration&lt;/strong&gt;: Because BPF tokens integrate with Linux Security Modules, you can combine token delegation with SELinux or AppArmor policies for defense-in-depth. Each token gets its own security context that LSM hooks can inspect.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we learned how BPF token provides a delegation model for eBPF privilege that goes beyond the binary "all or nothing" of Linux capabilities. We walked through the complete flow: a privileged parent configures a bpffs instance with specific delegation options, an unprivileged child in a user namespace derives a token from that bpffs, and libbpf transparently uses the token for map creation, program loading, and attachment. The result is a minimal XDP program running in an unprivileged context, something that was impossible before Linux 6.9.&lt;/p&gt;

&lt;p&gt;BPF token is not a niche feature. It represents the kernel's answer to a fundamental question in the eBPF ecosystem: how do you safely share BPF capabilities in a multi-tenant world without granting unconstrained access to the BPF subsystem?&lt;/p&gt;

&lt;p&gt;If you'd like to learn more about eBPF, visit our tutorial code repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or website &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt; for more examples and complete tutorials.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.ebpf.io/linux/concepts/token/" rel="noopener noreferrer"&gt;BPF Token concept documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lore.kernel.org/bpf/20240103222034.2582628-1-andrii@kernel.org/T/" rel="noopener noreferrer"&gt;BPF token kernel patch series (Andrii Nakryiko)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lwn.net/Articles/959350/" rel="noopener noreferrer"&gt;BPF token LWN article&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lwn.net/Articles/947173/" rel="noopener noreferrer"&gt;Finer-grained BPF tokens LWN discussion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://documentation.ubuntu.com/lxd/latest/explanation/bpf/" rel="noopener noreferrer"&gt;Privilege delegation using BPF Token (LXD documentation)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ebpf.io/ebpf-library/libbpf/userspace/bpf_token_create/" rel="noopener noreferrer"&gt;bpf_token_create() libbpf API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.kernel.org/bpf/" rel="noopener noreferrer"&gt;https://docs.kernel.org/bpf/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ebpf</category>
      <category>tutorial</category>
      <category>linux</category>
    </item>
    <item>
      <title>eBPF Tutorial: cgroup-based Policy Control</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 24 Feb 2026 07:43:56 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-cgroup-based-policy-control-1k2d</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-cgroup-based-policy-control-1k2d</guid>
      <description>&lt;p&gt;Do you need to enforce network access control on containers or specific process groups without affecting the entire system? Or do you need to restrict certain processes from accessing specific devices while allowing others to use them normally? Traditional iptables and device permissions are global, making fine-grained per-process-group control impossible.&lt;/p&gt;

&lt;p&gt;This is the problem &lt;strong&gt;cgroup eBPF&lt;/strong&gt; solves. By attaching eBPF programs to cgroups (control groups), you can implement policy control based on process membership—only processes belonging to a specific cgroup are affected. This enables container isolation, multi-tenant security, and sandbox environments. In this tutorial, we'll build a complete "policy guard" program that demonstrates TCP connection filtering, device access control, and sysctl read restrictions—three types of cgroup eBPF usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is cgroup eBPF?
&lt;/h2&gt;

&lt;p&gt;The core idea of cgroup eBPF is simple: attach an eBPF program to a cgroup, and all processes in that cgroup will be controlled by this program. Unlike XDP/tc which filter traffic by network interface, cgroup eBPF filters by process membership—put a container in a cgroup, attach a policy program, and that container's network access, device access, and sysctl reads/writes are all under your control. Processes in other cgroups are completely unaffected.&lt;/p&gt;

&lt;p&gt;This model is perfect for container and multi-tenant scenarios. Kubernetes NetworkPolicy uses cgroup eBPF under the hood. You can also use it for device isolation (e.g., restricting which containers can access GPUs), security sandboxes (preventing reads of sensitive sysctls), and more. When a cgroup eBPF program denies an operation, userspace syscalls return &lt;code&gt;EPERM&lt;/code&gt; (Operation not permitted).&lt;/p&gt;

&lt;h2&gt;
  
  
  cgroup eBPF Hook Points
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;BPF_PROG_TYPE_CGROUP_SOCK_ADDR&lt;/code&gt; - Socket Address Hooks
&lt;/h3&gt;

&lt;p&gt;Triggered on socket address syscalls (bind/connect/sendmsg/recvmsg):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hook&lt;/th&gt;
&lt;th&gt;Section Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IPv4 bind&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/bind4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter bind() calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IPv6 bind&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/bind6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter bind() calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IPv4 connect&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/connect4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter connect() calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IPv6 connect&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/connect6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter connect() calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UDP sendmsg&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cgroup/sendmsg4&lt;/code&gt;, &lt;code&gt;cgroup/sendmsg6&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Filter UDP sends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UDP recvmsg&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cgroup/recvmsg4&lt;/code&gt;, &lt;code&gt;cgroup/recvmsg6&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Filter UDP receives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unix connect&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/connect_unix&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter Unix socket connect&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt;: &lt;code&gt;struct bpf_sock_addr&lt;/code&gt; - contains &lt;code&gt;user_ip4&lt;/code&gt;, &lt;code&gt;user_port&lt;/code&gt; (network byte order)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Return semantics&lt;/strong&gt;: &lt;code&gt;return 1&lt;/code&gt; = allow, &lt;code&gt;return 0&lt;/code&gt; = deny (EPERM)&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;BPF_PROG_TYPE_CGROUP_DEVICE&lt;/code&gt; - Device Access Control
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hook&lt;/th&gt;
&lt;th&gt;Section Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Device access&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/dev&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter device open/read/write/mknod&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt;: &lt;code&gt;struct bpf_cgroup_dev_ctx&lt;/code&gt; - contains &lt;code&gt;major&lt;/code&gt;, &lt;code&gt;minor&lt;/code&gt;, &lt;code&gt;access_type&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Return semantics&lt;/strong&gt;: &lt;code&gt;return 0&lt;/code&gt; = deny (EPERM), non-zero = allow&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;BPF_PROG_TYPE_CGROUP_SYSCTL&lt;/code&gt; - Sysctl Access Control
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hook&lt;/th&gt;
&lt;th&gt;Section Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sysctl access&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cgroup/sysctl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filter /proc/sys reads/writes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt;: &lt;code&gt;struct bpf_sysctl&lt;/code&gt; - use &lt;code&gt;bpf_sysctl_get_name()&lt;/code&gt; to get sysctl name&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Return semantics&lt;/strong&gt;: &lt;code&gt;return 0&lt;/code&gt; = reject (EPERM), &lt;code&gt;return 1&lt;/code&gt; = proceed&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Other cgroup Hooks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cgroup_skb/ingress&lt;/code&gt;, &lt;code&gt;cgroup_skb/egress&lt;/code&gt; - Packet-level filtering&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cgroup/getsockopt&lt;/code&gt;, &lt;code&gt;cgroup/setsockopt&lt;/code&gt; - Socket option filtering&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cgroup/sock_create&lt;/code&gt;, &lt;code&gt;cgroup/sock_release&lt;/code&gt; - Socket lifecycle&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sockops&lt;/code&gt; - TCP-level optimization (attached via &lt;code&gt;BPF_CGROUP_SOCK_OPS&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  This Tutorial: cgroup Policy Guard
&lt;/h2&gt;

&lt;p&gt;We implement a single eBPF object with three programs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Network (TCP)&lt;/strong&gt;: Block &lt;code&gt;connect()&lt;/code&gt; to a specified destination port&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Device&lt;/strong&gt;: Block access to a specified &lt;code&gt;major:minor&lt;/code&gt; device&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sysctl&lt;/strong&gt;: Block reading a specified sysctl (read-only, safer for testing)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Events are sent to userspace via ringbuf for observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Shared Header: cgroup_guard.h
&lt;/h3&gt;

&lt;p&gt;This header defines data structures shared between kernel and userspace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef __CGROUP_GUARD_H
#define __CGROUP_GUARD_H
&lt;/span&gt;
&lt;span class="cp"&gt;#ifndef TASK_COMM_LEN
#define TASK_COMM_LEN 16
#endif
&lt;/span&gt;
&lt;span class="cp"&gt;#define SYSCTL_NAME_LEN 64
&lt;/span&gt;
&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;EVENT_CONNECT4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;EVENT_DEVICE&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;EVENT_SYSCTL&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__u64&lt;/span&gt; &lt;span class="n"&gt;ts_ns&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TASK_COMM_LEN&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="cm"&gt;/* IPv4, network order */&lt;/span&gt;
            &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;dport&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="cm"&gt;/* host order */&lt;/span&gt;
            &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;proto&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="cm"&gt;/* e.g. 6 for TCP */&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;major&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;minor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;access_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;sysctl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="cp"&gt;#endif &lt;/span&gt;&lt;span class="cm"&gt;/* __CGROUP_GUARD_H */&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;event&lt;/code&gt; structure uses a union to store type-specific data for different events, saving space while maintaining a unified event format.&lt;/p&gt;

&lt;h3&gt;
  
  
  eBPF Program: cgroup_guard.bpf.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause&lt;/span&gt;
&lt;span class="cm"&gt;/* cgroup_guard.bpf.c - cgroup eBPF policy guard
 *
 * This program demonstrates three types of cgroup eBPF hooks:
 * 1. cgroup/connect4 - TCP connection filtering
 * 2. cgroup/dev - Device access control
 * 3. cgroup/sysctl - Sysctl read/write control
 */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"vmlinux.h"&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_endian.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"cgroup_guard.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;LICENSE&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Dual BSD/GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* ===== Configurable options: set by userspace before load ===== */&lt;/span&gt;
&lt;span class="cp"&gt;#define IPPROTO_TCP 6
&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;blocked_tcp_dport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                   &lt;span class="cm"&gt;/* host order */&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;blocked_dev_major&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;blocked_dev_minor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;denied_sysctl_name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt; &lt;span class="cm"&gt;/* NUL-terminated */&lt;/span&gt;

&lt;span class="cm"&gt;/* ===== ringbuf: send denied events to userspace ===== */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_MAP_TYPE_RINGBUF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="cm"&gt;/* 16MB */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="nf"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".maps"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;__always_inline&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;fill_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ts_ns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ktime_get_ns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;bpf_get_current_pid_tgid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_get_current_comm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Compare two strings, return 1 if equal, 0 if not
 * Note: b is volatile to handle const volatile rodata arrays correctly */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;__always_inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;str_eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;max_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="cp"&gt;#pragma unroll
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;cb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ca&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ca&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* ===== 1) Network: block TCP connect4 to specified port =====
 * ctx: struct bpf_sock_addr
 * user_ip4/user_port: network byte order (need conversion)
 *
 * Return semantics:
 * - return 1: allow
 * - return 0: deny (userspace gets EPERM)
 */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cgroup/connect4"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;cg_connect4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_sock_addr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocked_tcp_dport&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;protocol&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;IPPROTO_TCP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ntohs&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;__u16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;user_port&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;blocked_tcp_dport&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ringbuf_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fill_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EVENT_CONNECT4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;daddr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;user_ip4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* network order */&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dport&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="cm"&gt;/* host order */&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;protocol&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;bpf_ringbuf_submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* deny -&amp;gt; userspace gets EPERM on connect */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* ===== 2) Device: block access to specified major:minor =====
 * ctx: struct bpf_cgroup_dev_ctx { access_type, major, minor }
 *
 * Return semantics:
 * - return 0: deny (userspace gets EPERM)
 * - return non-zero: allow
 */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cgroup/dev"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;cg_dev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_cgroup_dev_ctx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocked_dev_major&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;blocked_dev_minor&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;major&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;blocked_dev_major&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;minor&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;blocked_dev_minor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ringbuf_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fill_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EVENT_DEVICE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;major&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;major&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;minor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;access_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;access_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;bpf_ringbuf_submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* deny -&amp;gt; -EPERM */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* ===== 3) Sysctl: block reading specified sysctl =====
 * ctx: struct bpf_sysctl
 * Use bpf_sysctl_get_name() to get name
 *
 * Return semantics:
 * - return 0: reject
 * - return 1: proceed
 * If return 0, userspace read/write returns -1 with errno=EPERM
 */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cgroup/sysctl"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;cg_sysctl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_sysctl&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_sysctl_get_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;denied_sysctl_name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Only deny reads, allow writes (safer for testing) */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;str_eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;denied_sysctl_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ringbuf_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fill_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EVENT_SYSCTL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sysctl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="cp"&gt;#pragma unroll
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sysctl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;bpf_ringbuf_submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* deny -&amp;gt; -EPERM */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Understanding the BPF Code
&lt;/h4&gt;

&lt;p&gt;The overall logic of this program is clear: three cgroup hooks handle network connections, device access, and sysctl reads/writes respectively. Each hook follows the same workflow—check if the current operation matches the configured blocking rule, report an event via ringbuf and return 0 (deny) if it matches, otherwise return 1 (allow).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;cg_connect4&lt;/code&gt; function uses &lt;code&gt;SEC("cgroup/connect4")&lt;/code&gt; to attach at IPv4 connection time. There's an important detail here: &lt;code&gt;ctx-&amp;gt;user_port&lt;/code&gt; is in network byte order (big-endian), while our configured port is in host byte order, so we must convert with &lt;code&gt;bpf_ntohs()&lt;/code&gt; before comparing. If the destination port matches our configured &lt;code&gt;blocked_tcp_dport&lt;/code&gt;, the program returns 0, and the userspace &lt;code&gt;connect()&lt;/code&gt; call fails with &lt;code&gt;EPERM&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;cg_dev&lt;/code&gt; function handles device access. Its context &lt;code&gt;struct bpf_cgroup_dev_ctx&lt;/code&gt; contains three key fields: &lt;code&gt;major&lt;/code&gt; and &lt;code&gt;minor&lt;/code&gt; identify the device (e.g., &lt;code&gt;/dev/null&lt;/code&gt; is 1:3), and &lt;code&gt;access_type&lt;/code&gt; indicates the access type (read/write/mknod). We simply compare whether major:minor matches the configured values.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;cg_sysctl&lt;/code&gt; function intercepts sysctl reads/writes under &lt;code&gt;/proc/sys/&lt;/code&gt;. It uses &lt;code&gt;bpf_sysctl_get_name()&lt;/code&gt; to get the sysctl name, in path format like &lt;code&gt;kernel/hostname&lt;/code&gt; (slash-separated, not dots). We only block reads, allowing writes—this is safer for testing and won't accidentally change system configuration.&lt;/p&gt;

&lt;p&gt;The configuration options at the top of the program are declared as &lt;code&gt;const volatile&lt;/code&gt;. This is the standard CO-RE (Compile Once, Run Everywhere) pattern: these values are defaults (0 or empty string) at compile time, and userspace sets the actual values via &lt;code&gt;skel-&amp;gt;rodata-&amp;gt;&lt;/code&gt; before &lt;code&gt;load()&lt;/code&gt;. This allows a single compiled BPF program to run with different configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Userspace Loader: cgroup_guard.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause&lt;/span&gt;
&lt;span class="cm"&gt;/* cgroup_guard.c - Userspace loader for cgroup eBPF policy guard */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;getopt.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;signal.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;sys/resource.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;sys/stat.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;arpa/inet.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/libbpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"cgroup_guard.skel.h"&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"cgroup_guard.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;sig_atomic_t&lt;/span&gt; &lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;sig_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;libbpf_print_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;libbpf_print_level&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;va_list&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;LIBBPF_DEBUG&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;vfprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"Usage: %s [OPTIONS]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"Options:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"  -c, --cgroup PATH           cgroup v2 path (default: /sys/fs/cgroup/ebpf_demo)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"  -p, --block-port PORT       block TCP connect() to this dst port (IPv4)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"  -d, --deny-device MAJ:MIN   deny device access for (major:minor)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"  -s, --deny-sysctl NAME      deny sysctl READ of this name&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
        &lt;span class="s"&gt;"  -h, --help                  show this help&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;handle_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;data_sz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;data_sz&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;EVENT_CONNECT4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;INET_ADDRSTRLEN&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;in_addr&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s_addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;daddr&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="n"&gt;inet_ntop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[DENY connect4] pid=%u comm=%s daddr=%s dport=%u proto=%u&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;connect4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proto&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;EVENT_DEVICE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[DENY device]   pid=%u comm=%s major=%u minor=%u access_type=0x%x&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;major&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;minor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;access_type&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;EVENT_SYSCTL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[DENY sysctl]   pid=%u comm=%s write=%u name=%s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sysctl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sysctl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;fflush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;cgroup_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/sys/fs/cgroup/ebpf_demo"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;block_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;dev_major&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev_minor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;deny_sysctl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Parse command line arguments */&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;option&lt;/span&gt; &lt;span class="n"&gt;long_opts&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"cgroup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="n"&gt;required_argument&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'c'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"block-port"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;required_argument&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'p'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"deny-device"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required_argument&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'d'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"deny-sysctl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required_argument&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'s'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"help"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="n"&gt;no_argument&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'h'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;opt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getopt_long&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"c:p:d:s:h"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;long_opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'c'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cgroup_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optarg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'p'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;atoi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optarg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="cm"&gt;/* parse major:minor */&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'s'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;deny_sysctl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optarg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nl"&gt;default:&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;libbpf_set_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;libbpf_print_fn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sig_handler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIGTERM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sig_handler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Create cgroup directory if needed */&lt;/span&gt;
    &lt;span class="n"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cgroup_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mo"&gt;0755&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;cg_fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cgroup_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_RDONLY&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;O_DIRECTORY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cg_fd&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"open(%s) failed: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cgroup_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Open and configure BPF skeleton */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;cgroup_guard_bpf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cgroup_guard_bpf__open&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"cgroup_guard_bpf__open() failed&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cg_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Write .rodata configuration (must be before load) */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block_port&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;block_port&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;65535&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;rodata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;blocked_tcp_dport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__u16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;block_port&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dev_major&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;dev_minor&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;rodata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;blocked_dev_major&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;dev_major&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;rodata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;blocked_dev_minor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;dev_minor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deny_sysctl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;snprintf&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;rodata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;denied_sysctl_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;SYSCTL_NAME_LEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deny_sysctl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Load BPF programs into kernel */&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cgroup_guard_bpf__load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"cgroup_guard_bpf__load() failed: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Attach programs to cgroup */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link_connect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_cgroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cg_connect4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cg_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link_dev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_cgroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cg_dev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cg_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link_sysctl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_cgroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cg_sysctl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cg_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Setup ring buffer for events */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;ring_buffer&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ring_buffer__new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_map__fd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                                              &lt;span class="n"&gt;handle_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Attached to cgroup: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cgroup_path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Config: block_port=%d, deny_device=%d:%d, deny_sysctl_read=%s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;block_port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev_major&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev_minor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deny_sysctl&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;deny_sysctl&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"(none)"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Main event loop */&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;exiting&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ring_buffer__poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="cm"&gt;/* ms */&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;EINTR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;ring_buffer__free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nl"&gt;cleanup:&lt;/span&gt;
    &lt;span class="n"&gt;bpf_link__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link_sysctl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_link__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link_dev&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_link__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link_connect&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;cgroup_guard_bpf__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cg_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Understanding the Userspace Code
&lt;/h4&gt;

&lt;p&gt;The userspace loader's core job is to attach BPF programs to the specified cgroup, then continuously poll the ringbuf to print denied events.&lt;/p&gt;

&lt;p&gt;The program first uses &lt;code&gt;getopt_long&lt;/code&gt; to parse command-line arguments, getting the cgroup path and three policy configurations. Then it uses &lt;code&gt;open()&lt;/code&gt; with &lt;code&gt;O_RDONLY | O_DIRECTORY&lt;/code&gt; to open the cgroup directory and get a file descriptor. This fd is the attach target—cgroup eBPF programs are attached to cgroup directories.&lt;/p&gt;

&lt;p&gt;Next comes the standard skeleton workflow: &lt;code&gt;open()&lt;/code&gt; opens the BPF object, set &lt;code&gt;.rodata&lt;/code&gt; configuration, then &lt;code&gt;load()&lt;/code&gt; loads it into the kernel. Note that configuration must be set before load—after load, &lt;code&gt;.rodata&lt;/code&gt; becomes read-only.&lt;/p&gt;

&lt;p&gt;Attaching uses &lt;code&gt;bpf_program__attach_cgroup(prog, cg_fd)&lt;/code&gt; to attach each BPF program to the cgroup. Here we attach three programs: connect4, dev, and sysctl. After successful attachment, all processes in this cgroup will have their relevant operations go through these BPF programs.&lt;/p&gt;

&lt;p&gt;Finally, the event loop. &lt;code&gt;ring_buffer__poll()&lt;/code&gt; polls the ringbuf, calling the &lt;code&gt;handle_event&lt;/code&gt; callback whenever events arrive to print them. This lets you see which operations are being denied in real-time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;src/cgroup
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Terminal A: Start the loader
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Block: TCP port 9090, /dev/null (1:3), reading kernel/hostname&lt;/span&gt;
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./cgroup_guard &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cgroup&lt;/span&gt; /sys/fs/cgroup/ebpf_demo &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--block-port&lt;/span&gt; 9090 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deny-device&lt;/span&gt; 1:3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deny-sysctl&lt;/span&gt; kernel/hostname
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attached to cgroup: /sys/fs/cgroup/ebpf_demo
Config: block_port=9090, deny_device=1:3, deny_sysctl_read=kernel/hostname
Press Ctrl-C to stop.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Terminal B: Start test servers (outside cgroup)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start two HTTP servers&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 8080 &lt;span class="nt"&gt;--bind&lt;/span&gt; 127.0.0.1 &amp;amp;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 9090 &lt;span class="nt"&gt;--bind&lt;/span&gt; 127.0.0.1 &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Terminal C: Test from within the cgroup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'
echo $$ &amp;gt; /sys/fs/cgroup/ebpf_demo/cgroup.procs

echo "== TCP test =="
curl -s http://127.0.0.1:8080 &amp;gt;/dev/null &amp;amp;&amp;amp; echo "8080 OK"
curl -s http://127.0.0.1:9090 &amp;gt;/dev/null &amp;amp;&amp;amp; echo "9090 OK (unexpected)" || echo "9090 BLOCKED (expected)"

echo
echo "== Device test =="
cat /dev/null &amp;amp;&amp;amp; echo "/dev/null OK (unexpected)" || echo "/dev/null BLOCKED (expected)"

echo
echo "== Sysctl test =="
cat /proc/sys/kernel/hostname &amp;amp;&amp;amp; echo "sysctl read OK (unexpected)" || echo "sysctl read BLOCKED (expected)"
'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;8080 OK&lt;/code&gt; - Port 8080 is allowed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;9090 BLOCKED (expected)&lt;/code&gt; - Port 9090 is blocked&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/dev/null BLOCKED (expected)&lt;/code&gt; - Device 1:3 is blocked&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sysctl read BLOCKED (expected)&lt;/code&gt; - Reading kernel/hostname is blocked&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Terminal A output (events)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[DENY connect4] pid=12345 comm=curl daddr=127.0.0.1 dport=9090 proto=6
[DENY device]   pid=12346 comm=cat major=1 minor=3 access_type=0x...
[DENY sysctl]   pid=12347 comm=cat write=0 name=kernel/hostname
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  One-click Test
&lt;/h2&gt;

&lt;p&gt;We provide a test script that automatically compiles, starts servers, runs tests, and cleans up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./test.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Verifying with bpftool
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;bpftool cgroup tree /sys/fs/cgroup/ebpf_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When to Use cgroup eBPF
&lt;/h2&gt;

&lt;p&gt;Choosing the right technology depends on your control granularity requirements.&lt;/p&gt;

&lt;p&gt;cgroup eBPF's control granularity is &lt;strong&gt;process groups&lt;/strong&gt;—put processes in a cgroup, attach a BPF program, and the policy applies to that group. This is perfect for container scenarios: each container is a cgroup, and you can set different network policies, device permissions, and sysctl access rules for different containers. When a process leaves the cgroup, the policy automatically stops applying—no manual cleanup needed.&lt;/p&gt;

&lt;p&gt;XDP and tc's control granularity is &lt;strong&gt;network interfaces&lt;/strong&gt;. They handle all traffic passing through a specific NIC, regardless of which process it comes from. If you need high-performance packet processing, DDoS protection, or load balancing, XDP/tc are better choices. But if you want "only allow container A to access port 80, while container B can access any port," XDP/tc become inconvenient.&lt;/p&gt;

&lt;p&gt;seccomp-BPF's control granularity is &lt;strong&gt;individual processes&lt;/strong&gt;. It filters system calls, such as preventing a process from calling &lt;code&gt;fork&lt;/code&gt;, &lt;code&gt;exec&lt;/code&gt;, or &lt;code&gt;socket&lt;/code&gt;. seccomp is lower-level and suitable for process sandboxing. But it can't control network destination addresses or device major:minor—these higher-level semantics.&lt;/p&gt;

&lt;p&gt;Traditional iptables/nftables are &lt;strong&gt;global&lt;/strong&gt;. Rules you configure apply to all processes on the entire system—there's no way to say "this rule only affects container A."&lt;/p&gt;

&lt;p&gt;In summary: if you need per-container/process-group policies, want to control network, devices, and sysctls together, and want policies to automatically follow process lifecycles, cgroup eBPF is the right choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;cgroup eBPF solves the problem of fine-grained control that traditional global policies can't achieve by binding policies to process groups. This tutorial demonstrated three commonly used cgroup hooks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cgroup/connect4&lt;/code&gt;&lt;/strong&gt;: Filter destination ports at TCP connection time, blocking disallowed outbound connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cgroup/dev&lt;/code&gt;&lt;/strong&gt;: Check major:minor at device access time, restricting reads/writes to specific devices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cgroup/sysctl&lt;/code&gt;&lt;/strong&gt;: Check names at sysctl read/write time, preventing sensitive configuration leaks or tampering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This "policy guard" pattern can be extended to production use cases: container network policies (similar to Kubernetes NetworkPolicy), device isolation (GPU/TPU exclusive access), security sandboxes (restricting system information access). With ringbuf event reporting, you can also implement policy auditing and alerting.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you want to learn more about eBPF, check out our tutorial repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or visit our website at &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kernel docs:&lt;/strong&gt; &lt;a href="https://docs.kernel.org/bpf/libbpf/program_types.html" rel="noopener noreferrer"&gt;libbpf program types&lt;/a&gt; - all cgroup-related section names&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;eBPF docs:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_CGROUP_SOCK_ADDR/" rel="noopener noreferrer"&gt;CGROUP_SOCK_ADDR&lt;/a&gt; - socket address hooks explained&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;eBPF docs:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_CGROUP_DEVICE/" rel="noopener noreferrer"&gt;CGROUP_DEVICE&lt;/a&gt; - device access control explained&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;eBPF docs:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_CGROUP_SYSCTL/" rel="noopener noreferrer"&gt;CGROUP_SYSCTL&lt;/a&gt; - sysctl access control explained&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tutorial repository:&lt;/strong&gt; &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/cgroup" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/cgroup&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full source code is available in the tutorial repository. Requires Linux kernel 4.10+ (cgroup v2) and libbpf.&lt;/p&gt;

</description>
      <category>ebpf</category>
      <category>cgroup</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>eBPF Tutorial by Example: BPF Dynamic Pointers for Variable-Length Data</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 17 Feb 2026 07:43:38 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-by-example-bpf-dynamic-pointers-for-variable-length-data-cj2</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-by-example-bpf-dynamic-pointers-for-variable-length-data-cj2</guid>
      <description>&lt;p&gt;Ever written an eBPF packet parser and struggled with those verbose &lt;code&gt;data_end&lt;/code&gt; bounds checks that the verifier still rejects? Or tried to send variable-length events through ring buffers only to find yourself locked into fixed-size structures? Traditional eBPF development forces you to prove memory safety statically at compile time, which becomes painful when dealing with runtime-determined sizes like packet lengths or user-configurable snapshot lengths.&lt;/p&gt;

&lt;p&gt;This is what &lt;strong&gt;BPF dynptrs&lt;/strong&gt; (dynamic pointers) solve. Introduced gradually from Linux v5.19, dynptrs provide a verifier-friendly way to work with variable-length data by shifting some bounds checking from compile-time static analysis to runtime validation. In this tutorial, we'll build a TC ingress program that uses &lt;strong&gt;skb dynptrs&lt;/strong&gt; to parse TCP packets safely and &lt;strong&gt;ringbuf dynptrs&lt;/strong&gt; to output variable-length events containing configurable payload snapshots.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The complete source code: &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/dynptr" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/dynptr&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction to BPF Dynamic Pointers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: When Static Verification Isn't Enough
&lt;/h3&gt;

&lt;p&gt;The eBPF verifier's core mission is proving memory safety at load time. Every pointer dereference must be bounded, every array access must be within limits. This works beautifully for simple cases, but becomes a struggle when sizes are determined at runtime.&lt;/p&gt;

&lt;p&gt;Consider parsing a packet where the IP header length comes from a 4-bit field, or reading user-configurable amounts of TCP payload. The classic approach requires extensive bounds checking with &lt;code&gt;data_end&lt;/code&gt; comparisons, and even correctly written code sometimes fails verification because the verifier cannot trace all possible paths. When working with non-linear skb data (paged buffers), the situation gets worse since that data isn't directly accessible through &lt;code&gt;ctx-&amp;gt;data&lt;/code&gt; at all.&lt;/p&gt;

&lt;p&gt;Variable-length output presents similar challenges. The traditional &lt;code&gt;bpf_ringbuf_reserve()&lt;/code&gt; returns a raw pointer, but writing runtime-determined amounts of data to it makes the verifier uncomfortable because it cannot statically prove your writes stay within bounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Runtime-Checked Dynamic Pointers
&lt;/h3&gt;

&lt;p&gt;Dynptrs introduce an opaque handle type that carries metadata about the underlying memory region including its bounds and type. You cannot dereference a dynptr directly since the verifier will reject such attempts. Instead, you must use helper functions or kfuncs that perform the appropriate safety checks.&lt;/p&gt;

&lt;p&gt;The key insight is that &lt;strong&gt;some of these checks happen at runtime rather than compile time&lt;/strong&gt;. Functions like &lt;code&gt;bpf_dynptr_read()&lt;/code&gt; and &lt;code&gt;bpf_dynptr_write()&lt;/code&gt; validate bounds when they execute and return errors on failure. Functions like &lt;code&gt;bpf_dynptr_slice()&lt;/code&gt; return NULL when the requested region cannot be accessed safely. This lets you express logic that would be unprovable statically while maintaining safety guarantees.&lt;/p&gt;

&lt;p&gt;For the verifier, dynptrs are tracked specially. They have lifecycle rules (some must be released), type constraints (skb dynptrs behave differently than local dynptrs), and the verifier ensures you follow these rules. The runtime checks are the verifier's way of delegating what it cannot prove statically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynptr API Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Helpers vs Kfuncs
&lt;/h3&gt;

&lt;p&gt;The dynptr ecosystem spans two categories of functions. &lt;strong&gt;Helper functions&lt;/strong&gt; are part of the stable UAPI and generally maintain backward compatibility. &lt;strong&gt;Kfuncs&lt;/strong&gt; (kernel functions) are internal kernel exports to BPF with no ABI stability guarantees, meaning they may change between kernel versions.&lt;/p&gt;

&lt;p&gt;For dynptrs, the foundational read/write operations are helpers, while newer features like skb dynptrs and slicing are kfuncs. This means some dynptr functionality requires newer kernels and you should verify availability before relying on specific features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating Dynptrs
&lt;/h3&gt;

&lt;p&gt;There are several ways to create dynptrs depending on your data source. The &lt;code&gt;bpf_dynptr_from_mem()&lt;/code&gt; helper creates a dynptr from map values or global variables, useful for working with configuration data or scratch buffers. The &lt;code&gt;bpf_dynptr_from_skb()&lt;/code&gt; kfunc creates a dynptr from a socket buffer, enabling safe access to packet data including non-linear (paged) regions. For XDP programs, &lt;code&gt;bpf_dynptr_from_xdp()&lt;/code&gt; provides similar functionality.&lt;/p&gt;

&lt;p&gt;Ring buffer operations use &lt;code&gt;bpf_ringbuf_reserve_dynptr()&lt;/code&gt; to allocate variable-length records. Unlike regular &lt;code&gt;bpf_ringbuf_reserve()&lt;/code&gt; which returns a pointer to a fixed-size region, the dynptr variant lets you specify the size at runtime. This is crucial for variable-length event structures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading and Writing
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;bpf_dynptr_read()&lt;/code&gt; helper copies data from a dynptr into a destination buffer. It takes an offset and length, performing runtime bounds checking and returning an error if the read would exceed the dynptr's bounds. This is the safe way to extract data when you need it in a local buffer.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;bpf_dynptr_write()&lt;/code&gt; helper does the reverse, copying data into a dynptr. For skb dynptrs, writing may have additional semantics similar to &lt;code&gt;bpf_skb_store_bytes()&lt;/code&gt;, and note that writes can invalidate previously obtained slices.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;bpf_dynptr_data()&lt;/code&gt; helper returns a direct pointer to data within the dynptr, with the verifier tracking the bounds statically. However, this does NOT work for skb or xdp dynptrs since their data may not be in a single contiguous region.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slicing for Packet Parsing
&lt;/h3&gt;

&lt;p&gt;For skb and xdp dynptrs, &lt;code&gt;bpf_dynptr_slice()&lt;/code&gt; is the primary way to access data. You provide an offset, a length, and optionally a local buffer. The function returns a pointer to the requested data, which may be either a direct pointer into the packet or your provided buffer (if the data needed to be copied from non-linear regions).&lt;/p&gt;

&lt;p&gt;The critical rule is that &lt;strong&gt;you must NULL-check the return value&lt;/strong&gt;. A NULL return means the requested region cannot be accessed, either because it exceeds packet bounds or for other internal reasons. Once you have a valid slice pointer, you can dereference it safely within the requested bounds.&lt;/p&gt;

&lt;p&gt;There's also &lt;code&gt;bpf_dynptr_slice_rdwr()&lt;/code&gt; for obtaining writable slices, with availability depending on the program type and whether the underlying data supports writes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ring Buffer Lifecycle
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;bpf_ringbuf_reserve_dynptr()&lt;/code&gt; function has special lifecycle rules enforced by the verifier. Once you call it, you &lt;strong&gt;must&lt;/strong&gt; call either &lt;code&gt;bpf_ringbuf_submit_dynptr()&lt;/code&gt; or &lt;code&gt;bpf_ringbuf_discard_dynptr()&lt;/code&gt; on the dynptr, regardless of whether the reservation succeeded. This is not optional since the verifier tracks dynptr state and will reject programs that leak reserved dynptrs.&lt;/p&gt;

&lt;p&gt;This differs from regular ringbuf usage where a NULL return from &lt;code&gt;bpf_ringbuf_reserve()&lt;/code&gt; means nothing was allocated. With dynptrs, the reserve failure still requires explicit cleanup through discard. The verifier needs this guarantee to ensure proper resource management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: TC Ingress with Dynptr Parsing and Variable-Length Events
&lt;/h2&gt;

&lt;p&gt;Our demonstration program attaches to TC ingress and accomplishes three things. First, it creates an skb dynptr from incoming packets using &lt;code&gt;bpf_dynptr_from_skb()&lt;/code&gt;. Second, it parses Ethernet, IPv4, and TCP headers using &lt;code&gt;bpf_dynptr_slice()&lt;/code&gt; for safe bounds-checked access. Third, it outputs variable-length events through a ringbuf dynptr, including a configurable snapshot of TCP payload.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete BPF Program: dynptr_tc.bpf.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;vmlinux.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_endian.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"dynptr_tc.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cm"&gt;/* kfunc declarations for dynptr operations (v6.4+) */&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr_from_skb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__sk_buff&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__u64&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ptr__uninit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;__ksym&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;extern&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bpf_dynptr_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                              &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buffer__opt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;buffer__sz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;__ksym&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_MAP_TYPE_RINGBUF&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="cm"&gt;/* 16MB */&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="nf"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".maps"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_MAP_TYPE_ARRAY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;dynptr_cfg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;cfg_map&lt;/span&gt; &lt;span class="nf"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".maps"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tc"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;dynptr_tc_ingress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__sk_buff&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;dynptr_cfg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr&lt;/span&gt; &lt;span class="n"&gt;skb_ptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Temporary buffers for slice (data may be copied here) */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;ethhdr&lt;/span&gt; &lt;span class="n"&gt;eth_buf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;iphdr&lt;/span&gt;  &lt;span class="n"&gt;ip_buf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;tcphdr&lt;/span&gt; &lt;span class="n"&gt;tcp_buf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;ethhdr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;iphdr&lt;/span&gt;  &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;iph&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;tcphdr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cfg_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Create dynptr from skb */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_dynptr_from_skb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;skb_ptr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Parse Ethernet header using slice */&lt;/span&gt;
    &lt;span class="n"&gt;eth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;skb_ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;eth_buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eth_buf&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;h_proto&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;bpf_htons&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ETH_P_IP&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Parse IPv4 header */&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;ip_off&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;iph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;skb_ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip_off&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ip_buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_buf&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;iph&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;iph&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;iph&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;protocol&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;IPPROTO_TCP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Parse TCP header */&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;tcp_off&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ip_off&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;iph&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ihl&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;tcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;skb_ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tcp_off&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;tcp_buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tcp_buf&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ntohs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;sport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ntohs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__u8&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;blocked_port&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sport&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;blocked_port&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;blocked_port&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="cm"&gt;/* Output variable-length event using ringbuf dynptr */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;enable_ringbuf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;__u8&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MAX_SNAPLEN&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

        &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;payload_off&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tcp_off&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;doff&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload_off&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;avail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;payload_off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;avail&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;avail&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MAX_SNAPLEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_SNAPLEN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_dynptr_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;skb_ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload_off&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event_hdr&lt;/span&gt; &lt;span class="n"&gt;hdr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts_ns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ktime_get_ns&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pkt_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;saddr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iph&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;saddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;daddr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iph&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ntohs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;

        &lt;span class="cm"&gt;/* Reserve variable-length ringbuf record */&lt;/span&gt;
        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_dynptr&lt;/span&gt; &lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;total_sz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hdr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_ringbuf_reserve_dynptr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_sz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="cm"&gt;/* Must discard even on failure */&lt;/span&gt;
            &lt;span class="n"&gt;bpf_ringbuf_discard_dynptr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_SHOT&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;bpf_dynptr_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;hdr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hdr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;bpf_dynptr_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hdr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="n"&gt;bpf_ringbuf_submit_dynptr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;drop&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_SHOT&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TC_ACT_OK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;_license&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the BPF Code
&lt;/h3&gt;

&lt;p&gt;The program begins by declaring the kfuncs it needs. The &lt;code&gt;bpf_dynptr_from_skb()&lt;/code&gt; function creates a dynptr from the socket buffer, and &lt;code&gt;bpf_dynptr_slice()&lt;/code&gt; returns pointers to specific regions within it. The &lt;code&gt;__ksym&lt;/code&gt; attribute tells the loader these are kernel symbols to be resolved at load time.&lt;/p&gt;

&lt;p&gt;When parsing headers, notice how we provide local buffers (&lt;code&gt;eth_buf&lt;/code&gt;, &lt;code&gt;ip_buf&lt;/code&gt;, &lt;code&gt;tcp_buf&lt;/code&gt;) to each slice call. The slice function may return a pointer directly into packet data if it's linearly accessible, or it may copy data into our buffer and return a pointer to the buffer. Either way, we get a valid pointer we can dereference, or NULL on failure.&lt;/p&gt;

&lt;p&gt;The NULL check pattern is crucial. Each slice call can fail if the requested offset plus length exceeds packet bounds or if the data cannot be accessed for other reasons. Checking for NULL before using the returned pointer is mandatory.&lt;/p&gt;

&lt;p&gt;For ringbuf output, we use &lt;code&gt;bpf_dynptr_read()&lt;/code&gt; to copy TCP payload from the skb into a local buffer first. This demonstrates reading from an skb dynptr with runtime-determined length (bounded by configuration and available data). The read may fail if bounds are exceeded, in which case we set &lt;code&gt;snap_len&lt;/code&gt; to zero.&lt;/p&gt;

&lt;p&gt;The ringbuf dynptr reserve shows the variable-length allocation pattern. We compute the total size (header plus snapshot) and reserve that exact amount. After writing both the header and payload using &lt;code&gt;bpf_dynptr_write()&lt;/code&gt;, we submit the record. Note the discard call on reserve failure to satisfy the verifier's lifecycle requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete User-Space Program: dynptr_tc.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;signal.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;arpa/inet.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;net/if.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/libbpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"dynptr_tc.skel.h"&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"dynptr_tc.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;sig_atomic_t&lt;/span&gt; &lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;sig_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;signo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;handle_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;data_sz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;event_hdr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;saddr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;INET_ADDRSTRLEN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;INET_ADDRSTRLEN&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="n"&gt;inet_ntop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;saddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;saddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;saddr&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="n"&gt;inet_ntop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"if=%u %s:%u -&amp;gt; %s:%u len=%u drop=%u snap=%u"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;saddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daddr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;dport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pkt_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;data_sz&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" payload=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="n"&gt;putchar&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;126&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="sc"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ifname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;dynptr_cfg&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enable_ringbuf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="cm"&gt;/* Parse arguments */&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;strcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ifname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;strcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"-p"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;atoi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;strcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"-s"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;atoi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;strcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"-n"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enable_ringbuf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ifname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Usage: %s -i &amp;lt;ifname&amp;gt; [-p port] [-s len] [-n]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;ifindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;if_nametoindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ifname&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;perror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"if_nametoindex"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sig_handler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIGTERM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sig_handler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;dynptr_tc_bpf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynptr_tc_bpf__open_and_load&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to load BPF&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Configure */&lt;/span&gt;
    &lt;span class="n"&gt;bpf_map_update_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_map__fd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cfg_map&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__u32&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_ANY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Attach to TC ingress */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_tc_hook&lt;/span&gt; &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ifindex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ifindex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attach_point&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BPF_TC_INGRESS&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_tc_opts&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prog_fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__fd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dynptr_tc_ingress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="n"&gt;bpf_tc_hook_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_tc_attach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"TC attach failed&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;ring_buffer&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enable_ringbuf&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
        &lt;span class="n"&gt;ring_buffer__new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_map__fd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;handle_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Attached to %s. blocked_port=%u snap_len=%u&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ifname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;snap_len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;exiting&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ring_buffer__poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;usleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;ring_buffer__free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_tc_detach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_tc_hook_destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nl"&gt;cleanup:&lt;/span&gt;
    &lt;span class="n"&gt;dynptr_tc_bpf__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the User-Space Code
&lt;/h3&gt;

&lt;p&gt;The userspace program loads the BPF skeleton, configures it through the array map, and attaches to TC ingress. The ring buffer callback &lt;code&gt;handle_event()&lt;/code&gt; receives each variable-length event and prints it.&lt;/p&gt;

&lt;p&gt;Notice how we access the variable-length payload. The &lt;code&gt;struct event_hdr&lt;/code&gt; has a flexible array member &lt;code&gt;payload[]&lt;/code&gt; at the end. When an event arrives, &lt;code&gt;data_sz&lt;/code&gt; tells us the total size, and &lt;code&gt;e-&amp;gt;snap_len&lt;/code&gt; tells us specifically how much payload was included. We validate both before accessing the payload bytes.&lt;/p&gt;

&lt;p&gt;The configuration map allows runtime control over blocking behavior and snapshot length without reloading the BPF program. This demonstrates the common pattern of using maps for user-to-kernel communication.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compilation and Execution
&lt;/h2&gt;

&lt;p&gt;Navigate to the dynptr directory and build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bpf-developer-tutorial/src/features/dynptr
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This compiles the BPF program with the repository's standard toolchain, generating the skeleton header and linking against libbpf.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a Test Environment
&lt;/h3&gt;

&lt;p&gt;To test properly, we need a network namespace so traffic actually traverses the veth pair rather than going through loopback. The included &lt;code&gt;test.sh&lt;/code&gt; script handles this automatically, but here's the manual setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create network namespace&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip netns add test_ns

&lt;span class="c"&gt;# Create veth pair with one end in the namespace&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip &lt;span class="nb"&gt;link &lt;/span&gt;add veth_host &lt;span class="nb"&gt;type &lt;/span&gt;veth peer name veth_ns
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip &lt;span class="nb"&gt;link set &lt;/span&gt;veth_ns netns test_ns

&lt;span class="c"&gt;# Configure host side&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip addr add 10.200.0.1/24 dev veth_host
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip &lt;span class="nb"&gt;link set &lt;/span&gt;veth_host up

&lt;span class="c"&gt;# Configure namespace side&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip netns &lt;span class="nb"&gt;exec &lt;/span&gt;test_ns ip addr add 10.200.0.2/24 dev veth_ns
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip netns &lt;span class="nb"&gt;exec &lt;/span&gt;test_ns ip &lt;span class="nb"&gt;link set &lt;/span&gt;veth_ns up

&lt;span class="c"&gt;# Start HTTP server inside the namespace&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip netns &lt;span class="nb"&gt;exec &lt;/span&gt;test_ns python3 &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 8080 &lt;span class="nt"&gt;--bind&lt;/span&gt; 10.200.0.2 &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running the Demo
&lt;/h3&gt;

&lt;p&gt;Start the dynptr TC program attached to the host side of the veth:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./dynptr_tc &lt;span class="nt"&gt;-i&lt;/span&gt; veth_host &lt;span class="nt"&gt;-p&lt;/span&gt; 0 &lt;span class="nt"&gt;-s&lt;/span&gt; 32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another terminal, make a request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://10.200.0.2:8080/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output showing captured packets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attached to TC ingress of veth_host (ifindex=X). Ctrl-C to exit.
blocked_port=0 snap_len=32 ringbuf=1
if=X 10.200.0.2:8080 -&amp;gt; 10.200.0.1:XXXXX len=221 drop=0 snap=32 payload="HTTP/1.0 200 OK..Server: SimpleH"
if=X 10.200.0.2:8080 -&amp;gt; 10.200.0.1:XXXXX len=742 drop=0 snap=32 payload="&amp;lt;!DOCTYPE HTML&amp;gt;.&amp;lt;html lang="en"&amp;gt;"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows HTTP response packets from the server, with the payload field containing the beginning of the response data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing the Drop Policy
&lt;/h3&gt;

&lt;p&gt;Test blocking by specifying port 8080:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./dynptr_tc &lt;span class="nt"&gt;-i&lt;/span&gt; veth_host &lt;span class="nt"&gt;-p&lt;/span&gt; 8080 &lt;span class="nt"&gt;-s&lt;/span&gt; 32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--max-time&lt;/span&gt; 3 http://10.200.0.2:8080/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The curl should timeout since response packets are blocked. The dynptr_tc output shows &lt;code&gt;drop=1&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if=X 10.200.0.2:8080 -&amp;gt; 10.200.0.1:XXXXX len=74 drop=1 snap=0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using the Test Script
&lt;/h3&gt;

&lt;p&gt;For convenience, run the included test script which handles all setup automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./test.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates the namespace, runs both capture and blocking tests, and cleans up afterward.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Dynptrs
&lt;/h2&gt;

&lt;p&gt;Dynptrs shine in several scenarios. &lt;strong&gt;Variable-length events&lt;/strong&gt; are the classic use case since ringbuf dynptrs let you allocate exactly the size you need at runtime, avoiding wasted space from oversized fixed structures or complex multi-record schemes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Packet parsing&lt;/strong&gt; benefits from dynptrs when dealing with non-linear skbs or complex protocol stacks where traditional bounds checking becomes unwieldy. The slice API provides a cleaner abstraction that handles both linear and paged data uniformly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crypto and verification&lt;/strong&gt; operations like &lt;code&gt;bpf_crypto_encrypt()&lt;/code&gt;, &lt;code&gt;bpf_verify_pkcs7_signature()&lt;/code&gt;, and &lt;code&gt;bpf_get_file_xattr()&lt;/code&gt; all use dynptrs as buffer arguments, making dynptr familiarity essential for these advanced use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User ringbuf consumption&lt;/strong&gt; through &lt;code&gt;bpf_user_ringbuf_drain()&lt;/code&gt; delivers samples as dynptrs, enabling safe handling of userspace-provided data in BPF programs.&lt;/p&gt;

&lt;p&gt;For simple fixed-size operations where you know bounds at compile time, traditional approaches may be simpler. But as your BPF programs grow more sophisticated, dynptrs become increasingly valuable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;BPF dynptrs provide a verifier-friendly mechanism for working with variable-length and runtime-bounded data. Rather than proving memory safety entirely through static analysis, dynptrs shift some verification to runtime checks, enabling patterns that would otherwise be impossible or extremely awkward to express.&lt;/p&gt;

&lt;p&gt;Our example demonstrated the two primary dynptr patterns: using skb dynptrs with slices for clean packet parsing, and using ringbuf dynptrs for variable-length event output. The key takeaways are to always NULL-check slice returns, always submit or discard ringbuf dynptrs, and remember that skb dynptrs require kfuncs available from Linux v6.4.&lt;/p&gt;

&lt;p&gt;As eBPF capabilities continue to expand, dynptrs form an increasingly important part of the toolkit. Whether you're building packet processors, security monitors, or performance tools, understanding dynptrs will help you write cleaner, more capable BPF programs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you'd like to dive deeper into eBPF, check out our tutorial repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or visit our website at &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynptr Concept Documentation:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/concepts/dynptrs/" rel="noopener noreferrer"&gt;https://docs.ebpf.io/linux/concepts/dynptrs/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bpf_ringbuf_reserve_dynptr Helper:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/helper-function/bpf_ringbuf_reserve_dynptr/" rel="noopener noreferrer"&gt;https://docs.ebpf.io/linux/helper-function/bpf_ringbuf_reserve_dynptr/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bpf_dynptr_from_skb Kfunc:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/kfuncs/bpf_dynptr_from_skb/" rel="noopener noreferrer"&gt;https://docs.ebpf.io/linux/kfuncs/bpf_dynptr_from_skb/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bpf_dynptr_slice Kfunc:&lt;/strong&gt; &lt;a href="https://docs.ebpf.io/linux/kfuncs/bpf_dynptr_slice/" rel="noopener noreferrer"&gt;https://docs.ebpf.io/linux/kfuncs/bpf_dynptr_slice/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel Kfuncs Documentation:&lt;/strong&gt; &lt;a href="https://docs.kernel.org/bpf/kfuncs.html" rel="noopener noreferrer"&gt;https://docs.kernel.org/bpf/kfuncs.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tutorial Repository:&lt;/strong&gt; &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This example requires Linux kernel 6.4 or newer for the skb dynptr kfuncs. The ringbuf dynptr helpers are available from Linux 5.19. Complete source code is available in the tutorial repository.&lt;/p&gt;

</description>
      <category>ebpf</category>
      <category>verifier</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>A Taxonomy of GPU Bugs: 19 Defect Classes for CUDA Verification</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 10 Feb 2026 07:53:16 +0000</pubDate>
      <link>https://dev.to/yunwei37/a-taxonomy-of-gpu-bugs-19-defect-classes-for-cuda-verification-169f</link>
      <guid>https://dev.to/yunwei37/a-taxonomy-of-gpu-bugs-19-defect-classes-for-cuda-verification-169f</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;GPU programming introduces a distinct class of correctness and performance challenges that differ fundamentally from traditional CPU-based systems. The SIMT (Single Instruction, Multiple Threads) execution model, hierarchical memory architecture, and massive parallelism create unique bug patterns that require specialized verification and detection techniques.&lt;/p&gt;

&lt;p&gt;Just as eBPF enables safe, verified extension code to run inside the Linux kernel, &lt;a href="https://github.com/eunomia-bpf/bpftime" rel="noopener noreferrer"&gt;bpftime gpu_ext&lt;/a&gt; (The &lt;a href="https://arxiv.org/abs/2512.12615" rel="noopener noreferrer"&gt;arxiv&lt;/a&gt;, previous name &lt;a href="https://dl.acm.org/doi/10.1145/3723851.3726984" rel="noopener noreferrer"&gt;eGPU&lt;/a&gt;) bring eBPF to GPUs, allowing user-defined policy code (for observability, scheduling, or resource control) to be injected into GPU drivers and kernels with &lt;strong&gt;static verification guarantees&lt;/strong&gt;. Such a GPU extension framework must ensure that policy code cannot introduce crashes, hangs, data races, or unbounded overhead. A critical concern in modern GPU deployments is &lt;strong&gt;performance interference in multi-tenant environments&lt;/strong&gt;: contention for shared resources makes execution time unpredictable. "Making Powerful Enemies on NVIDIA GPUs" studies how adversarial kernels can amplify slowdowns, arguing that performance interference is a &lt;em&gt;system-level safety&lt;/em&gt; property when GPUs are shared. This motivates treating bounded overhead as a correctness property, not merely an optimization goal.&lt;/p&gt;

&lt;p&gt;To build a sound GPU extension verifier, we must first understand what can go wrong. This taxonomy identifies the defect classes a verifier must address, drawing lessons from eBPF's success: restrict the programming model, enforce bounded execution, and verify memory safety before loading. We synthesize findings from static verifiers (GPUVerify, GKLEE, ESBMC-GPU), dynamic detectors (Compute Sanitizer, Simulee, CuSan), and empirical bug studies (Wu et al., ScoRD, iGUARD) into 19 defect classes organized along two dimensions: impact type (Safety, Correctness, Performance) and GPU specificity (GPU-specific, GPU-amplified, CPU-shared). Each entry provides concrete examples, documents detection tools, and offers actionable verification strategies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Taxonomy Overview
&lt;/h2&gt;

&lt;p&gt;Each bug class is categorized along four dimensions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impact Type:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safety&lt;/strong&gt;: Program fails to complete safely (crash, hang, isolation failure, deadlock)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correctness&lt;/strong&gt;: Program completes but produces wrong results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Program works correctly but inefficiently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GPU Specificity:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU-specific&lt;/strong&gt;: Unique to GPU/SIMT execution model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU-amplified&lt;/strong&gt;: Exists on CPUs but much more severe on GPUs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU-shared&lt;/strong&gt;: Similar on both platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verification Scope (for GPU extension frameworks):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;E (Extension-local)&lt;/strong&gt;: Can be verified by examining only the extension/policy code, without inspecting the host kernel. This is the ideal case: like eBPF, the verifier can provide strong safety guarantees for &lt;em&gt;any&lt;/em&gt; kernel the extension attaches to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C (Combined)&lt;/strong&gt;: Requires joint analysis of extension + kernel, or a contract between them. These bugs arise from interactions between policy code and kernel state/behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;H (Host+Device/System)&lt;/strong&gt;: Involves host-side API ordering, driver state, or cross-boundary interactions that cannot be verified by device-side analysis alone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Assurance Type (Soundness/Completeness guarantees):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;By-construction&lt;/strong&gt;: Bug class is structurally impossible due to language/feature restrictions. &lt;em&gt;Soundness&lt;/em&gt;: perfect (the bug cannot exist). &lt;em&gt;Completeness&lt;/em&gt;: high for policy use cases (restrictions rarely limit legitimate policies).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static-sound&lt;/strong&gt;: If verifier accepts, property holds; but some safe programs rejected. &lt;em&gt;Soundness&lt;/em&gt;: strong. &lt;em&gt;Completeness&lt;/em&gt;: low (conservative).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contract-based&lt;/strong&gt;: Requires declared preconditions validated at attach/launch time. &lt;em&gt;Soundness&lt;/em&gt;: conditional on contract correctness. &lt;em&gt;Completeness&lt;/em&gt;: depends on contract expressiveness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounded-sound&lt;/strong&gt;: Sound within specified bounds (loop unrolling, context switches). &lt;em&gt;Soundness&lt;/em&gt;: within bounds. &lt;em&gt;Completeness&lt;/em&gt;: limited by bound coverage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic-only&lt;/strong&gt;: Detected at runtime; no static guarantee. &lt;em&gt;Soundness&lt;/em&gt;: for executed paths only. &lt;em&gt;Completeness&lt;/em&gt;: coverage-dependent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime-enforced&lt;/strong&gt;: Property enforced via instrumentation/interception. &lt;em&gt;Soundness&lt;/em&gt;: if enforcement is complete. &lt;em&gt;Completeness&lt;/em&gt;: N/A (enforcement, not verification).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why These Dimensions Matter for GPU Extension Verifiers
&lt;/h3&gt;

&lt;p&gt;A GPU extension framework (like &lt;a href="https://arxiv.org/abs/2512.12615" rel="noopener noreferrer"&gt;bpftime gpu_ext&lt;/a&gt;) aims to provide &lt;strong&gt;static verification guarantees&lt;/strong&gt; analogous to eBPF: policy code should be safe to attach to &lt;em&gt;any&lt;/em&gt; kernel without risking crashes, hangs, or unbounded overhead. The key insight is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Extension-local verification is the only path to strong, universal guarantees.&lt;/strong&gt; If a bug class can be eliminated by restricting the policy language or enforcing invariants on policy code alone, the verifier can guarantee safety without inspecting (potentially closed-source) kernels.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For &lt;strong&gt;Combined&lt;/strong&gt; bugs, the framework has two options: (1) restrict policy capabilities so the bug becomes Extension-local (e.g., forbid policies from writing kernel memory), or (2) require kernel-side contracts/annotations and validate at attach time.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;Host+Device&lt;/strong&gt; bugs, device-side verification is insufficient; these require host-side tooling (CuSan, TSan) or runtime enforcement in the driver/loader.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Soundness vs. Completeness
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Assurance Type&lt;/strong&gt; dimension makes explicit what guarantees each verification approach provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Soundness&lt;/strong&gt; answers: "If the verifier accepts, does the property definitely hold?" A sound verifier never produces false negatives (misses real bugs).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Completeness&lt;/strong&gt; answers: "If the property holds, will the verifier accept?" A complete verifier never produces false positives (rejects safe programs).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For safety-critical GPU extensions, we prioritize &lt;strong&gt;soundness over completeness&lt;/strong&gt;: it's acceptable to reject some safe policies if it means we never accept unsafe ones. The table below shows not just &lt;em&gt;what&lt;/em&gt; can be verified, but &lt;em&gt;how strong&lt;/em&gt; the guarantee is.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Bug Class&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;GPU Spec.&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Assurance Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Barrier Divergence&lt;/td&gt;
&lt;td&gt;Safety&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (enforce uniform barrier placement)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Invalid Warp Sync&lt;/td&gt;
&lt;td&gt;Safety&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;By-construction (ban warp sync)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Insufficient Atomic/Sync Scope&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;C→E&lt;/td&gt;
&lt;td&gt;Static-sound (isolate state + device-scope)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Warp-divergence Race&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (uniform side-effects)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Uncoalesced Memory Access&lt;/td&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E/C&lt;/td&gt;
&lt;td&gt;Static-sound (restrict patterns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Control-Flow Divergence&lt;/td&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (enforce uniformity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Bank Conflicts&lt;/td&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-heuristic (enforce conflict-free patterns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Block-Size Dependence&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E/C&lt;/td&gt;
&lt;td&gt;Contract-based (declare requirements)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Launch Config Assumptions&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;Contract-based (validate at attach)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Missing Volatile/Fence&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;By-construction (ban spin-wait)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Shared-Memory Data Races&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (restrict writes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Redundant Barriers&lt;/td&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-heuristic (detect unnecessary barriers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Host ↔ Device Async Races&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;GPU-specific&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;Dynamic-only (CuSan/TSan)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Atomic Contention&lt;/td&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;GPU-amplified&lt;/td&gt;
&lt;td&gt;C→E&lt;/td&gt;
&lt;td&gt;Static-sound (budgetize atomics)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Non-Barrier Deadlocks&lt;/td&gt;
&lt;td&gt;Safety&lt;/td&gt;
&lt;td&gt;GPU-amplified&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;By-construction (ban blocking)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Kernel Non-Termination&lt;/td&gt;
&lt;td&gt;Safety&lt;/td&gt;
&lt;td&gt;GPU-amplified&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (bound iterations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Global-Memory Data Races&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;CPU-shared&lt;/td&gt;
&lt;td&gt;C→E&lt;/td&gt;
&lt;td&gt;Static-sound (isolate state)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Memory Safety&lt;/td&gt;
&lt;td&gt;Safety&lt;/td&gt;
&lt;td&gt;CPU-shared&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (restrict pointers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Arithmetic Errors&lt;/td&gt;
&lt;td&gt;Correctness&lt;/td&gt;
&lt;td&gt;CPU-shared&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;Static-sound (range analysis)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Insights from a Taxonomy of GPU Defects
&lt;/h2&gt;

&lt;p&gt;We conducted a comprehensive study of GPU correctness defects by synthesizing findings from empirical bug analyses (&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;Wu et al.&lt;/a&gt;, &lt;a href="https://akkamath.github.io/files/SOSP21_iGUARD.pdf" rel="noopener noreferrer"&gt;iGUARD&lt;/a&gt;), static verifiers (&lt;a href="https://nchong.github.io/papers/oopsla12.pdf" rel="noopener noreferrer"&gt;GPUVerify&lt;/a&gt;, &lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;GKLEE&lt;/a&gt;, &lt;a href="https://github.com/ssvlab/esbmc-gpu" rel="noopener noreferrer"&gt;ESBMC-GPU&lt;/a&gt;), and runtime detectors (&lt;a href="https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html" rel="noopener noreferrer"&gt;Compute Sanitizer&lt;/a&gt;, &lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;Simulee&lt;/a&gt;, &lt;a href="https://www.csa.iisc.ac.in/~arkapravab/papers/isca20_ScoRD.pdf" rel="noopener noreferrer"&gt;ScoRD&lt;/a&gt;). Our taxonomy identifies 19 distinct classes of GPU programming defects, uncovering fundamental insights into the unique correctness challenges posed by GPU architectures:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, we observe that &lt;em&gt;control-flow uniformity&lt;/em&gt; is a foundational correctness requirement for GPU kernels. Non-uniform execution across threads, caused by GPU's SIMT execution model, breaks implicit synchronization assumptions and triggers GPU-specific correctness violations, such as barrier divergence, warp synchronization errors, and subtle warp-divergence races. This insight elevates uniformity from a performance concern to a correctness property that GPU verification frameworks must explicitly enforce.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, GPU's scoped memory synchronization semantics (e.g., block-scoped atomics, missing fences, volatile misuse) create unique correctness hazards rarely encountered on CPU platforms. Our analysis emphasizes that synchronization primitives' scopes must be explicit, conservative, and verifiable at the kernel level. This requirement is critical for correctness given GPU memory model subtleties.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third&lt;/strong&gt;, performance interference in GPUs, manifested as uncoalesced accesses, atomic contention, redundant barriers, and bank conflicts, must be viewed as a &lt;em&gt;safety and isolation&lt;/em&gt; concern rather than mere inefficiency. Our taxonomy reveals how adversarial workloads exploit GPU parallelism to amplify performance issues into denial-of-service attacks in multi-tenant environments. Consequently, bounded overhead must be explicitly enforced as a correctness property in GPU extension frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finally&lt;/strong&gt;, our study highlights that liveness (deadlocks, infinite loops) and memory safety (out-of-bounds accesses, temporal violations) are system-level concerns uniquely amplified by GPU parallelism. Unlike traditional CPU environments, GPU kernel hangs or memory violations can trigger hardware-level recovery affecting all tenants. Thus, GPU liveness and memory safety must be explicitly recognized as first-class system-level correctness properties in verifier designs.&lt;/p&gt;

&lt;p&gt;Together, these insights not only characterize GPU correctness issues more precisely but also inform principled design requirements for GPU kernel extensibility and verification frameworks, moving beyond traditional CPU-centric correctness towards a GPU-aware system correctness definition. We are applying these principles in &lt;a href="https://github.com/eunomia-bpf/bpftime" rel="noopener noreferrer"&gt;bpftime&lt;/a&gt;, you can find more detail in &lt;a href="https://arxiv.org/abs/2512.12615" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Insights from Verification Scope and Assurance Analysis
&lt;/h2&gt;

&lt;p&gt;Beyond characterizing &lt;em&gt;what&lt;/em&gt; can go wrong, we analyze &lt;em&gt;whether and how&lt;/em&gt; each bug class can be addressed by a GPU extension verifier. By examining each defect through the lens of verification scope (Extension-local vs. Combined vs. Host+Device) and assurance type (soundness and completeness guarantees), we arrive at several key conclusions for GPU extension framework design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extension-local verification is sufficient for the majority of GPU bug classes.&lt;/strong&gt; Of the 19 defect classes identified, 14 can be fully addressed through Extension-local verification, examining only the policy code without inspecting the host kernel. Some of these (#2, #10, #15) can be eliminated &lt;em&gt;by construction&lt;/em&gt; through language restrictions: banning warp sync primitives, spin-wait patterns, and blocking constructs makes entire bug classes structurally impossible. Others (#1, #7, #12) use &lt;em&gt;static analysis&lt;/em&gt; to enforce safe usage patterns (uniform barrier placement, conflict-free shared-memory access, redundant barrier detection) rather than outright bans, preserving useful functionality while maintaining safety. Four additional classes (#3, #5, #14, #17) that initially appear to require Combined analysis can be &lt;em&gt;reduced to Extension-local&lt;/em&gt; through state isolation, restricting policies to write only policy-owned objects (maps, ringbuffers) rather than kernel data structures. This finding validates the eBPF design philosophy: by appropriately restricting extension capabilities, a verifier can provide strong safety guarantees for &lt;em&gt;any&lt;/em&gt; kernel, including closed-source ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Only three bug classes fundamentally resist Extension-local verification.&lt;/strong&gt; Block-size dependence (#8) and launch configuration assumptions (#9) depend on host-determined launch parameters invisible to the policy verifier; these require a contract-based approach where policies declare preconditions validated at attach time. Host↔device async races (#13) span the host API boundary entirely outside device-side verification scope; these can only be addressed through dynamic detection tools like CuSan. Importantly, these three classes represent a small, well-defined subset that can be handled through complementary mechanisms rather than requiring full Combined verification of kernel+extension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Soundness and completeness trade-offs are explicit and favorable for safety-critical extensions.&lt;/strong&gt; By-construction approaches (banning genuinely dangerous features like spin-wait and blocking primitives) achieve perfect soundness with high completeness for policy use cases. Static-sound approaches (uniform barrier placement, conflict-free access pattern enforcement, uniformity analysis, bounds checking, range analysis) provide strong soundness while preserving useful functionality, at the cost of conservatively rejecting some safe programs. For safety-critical GPU extensions, this trade-off is appropriate: it is better to reject a safe policy than to accept an unsafe one. The verifier's job is to guarantee safety for any kernel, not to accept every possible safe program.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A two-track verification pipeline emerges as the principled design.&lt;/strong&gt; The &lt;em&gt;production track&lt;/em&gt; provides hard guarantees for any kernel through Extension-local verification at load time, contract validation at attach time, and optional runtime enforcement for multi-tenant isolation. The &lt;em&gt;CI/offline track&lt;/em&gt; enhances coverage through Combined analysis tools (GPUVerify, ESBMC-GPU) when kernel source is available, dynamic sanitizers (Compute Sanitizer, iGUARD, Simulee) for regression testing, and host-side race detection (CuSan) for API ordering bugs. This separation acknowledges that Combined verification, while valuable for development and testing, cannot be a production requirement for systems targeting arbitrary kernels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance interference can be bounded but not eliminated.&lt;/strong&gt; While adversarial workloads can systematically amplify interference through shared GPU resources (as demonstrated by "Making Powerful Enemies on NVIDIA GPUs"), the verifier can still provide meaningful guarantees: bounding policy overhead per invocation through instruction/helper budgets, limiting atomic contention through warp-aggregation requirements, and enforcing coalesced access patterns. These guarantees bound the &lt;em&gt;policy's contribution&lt;/em&gt; to interference, even if system-wide slowdown bounds remain impossible to guarantee statically.&lt;/p&gt;

&lt;p&gt;In summary, the verification scope analysis reveals that the eBPF success pattern (restricting extension capabilities to what can be verified without inspecting the host) transfers effectively to GPUs. Through language restrictions, state isolation, and budgetization, a GPU extension verifier can provide strong, universal safety guarantees while relegating the few irreducibly Combined or Host+Device properties to contracts and dynamic detection.&lt;/p&gt;




&lt;h2&gt;
  
  
  Canonical bug list
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Barrier Divergence at Block Barriers (&lt;code&gt;__syncthreads&lt;/code&gt;) [Safety, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;A block-wide barrier requires &lt;em&gt;all&lt;/em&gt; threads in the block to reach it. If the barrier is placed under a condition that evaluates differently across threads, some threads wait forever → deadlock / kernel hang. This is treated as a first-class defect in GPU kernel verification (e.g., "barrier divergence" in GPUVerify), and is also one of the main CUDA synchronization bug types characterized/targeted by AuCS/Wu. Note that general control-flow divergence is a performance issue, but barrier divergence is the &lt;em&gt;specific, critical case&lt;/em&gt; where divergent control flow causes threads to reach a barrier non-uniformly, turning a performance issue into a &lt;strong&gt;liveness/correctness failure&lt;/strong&gt; (deadlock).&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// divergent barrier =&amp;gt; UB / deadlock&lt;/span&gt;
  &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPUVerify: checking divergence is a core goal ("divergence freedom").(&lt;a href="https://nchong.github.io/papers/oopsla12.pdf" rel="noopener noreferrer"&gt;Nathan Chong&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Simulee detects &lt;strong&gt;barrier divergence bugs&lt;/strong&gt; in real-world code.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Wu et al.: explicitly defines barrier divergence and places it under improper synchronization.(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Tools like Compute Sanitizer &lt;code&gt;synccheck&lt;/code&gt; report "divergent thread(s) in block"; Oclgrind can also detect barrier divergence (OpenCL).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static check (GPUVerify-style):&lt;/strong&gt; prove that each barrier is reached by all threads in the relevant scope, often via uniformity reasoning.(&lt;a href="https://nchong.github.io/papers/oopsla12.pdf" rel="noopener noreferrer"&gt;Nathan Chong&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic check:&lt;/strong&gt; synccheck-style runtime validation, and Simulee-style bug finding.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Require &lt;strong&gt;warp-/block-uniform control flow&lt;/strong&gt; for any path reaching a barrier (GPUVerify-style uniform predicate analysis): the verifier statically proves that every &lt;code&gt;__syncthreads()&lt;/code&gt; is reached by all threads in the block, otherwise reject. This allows policies to use barriers for legitimate shared-memory coordination while preventing divergent barriers that cause deadlocks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-sound. Enforcing uniform barrier placement via static analysis prevents barrier divergence with strong soundness. Policies can use &lt;code&gt;__syncthreads()&lt;/code&gt; when the verifier can prove all threads in the block reach the barrier uniformly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier statically analyzes control flow to ensure every &lt;code&gt;__syncthreads()&lt;/code&gt; call is reached by all threads in the block. Barriers under divergent conditions (e.g., &lt;code&gt;if (threadIdx.x &amp;lt; 16) __syncthreads()&lt;/code&gt;) are rejected. This allows safe barrier usage for shared-memory coordination while preventing GPU hangs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: For kernel-level analysis, GPUVerify proves divergence freedom via static verification; Compute Sanitizer &lt;code&gt;synccheck&lt;/code&gt; detects divergent barriers at runtime; Simulee finds barrier divergence bugs through evolutionary simulation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Some safe barrier placements under complex but provably uniform conditions may be conservatively rejected. The verifier guarantees &lt;em&gt;policy&lt;/em&gt; cannot introduce barrier divergence, but cannot guarantee the &lt;em&gt;kernel&lt;/em&gt; itself is free of this bug; kernel-level bugs require kernel-level tools.&lt;/p&gt;




&lt;h3&gt;
  
  
  2) Invalid Warp Synchronization (&lt;code&gt;__syncwarp&lt;/code&gt; mask, warp-level barriers) [Safety, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Warp-level sync requires correct participation masks. A common failure is calling &lt;code&gt;__syncwarp(mask)&lt;/code&gt; where not all lanes that reach the barrier are included in &lt;code&gt;mask&lt;/code&gt;, or where divergence causes only a subset to arrive.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;lane&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lane&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__syncwarp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0xffffffff&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// only 16 lanes arrive, but mask expects all 32&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lane&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Compute Sanitizer &lt;code&gt;synccheck&lt;/code&gt; explicitly reports "Invalid arguments" and "Divergent thread(s) in warp" classes for these hazards.(&lt;a href="https://docs.nersc.gov/tools/debug/compute-sanitizer/" rel="noopener noreferrer"&gt;NERSC Documentation&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;iGUARD discusses how newer CUDA features (e.g., independent thread scheduling + cooperative groups) create new race/sync hazards beyond the classic model.(&lt;a href="https://akkamath.github.io/files/SOSP21_iGUARD.pdf" rel="noopener noreferrer"&gt;Aditya K Kamath&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Runtime validation via &lt;code&gt;synccheck&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Static analysis to verify mask correctness at each &lt;code&gt;__syncwarp&lt;/code&gt; callsite.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If policies can ever emit warp-level sync or cooperative-groups barriers, require a &lt;em&gt;verifiable&lt;/em&gt; mask discipline: e.g., only &lt;code&gt;__syncwarp(0xffffffff)&lt;/code&gt; (full mask) or masks proven to equal the active mask at the callsite. Otherwise, simplest is: &lt;strong&gt;ban warp sync primitives entirely&lt;/strong&gt; inside policies.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), By-construction. Banning &lt;code&gt;__syncwarp&lt;/code&gt;/CG barriers entirely (or requiring only full-mask sync at provably uniform points) makes invalid warp sync structurally impossible, providing perfect soundness with high completeness for policy use cases where warp-level sync is rarely needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: Policy code cannot introduce invalid warp synchronization because the verifier bans warp-level sync primitives. If allowed, only full-mask &lt;code&gt;__syncwarp(0xffffffff)&lt;/code&gt; at provably uniform points is permitted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: Compute Sanitizer &lt;code&gt;synccheck&lt;/code&gt; reports invalid sync arguments and divergent warps at runtime; iGUARD provides NVBit-based instrumentation for detecting sync hazards from modern CUDA features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: iGUARD notes that ITS (Independent Thread Scheduling) and CG create new hazards that even experienced developers misuse. This justifies conservative restrictions; banning these primitives in policy code is the only sound approach without complex ITS-aware analysis.&lt;/p&gt;




&lt;h3&gt;
  
  
  3) Insufficient Atomic/Sync Scope [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;GPU adds &lt;em&gt;scope&lt;/em&gt; and memory-model subtleties that don't exist on CPUs. &lt;strong&gt;Scoped races&lt;/strong&gt; occur when synchronization/atomics are done at an insufficient scope (e.g., using &lt;code&gt;atomicAdd_block&lt;/code&gt; when &lt;code&gt;atomicAdd&lt;/code&gt; with device scope is needed). This is a distinct GPU bug class because scope semantics are unique to CUDA's memory model.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Scoped race: using block-scope atomic when device-scope is needed&lt;/span&gt;
&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;atomicAdd_block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// only block-scope, may race across blocks&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;ScoRD introduces &lt;em&gt;scoped races&lt;/em&gt; due to insufficient scope and argues this is a distinct bug class.(&lt;a href="https://www.csa.iisc.ac.in/~arkapravab/papers/isca20_ScoRD.pdf" rel="noopener noreferrer"&gt;CSA - IISc Bangalore&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;iGUARD further targets races introduced by "scoped synchronization" and advanced CUDA features (independent thread scheduling, cooperative groups).(&lt;a href="https://akkamath.github.io/files/SOSP21_iGUARD.pdf" rel="noopener noreferrer"&gt;Aditya K Kamath&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope verification:&lt;/strong&gt; ensure atomics/sync use sufficient scope for the access pattern.&lt;/li&gt;
&lt;li&gt;Require explicit scope annotations and validate against access patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Treat scope as part of the verifier contract: if policies do atomic/synchronizing operations, require the &lt;em&gt;strongest&lt;/em&gt; allowed scope (or forbid nontrivial scope usage). Practically: ban cross-block shared global updates unless they're done through a small set of "safe" helpers (e.g., per-SM/per-warp buffers → host aggregation). If policies use scoped atomics, require the scope to be explicit and conservative.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Combined → Extension-local (C→E) via state isolation, Static-sound. If policies can touch kernel-shared global objects, scope correctness depends on kernel access patterns (Combined). However, this reduces to Extension-local by restricting policies to write only policy-owned state or requiring all atomics to use device-scope by default, providing strong soundness with medium completeness (policies needing block-scope atomics must use conservative device-scope).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: Two design choices enable Extension-local verification: (A) Policy only writes policy-owned state (maps, ringbuffers), never kernel globals: scope becomes irrelevant; (B) All policy atomics use device-scope by default: sufficient for any access pattern. Both approaches eliminate scope bugs without kernel inspection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: ScoRD introduces "scoped races" as a distinct bug class and provides detection (research prototype requiring hardware support); iGUARD targets races from scoped synchronization and advanced CUDA features via NVBit GPU-side runtime instrumentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: If policies must write kernel-shared objects with fine-grained scope optimization, Combined analysis or contracts are required. ScoRD and iGUARD emphasize scope bugs are subtle and underdetected: defaulting to device-scope is a sound engineering choice.&lt;/p&gt;




&lt;h3&gt;
  
  
  4) Warp-divergence Race [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;A &lt;strong&gt;warp-divergence race&lt;/strong&gt; is a GPU-specific phenomenon where &lt;strong&gt;divergence changes which threads are effectively concurrent&lt;/strong&gt;, producing racy outcomes that don't map cleanly to CPU assumptions. SIMT execution order + reconvergence can create subtle concurrency patterns. This is one reason "CPU-style race reasoning" doesn't port directly to GPUs. While control-flow divergence is generally a performance issue (serialized execution paths), warp-divergence race is a &lt;strong&gt;correctness&lt;/strong&gt; issue where divergence creates unexpected concurrency patterns leading to data races: same root cause, but different failure modes: perf degradation vs. racy/undefined behavior.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;lane&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lane&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// first half writes&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;           &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// second half writes&lt;/span&gt;
  &lt;span class="c1"&gt;// outcome depends on SIMT execution + reconvergence&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GKLEE explicitly lists "warp-divergence race" among discovered bug classes.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Simulee stresses CUDA-aware race definitions and discusses GPU-specific race interpretation constraints (e.g., avoiding false positives due to warp lockstep).(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Verifier rule:&lt;/strong&gt; treat "lane-divergent side effects" as forbidden unless proven safe.&lt;/li&gt;
&lt;li&gt;Require that any helper with side effects is guarded by a &lt;strong&gt;warp-uniform predicate&lt;/strong&gt; or executed only by a designated lane (e.g., lane0). Then the verifier only needs to prove &lt;strong&gt;uniformity&lt;/strong&gt; (or single-lane execution), not full SIMT interleavings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Enforce warp-uniform control flow for policy side effects. If divergence is unavoidable, force "single-lane execution" patterns where only lane0 performs the side effect. This eliminates warp-divergence races by construction.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-sound. Warp-divergence races arise from SIMT execution semantics, but can be prevented by structural restrictions on policy code, providing strong soundness with medium completeness (legitimately safe lane-divergent writes are rejected).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier enforces that all side-effecting operations are either (1) under warp-uniform predicates, or (2) executed only by lane0 (single-lane execution pattern). This eliminates warp-divergence races without analyzing the kernel. The verifier proves uniformity or single-lane execution statically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GKLEE explicitly lists "warp-divergence race" among discovered bug classes and explores divergent execution paths via concolic/symbolic testing; Simulee uses CUDA-aware race definitions that account for warp lockstep behavior to avoid false positives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Policies with legitimately safe lane-divergent writes will be rejected. This trade-off is favorable: warp-divergence races are notoriously subtle: GKLEE found them in real SDK code: eliminating by construction is safer than complex SIMT interleaving analysis.&lt;/p&gt;




&lt;h3&gt;
  
  
  5) Uncoalesced / Non-Coalesceable Global Memory Access Patterns [Performance, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Warp memory coalescing is a GPU-specific performance contract. "Uncoalesced" accesses can cause large slowdowns (memory transactions split into many).&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;   &lt;span class="c1"&gt;// stride&amp;gt;1 =&amp;gt; likely uncoalesced&lt;/span&gt;
  &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPUDrano: "detects uncoalesced global memory accesses" and treats them as performance bugs.(&lt;a href="https://github.com/upenn-acg/gpudrano-static-analysis_v1.0" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, &lt;a href="https://www.cis.upenn.edu/~alur/Cav17.pdf" rel="noopener noreferrer"&gt;CAV17&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GKLEE: reports "non-coalesced memory accesses" as performance bugs it finds.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GPUCheck: detects "non-coalesceable memory accesses."(&lt;a href="https://webdocs.cs.ualberta.ca/~amaral/thesis/TaylorLloydMSc.pdf" rel="noopener noreferrer"&gt;WebDocs&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static analysis (GPUDrano/GPUCheck-style):&lt;/strong&gt; analyze address expressions in terms of lane-to-address stride; flag when stride exceeds coalescing thresholds.(&lt;a href="https://www.cis.upenn.edu/~alur/Cav17.pdf" rel="noopener noreferrer"&gt;CAV17&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If you want "performance as correctness," this is a flagship rule: restrict policy memory ops to patterns provably coalesced (e.g., affine, lane-linear indexing with small stride), and/or require warp-level aggregation so only one lane performs global updates. Require map operations to use &lt;strong&gt;warp-uniform keys&lt;/strong&gt; or &lt;strong&gt;contiguous per-lane indices&lt;/strong&gt; (e.g., &lt;code&gt;base + lane_id&lt;/code&gt;), not random hashes. If policies must do random accesses, restrict them to &lt;strong&gt;lane0 only&lt;/strong&gt;, amortizing the uncoalesced behavior to 1 lane/warp.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E) for policy-owned memory; Combined (C) for kernel arrays. Static-sound for policy memory: affine/lane-linear indexing guarantees coalescing with strong soundness but low completeness (random-access patterns rejected; kernel-array reads require Combined analysis).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: For policy-owned memory (maps, ringbuffers), restricting index expressions to affine/lane-linear forms (&lt;code&gt;base + lane_id&lt;/code&gt;) or lane0-only access provides bounded overhead guarantees. Warp-level aggregation (only lane0 performs global updates) amortizes uncoalesced behavior to 1 lane/warp. The verifier cannot guarantee coalescing for kernel-array reads without kernel knowledge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GPUDrano statically detects uncoalesced global memory accesses and treats them as performance bugs; GPUCheck identifies non-coalesceable access patterns via thread-divergent expression analysis; GKLEE reports "non-coalesced memory accesses" as performance bugs via symbolic exploration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: True coalescing depends on hardware cache behavior and concurrent workloads: static analysis provides structural guarantees, not tight performance bounds. "Is it really slow / how slow" is architecture-dependent; static tools provide sound-ish structural warnings rather than tight performance proofs.&lt;/p&gt;




&lt;h3&gt;
  
  
  6) Control-Flow Divergence (warp branch divergence) [Performance, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;SIMT divergence serializes paths within a warp, lowering "branch efficiency" and increasing worst-case overhead. This entry focuses on divergence as a &lt;strong&gt;performance&lt;/strong&gt; issue. However, divergence is also the root cause of more severe correctness bugs: barrier divergence (deadlock when barriers are in conditional code) and warp-divergence races (unexpected concurrency patterns leading to data races).&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;                &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// divergence within warp&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPUCheck explicitly targets "branch divergence" as a performance problem arising from thread-divergent expressions.(&lt;a href="https://webdocs.cs.ualberta.ca/~amaral/thesis/TaylorLloydMSc.pdf" rel="noopener noreferrer"&gt;WebDocs&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GKLEE: "divergent warps" as performance bugs.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Wu et al.: "non-optimal implementation" includes performance loss causes like branch divergence.(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static taint + symbolic reasoning (GPUCheck-style):&lt;/strong&gt; identify conditions dependent on thread/lane id, and prove whether divergence is possible.(&lt;a href="https://webdocs.cs.ualberta.ca/~amaral/thesis/TaylorLloydMSc.pdf" rel="noopener noreferrer"&gt;WebDocs&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Divergence is the &lt;em&gt;core reason&lt;/em&gt; you can treat performance as correctness. Enforce &lt;strong&gt;warp-uniform control flow&lt;/strong&gt; for policies (or at least for any code path that triggers side effects / heavy helpers). If you can't prove uniformity, force "single-lane execution" of policy side effects (others become no-ops) to prevent warp amplification. Put a hard cap on the number of helper calls on any path, to bound the "divergence amplification factor."&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-sound. Control-flow divergence is determined entirely by the policy's branch conditions and their dependence on thread IDs, providing strong soundness via taint analysis but low completeness (data-dependent branches that happen to be uniform at runtime are rejected).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier tracks which values depend on &lt;code&gt;threadIdx&lt;/code&gt;/&lt;code&gt;laneId&lt;/code&gt; (taint analysis). Branches on tainted values are either forbidden or force single-lane execution for side effects (others become no-ops). This bounds the "warp amplification factor" and prevents SIMT-amplified performance degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GPUCheck explicitly targets "branch divergence" as a performance problem via thread-divergent expression analysis; GKLEE reports "divergent warps" as performance bugs via symbolic exploration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Some safe data-dependent branches will be rejected. The gpu_ext design principle lists warp-uniform control flow as a load-time verification requirement: treating divergence as a correctness property (bounded overhead), not just optimization. For kernel-level divergence analysis, use GPUCheck or GKLEE.&lt;/p&gt;




&lt;h3&gt;
  
  
  7) Shared-Memory Bank Conflicts [Performance, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Bank conflicts are a shared-memory–specific performance pathology: accesses serialize when multiple lanes hit the same bank.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;__shared__&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;lane&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// stride hits same bank pattern (illustrative)&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lane&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GKLEE explicitly lists "memory bank conflicts" among detected performance bugs.(&lt;a href="https://lipeng28.github.io/papers/ppopp12-gklee.pdf" rel="noopener noreferrer"&gt;Peng Li's Homepage&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static heuristic:&lt;/strong&gt; classify shared-memory index expressions by lane stride and bank mapping; warn if likely conflict.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If policies use shared scratchpads (e.g., per-block staging), enforce a &lt;strong&gt;conflict-free access pattern&lt;/strong&gt; (e.g., contiguous per-lane indexing such as &lt;code&gt;base + threadIdx.x&lt;/code&gt;). A static heuristic can classify shared-memory index expressions by lane stride and bank mapping, rejecting or warning on patterns likely to cause conflicts. Shared memory should not be banned entirely for this performance issue—it remains useful for legitimate policy scratchpads.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-heuristic. Enforcing conflict-free access patterns on shared memory eliminates most bank conflicts while still allowing policies to use shared scratchpads for legitimate purposes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: Policies using shared memory are restricted to conflict-free index patterns (&lt;code&gt;base + threadIdx.x&lt;/code&gt; for contiguous access). The verifier statically checks shared-memory index expressions and rejects patterns with likely bank conflicts (e.g., stride-32 access). This preserves shared memory availability for per-block staging and aggregation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GKLEE explicitly lists "memory bank conflicts" among detected performance bugs via symbolic exploration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Some safe but complex index patterns may be conservatively rejected. Kernel-level bank conflict analysis requires GPUDrano-style static tools or profiling. Policies needing non-trivial shared-memory access patterns may need to demonstrate conflict-freedom through annotations or simplified indexing.&lt;/p&gt;




&lt;h3&gt;
  
  
  8) Block-Size Dependence [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Block-size independence is essential for safe block-size tuning. Kernels that implicitly depend on specific &lt;code&gt;blockDim&lt;/code&gt; values can produce incorrect results or races when launched with different configurations. This is critical for auto-tuning and portability across GPU generations. This entry focuses on &lt;strong&gt;compile-time hardcoded assumptions&lt;/strong&gt; within the kernel code itself (e.g., fixed shared memory sizes, hardcoded reduction strides), distinct from runtime launch configuration assumptions about grid dimensions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;__shared__&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="c1"&gt;// Hardcoded reduction assumes exactly 256 threads&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;  &lt;span class="c1"&gt;// OOB read if blockDim.x &amp;lt; 256&lt;/span&gt;
  &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                         &lt;span class="c1"&gt;// incomplete reduction if blockDim.x &amp;gt; 256&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="c1"&gt;// ... continues with warp-level reduction ...&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// Launched with blockDim.x != 256 =&amp;gt; wrong results or crash&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPUDrano explicitly includes "block-size independence" analysis.(&lt;a href="https://github.com/upenn-acg/gpudrano-static-analysis_v1.0" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static analysis (GPUDrano):&lt;/strong&gt; analyze kernel code for implicit blockDim dependencies.&lt;/li&gt;
&lt;li&gt;Require explicit declaration of block-size assumptions in kernel metadata.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Policies should not implicitly assume block shapes unless the verifier can guarantee them. If a policy depends on block-level structure, require declaring it (metadata) and validate at attach time. Add verifier rules that forbid hard-coded assumptions about blockDim unless explicitly declared.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E) if block-agnostic; Combined (C) if assumes blockDim. Contract-based for blockDim-dependent policies: conditional soundness (sound if declared requirements match actual launch config) with high completeness (policies can declare requirements; undeclared policies assumed block-agnostic).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: Two approaches enable verification: (A) Block-agnostic design: policies use only lane-local or warp-level logic, avoiding &lt;code&gt;blockDim&lt;/code&gt; dependencies entirely, making them safe for any launch config; (B) Contract-based: policies declare block-size requirements in metadata, and the runtime validates at attach time. The verifier rejects policies with hardcoded block-size constants unless explicitly declared.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GPUDrano explicitly includes "block-size independence" analysis for detecting implicit blockDim dependencies in kernel code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Policies with undeclared blockDim dependencies may fail silently with different launch configs. The contract approach shifts responsibility to policy authors to declare requirements correctly. Recommended design: make policy APIs block-agnostic (use relative indices, not absolute sizes).&lt;/p&gt;




&lt;h3&gt;
  
  
  9) Launch Config Assumptions [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Many CUDA kernels assume certain launch configurations (e.g., single block, specific grid dimensions). Violating these assumptions leads to incorrect results or races that are hard to diagnose. This entry focuses on &lt;strong&gt;runtime launch configuration assumptions&lt;/strong&gt; (gridDim, number of blocks), distinct from compile-time hardcoded block-size dependencies within the kernel code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;__shared__&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;stride&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;  &lt;span class="c1"&gt;// BUG: assumes gridDim.x == 1, writes final result directly&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;              &lt;span class="c1"&gt;// if gridDim.x &amp;gt; 1, multiple blocks race on *out&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// Called with &amp;lt;&amp;lt;&amp;lt;N/256, 256&amp;gt;&amp;gt;&amp;gt; where N &amp;gt; 256 =&amp;gt; data race, wrong result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Wu et al.'s discussion of detected bugs includes developer responses that kernels "should not be called with more than one block" and suggests adding assertions like &lt;code&gt;assert(gridDim.x == 1)&lt;/code&gt;.(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Contract checking:&lt;/strong&gt; encode launch preconditions (gridDim, blockDim assumptions) and enforce them at runtime or statically.&lt;/li&gt;
&lt;li&gt;Add runtime assertions for grid/block dimension assumptions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If policy code assumes a particular block/warp mapping (e.g., keys use &lt;code&gt;threadIdx.x&lt;/code&gt; directly), you can end up with correctness or performance regressions when kernels run under different launch configs. If a policy depends on warp- or block-level structure, require declaring it (metadata) and validate at attach time.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Combined (C): launch configuration is host-determined, not visible to policy verifier. Contract-based assurance: conditional soundness (sound only if contracts are correctly specified and validated) with completeness depending on contract expressiveness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: This bug class fundamentally requires contracts: Extension-local verification cannot see launch parameters. The policy declares preconditions (e.g., "requires gridDim.x == 1" or "requires blockDim.x &amp;gt;= 128"), and the runtime validates at attach/launch time. Policies without explicit requirements are assumed to work with any config.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: Wu et al.'s empirical study found real bugs where developers noted kernels "should not be called with more than one block": they suggest adding runtime assertions like &lt;code&gt;assert(gridDim.x == 1)&lt;/code&gt;. Convert such requirements into contract metadata for policy verification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Contract-based verification shifts responsibility to policy authors to declare requirements correctly. This is one of the few bug classes where Combined verification is unavoidable, but contracts provide a clean interface without requiring complex joint analysis of kernel + policy.&lt;/p&gt;




&lt;h3&gt;
  
  
  10) Missing Volatile/Fence [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;GPU code often relies on compiler and memory-model subtleties. GKLEE reports a real-world category: forgetting to mark a shared memory variable as &lt;code&gt;volatile&lt;/code&gt;, producing stale reads/writes due to compiler optimization or caching behavior. This is a GPU-flavored instance of memory visibility/ordering bugs that can be hard to reproduce.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__shared__&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// should sometimes be volatile / properly fenced&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;         &lt;span class="c1"&gt;// may spin if compiler hoists load / visibility issues&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GKLEE explicitly lists "forgot volatile" as a discovered bug type.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Simulee and other tools' race detection can surface some of these issues when they manifest as data races.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Symbolic exploration (GKLEE-style):&lt;/strong&gt; explore memory access orderings and detect stale read scenarios.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern-based linting:&lt;/strong&gt; flag spin-wait loops on shared memory without volatile or fence.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Avoid exposing raw shared/global memory communication to policies; instead provide &lt;strong&gt;helpers with explicit semantics&lt;/strong&gt; (e.g., "atomic increment" or "write once" patterns), and verify policies don't implement ad-hoc synchronization loops. Forbid spin-waiting on shared memory in policy code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), By-construction. Banning spin-wait loops and raw shared/global memory communication eliminates volatile/fence bugs entirely, providing perfect soundness with high completeness (legitimate polling patterns are rare in policy code).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier bans spin-wait loops (&lt;code&gt;while(flag == 0)&lt;/code&gt;), flag polling patterns, and raw shared/global memory communication. All inter-thread communication must go through atomic helpers with explicit semantics (e.g., "atomic increment" or "write once" patterns). This eliminates volatile/fence bugs by forbidding the patterns that cause them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GKLEE explicitly lists "forgot volatile" as a discovered bug type via symbolic exploration. Simulee and other race detectors can surface these issues when they manifest as data races.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: ITS (Independent Thread Scheduling) changes assumptions about warp-lockstep execution, making traditional volatile assumptions unreliable: code that worked on pre-Volta architectures may race on newer GPUs. The safest approach is to ban ad-hoc synchronization entirely rather than trying to verify memory model subtleties.&lt;/p&gt;




&lt;h3&gt;
  
  
  11) Shared-Memory Data Races (&lt;code&gt;__shared__&lt;/code&gt;) [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Threads in a block access on-chip shared memory concurrently; missing/incorrect synchronization causes races. This is a classic CUDA bug class (AuCS/Wu).&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;__shared__&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// write-write race on s&lt;/span&gt;
  &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPUVerify explicitly targets &lt;strong&gt;data-race freedom&lt;/strong&gt; and defines intra-group / inter-group races.(&lt;a href="https://nchong.github.io/papers/oopsla12.pdf" rel="noopener noreferrer"&gt;Nathan Chong&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GKLEE reports finding &lt;strong&gt;races&lt;/strong&gt; (and related deadlocks) via symbolic exploration.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Simulee detects &lt;strong&gt;data race bugs&lt;/strong&gt; in real projects and uses a CUDA-aware notion of race.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Wu et al. classify &lt;strong&gt;data race&lt;/strong&gt; under "improper synchronization" as a CUDA-specific root cause.(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Compute Sanitizer &lt;code&gt;racecheck&lt;/code&gt; is a runtime shared-memory hazard detector.(&lt;a href="https://www.shinhwei.com/cuda-repair.pdf" rel="noopener noreferrer"&gt;Shinhwei&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static verifier route (GPUVerify-style):&lt;/strong&gt; enforce "race-free under SIMT" by proving that any two potentially concurrent lanes/threads cannot perform conflicting accesses without proper synchronization.(&lt;a href="https://nchong.github.io/papers/oopsla12.pdf" rel="noopener noreferrer"&gt;Nathan Chong&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic route (Simulee-style):&lt;/strong&gt; instrument / simulate memory accesses and flag conflicting pairs; good for bug-finding and regression tests.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If policies have any shared state, require &lt;strong&gt;warp-uniform side effects&lt;/strong&gt; or &lt;strong&gt;single-lane side effects&lt;/strong&gt; (e.g., lane0 updates) plus explicit atomics. A conservative verifier rule is: policy code cannot write shared memory except via restricted helpers that are race-safe (e.g., per-warp aggregation).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Option A – warp-/block-uniform single-writer rules&lt;/strong&gt; (e.g., "only lane 0 updates").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Option B – atomic-only helpers&lt;/strong&gt; for shared objects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Option C – per-thread/per-warp sharding&lt;/strong&gt; (each lane updates its own slot).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-sound. Shared-memory races depend only on the policy's access patterns and synchronization, providing strong soundness via structural restrictions (per-lane sharding or lane0-only writes eliminate races by construction) with medium completeness (complex shared-memory algorithms rejected).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: Three options, all Extension-local: (A) Ban shared-memory writes entirely; (B) Require per-lane sharding: each lane writes its own slot, no conflicts possible; (C) Require lane0-only writes with atomic helpers. All three approaches make races impossible by construction without requiring complex GPUVerify-style interleaving proofs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GPUVerify explicitly targets data-race freedom as a core verification goal and defines intra-group/inter-group races; ESBMC-GPU checks data races via bounded model checking; Compute Sanitizer &lt;code&gt;racecheck&lt;/code&gt; is a runtime shared-memory hazard detector; Simulee detects data race bugs using CUDA-aware race definitions; Wu et al. classify data race under "improper synchronization" as a CUDA-specific root cause.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: GPUVerify-style proofs are possible but complex for arbitrary code; structural restrictions are simpler and equally sound for policy use cases. Policies needing complex shared-memory algorithms should use ringbuffers instead, avoiding shared memory entirely.&lt;/p&gt;




&lt;h3&gt;
  
  
  12) Redundant Barriers (unnecessary &lt;code&gt;__syncthreads&lt;/code&gt;) [Performance, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;A redundant barrier is a performance-pathology class: removing the barrier &lt;strong&gt;does not introduce a race&lt;/strong&gt;, so the barrier was unnecessary overhead.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;__shared__&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;// no cross-thread dependence here&lt;/span&gt;
  &lt;span class="n"&gt;__syncthreads&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;      &lt;span class="c1"&gt;// redundant&lt;/span&gt;
  &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Wu et al.: defines "redundant barrier function."(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Simulee: detects redundant barrier bugs and reports numbers across projects.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;AuCS: repairs synchronization bugs, including redundant barriers.(&lt;a href="https://www.shinhwei.com/cuda-repair.pdf" rel="noopener noreferrer"&gt;Shinhwei&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GPURepair tooling also exists to insert/remove barriers to fix races and remove unnecessary ones.(&lt;a href="https://github.com/cs17resch01003/gpurepair" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static/dynamic dependence analysis:&lt;/strong&gt; determine whether any read-after-write / write-after-read across threads is protected by the barrier; if not, barrier is removable (Simulee/AuCS angle).(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Since barriers are allowed in policy code (with uniform placement enforced by #1), redundant barriers become a performance concern. Use static dependence analysis to detect barriers where no cross-thread data dependence exists between the preceding writes and subsequent reads. The verifier can warn about or reject redundant barriers to enforce &lt;strong&gt;bounded overhead&lt;/strong&gt; as a correctness property, ensuring policies do not introduce unnecessary synchronization cost.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-heuristic. Static dependence analysis can identify barriers that protect no cross-thread memory dependence, flagging them as redundant. This provides good detection coverage for common patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier performs dependence analysis on barrier sites: if no read-after-write or write-after-read across threads is protected by a barrier, the barrier is flagged as redundant and rejected. Combined with the policy overhead budget, this ensures barriers are only used when structurally necessary for shared-memory coordination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: Simulee detects redundant barriers through evolutionary simulation; Wu et al. define "redundant barrier function" as a key synchronization bug type; GPURepair uses GPUVerify as an oracle to repair data races/barrier divergence and can remove unnecessary barriers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Some barriers may appear redundant in isolation but are necessary for correctness under specific scheduling scenarios. Conservative analysis may retain some unnecessary barriers; profiling tools can identify remaining optimization opportunities at the kernel level.&lt;/p&gt;




&lt;h3&gt;
  
  
  13) Host-Device Asynchronous Data Races (API ordering bugs) [Correctness, GPU-specific]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;CUDA exposes async kernel launches/memcpy/events; host code can race with device work if synchronization is missing. This is a major real-world bug source in heterogeneous programs and is &lt;em&gt;not&lt;/em&gt; covered by pure kernel-only verifiers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;d_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;cudaMalloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;d_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// missing cudaDeviceSynchronize() here&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;malloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="n"&gt;cudaMemcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;cudaMemcpyDeviceToHost&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// race with kernel&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;CuSan is an open-source detector for "data races between (asynchronous) CUDA calls and the host," using Clang/LLVM instrumentation plus ThreadSanitizer.(&lt;a href="https://github.com/tudasc/cusan" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic detection (CuSan-style):&lt;/strong&gt; instrument host-side CUDA API calls and detect ordering violations at runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If policies interact with host-visible buffers or involve asynchronous map copies, define a strict &lt;strong&gt;lifetime &amp;amp; ordering contract&lt;/strong&gt; (e.g., "policy writes are only consumed after a guaranteed sync point"). For testing, integrate CuSan into CI for host-side integration tests of the runtime/loader.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Host+Device/System (H), Dynamic-only. These races involve host-side API calls (cudaMemcpy, kernel launch, synchronization) interacting with device execution: the policy verifier provides no soundness guarantees for this bug class (host API ordering is out of scope); completeness is N/A as this is fundamentally a host-side problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The policy verifier cannot provide guarantees for this bug class. It can only ensure policy code doesn't introduce &lt;em&gt;additional&lt;/em&gt; async semantics (e.g., policy writes are only visible after guaranteed sync points). Define strict lifetime &amp;amp; ordering contracts for policy-accessible buffers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: CuSan is the primary tool: an open-source detector for "data races between (asynchronous) CUDA calls and the host," using Clang/LLVM instrumentation plus ThreadSanitizer. Integrate CuSan into CI for host-side integration tests of the runtime/loader.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Dynamic detection depends on test coverage: executed paths only. For production, implement runtime checks in the loader/driver for obvious violations (e.g., policy accessing freed memory, missing sync before host read). This is the H-track core tool requirement.&lt;/p&gt;




&lt;h3&gt;
  
  
  14) Atomic Contention [Performance, GPU-amplified]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Heavy atomic contention is a classic "performance bug that behaves like a DoS" under massive parallelism. Even when correctness is preserved, contention on a single address can cause extreme slowdowns (orders of magnitude). With millions of threads, a single hot atomic can serialize execution and cause tail latency explosion.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// All threads atomically increment the same location =&amp;gt; extreme contention&lt;/span&gt;
  &lt;span class="n"&gt;atomicAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// Called with &amp;lt;&amp;lt;&amp;lt;1000, 1024&amp;gt;&amp;gt;&amp;gt; =&amp;gt; 1M threads contending on one address&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPUAtomicContention: an open-source benchmark suite (2025) explicitly measuring atomic performance under contention and across different &lt;strong&gt;memory scopes&lt;/strong&gt; (block/device/system) and access patterns.(&lt;a href="https://github.com/KIT-OSGroup/GPUAtomicContention" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Budget-based verification:&lt;/strong&gt; limit atomic frequency per warp/block.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarking:&lt;/strong&gt; use atomic contention benchmarks to calibrate safe budgets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static analysis:&lt;/strong&gt; identify hot atomic targets and warn about contention risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Treat "atomic frequency + contention risk" as a verifier-enforced budget: e.g., allow at most one global atomic per warp, or require warp-aggregated updates. For evaluation, you can reuse the open benchmark suite to calibrate "safe budgets" per GPU generation. Consider requiring warp-level reduction before global atomics to reduce contention by 32x.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Combined → Extension-local (C→E) via budgetization, Static-sound. Contention severity depends on both policy behavior (atomic frequency) and kernel behavior (concurrent atomics to same address), but this reduces to Extension-local by treating atomics as a budget, providing strong soundness for policy's contribution with medium completeness (high-throughput atomic patterns hit budget limits).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier treats "atomic frequency + contention risk" as a budget: (1) limit to N global atomics per warp per invocation; (2) require warp-aggregation (one atomic per warp instead of per-lane) for 32x contention reduction by construction; (3) forbid unbounded atomic loops. The budget provides bounded-overhead guarantees for policy's contribution regardless of kernel behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: GPUAtomicContention is an open-source benchmark suite (2025) explicitly measuring atomic performance under contention across different memory scopes (block/device/system) and access patterns: use it to calibrate "safe budgets" per GPU generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Total system contention depends on concurrent workloads: the verifier bounds &lt;em&gt;policy's contribution&lt;/em&gt;, not system-wide slowdown. "Making Powerful Enemies on NVIDIA GPUs" demonstrates adversarial kernels can systematically amplify interference through shared resource contention, making tight system-wide bounds impossible to guarantee statically.&lt;/p&gt;




&lt;h3&gt;
  
  
  15) Non-Barrier Deadlocks [Safety, GPU-amplified]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Besides barrier divergence (which is specifically about &lt;code&gt;__syncthreads&lt;/code&gt; under divergent control flow), SIMT lockstep can create deadlocks in other patterns that are unusual on CPUs: spin-waiting, lock contention within a warp, and named-barrier misuse. Warp-specialized kernels often use &lt;strong&gt;named barriers&lt;/strong&gt; or structured synchronization patterns between warps/roles (producer/consumer). Bugs include: (a) spin deadlock due to missing signals, (b) unsafe barrier reuse ("recycling") across iterations, (c) races between producers/consumers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example (spin deadlock)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Block 0 expects Block 1 to set flag, but no global sync exists&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;atomicAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// may spin forever&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* forgot to set flag */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Bug example (named-barrier misuse, sketch)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Producer writes buffer then signals barrier B&lt;/span&gt;
&lt;span class="c1"&gt;// Consumer waits on B then reads buffer&lt;/span&gt;
&lt;span class="c1"&gt;// Bug: consumer waits on wrong barrier instance / reused incorrectly in loop&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;iGUARD notes that lockstep execution can deadlock if threads within a warp use distinct locks.(&lt;a href="https://akkamath.github.io/files/SOSP21_iGUARD.pdf" rel="noopener noreferrer"&gt;Aditya K Kamath&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GKLEE reports finding deadlocks via symbolic exploration of GPU kernels.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;ESBMC-GPU models and checks deadlock too.(&lt;a href="https://github.com/ssvlab/esbmc-gpu" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;WEFT verifies &lt;strong&gt;deadlock freedom&lt;/strong&gt;, &lt;strong&gt;safe barrier recycling&lt;/strong&gt;, and &lt;strong&gt;race freedom&lt;/strong&gt; for producer-consumer synchronization (named barriers).(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Protocol verification (WEFT-style):&lt;/strong&gt; for specific synchronization patterns, prove deadlock freedom + race freedom + safe reuse. Model barrier instances across loop iterations and prove safe reuse.(&lt;a href="https://zhangyuqun.github.io/publications/ase2019.pdf" rel="noopener noreferrer"&gt;zhangyuqun.github.io&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Symbolic exploration (GKLEE-style):&lt;/strong&gt; explore possible interleavings and detect deadlock states.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Ban blocking primitives in policy code (locks, spin loops, waiting on global conditions). Add a verifier rule: &lt;strong&gt;no unbounded loops / no "wait until" patterns&lt;/strong&gt;. If you absolutely need synchronization, force "single-lane, nonblocking" patterns and bounded retries. Policies must not interact with named barriers (no waits, no signals). This aligns with the availability story: policies must not create device stalls.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), By-construction. Deadlock patterns (spin-wait, lock contention, named-barrier misuse) are structural properties of policy code; banning blocking primitives makes deadlocks structurally impossible with perfect soundness and high completeness (blocking patterns are rarely needed in policy code).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier bans: (1) &lt;code&gt;while(condition)&lt;/code&gt; loops that could spin indefinitely; (2) lock primitives and mutex-like patterns; (3) named-barrier operations (waits, signals); (4) waiting on global conditions; (5) any construct that could block warp/block execution. If synchronization is needed, force "single-lane, nonblocking" patterns with bounded retries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: ESBMC-GPU models and checks deadlock via bounded model checking; WEFT verifies deadlock freedom, safe barrier recycling, and race freedom for producer-consumer synchronization with named barriers; GKLEE reports finding deadlocks via symbolic exploration. iGUARD notes that lockstep execution can deadlock if threads within a warp use distinct locks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Policies with legitimate bounded-retry patterns must be structured with explicit iteration counts to prove termination. iGUARD notes that ITS breaks warp-lockstep assumptions: threads in the same warp can now deadlock on locks if they take different branches. Banning blocking primitives is the only sound approach without complex ITS-aware analysis.&lt;/p&gt;




&lt;h3&gt;
  
  
  16) Kernel Non-Termination / Infinite Loops [Safety, GPU-amplified]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Infinite loops can hang GPU execution. In practice, non-termination is especially dangerous because GPU preemption/recovery can be coarse.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// infinite loop if flag never set&lt;/span&gt;
  &lt;span class="c1"&gt;// or: while (true) { /* missing break */ }&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;CL-Vis explicitly calls out infinite loops (together with barrier divergence) as GPU-specific bug types to detect/handle.(&lt;a href="https://cai.type.sk/content/2019/1/cl-vis-visualization-platform-for-understanding-and-checking-the-opencl-programs/4318.pdf" rel="noopener noreferrer"&gt;Computing and Informatics&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static bounds analysis:&lt;/strong&gt; prove loop termination or enforce compile-time bounded loops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime watchdog:&lt;/strong&gt; timeout-based detection (coarse but practical).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;This is where "bounded overhead = correctness" is easiest to justify: enforce a &lt;strong&gt;strict instruction/iteration bound&lt;/strong&gt; for policy code (like eBPF on CPU). If policies may contain loops, require compile-time bounded loops only, with conservative upper bounds.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E) for policy; kernel non-termination is out of scope. Static-sound, where bounded loops or instruction budget guarantees policy termination with strong soundness but low completeness (data-dependent loop bounds rejected even if always terminating).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The eBPF approach works: (1) all loops must have compile-time bounded iteration counts; OR (2) ban loops entirely; OR (3) enforce a total instruction budget. The verifier proves termination by construction without analyzing the kernel. Policies may contain loops only if bounds can be statically determined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: ESBMC-GPU can find non-termination paths within context bounds; CL-Vis explicitly calls out infinite loops (together with barrier divergence) as GPU-specific bug types to detect; runtime watchdogs provide coarse timeout-based detection (engineering stopgap, not completeness).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: The verifier guarantees &lt;em&gt;policy&lt;/em&gt; termination, not &lt;em&gt;kernel&lt;/em&gt; termination. If the kernel itself has infinite loops, the policy verifier cannot and should not try to detect this; that's a kernel bug requiring kernel-level tools. This is "bounded overhead = correctness" at its most justified.&lt;/p&gt;




&lt;h3&gt;
  
  
  17) Global-Memory Data Races [Correctness, CPU-shared]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Races on global memory are a fundamental correctness issue. Unlike shared memory (block-local), global memory is accessible by all threads across all blocks, making races harder to reason about. Many GPU race detectors historically focused on shared memory and ignored global-memory races.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// Multiple threads may write to same location without sync&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// race if multiple threads hit same index&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;ScoRD explicitly argues that many GPU race detectors focus on shared memory and ignore global-memory races.(&lt;a href="https://www.csa.iisc.ac.in/~arkapravab/papers/isca20_ScoRD.pdf" rel="noopener noreferrer"&gt;CSA - IISc Bangalore&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;iGUARD targets races in global memory introduced by advanced CUDA features.(&lt;a href="https://akkamath.github.io/files/SOSP21_iGUARD.pdf" rel="noopener noreferrer"&gt;Aditya K Kamath&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GKLEE reports global memory races via symbolic exploration.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static verification:&lt;/strong&gt; extend race-freedom proofs to global memory accesses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic detection:&lt;/strong&gt; instrument global memory accesses and track conflicting pairs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;If policies can write to global memory (maps, counters, logs), require either: (1) warp-uniform single-writer rules, (2) atomic-only helpers, or (3) per-thread/per-warp sharding. Ban unprotected global writes from policies.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Combined → Extension-local (C→E) via state isolation, Static-sound. If policies can write arbitrary kernel global memory, race analysis requires knowing kernel access patterns (Combined). However, restricting policies to write only policy-owned objects reduces this to Extension-local, providing strong soundness with isolation, low completeness for kernel-modifying policies (direct kernel writes require Combined analysis).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: Restricting policies to write only policy-owned objects (maps, ringbuffers) enables Extension-local verification: (1) policy-owned objects use known-safe access patterns (atomics, per-warp sharding); (2) the verifier guarantees race-freedom for policy state without inspecting the kernel; (3) ban unprotected global writes from policies. Three safe patterns: warp-uniform single-writer rules, atomic-only helpers, or per-thread/per-warp sharding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: ScoRD explicitly argues that many GPU race detectors focus on shared memory and ignore global-memory races, and provides detection with scope awareness; iGUARD targets races in global memory introduced by advanced CUDA features via NVBit instrumentation; GKLEE reports global memory races via symbolic exploration. Note: Compute Sanitizer &lt;code&gt;racecheck&lt;/code&gt; is primarily a shared-memory hazard detector; do not expect it to fully cover global races.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Policies needing to modify kernel data structures directly cannot be verified locally; this capability should be restricted or require explicit kernel-side contracts. ScoRD/iGUARD emphasize global-memory races are underdetected by existing tools; state isolation sidesteps this entirely for policy code.&lt;/p&gt;




&lt;h3&gt;
  
  
  18) Memory Safety (Out-of-Bounds / Misaligned / Use-After-Free / Use-After-Scope / Uninitialized) [Safety, CPU-shared]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Classic memory safety includes both &lt;strong&gt;spatial&lt;/strong&gt; (OOB, misaligned) and &lt;strong&gt;temporal&lt;/strong&gt; (UAF, UAS) violations. Temporal bugs exist on GPUs too: pointers can outlive allocations (host frees while kernel still uses, device-side stack frame returns, etc.).&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example (OOB)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// OOB write&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Bug example (Use-After-Scope)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__device__&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;bad&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// returns pointer to dead stack frame (UAS)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bad&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;          &lt;span class="c1"&gt;// UAS read&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Compute Sanitizer &lt;code&gt;memcheck&lt;/code&gt; precisely detects OOB/misaligned accesses (and can detect memory leaks).(&lt;a href="https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html" rel="noopener noreferrer"&gt;NVIDIA Docs&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Oclgrind reports invalid memory accesses in its simulator.(&lt;a href="https://github.com/jrprice/Oclgrind" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;ESBMC-GPU checks pointer safety and array bounds as part of its model checking.(&lt;a href="https://github.com/ssvlab/esbmc-gpu" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;GKLEE's evaluation includes out-of-bounds global memory accesses as error cases.(&lt;a href="https://lingming.cs.illinois.edu/publications/icse2020b.pdf" rel="noopener noreferrer"&gt;Lingming Zhang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Wu et al.: "unauthorized memory access" appears in root-cause characterization.(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;cuCatch explicitly targets temporal violations using tagging mechanisms and discusses UAF/UAS detection.(&lt;a href="https://d1qx31qr3h6wln.cloudfront.net/publications/PLDI_2023_cuCatch_2.pdf" rel="noopener noreferrer"&gt;d1qx31qr3h6wln.cloudfront.net&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Guardian: PTX-level instrumentation + interception to fence illegal memory accesses under GPU sharing.(&lt;a href="https://arxiv.org/pdf/2401.09290" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bounds-check instrumentation (Guardian/cuCatch-style):&lt;/strong&gt; insert base+bounds checks (or partition-fencing) around loads/stores.(&lt;a href="https://arxiv.org/pdf/2401.09290" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal tagging + runtime checks (cuCatch-style):&lt;/strong&gt; tag allocations and validate before deref.(&lt;a href="https://d1qx31qr3h6wln.cloudfront.net/publications/PLDI_2023_cuCatch_2.pdf" rel="noopener noreferrer"&gt;d1qx31qr3h6wln.cloudfront.net&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static verification (ESBMC-GPU):&lt;/strong&gt; model checking for pointer safety and array bounds.(&lt;a href="https://github.com/ssvlab/esbmc-gpu" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PTX-level instrumentation (Guardian-style):&lt;/strong&gt; insert bounds checks and interception to fence illegal accesses.(&lt;a href="https://arxiv.org/pdf/2401.09290" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tagging mechanisms (cuCatch-style):&lt;/strong&gt; track allocation ownership and validate access rights.(&lt;a href="https://d1qx31qr3h6wln.cloudfront.net/publications/PLDI_2023_cuCatch_2.pdf" rel="noopener noreferrer"&gt;d1qx31qr3h6wln.cloudfront.net&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;This is the "classic verifier" portion: keep eBPF-like pointer tracking, bounds checks, and restricted helpers. Easiest for policies is to &lt;strong&gt;ban arbitrary pointer dereferences&lt;/strong&gt; and force all memory access through safe helpers (maps/ringbuffers). Ideally: policies cannot allocate/free; all policy-visible objects are managed by the extension runtime and remain valid across policy execution (no UAF/UAS by construction). Also add a testing story: run policy-enabled kernels under Compute Sanitizer memcheck in CI for regression.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E) for policy memory. Static-sound for spatial safety (helper-only access with tracked bounds); By-construction for temporal safety (runtime-managed objects, no policy malloc/free). Strong soundness with low completeness (raw pointer arithmetic rejected).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The eBPF approach: (1) ban arbitrary pointer dereferencing; (2) all memory access through verified helpers (map lookup, ringbuffer write); (3) verifier tracks pointer provenance and bounds; (4) policy-visible objects are runtime-managed (no policy malloc/free): UAF/UAS impossible by construction because objects remain valid for the policy's lifetime. This provides strong memory safety for policy code without analyzing the kernel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: Compute Sanitizer &lt;code&gt;memcheck&lt;/code&gt; precisely detects OOB/misaligned accesses and memory leaks; cuCatch explicitly targets temporal violations using tagged base&amp;amp;bounds mechanisms and discusses UAF/UAS detection (some deterministic, some probabilistic); ESBMC-GPU checks pointer safety and array bounds via bounded model checking; GKLEE's evaluation includes out-of-bounds global memory accesses as error cases; Wu et al. characterize "unauthorized memory access" in their root-cause analysis; Guardian provides PTX-level instrumentation + interception for multi-tenant memory isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Policy memory safety doesn't protect against &lt;em&gt;kernel&lt;/em&gt; bugs. For multi-tenant fault isolation in spatial sharing (streams/MPS), Guardian-style PTX instrumentation or hardware isolation is needed to prevent one tenant's OOB from crashing others: policy verification alone is insufficient for system-wide isolation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Multi-tenant implications
&lt;/h4&gt;

&lt;p&gt;In spatial sharing (streams/MPS), kernels share a GPU address space. An OOB access by one application can crash other co-running applications (fault isolation issue). Guardian's motivation explicitly calls out this problem and designs PTX-level fencing + interception as a fix.(&lt;a href="https://arxiv.org/pdf/2401.09290" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;) This directly supports the "availability is correctness" story: if policies run in privileged/shared contexts, you must prevent policy code from generating OOB accesses. Either: (a) only allow map helpers (no raw memory), or (b) instrument policy memory ops with bounds checks (Guardian-style PTX rewriting).&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example (multi-tenant OOB, conceptual)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Tenant A kernel writes OOB and corrupts Tenant B memory in same context.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Bug example (Uninitialized Memory)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// 'in' was cudaMalloc'd but never initialized or memset&lt;/span&gt;
  &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// reading uninitialized memory&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Uninitialized Memory: additional notes
&lt;/h4&gt;

&lt;p&gt;Accessing device global memory without initialization leads to nondeterministic behavior. This is a frequent source of heisenbugs because GPU concurrency amplifies nondeterminism. Compute Sanitizer &lt;code&gt;initcheck&lt;/code&gt; reports cases where device global memory is accessed without being initialized.(&lt;a href="https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html" rel="noopener noreferrer"&gt;NVIDIA Docs&lt;/a&gt;) For policies, require explicit initialization semantics (e.g., map lookup returns "not found" unless initialized; forbid reading uninitialized slots).&lt;/p&gt;




&lt;h3&gt;
  
  
  19) Arithmetic Errors (overflow, division by zero) [Correctness/Safety, CPU-shared]
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What it is / why it matters
&lt;/h4&gt;

&lt;p&gt;Arithmetic errors can corrupt keys/indices and cascade into memory safety/perf disasters.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bug example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cuda"&gt;&lt;code&gt;&lt;span class="k"&gt;__global__&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;divisor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blockIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;blockDim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;threadIdx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;divisor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// div-by-zero if divisor == 0&lt;/span&gt;

  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// overflow for large tid&lt;/span&gt;
  &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                  &lt;span class="c1"&gt;// corrupted index =&amp;gt; OOB&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Seen in / checked by
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;ESBMC-GPU explicitly lists arithmetic overflow and division-by-zero among the properties it checks for CUDA programs (alongside races/deadlocks/bounds).(&lt;a href="https://github.com/ssvlab/esbmc-gpu" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Checking approach
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model checking (ESBMC-GPU):&lt;/strong&gt; static verification of arithmetic properties.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight runtime checks:&lt;/strong&gt; guard div/mod operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Verification strategy
&lt;/h4&gt;

&lt;p&gt;Optional but reviewer-friendly: add lightweight verifier checks for div-by-zero and dangerous shifts, and constrain pointer arithmetic (already typical in eBPF verifiers). For "perf correctness," overflow in index computations is a common hidden cause of random/uncoalesced patterns.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verification scope analysis
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scope &amp;amp; Assurance&lt;/strong&gt;: Extension-local (E), Static-sound. Arithmetic errors depend only on the policy's operations and input value ranges, providing strong soundness via range analysis with medium completeness (complex arithmetic may require explicit assertions).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production guarantee&lt;/strong&gt;: The verifier performs lightweight static checks: (1) division: require static proof that divisor ≠ 0, or insert runtime guards; (2) overflow: use saturating arithmetic, or prove bounds on operands; (3) dangerous shifts: validate shift amounts; (4) index arithmetic: track value ranges to catch OOB before memory access. This is already typical in eBPF verifiers and adds minimal overhead to policy verification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline/CI tools&lt;/strong&gt;: ESBMC-GPU explicitly lists arithmetic overflow and division-by-zero among the properties it checks for CUDA programs (alongside races/deadlocks/bounds) via bounded model checking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Residual gap&lt;/strong&gt;: Policies with complex arithmetic that happens to be safe may need explicit assertions or be conservatively rejected. Cascade risk: arithmetic errors often cascade into memory safety bugs (corrupted indices → OOB) or performance bugs (overflow in index computations causing random/uncoalesced patterns). The verifier should track value ranges through index computations proactively to catch these before they become downstream violations.&lt;/p&gt;




&lt;h3&gt;
  
  
  Summary: Improper Synchronization as a Root-Cause Category (Wu et al.'s Three-Way Taxonomy)
&lt;/h3&gt;

&lt;p&gt;Wu et al.'s empirical study explicitly groups CUDA-specific synchronization issues into three concrete bug types: &lt;strong&gt;data race&lt;/strong&gt;, &lt;strong&gt;barrier divergence&lt;/strong&gt;, and &lt;strong&gt;redundant barrier functions&lt;/strong&gt;. They also highlight that these often manifest as inferior performance and flaky tests. Simulee is used to find these categories in real projects.(&lt;a href="https://arxiv.org/pdf/1905.01833" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;This is exactly the "verification story" hook: a GPU extension verifier can claim that policy code cannot introduce these synchronization root causes because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;barriers are only allowed at provably uniform control flow points,&lt;/li&gt;
&lt;li&gt;warp-uniform side effects enforced,&lt;/li&gt;
&lt;li&gt;bounded helper calls,&lt;/li&gt;
&lt;li&gt;and a restricted memory model for policies.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Summary: Verification Scope and Assurance Types
&lt;/h3&gt;

&lt;p&gt;The verification scope and assurance type dimensions reveal crucial insights for GPU extension framework design.&lt;/p&gt;

&lt;h4&gt;
  
  
  By Verification Scope
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Extension-local (E): 14 of 19 classes:&lt;/strong&gt;&lt;br&gt;
Bugs #1, #2, #4, #6, #7, #10, #11, #12, #15, #16, #18, #19 can be eliminated purely by restricting policy code, without inspecting the host kernel. Additionally, bugs #3, #5, #14, #17 can be &lt;strong&gt;reduced from Combined to Extension-local&lt;/strong&gt; through state isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Combined (C): 2 classes requiring contracts:&lt;/strong&gt;&lt;br&gt;
Bugs #8 (block-size dependence) and #9 (launch config assumptions) fundamentally depend on kernel launch parameters. These require contract-based validation at attach time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Host+Device (H): 1 class requiring host-side tools:&lt;/strong&gt;&lt;br&gt;
Bug #13 (host↔device async races) cannot be addressed by device-side verification. Requires CuSan/TSan and careful API design.&lt;/p&gt;

&lt;h4&gt;
  
  
  By Assurance Type
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Assurance Type&lt;/th&gt;
&lt;th&gt;Bug Classes&lt;/th&gt;
&lt;th&gt;Soundness&lt;/th&gt;
&lt;th&gt;Completeness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;By-construction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#2, #10, #15&lt;/td&gt;
&lt;td&gt;Perfect&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Static-sound&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#1, #3, #4, #5, #6, #11, #14, #16, #17, #18, #19&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Low-Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Static-heuristic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#7, #12&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Contract-based&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#8, #9&lt;/td&gt;
&lt;td&gt;Conditional&lt;/td&gt;
&lt;td&gt;Depends on contracts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dynamic-only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#13&lt;/td&gt;
&lt;td&gt;Executed paths only&lt;/td&gt;
&lt;td&gt;Coverage-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  The Three-Stage Verification Pipeline
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Load-time static verifier (core, analogous to eBPF verifier)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The load-time verifier employs three tiers of analysis, ranging from outright bans on genuinely dangerous constructs to static analysis that preserves useful functionality:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tier A — By-construction bans (3 classes, no legitimate policy use):&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ban warp sync primitives (#2) — mask correctness is unverifiable without ITS-aware analysis&lt;/li&gt;
&lt;li&gt;Ban spin-wait / polling loops (#10) — causes stale reads and ad-hoc synchronization&lt;/li&gt;
&lt;li&gt;Ban blocking primitives: locks, mutexes, named barriers (#15) — prevents non-barrier deadlocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Tier B — Static-sound analysis (11 classes, allow but verify safe usage):&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Verification capability&lt;/th&gt;
&lt;th&gt;Bug classes covered&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Uniform control-flow analysis&lt;/td&gt;
&lt;td&gt;#1 barrier divergence, #4 warp-divergence race, #6 control-flow divergence&lt;/td&gt;
&lt;td&gt;Prove barriers are at uniform points; side-effects on uniform paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory access pattern analysis&lt;/td&gt;
&lt;td&gt;#5 uncoalesced access, #7 bank conflicts&lt;/td&gt;
&lt;td&gt;Check stride patterns; reject non-conforming index expressions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Race-freedom structural rules&lt;/td&gt;
&lt;td&gt;#11 shared-mem races, #17 global-mem races&lt;/td&gt;
&lt;td&gt;Per-lane sharding / lane0-only / atomic helpers + state isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scope enforcement&lt;/td&gt;
&lt;td&gt;#3 atomic scope&lt;/td&gt;
&lt;td&gt;Force device-scope for policy atomics + state isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pointer/memory safety&lt;/td&gt;
&lt;td&gt;#18 memory safety&lt;/td&gt;
&lt;td&gt;Restrict pointer operations, analogous to eBPF pointer verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Loop termination&lt;/td&gt;
&lt;td&gt;#16 non-termination&lt;/td&gt;
&lt;td&gt;Enforce bounded iteration counts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Range analysis&lt;/td&gt;
&lt;td&gt;#19 arithmetic errors&lt;/td&gt;
&lt;td&gt;Track value ranges to prevent overflow cascading into OOB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource budgets&lt;/td&gt;
&lt;td&gt;#14 atomic contention&lt;/td&gt;
&lt;td&gt;Limit atomic counts / enforce warp-aggregation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Tier C — Static-heuristic detection (2 classes, performance warnings/rejections):&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;#7 bank conflicts → check shared-memory index stride against bank mapping&lt;/li&gt;
&lt;li&gt;#12 redundant barriers → dependence analysis to determine if a barrier protects actual cross-thread dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Attach-time contract validation (2 classes)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;#8 block-size dependence → policy declares preconditions (e.g., &lt;code&gt;requires: blockDim.x &amp;gt;= 128&lt;/code&gt;), validated when attaching to a specific kernel&lt;/li&gt;
&lt;li&gt;#9 launch config assumptions → validate grid/block dimensions satisfy policy preconditions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stage 3: CI/Offline + Runtime (complementary coverage)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;#13 host↔device async races → CuSan/TSan dynamic detection, beyond device-side verification scope&lt;/li&gt;
&lt;li&gt;GPUVerify/ESBMC-GPU for kernel+extension combined analysis (when source is available)&lt;/li&gt;
&lt;li&gt;Compute Sanitizer suite for dynamic regression testing&lt;/li&gt;
&lt;li&gt;iGUARD/Simulee for advanced race detection&lt;/li&gt;
&lt;li&gt;Runtime overhead enforcement for multi-tenant isolation (Guardian-style)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The eBPF Lesson Applied to GPUs
&lt;/h4&gt;

&lt;p&gt;Just as eBPF succeeds by restricting extension capabilities to what can be verified without inspecting the kernel, a GPU extension verifier should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ban only what is genuinely dangerous and unnecessary&lt;/strong&gt; — warp sync, spin-wait, and blocking primitives have no legitimate use in policy code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use static analysis to allow useful features safely&lt;/strong&gt; — barriers, shared memory, and atomics are valuable; verify their safe usage rather than banning them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isolate policy state&lt;/strong&gt; to reduce Combined bugs to Extension-local&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforce warp-uniformity&lt;/strong&gt; for side effects, bounding SIMT-amplified overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use budgets&lt;/strong&gt; for performance-affecting resources (atomics, memory ops)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require contracts&lt;/strong&gt; only for unavoidably Combined properties (#8, #9)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key design principle is &lt;em&gt;not&lt;/em&gt; to ban everything that could go wrong, but to apply the right level of restriction for each risk: outright bans for constructs with no legitimate policy use, static verification for useful but dangerous features, and heuristic detection for performance concerns. This preserves policy expressiveness while maintaining soundness for safety-critical GPU extensions.&lt;/p&gt;




</description>
      <category>ebpf</category>
      <category>gpu</category>
      <category>verifier</category>
    </item>
    <item>
      <title>Architectures for Agent Systems: A Survey of Isolation, Integration, and Governance</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 03 Feb 2026 07:36:18 +0000</pubDate>
      <link>https://dev.to/yunwei37/architectures-for-agent-systems-a-survey-of-isolation-integration-and-governance-2185</link>
      <guid>https://dev.to/yunwei37/architectures-for-agent-systems-a-survey-of-isolation-integration-and-governance-2185</guid>
      <description>&lt;p&gt;Large Language Model (LLM) based agent systems – software that leverages LLMs to autonomously plan and execute multi-step tasks using external tools – are rapidly moving from proof-of-concept demos into enterprise deployment. These agents promise to automate coding, IT operations, data analysis, and more, but deploying them in production raises new challenges in security, reliability, and integration. Over the last half-year, the community has converged on key strategies: strong isolation for executing untrusted actions, standardized protocols for tool integration, and governance frameworks to align agent behavior with enterprise policies. This survey provides a systematic review of recent developments (roughly the latter half of 2025), including agent sandbox architectures, emerging standards like MCP, open-source projects, industry initiatives, and research advances. We focus on the pain points encountered when bringing agent systems to production and how the latest solutions address (or still fall short on) those needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Agent System Architecture in the Enterprise
&lt;/h2&gt;

&lt;p&gt;An enterprise-ready agent system typically consists of several layers: (i) an LLM-based reasoning core (the "agent" that decides which actions to take), (ii) an interface to invoke external tools or services (e.g. via APIs, command-line, databases), and (iii) an execution environment or runtime where the agent's tool actions (like running code or shell commands) actually occur. Surrounding these are components for memory/state storage, orchestration (especially if multiple agents work together), and monitoring &amp;amp; control (for safety and compliance). The overarching architectural challenge is that these systems are highly dynamic and open-ended: the agent may generate arbitrary code or tool requests at runtime, often based on unpredictable input. This requires a different approach to software architecture than traditional deterministic services.&lt;/p&gt;

&lt;p&gt;Isolation and Safety by Design. Unlike a bounded microservice, an AI agent might decide to execute unvetted code or make system-altering calls. A core architectural principle emerging in 2025 is to sandbox the agent's actions – running them in an isolated environment that protects the host system and network. For example, the open-source Agent Sandbox for Kubernetes was introduced as a new Kubernetes primitive to run AI agents safely. Instead of letting LLM-generated code run in a standard container (which could still abuse the host kernel or other pods), Agent Sandbox uses lightweight VMs (gVisor-based userland kernel, with optional Kata Containers support) to create a secure barrier between the agent's code and the cluster node's OS. This isolates potentially malicious or errant code from interfering with other applications or the host. The Sandbox is managed via a custom Kubernetes resource (CRD) called &lt;code&gt;Sandbox&lt;/code&gt;, which represents a single, stateful, long-lived pod with a stable identity and persistent storage. This design reflects a shift from treating agent workloads as ephemeral stateless functions to treating them as session-oriented services that may hold state over time. Indeed, the Agent Sandbox supports features like pausing and resuming the VM, automatically reviving it if a network reconnect is needed, and even memory sharing across sandboxes for efficiency. It also provides a templating and pool mechanism – &lt;code&gt;SandboxTemplate&lt;/code&gt; and &lt;code&gt;SandboxClaim&lt;/code&gt; – to manage pools of pre-warmed sandbox pods. Pre-warming is crucial because launching a fresh isolated VM can be slow; by keeping a pool of ready-to-go sandboxes, startup latency for a new agent session is dramatically reduced (Google reports sub-second startup latency, a ~90% improvement over cold-starting sandboxes). In Google's GKE, this is paired with a new Pod Snapshots feature that can checkpoint and restore running sandbox pods (even GPU workloads), cutting startup from minutes to seconds and avoiding idle resource waste. In short, the sandbox architecture is purpose-built for autonomous agents: it provides stronger isolation than ordinary containers, yet supports persistent state and fast elasticity to accommodate long-running, interactive agent tasks at scale.&lt;/p&gt;

&lt;p&gt;Stateful Singleton Runtimes. Traditional cloud apps often scale by running many stateless instances behind a load balancer, but agent use-cases (like an AI coding assistant or an autonomous scheduler) often manifest as a single specialized "worker" with memory (such as cached tools or context) that persists across many tool calls. The Kubernetes Agent Sandbox explicitly targets these singleton, stateful workloads – not just for AI agents but also things like CI/CD build agents or single-node databases that require stable identity and disk state. This reflects a broader industry recognition: agent applications need new runtime primitives that can maintain continuity of state and identity across a session (for example, so the agent can incrementally build on previous tool outputs, or maintain an authenticated session to a service). Recent designs propose durable execution for agents – the ability to pause an agent's process, snapshot its memory or file system, and later resume or even migrate it. The GKE Agent Sandbox + Pod Snapshot combo is an early real-world example of this, effectively treating an agent's environment as a checkpointable virtual machine. We anticipate emerging orchestration support where an agent can be hibernated when idle and quickly reawakened when needed, balancing responsiveness with efficient resource use.&lt;/p&gt;

&lt;p&gt;Tool Interface Layer. The other critical piece of architecture is how agents interface with external tools and data. Historically, each AI assistant platform invented its own plugin system or API schema (e.g. OpenAI's Plugins, LangChain's tool abstractions). This led to a fragmented ecosystem where tools had to be rewritten for each agent framework. Over 2025, a consensus has grown around Model Context Protocol (MCP) as a standard interface between AI models (the clients) and tools or services (the servers). MCP was released by Anthropic in late 2024 and by 2025 it has become "the universal standard protocol for connecting AI models to tools, data, and applications". Conceptually, MCP defines a simple JSON-RPC-based client-server protocol by which an AI agent can discover available tools and invoke them with arguments, and receive results/observations. The tools can be anything: database queries, file system operations, web requests, code compilation – each exposed by an MCP server that the agent connects to. The power of a common protocol is that it transforms the integration problem from M×N (every model integrating with every tool) to M+N modularity. A tool developer can create an MCP server once, and any compliant agent (whether it's OpenAI's, Anthropic's, or an open-source project) can use it. This dramatically reduces duplicated effort and makes the system more maintainable. GitHub engineers describe MCP as creating a "USB-C for AI" – a universal port for tools. In practice, MCP connections can be local (via stdio pipes) or remote (HTTP+SSE streams), and are typically stateful sessions, which aligns well with the idea of agent tools that maintain context (e.g. a database connection that stays open, or a browser that retains cookies).&lt;/p&gt;

&lt;p&gt;Orchestration and Multi-Agent Workflows. Many real tasks may be too complex for a single agent or might benefit from specialized agents collaborating. The architecture is therefore expanding to support multi-agent systems where agents communicate or coordinate. Some protocols, like Agent-to-Agent (A2A) messaging, are emerging to standardize inter-agent communication (for instance, Google's Agent2Agent protocol and Microsoft's adoption of A2A in their framework). In a multi-agent setup, you might have one agent that specializes in planning, another in executing code, another in validation, etc., passing context or subtasks among them. Orchestration frameworks now often support deterministic workflows (where the chain of sub-tasks is predefined, akin to a business process) alongside LLM-driven orchestration (where agents dynamically decide how to break down and assign tasks). For example, Microsoft's new open-source Agent Framework explicitly supports both Agent Orchestration (LLM-driven, creative, adaptive) and Workflow Orchestration (fixed logic, for reliable repeatability) within one runtime. This framework, released in late 2025, consolidates previous research prototypes (like Semantic Kernel's planner and AutoGen from MSR) into an enterprise-ready SDK. It emphasizes connectors to enterprise systems, open standards (MCP, A2A, OpenAPI), and built-in telemetry, approvals, and long-running durability to meet enterprise needs. The trend here is that agents are being treated as first-class components of software systems, with the same expectations for monitoring, security, and lifecycle management as microservices or human-in-the-loop workflows.&lt;/p&gt;

&lt;p&gt;Summary: The architecture of modern agent systems is coalescing around a modular, layered design. A secure sandboxed execution layer ensures that any generated code or commands run in isolation with controlled privileges. A standardized tool interface layer (MCP and similar protocols) decouples agent reasoning from the implementation of tools, enabling a rich ecosystem of reusable capabilities. On top of these, orchestration mechanisms allow composing multiple agents and tools into larger autonomous workflows, while providing hooks for humans and existing DevOps processes to supervise and intervene when needed. In the following sections, we delve deeper into three crucial aspects of enterprise agent systems: (a) the sandbox and runtime isolation mechanisms, (b) the emerging standards and ecosystems of tools/plugins, and (c) the security, governance, and observability considerations that are top-of-mind as organizations deploy these systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Isolated Execution Environments for Agents (Sandboxing)
&lt;/h2&gt;

&lt;p&gt;Running untrusted or machine-generated code has always been risky – the difference now is that with LLM agents the code is being generated and executed on the fly, without a human vetting each command. This opens the door to accidental failures or even malicious exploits if the agent is tricked or if its outputs are unsafe. As a result, sandboxing has become a foundational requirement for agent systems. Sandboxing in this context means confining the agent's actions (code execution, file system writes, network calls, etc.) to an environment where it can't harm other processes or breach data it shouldn't access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table 1: Research / OSS Projects (Papers, Benchmarks, Open-Source Runtimes)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Sandbox/Isolation Boundary&lt;/th&gt;
&lt;th&gt;Key Capabilities&lt;/th&gt;
&lt;th&gt;Reference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes SIGs: agent-sandbox&lt;/td&gt;
&lt;td&gt;OSS (K8s Primitives/Controller)&lt;/td&gt;
&lt;td&gt;Sandbox CRD in Kubernetes (with Template/Claim/WarmPool)&lt;/td&gt;
&lt;td&gt;Manage "isolated + stateful + singleton" workloads; standardized API for agent runtime&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/kubernetes-sigs/agent-sandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIO Sandbox (agent-infra/sandbox)&lt;/td&gt;
&lt;td&gt;OSS (All-in-One Environment)&lt;/td&gt;
&lt;td&gt;Single Docker container (integrated multi-tools)&lt;/td&gt;
&lt;td&gt;Browser/Shell/File/MCP/VSCode Server unified; unified workspace for agents &amp;amp; dev&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/agent-infra/sandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alibaba OpenSandbox&lt;/td&gt;
&lt;td&gt;OSS (Universal Sandbox Platform)&lt;/td&gt;
&lt;td&gt;Unified protocol + multi-language SDK + sandbox runtime&lt;/td&gt;
&lt;td&gt;Universal sandbox foundation for command/file/code/browser/agent execution&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/alibaba/OpenSandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E2B (e2b-dev/E2B)&lt;/td&gt;
&lt;td&gt;OSS (Cloud Sandbox Infrastructure)&lt;/td&gt;
&lt;td&gt;Cloud-isolated sandbox (SDK controlled)&lt;/td&gt;
&lt;td&gt;Run AI-generated code in cloud; Python/JS SDK; for agent code interpreter&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/e2b-dev/E2B" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E2B Desktop (e2b-dev/desktop)&lt;/td&gt;
&lt;td&gt;OSS (Virtual Desktop Sandbox)&lt;/td&gt;
&lt;td&gt;Isolated virtual desktop environment&lt;/td&gt;
&lt;td&gt;"Computer Use" agent: desktop GUI, customizable dependencies, per-sandbox isolation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/e2b-dev/desktop" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Sandbox (vndee/llm-sandbox)&lt;/td&gt;
&lt;td&gt;OSS (Lightweight Code Sandbox)&lt;/td&gt;
&lt;td&gt;Containerized isolation (configurable security policies)&lt;/td&gt;
&lt;td&gt;Run LLM-generated code; customizable security policies and isolated container environments&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/vndee/llm-sandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SkyPilot Code Sandbox (alex000kim/…)&lt;/td&gt;
&lt;td&gt;OSS (Self-hosted Execution Service)&lt;/td&gt;
&lt;td&gt;SkyPilot deployment + Docker sandboxing&lt;/td&gt;
&lt;td&gt;Self-hosted, multi-language execution, token auth, MCP integration (for agent tools)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/alex000kim/skypilot-code-sandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsandbox (zerocore-ai/microsandbox)&lt;/td&gt;
&lt;td&gt;OSS (microVM Execution Environment)&lt;/td&gt;
&lt;td&gt;Hardware-isolated microVM (fast startup)&lt;/td&gt;
&lt;td&gt;Run untrusted workloads via microVM; emphasis on isolation strength and startup speed&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/zerocore-ai/microsandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ERA (BinSquare/ERA)&lt;/td&gt;
&lt;td&gt;OSS (Local microVM Sandbox)&lt;/td&gt;
&lt;td&gt;Local microVM ("microVM with container ease-of-use")&lt;/td&gt;
&lt;td&gt;Run untrusted/AI-generated code locally with hardware-level isolation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/BinSquare/ERA" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SandboxAI (substratusai/sandboxai)&lt;/td&gt;
&lt;td&gt;OSS (Runtime)&lt;/td&gt;
&lt;td&gt;Isolated sandbox&lt;/td&gt;
&lt;td&gt;Secure execution runtime for AI-generated Python code and shell commands&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/substratusai/sandboxai" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python MCP Sandbox (JohanLi233/mcp-sandbox)&lt;/td&gt;
&lt;td&gt;OSS (MCP Server)&lt;/td&gt;
&lt;td&gt;Docker container isolation&lt;/td&gt;
&lt;td&gt;Expose "secure Python execution" as a tool to agent/LLM clients via MCP&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/JohanLi233/mcp-sandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code Sandbox MCP (Automata-Labs-team/…)&lt;/td&gt;
&lt;td&gt;OSS (MCP Server)&lt;/td&gt;
&lt;td&gt;Docker container isolation&lt;/td&gt;
&lt;td&gt;MCP server: provide containerized secure code execution environment for AI applications&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Automata-Labs-team/code-sandbox-mcp" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ToolSandbox (Apple)&lt;/td&gt;
&lt;td&gt;Research + OSS (Evaluation Benchmark)&lt;/td&gt;
&lt;td&gt;Evaluation sandbox with "stateful tool execution + user simulator"&lt;/td&gt;
&lt;td&gt;Evaluate LLM tool-use: state dependencies, multi-turn dialogue, dynamic evaluation; open-source&lt;/td&gt;
&lt;td&gt;&lt;a href="https://arxiv.org/abs/2408.04682" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ToolEmu&lt;/td&gt;
&lt;td&gt;Research (Risk Evaluation Framework)&lt;/td&gt;
&lt;td&gt;LM-emulated sandbox (simulate tool execution with LM)&lt;/td&gt;
&lt;td&gt;Use LM to simulate tool execution for scalable agent risk testing; includes automatic safety evaluator&lt;/td&gt;
&lt;td&gt;&lt;a href="https://openreview.net/forum?id=GEcwtMk1uA" rel="noopener noreferrer"&gt;OpenReview&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HAICOSYSTEM&lt;/td&gt;
&lt;td&gt;Research + OSS (Safety Evaluation Ecosystem)&lt;/td&gt;
&lt;td&gt;Modular interaction sandbox (human-agent-tool multi-turn simulation)&lt;/td&gt;
&lt;td&gt;Multi-domain scenario simulation and multi-dimensional risk evaluation (operational/content/social/legal); code platform&lt;/td&gt;
&lt;td&gt;&lt;a href="https://arxiv.org/abs/2409.16427" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EnterpriseBench&lt;/td&gt;
&lt;td&gt;Research (Enterprise Environment Evaluation Sandbox)&lt;/td&gt;
&lt;td&gt;"Evaluation environment" for enterprise tasks/tools/data&lt;/td&gt;
&lt;td&gt;Evaluate LLM agents in enterprise scenarios (task execution, tool dependencies, data retrieval)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managing Linux servers with LLM-based AI agents&lt;/td&gt;
&lt;td&gt;Research (Empirical Evaluation)&lt;/td&gt;
&lt;td&gt;Dockerized Linux sandbox&lt;/td&gt;
&lt;td&gt;Let agents execute server tasks in Dockerized Linux environment and evaluate performance&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.sciencedirect.com/science/article/pii/S266682702400046X" rel="noopener noreferrer"&gt;ScienceDirect&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Programming Language Sandbox for LLMs&lt;/td&gt;
&lt;td&gt;Research (Multi-language Execution Sandbox)&lt;/td&gt;
&lt;td&gt;Container-isolated sub-sandbox&lt;/td&gt;
&lt;td&gt;Multi-language compilation/execution isolation (sub-sandbox isolated from main environment)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://arxiv.org/html/2410.23074v1" rel="noopener noreferrer"&gt;arXiv&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;awesome-sandbox (restyler/awesome-sandbox)&lt;/td&gt;
&lt;td&gt;OSS (Ecosystem Overview/List)&lt;/td&gt;
&lt;td&gt;N/A (aggregation)&lt;/td&gt;
&lt;td&gt;Systematic curated list &amp;amp; analysis of "code sandboxing solutions"; good entry point for long-tail coverage&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/restyler/awesome-sandbox" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Achieving exhaustive coverage is impractical (especially given the long tail of the MCP ecosystem), so this table covers mainstream/representative projects plus ecosystem indexes. The &lt;code&gt;awesome-sandbox&lt;/code&gt; list serves as an entry point for additional coverage.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Table 2: Commercial / Cloud Service Projects (Agent Sandbox / Code Sandbox / Runtime)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product/Service&lt;/th&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;th&gt;Isolation/Execution Model&lt;/th&gt;
&lt;th&gt;Key Capabilities&lt;/th&gt;
&lt;th&gt;Reference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code Interpreter (Tools)&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Managed Python sandbox execution&lt;/td&gt;
&lt;td&gt;Model writes and runs Python; for data analysis/coding/math&lt;/td&gt;
&lt;td&gt;&lt;a href="https://platform.openai.com/docs/guides/tools-code-interpreter" rel="noopener noreferrer"&gt;OpenAI Platform&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code Interpreter (Assistants on Azure)&lt;/td&gt;
&lt;td&gt;Microsoft Azure OpenAI&lt;/td&gt;
&lt;td&gt;Managed Python sandbox execution&lt;/td&gt;
&lt;td&gt;Assistants API runs Python in sandbox environment (per Azure docs)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/code-interpreter" rel="noopener noreferrer"&gt;Microsoft Learn&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E2B (Managed Cloud)&lt;/td&gt;
&lt;td&gt;E2B&lt;/td&gt;
&lt;td&gt;Managed cloud sandbox (enterprise agent cloud)&lt;/td&gt;
&lt;td&gt;Sandbox as agent runtime; emphasis on concurrency and execution infrastructure&lt;/td&gt;
&lt;td&gt;&lt;a href="https://e2b.dev/" rel="noopener noreferrer"&gt;E2B&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daytona&lt;/td&gt;
&lt;td&gt;Daytona&lt;/td&gt;
&lt;td&gt;Managed/platform sandbox infrastructure&lt;/td&gt;
&lt;td&gt;"Stateful infra for AI agents"; ultra-fast creation and isolated execution&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.daytona.io/" rel="noopener noreferrer"&gt;Daytona&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent Sandbox&lt;/td&gt;
&lt;td&gt;Novita AI&lt;/td&gt;
&lt;td&gt;Managed agent runtime&lt;/td&gt;
&lt;td&gt;Low startup latency, high concurrency; code execution/network access/browser automation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://novita.ai/sandbox" rel="noopener noreferrer"&gt;Novita AI&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sandboxes (Desktop / GUI)&lt;/td&gt;
&lt;td&gt;Bunnyshell&lt;/td&gt;
&lt;td&gt;Firecracker microVM virtual desktop&lt;/td&gt;
&lt;td&gt;For GUI/Computer Use: isolated desktop, VNC/noVNC, desktop automation API&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.bunnyshell.com/sandboxes/" rel="noopener noreferrer"&gt;Bunnyshell&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent Sandbox on GKE&lt;/td&gt;
&lt;td&gt;Google Cloud (GKE)&lt;/td&gt;
&lt;td&gt;Deploy/run Agent Sandbox controller on GKE&lt;/td&gt;
&lt;td&gt;Isolated execution of untrusted commands in cluster; official installation and usage guide&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox" rel="noopener noreferrer"&gt;Google Cloud Documentation&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore "agent sandbox"&lt;/td&gt;
&lt;td&gt;AWS Bedrock AgentCore&lt;/td&gt;
&lt;td&gt;Console testing sandbox&lt;/td&gt;
&lt;td&gt;AWS docs: test agents in agent sandbox&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/develop-agents.html" rel="noopener noreferrer"&gt;AWS Documentation&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modal Sandboxes&lt;/td&gt;
&lt;td&gt;Modal&lt;/td&gt;
&lt;td&gt;Modal platform sandbox execution unit&lt;/td&gt;
&lt;td&gt;Official example: build code-executing agent with Modal Sandboxes + LangGraph&lt;/td&gt;
&lt;td&gt;&lt;a href="https://modal.com/docs/examples/agent" rel="noopener noreferrer"&gt;Modal&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vercel Sandbox&lt;/td&gt;
&lt;td&gt;Vercel&lt;/td&gt;
&lt;td&gt;Vercel managed execution environment (Sandbox product)&lt;/td&gt;
&lt;td&gt;For scalable execution (fluid compute/pay-per-active-CPU, etc.)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://vercel.com/sandbox" rel="noopener noreferrer"&gt;Vercel&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker Sandboxes (Experimental)&lt;/td&gt;
&lt;td&gt;Docker&lt;/td&gt;
&lt;td&gt;Local containerized sandbox (for coding agents)&lt;/td&gt;
&lt;td&gt;Docker official: use local isolated environments to run coding agents, enforce boundaries&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.docker.com/blog/docker-sandboxes-a-new-approach-for-coding-agent-safety/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Agent Sandbox on Kubernetes. The Kubernetes-based Agent Sandbox, spearheaded by Google and open-sourced as a SIG project in late 2025, exemplifies state-of-the-art sandbox design. A sandbox instance is essentially a microVM (micro virtual machine) launched per agent session, managed through K8s APIs. Internally it leverages technologies like gVisor (userspace kernel) to intercept syscalls and Kata Containers (lightweight VM isolation) to provide a robust security boundary. This means even if an agent's code tries to perform a malicious syscall or exploit a kernel bug, it's constrained within a sandbox kernel that has minimal privileges on the host. The sandbox also limits network access by default on GKE (only allowing what's necessary for the agent tools), reducing the risk of an agent scanning internal networks or exfiltrating data. At KubeCon NA 2025, Google showcased how they can schedule thousands of sandbox pods in parallel, thanks to the lightweight nature of gVisor, and how pre-warmed sandbox pools enable sub-second startup latencies even with the isolation. This addresses the performance concern that isolation often introduces: by carefully engineering snapshot/restore and pooling, the overhead can be kept low enough for interactive use.&lt;/p&gt;

&lt;p&gt;From an API standpoint, the Sandbox CRD provides features tailored to long-running agent processes: you can specify resource limits, attach persistent volumes for agent state, and use the Kubernetes scheduler to place sandboxes on appropriate nodes (e.g. ones with GPU if the agent needs it). It also has life-cycle controls like scheduled deletion (to clean up sandboxes after use) and the mentioned pause/resume. Collectively, these features fulfill OWASP's top recommendation for mitigating agent risks: "system isolation, access segregation, permission management, command validation, and other safeguards". In fact, OWASP added an entry to its Top 10 for LLMs called "Agent Tool Interaction Manipulation" – the risk of an AI agent being induced to misuse its tools or perform unintended actions. The primary defense listed is to run the agent in a locked-down environment with fine-grained permission controls on what it can do. By confining an agent to a Kubernetes sandbox with only specific Kubernetes API access (or none at all beyond its tools) and no broad host access, even a compromised agent will have limited blast radius.&lt;/p&gt;

&lt;p&gt;Local Sandboxing Solutions. Not all organizations use Kubernetes or need cloud-scale multi-tenancy; for individual developers or on-prem deployment, there are lighter-weight sandbox solutions emerging. One notable project is ERA (by BinSquare), which provides a local sandbox for running AI-generated code with "microVM security guarantees plus containers ease of use". ERA uses technologies like krunvm (firecracker microVM runner) under the hood, orchestrated in a way that feels like using Docker containers. The idea is to give developers a quick way to test AI-written scripts safely on their laptop or CI pipeline, without having to set up full Kubernetes. Similarly, some frameworks allow using WebAssembly (Wasm) sandboxes for certain tasks (since Wasm can restrict file and network access for code running within it). The InfoQ article on sandboxing mentions Lightning AI's LitSandbox and a library called container-use as alternatives, which likely explore isolating Python execution or providing wrapper APIs that simulate a sandbox. While these are not yet as standardized as the Kubernetes Agent Sandbox, they indicate a broad interest in making sandboxing accessible across environments.&lt;/p&gt;

&lt;p&gt;Integration with Agent Frameworks. Modern agent frameworks are starting to build in assumptions about sandboxing. For example, LangChain (one of the earliest agent libraries) historically would just execute Python code or bash commands directly on the host, which is obviously dangerous in production. By late 2025, we see frameworks like LangGraph 1.0 (the evolution of LangChain's agent module) emphasizing "durable and safe" execution, and CrewAI (another open-source agent framework) adding features for asynchronous tool execution and monitoring to potentially plug into sandboxed runtimes. Microsoft's Agent Framework integrates with their Azure Foundry services, which likely means an agent's code execution can be routed to a managed sandbox (e.g. an isolated Azure Function or container instance) – in their blog they highlight "enterprise-grade deployment from the beginning", including security and compliance hooks. We also see new tools like Aspire's AI agent isolation module (by Microsoft) which aims to allow developers to run multiple agent instances in parallel without conflict, hinting at port isolation and MCP proxy layers. All these efforts point to execution isolation becoming a default part of agent system design. It's no longer assumed that an agent's code runs in the same process as the host application or with full OS privileges – instead, agents run in a contained, observable slot, much like how web browsers run untrusted JavaScript in a sandboxed process.&lt;/p&gt;

&lt;p&gt;Transactional and Fault-Tolerant Execution. A sophisticated angle to sandboxing is making execution fault-tolerant. If an agent's action fails or does something unwanted, can we roll it back? One recent research prototype, Fault-Tolerant Sandboxing for AI Coding Agents, introduced a transactional file system wrapper for agent execution. It intercepts file system writes and system changes during an agent's tool use, and if the agent misbehaves or a policy violation is detected, the sandbox can rollback to a clean snapshot. In their experiments, 100% of unsafe actions were intercepted and rolled back, at a cost of ~14.5% performance overhead. However, they note a key limitation: this works for local state (files, processes) but not for external side-effects. If the agent made a cloud API call that created resources or sent emails, a local rollback doesn't undo those. This is pushing the conversation toward distributed transaction semantics for agents – treating a sequence of tool API calls as a saga that might need compensating actions if aborted. While not solved yet, it's a recognized gap (researchers call for integrating compensating transactions for external tools to truly sandbox at the multi-system level). For now, sandboxing primarily ensures the agent's local environment can be reset to a safe state even if one step goes awry.&lt;/p&gt;

&lt;p&gt;Human Takeover and Hybrid Sandboxes. An intriguing development in sandbox design is support for human-in-the-loop interventions not just via yes/no approval prompts, but via full manual control of the sandbox. The idea is that if an agent reaches a step where it is stuck or needs privileged action (like entering a password or solving a tricky problem), a human operator can seamlessly take over the agent's sandbox session, do what's needed, and then hand control back to the AI. The research prototype AgentBay embodies this concept: it provides a unified isolated session that the AI agent can control via API (e.g. issuing OS commands, browser actions) and that a human can remote into graphically at any moment. AgentBay implements a custom Adaptive Streaming Protocol (ASP) to make this possible with very low latency. Unlike traditional screen sharing (RDP/VNC), ASP dynamically switches between sending high-level commands and video frames, adjusting to network conditions and whether the AI or human is currently in charge. The result is a much smoother experience for the human supervisor, even on weaker networks. In tests, allowing a human to intervene in AgentBay's sandbox improved task success rates by over 48% on complex benchmarks, showing the value of fluid HITL (Human-In-The-Loop) control. This approach directly addresses enterprise needs for control: rather than the agent being a black-box automation that might get stuck, it becomes a cooperative automation that an analyst or engineer can jump into whenever needed, without compromising the isolation or requiring the task to be restarted. We foresee future enterprise agent platforms offering a "panic button" or agent assist mode that spawns a secure VNC/Browser session for an operator, all actions logged, then closes back to autonomous mode.&lt;/p&gt;

&lt;p&gt;In summary, sandboxing in agent systems has evolved into a multi-faceted capability: it's not only about securing the environment (with VMs, syscall filters, network restrictions), but also about managing the agent's lifecycle and state (persistent storage, snapshots, warm pools) and facilitating controlled handoffs (pause/resume and human takeover). The investments by major players – e.g. Google building Agent Sandbox as a CNCF project – indicate that these sandboxing techniques will likely become standard infrastructure in cloud platforms. Just as Kubernetes gave us primitives for scalable microservices, we are now getting primitives for safe autonomous agent execution on the cloud and the edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Tool Ecosystem and Standardization: From Plugins to MCP
&lt;/h2&gt;

&lt;p&gt;In parallel with sandboxing the runtime, the industry has tackled the tool integration problem for agents. Early agent implementations often hard-coded a set of tools or required developers to write custom "plugin" adapters for each use case. This doesn't scale when enterprises might want agents to access dozens of internal APIs, databases, and third-party services. The last six months have seen a strong push toward standardizing how agents discover and use tools, yielding a more interoperable ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Model Context Protocol (MCP) and the AAIF
&lt;/h3&gt;

&lt;p&gt;Model Context Protocol (MCP) has emerged as the de facto standard protocol in this space. As mentioned, MCP defines a client-server schema where the AI agent (client) can list what tools a server offers, call those tools with JSON arguments, and receive results. It also covers things like authentication handshakes (e.g. OAuth flows to let an agent "login" to use a tool on a user's behalf) and streaming responses (for tools that send incremental results). By late 2025, MCP's momentum was cemented by the formation of the Agentic AI Foundation (AAIF) under the Linux Foundation. In December 2025, the Linux Foundation announced AAIF with MCP as a founding contribution alongside OpenAI's AGENTS.md and Block's Goose. The goal is to provide a neutral, open governance home for these agent standards so that no single company controls them. The AAIF launch PR notes MCP had already exploded in adoption: over 10,000 MCP servers published covering everything from dev tools to Fortune 500 internal integrations, and support built into major AI platforms including Claude, ChatGPT, GitHub Copilot, Google Gemini, VS Code, Cursor, and many others. This is remarkable considering MCP was only open-sourced in late 2024 – it resonated because it addressed an urgent pain point: without it, every AI vendor and every enterprise would be duplicating integrations. By rallying around MCP, the community effectively agreed on a "lingua franca" between agents and tools.&lt;/p&gt;

&lt;p&gt;From an enterprise perspective, MCP brings several benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interoperability: A tool (say a database query interface) can be implemented once as an MCP server and then used by different agents (Anthropic's, OpenAI's, self-hosted ones) without custom adapters. This has analogies to drivers or connectors in classical software – build it once, use anywhere.&lt;/li&gt;
&lt;li&gt;Security and Auditability: MCP messages are structured (JSON) and typically go through a client library in the agent runtime, where they can be logged and inspected. This makes it easier to audit what the agent asked a tool to do, as opposed to the agent running free-form shell commands that are hard to intercept. The protocol includes a capability advertisement step (the server tells what it can do), which can be checked against policies. It also often requires an auth handshake (e.g. OAuth) for the agent to gain access to the tool on behalf of a user, which means existing identity systems can mediate access.&lt;/li&gt;
&lt;li&gt;Modularity and Future-proofing: As InfoQ summarized, MCP shifts integration from a tangled web into a modular architecture, reducing the "plugin fatigue" problem and making it easier to add new tools or swap out models. It also levels the playing field – small open-source projects can publish MCP servers that become as easily usable as those from big vendors, fostering a community ecosystem of tools.&lt;/li&gt;
&lt;li&gt;Neutral Governance: With AAIF, companies like AWS, Google, Microsoft, Anthropic, and OpenAI are all at the same table (indeed all are listed as platinum members). This reduces the risk that MCP splinters into competing versions; it's likely to become analogous to HTML or SQL – a baseline standard that everyone implements, with maybe some extensions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's worth noting that MCP is evolving to cover more than just "traditional API calls." Recent extensions include Agent-to-Agent messaging (so an agent can expose itself as a tool to others via MCP) and binary data support (for image and file transfer). The AGENTS.md standard, also under AAIF, complements MCP by providing a way for software projects to declare to agents how to interact with them. AGENTS.md is essentially a README for AI agents, placed in a code repo to describe the project, its build/test tools, key contexts, and constraints. Over 60k open-source repos have adopted AGENTS.md to guide coding agents. By standardizing this, when an agent (like GitHub Copilot or Cursor) is working on a new codebase, it can automatically read AGENTS.md to understand the project's specific commands (e.g. how to run tests) rather than relying on general knowledge. This reduces errors and makes code-writing agents more reliable across different environments.&lt;/p&gt;

&lt;p&gt;MCP Tool Ecosystem. Many companies and open-source teams have published MCP servers for their systems. For instance, GitHub released an official GitHub MCP Server that exposes GitHub operations (issues, PRs, repo contents, etc.) via MCP. This allows an agent to perform GitHub actions (like creating an issue or commenting on a PR) in a safe way – the server enforces GitHub's API policies and scopes. Similarly, we have MCP servers for databases (SQL tools), cloud resources (AWS, Azure MCP servers), information lookups (Wikipedia, web search), and even OS-level tasks (there are MCP servers that wrap shell commands or Docker). A typical enterprise might run a suite of internal MCP servers: one for their ticketing system, one for their customer database, one for DevOps (Kubernetes control like the &lt;code&gt;mcp-server-kubernetes&lt;/code&gt; we saw). By doing so, they create a catalog of approved tools that their AI agents can use. Some companies are building MCP Gateways or registries to manage this catalog, which we'll discuss in the security section.&lt;/p&gt;

&lt;p&gt;Local-First and Offline Agents. While MCP often assumes a client (agent) connecting to a server over HTTP, it's flexible enough to work in "all local" scenarios too (using stdio pipes). The Goose framework (contributed by Block to AAIF) is described as a "local-first AI agent framework". Goose uses MCP for tool extensions – meaning you can run goose agents on your laptop, and they can spin up local MCP servers for local tools (say, accessing a local filesystem or application) without needing cloud connectivity. This is important for cases where data privacy requires everything to remain on-prem or on-device. It also means an enterprise could package up an agent + tool suite to run entirely in an isolated network (e.g. an AI agent that helps with internal network diagnostics, running in a secure enclave with no internet access, but with MCP hooking into internal systems). The push toward standardization via MCP doesn't imply centralization in the cloud – on the contrary, it can democratize who provides tools (open-source implementations, self-hosted services, etc.) as long as they speak the protocol.&lt;/p&gt;

&lt;p&gt;Beyond MCP: Other Standards. While MCP is currently the frontrunner, there are other noteworthy efforts. OpenAPI-based tool use: some agent frameworks allow importing any OpenAPI spec and will auto-generate an "agent tool" from it. For example, Microsoft's Agent Framework highlights that any REST API with an OpenAPI definition can be instantly turned into a tool, with the framework handling schema parsing and secure invocation. This is complementary to MCP: one could imagine MCP servers automatically exposing an OpenAPI, or vice versa. Another is the concept of capability description languages – OpenAI's Function Calling spec is one example, where the model is told function signatures and it outputs JSON for calls. Some researchers propose more formal schemas for tool affordances. At the moment, however, MCP seems to be converging those threads: it provides a structured way for an agent to query "what can I do?" and then invoke a function with arguments, which is essentially function calling over a channel. It's likely we'll see alignment or bridging between OpenAPI, JSON-RPC, and whatever else emerges, to avoid fragmenting this again.&lt;/p&gt;

&lt;p&gt;In essence, if sandboxing addresses the agent's "body," MCP addresses the agent's "arms and legs". It standardizes how the agent reaches out to interact with the world. This was a necessary step for agents to become truly useful in enterprise settings, because no single vendor can supply every integration. By lowering the integration barrier, companies can leverage a far broader set of tools. However, as we'll discuss next, giving an AI agent access to many tools also broadens the attack surface and governance burden – thus, standardization and security have to go hand in hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Security, Governance, and Trust in Agent Systems
&lt;/h2&gt;

&lt;p&gt;Deploying autonomous agents in an enterprise inherently raises the question: how do we trust them? Unlike a deterministic script, an AI agent can come up with unexpected actions, and it might be influenced by inputs (or adversaries) in ways we can't fully predict. Over the past months, a significant focus of both practitioners and researchers has been on closing the "trust gap" – ensuring that agents do what they're supposed to and nothing more, or at least that we can detect and mitigate when they misbehave. Several key themes have emerged: permission and policy models, supply chain security of tools, prompt injection defenses, auditing and observability, and fail-safe mechanisms. We'll examine each in turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 Prompt Injection and Confused Deputy Problems
&lt;/h3&gt;

&lt;p&gt;Prompt injection – where an external input is crafted to manipulate the agent's LLM into ignoring its instructions or performing unintended actions – has proven to be a very real threat. In the context of agent tools, prompt injection can become a "confused deputy" attack: the LLM is the deputy that has privileges (access to tools) and the attacker exploits it via crafted input (a prompt) to misuse those privileges. A simple example: an attacker might embed a malicious command in a user-provided email, which the agent then dutifully executes with its shell tool. Real incidents and proofs-of-concept have shown this is not just theoretical. The consensus in discussions (e.g. on Hacker News) is that prompt injection is analogous to XSS (cross-site scripting) in web apps – you cannot fully eliminate it just by sanitizing inputs, because the model's behavior with arbitrary text is hard to constrain. Thus, relying solely on prompt-based safeguards (like "don't execute if user says to do something bad") is brittle.&lt;/p&gt;

&lt;p&gt;The more robust approach is structural: limit what the agent can do even if it's tricked. This means enforcing policy at the tool invocation layer. For instance, if the agent tries to run a shell command, have a policy that disallows &lt;code&gt;rm -rf&lt;/code&gt; or network calls to sensitive endpoints. If it uses a database tool, ensure it cannot query tables it shouldn't. This is where sandboxing and permission models overlap. In a sandbox, you can intercept system calls – e.g. prevent file writes outside a certain directory, or limit network access to only whitelisted domains. With MCP, you can implement an allow-deny policy per tool – e.g. forbid a certain combination of API calls or detect if the arguments look suspicious (like a SQL query that's dumping all user data).&lt;/p&gt;

&lt;p&gt;One concrete advancement is the research AgentBound framework, which proposes attaching a declarative access control policy to MCP servers. Inspired by Android's app permissions, AgentBound allows a tool to declare what host resources it needs (files, network targets, etc.), and an admin can approve or limit those. At runtime, an enforcement engine monitors the agent's calls and blocks anything outside the allowed scope. Impressively, AgentBound's evaluation auto-generated policies for 296 popular MCP servers with about 80.9% accuracy from the code, and could block the majority of malicious actions with negligible overhead. This suggests that intelligent tooling can help manage the policy burden: we can analyze a tool's code to infer "this tool should only ever need to access X API or Y file", then use that as a sandbox rule.&lt;/p&gt;

&lt;p&gt;Another line of defense is schema validation. Many tools expect inputs of a certain form (JSON with specific fields, numbers in ranges, etc.). If the agent's output deviates, it can indicate either a prompt injection or a model error. Rigorously validating the agent's action format before executing it can catch some attacks or mistakes. In fact, OWASP's recommendation of command validation falls here – e.g. if an agent tries to execute &lt;code&gt;sudo rm -rf /&lt;/code&gt;, the sandbox or tool wrapper should detect that and refuse.&lt;/p&gt;

&lt;p&gt;It's widely acknowledged that prompt injection cannot be fully solved at the model level, so enterprise systems are layering these runtime controls. Some are even exploring two-model setups: one model generates a plan or interprets user input without any tools (and thus with no privileges), then a separate "execution model" with tools enabled but a much more constrained input (only the sanitized plan). This is analogous to separating policy decision and policy enforcement. However, this approach is in its infancy – researchers have noted it's tricky to ensure the two models stay in sync and that the first model doesn't inadvertently become a covert channel for bad instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Tool Supply Chain Security
&lt;/h3&gt;

&lt;p&gt;As the MCP tool ecosystem grows, a new class of security concerns appears: the tools themselves may have vulnerabilities or could be malicious. We've effectively extended our "attack surface" to any code that implements a tool API. In July 2025, security researchers disclosed critical flaws in some community-developed MCP servers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The MCP Server for Kubernetes (an MCP tool that allowed agents to run &lt;code&gt;kubectl&lt;/code&gt; commands on a cluster) had a command injection flaw. It constructed shell commands from user input without sanitization, so an attacker could embed &lt;code&gt;|&lt;/code&gt; or &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; to execute arbitrary commands on the host. Not only that, the advisory demonstrated a prompt injection chain: if an agent was asked to read a pod's logs (which contained malicious instructions), the agent might then call a vulnerable &lt;code&gt;kubectl&lt;/code&gt; tool with those instructions, leading to RCE (Remote Code Execution) on the MCP server host. This is a vivid example of how an innocuous high-level task (read logs) can cascade into a full compromise via weaknesses in the tool implementation. It underscores that agent security is only as strong as the weakest tool in its arsenal.&lt;/li&gt;
&lt;li&gt;Another advisory for mcp-package-docs (a tool for reading package documentation) had a similar shell injection issue. Essentially, many early tools naively used &lt;code&gt;exec()&lt;/code&gt; on strings, a practice long known to be dangerous in any software context.&lt;/li&gt;
&lt;li&gt;The AI coding assistant Cursor found an even more subtle exploit: an agent could be tricked into writing a malicious MCP server configuration to disk (effectively "installing" a new tool) which would then be loaded and executed, giving the attacker code execution on the system. In response, Cursor had to forbid agents from writing to certain config directories.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These incidents highlight supply chain risk: when you install an MCP server from NPM or pip, do you know it's safe? Could it have a dependency hijacked to steal data? Traditional supply chain best practices – code signing, vetting maintainers, vulnerability scanning – all apply here. But additionally, the dynamic nature of agent tool use requires new thinking. For example, an agent might fetch a tool definition (schema) from somewhere at runtime – that channel could be compromised (a malicious tool listing that lies about what it does). To address this, the community is discussing tool registries with verification. Imagine an "App Store" for MCP tools where each tool is reviewed, sandboxed, and cryptographically signed. The Linux Foundation AAIF might play a role in hosting a global registry, or there may be vendor-specific ones.&lt;/p&gt;

&lt;p&gt;Some researchers call for transparency logs and a "SBOM" (Software Bill of Materials) approach for agent tools. For instance, an enterprise might want a log of every tool version the agent ever used, so if one is later found malicious they can audit past agent runs. They also want assurance that the tool code running is exactly the code that was audited. This is akin to how modern browsers handle extensions: with strict signing and review processes.&lt;/p&gt;

&lt;p&gt;On the defense side, one idea is dynamic tool vetting – before an agent uses a new tool, run that tool in a test mode on known benign inputs to see if it behaves correctly, or run it in a shadow sandbox with instrumented monitoring to detect unexpected actions. This is analogous to how app stores do a review, but potentially automated and at runtime. For now, this is an open research problem; we haven't seen full implementations yet, but it's identified in literature as a needed control.&lt;/p&gt;

&lt;p&gt;In summary, securing the tool ecosystem requires both preventive measures (secure coding practices for tool developers, automated scans for dangerous patterns like &lt;code&gt;execSync&lt;/code&gt; on inputs) and mitigations (running tools with least privilege, e.g. a tool that only needs to read a database should not also have OS write access). The principle of least privilege should apply at every level: the agent only has access to certain tools, the tool only has access to certain system resources. Achieving this in practice means plumbing through the user's identity and intent: e.g., if an agent is acting on behalf of Alice, the database tool should run under Alice's credentials or a role with her permissions, not a superuser. This is an area where enterprise IAM (Identity and Access Management) integration is critical – mapping the human user's identity to the agent's allowed actions. Recent work is exploring how to tie enterprise SSO/OAuth tokens into agent sessions in a fine-grained way, so that an agent cannot escalate its privileges beyond what the user would normally have through regular apps.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 Monitoring, Auditing, and Policy Enforcement
&lt;/h3&gt;

&lt;p&gt;Observability is notoriously difficult for AI systems because of their nondeterminism and unstructured outputs. But for agents, observability is non-negotiable in enterprise settings. Operators need to be able to ask: "What sequence of steps did the agent take? Why did it take a certain action? What tool calls were made with what parameters? Did anything unusual happen?" To that end, agent platforms are incorporating extensive logging and tracing capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured Traces: There's a push to use standards like OpenTelemetry to trace agent execution like any microservice call graph. Each agent action (e.g. "called Tool X with params Y, got result Z") can be a span in a trace. This allows using existing APM (Application Performance Monitoring) tools to visualize agent workflows. Some commercial platforms now show a real-time step-by-step trace of the agent's reasoning and tool use (often known as an "Agent console" or debug pane).&lt;/li&gt;
&lt;li&gt;Semantic Logging: Beyond raw tool call logs, there's interest in capturing higher-level events. For example, flag if an agent's plan changed drastically mid-execution (could indicate it got confused or was manipulated), or if it requested an unusually large amount of data from a tool. Logging the content of prompts and responses is tricky (for privacy reasons), but logging the intents and outcomes is feasible. Additionally, cryptographic logging (hash chaining the logs) has been suggested so that forensic analysis can trust that logs weren't tampered with.&lt;/li&gt;
&lt;li&gt;Auditing for Compliance: In sectors like finance or healthcare, any automated system needs audit trails for compliance. If an agent made a change to a customer's record, we need to know who/what prompted that and that it was authorized. Solutions here include linking agent actions to a user session and storing that context (e.g. "Agent acted on behalf of Alice, in response to request R, at time T"). Some enterprises restrict certain tools to manual-confirmation mode where a human must approve the agent's action in a dashboard (common for things like executing a trade or sending an email). Ensuring the agent properly presents the action for approval (and doesn't hide the true intent) is an active UX/security challenge.&lt;/li&gt;
&lt;li&gt;Policy Engines: Enterprises are beginning to employ policy-as-code systems (like Open Policy Agent or custom rule engines) to govern agent behavior. For example, a policy might be: "Agents cannot call the production database tool with a WHERE clause missing a limit, unless the user is in admin role." When an agent attempts such a query, the policy engine can intercept and either block it or route it for approval. This ties into MCP Gateway architectures, where instead of the agent connecting directly to tool servers, it connects to a Gateway proxy that mediates all calls. Microsoft's preview of an MCP Gateway shows features like session persistence (to keep agent-tool sessions sticky) and a central place to enforce auth, rate limiting, etc. We can foresee these gateways becoming very sophisticated, implementing org-wide guardrails (e.g. no agent can call external web APIs that are not in a vetted list, to prevent data exfiltration).&lt;/li&gt;
&lt;li&gt;Evaluation and Testing: An emerging practice is to treat agents like code and develop evaluation suites for them. Before deploying an agent update (new model version or new tool), run a battery of scenarios (some normal, some adversarial) to see how it behaves. In late 2025, multiple benchmarks for agent safety were released to facilitate this. The MCP-SafetyBench is one such benchmark: it tests LLM agents on realistic multi-step tasks across five domains (web browsing, financial analysis, code repo management, navigation, and web search) while injecting 20 types of attacks (from prompt tampering to tool output manipulation). The sobering result: no current model is remotely immune to MCP-based attacks – even top-tier models had 30–48% of tasks compromised. They also found a negative correlation between task performance and security: models that are more capable at completing tasks also tend to be more exploitable, presumably because they more eagerly follow any instruction including malicious ones. This points to a fundamental safety-utility trade-off. Enterprises must calibrate how "aggressive" or autonomous they want the agent to be. Some are introducing adjustable risk settings – e.g. a slider from conservative (fewer tools, more confirmations) to aggressive (full autonomy, high risk). A metric called NRP (Normalized Risk-Performance) was proposed to quantify this balance. Ultimately, continuous evaluation will be key: as new attacks are discovered, adding them to test suites and ensuring the agent (with all its tools and policies) can handle or resist them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4.4 Identity, Authentication, and Governance
&lt;/h3&gt;

&lt;p&gt;A less glamorous but absolutely crucial aspect is identity and access management (IAM) for agents. When an agent performs an action, whose authority is it under? In a multi-user environment (say an AI assistant in a company), the agent might have to act as different users at different times. Traditional OAuth wasn't designed for a scenario where an LLM is effectively a headless client acting interactively on behalf of a user. Over the past months, developers have hit practical snags integrating OAuth with MCP. For example, the OAuth Dynamic Client Registration used by MCP (so an agent can automatically register itself to use an API) sometimes fails with enterprise IdPs due to strict URL checks. Some IdPs don't allow dynamic clients at all. There are calls to allow static client credentials or out-of-band provisioning for agents in such cases. This is more of a standards gap than a research one – it's being worked through in the MCP working group.&lt;/p&gt;

&lt;p&gt;From an enterprise architecture view, many want the agent to integrate with existing SSO. That means when an employee invokes an agent, the agent should use that employee's OAuth token to access tools. This ensures all actions are attributable and within the user's permissions. It's straightforward for some tools (like an MCP server can simply require a token from the agent), but complex for others (e.g. a shell tool on a server – how to scope that per user?). Some solutions involve impersonation tokens or scoped API keys: e.g. the agent might have a key that only allows certain operations and is tagged to the user.&lt;/p&gt;

&lt;p&gt;The concept of "least privilege" comes into sharp focus here: the agent should only have the minimum access needed for the task, and ideally only for the duration needed. Techniques like OAuth token exchange or short-lived credentials are recommended. If an agent is spun up to do a build job, give it a temporary token that expires after, so even if it went rogue, it couldn't do damage later. One recent architecture paper emphasizes integrating enterprise identity with these agents so that all actions flow through the normal IAM checks and logs of the enterprise. That means, for instance, an agent using a Jira tool would appear in the Jira audit logs as "actions performed via AI agent on behalf of Bob". This transparency is needed for trust – people won't use the agent if it's a black box doing things in the shadows.&lt;/p&gt;

&lt;p&gt;Governance also extends to deciding which tasks to automate vs require human approval, what data agents are allowed to see, and how to prevent data leakage. Some enterprises restrict agents from accessing production data entirely, using them only on sanitized or test datasets until trust is built. Others put heavy monitoring on outputs (e.g. scanning everything the agent is about to output to a user for sensitive data). These are areas where data loss prevention (DLP) tools intersect with AI. A future vision is that an enterprise agent platform will integrate DLP classifiers that flag if an agent's response likely contains company confidential info, and either redact it or alert a human.&lt;/p&gt;

&lt;p&gt;Finally, we must mention user trust and adoption: beyond technical measures, building trust in agents involves user education and incremental rollout. Many organizations start with "read-only" agents (they can suggest actions but not execute them) and then gradually allow more autonomy as confidence grows. By having robust logs and a clear override path, users are more likely to accept the agent's help. Trust is also enhanced by making the agent's reasoning visible (hence the popularity of chain-of-thought traces displayed to users) and by giving users easy ways to correct or stop the agent. In essence, transparency and control are the antidotes to the unpredictability of AI.&lt;/p&gt;

&lt;p&gt;The advancements in the last half-year – from sandbox isolation to protocol standardization and new benchmarks – all aim to shrink the trust gap. Yet, open challenges remain (discussed in the next section) before one can confidently say an autonomous agent is as well-understood and controlled as a traditional software microservice.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Open Challenges and Future Directions
&lt;/h2&gt;

&lt;p&gt;Despite rapid progress, enterprise agent systems still have unsolved research questions and practical gaps. We conclude by highlighting some of the most pressing ones, as identified by recent discussions and publications, which represent opportunities for future work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unified Cross-Layer Security Model: Today we have pieces – OAuth for identity, MCP scopes for tool access, sandbox for OS isolation – but they don't always speak the same language. There is no single policy that says, for example, "User X's agent can read from database Y but not write, and can run code but only use 2 CPU and no internet, and these conditions are cryptographically verified." A comprehensive model that ties user identity, agent capabilities, tool permissions, and sandbox OS permissions into one coherent framework is needed. Early proposals like AgentBound (inspired by mobile app permissions) are a start. In the future, we might see capability tokens that encode all these at once – the agent carries a token which the sandbox and tools all check, limiting what it can do in each context. Formal verification of such models (to prove an agent cannot do X) would greatly enhance trust.&lt;/li&gt;
&lt;li&gt;Rollback of External Side Effects: As noted, while we can rollback filesystem changes in a sandbox, we cannot yet rollback an email sent or a transaction made. Developing agent transaction protocols or sagas is an open challenge. One idea is to require critical tools to provide a compensation function – e.g. an MCP server for cloud VMs could have an "undo" for creating a VM (which would delete it). An agent planner could then use these to revert a series of actions if needed. This also ties into training the LLM or using a secondary verifier to decide when to rollback (e.g. if it notices an outcome diverges from expected state). Without solving this, enterprises will be hesitant to let agents perform irreversible operations autonomously.&lt;/li&gt;
&lt;li&gt;Advanced Threat Defenses: The taxonomy of potential attacks (context injection, tool poisoning, cross-tool data leaks, etc.) is growing. Defenses like context signing (cryptographically signing tool outputs or important prompts to prevent tampering) have been suggested but not widely implemented. The idea there is: an agent would only trust tool outputs that come with a signature or hash, so an attacker who intercepts or modifies the content (like a man-in-the-middle on an HTTP tool) would fail. Similarly, isolating tools from each other (so one tool can't directly influence another except through the agent's vetted reasoning) is a challenge – currently the agent's memory is the meeting point of all tool data, making it a melting pot where a malicious output in one tool can affect decisions involving another.&lt;/li&gt;
&lt;li&gt;Benchmarking and Standards for Evaluation: The community has started benchmarks like MCP-SafetyBench and MSB, but we need continuous evaluation pipelines. Perhaps an open leaderboard where agent developers can submit their agent (with a certain set of tools and policies) to be evaluated against a suite of scenarios, similar to how language models are benchmarked on GLUE or SuperGLUE for NLP. This could drive competition and improvement in safety. Also, evaluation should include cost and latency metrics – an agent that is safe but takes hours or $$$ to complete a task isn't practical. Balancing efficiency with safety will likely lead to innovations like adaptive risk modes (the agent switches to a more cautious approach if it senses something sensitive, trading speed for safety dynamically).&lt;/li&gt;
&lt;li&gt;Human-Agent Interaction Paradigms: AgentBay's approach to HITL is one example of making agents more usable in the real world. There is still work to do on when and how an agent should ask for help. If it asks too often, it's not useful; if it asks too rarely, it might make an irrecoverable error. Finding that sweet spot (perhaps through reinforcement learning or feedback from users) is an ongoing area. Also, UI/UX research into how to present agent decisions to users in a clear way will be important (so users can confidently approve or deny actions). In enterprises, this might mean integrating agent controls into existing interfaces – e.g. showing an "AI agent suggestion" in a Jira ticket with a one-click approve.&lt;/li&gt;
&lt;li&gt;Cross-Organization Collaboration and Data Sharing: Enterprise agents often need to work across silos – e.g. an agent might coordinate between a supplier's system and the company's internal system. This raises questions of federated trust: how do you let an agent use two domains' tools in a secure way? This touches on things like standardizing how agents convey identity across org boundaries, and how audit logs are shared. The AAIF being under Linux Foundation hints at future inter-company standards to address this, since agents won't stop at the corporate firewall.&lt;/li&gt;
&lt;li&gt;Ethical and Compliance Considerations: Beyond security, enterprises must ensure agents comply with regulations and ethical norms. For example, if an agent interacts with personal data, privacy laws apply. How do we audit that an agent didn't retain or leak personal data beyond allowed purposes? Techniques like data tagging and tracking could be employed – marking certain outputs as containing sensitive info and preventing them from being used in contexts that aren't allowed. Ensuring AI explanations for decisions (especially if used in regulated domains) is another angle – if an agent makes a decision that affects a customer, one might need a rationale logged for compliance, which is tricky given the opaque reasoning of LLMs.&lt;/li&gt;
&lt;li&gt;Improving Model Robustness: Finally, at the heart is the LLM itself. There's ongoing research into fine-tuning models to be more resistant to manipulation (advantageous to safety but often at odds with capability). Techniques like constitutional AI or adversarial training on tool-use scenarios might yield models that inherently refuse certain dangerous actions or at least flag uncertainty. Also, specialized models for parsing and validating the agent's outputs (e.g. a secondary model that checks if a proposed action seems safe/rational) could be integrated. OpenAI and others are exploring "moderator" models that look at the main model's outputs. In agents, a "policy model" might examine the plan and tool uses and raise red flags for anything that violates training-time learned safe patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Outlook: The next year will likely bring a maturation of the agent ecosystem akin to what 2010-2015 saw for cloud microservices – an explosion of tools and best practices to handle deployment, security, monitoring, and standardization. The formation of AAIF is a strong indicator that industry players see collaboration as the way forward; no one wants a fragmented, Wild West environment when so much is at stake (both in terms of safety and potential business value). We will probably see AgentOps teams emerge in organizations, analogous to MLOps, focused on managing and supervising fleets of agents. They'll use dashboards (like GitHub's Agent HQ mission control) to oversee agent activities across the enterprise. And just as DevOps developed guardrails and CI/CD for code, AgentOps will develop guardrails and continuous evaluation for autonomous AI behaviors.&lt;/p&gt;

&lt;p&gt;In conclusion, enterprise agent systems are transitioning from the lab to the real world, carrying with them both excitement (unprecedented automation capabilities) and caution (novel failure modes). Sandbox architectures and protocols like MCP have laid a foundation that makes these systems more modular, controllable, and interoperable than before. Yet, achieving a level of trust comparable to traditional software will require continued innovation in permission modeling, verification, and human oversight integration. The last half-year's progress has been remarkable – what was mostly sci-fi a year ago (multiple AIs collaborating on complex tasks with minimal human input) is now demonstrably feasible. The coming months will likely see pilots turn into production deployments in enterprises, each teaching new lessons. By actively sharing these lessons and converging on open standards and benchmarks, the community can accelerate the safe adoption of agentic AI. The end goal is an ecosystem where AI agents become reliable teammates – tirelessly automating drudgery and navigating complexity – while humans retain ultimate control and understanding of their behavior. The path to get there is challenging, but as this survey shows, the groundwork is rapidly being put in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.infoq.com/news/2025/12/agent-sandbox-kubernetes/" rel="noopener noreferrer"&gt;Open-Source Agent Sandbox Enables Secure Deployment of AI Agents on Kubernetes&lt;/a&gt; - InfoQ News on Agent Sandbox, gVisor/Kata isolation, CRD for stateful agents, OWASP Top 10 for AI Agents&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://techinformed.com/google-launches-agent-sandbox-for-secure-ai-agents-on-kubernetes/" rel="noopener noreferrer"&gt;Google launches Agent Sandbox for secure AI agents on Kubernetes&lt;/a&gt; - TechInformed on gVisor isolation, pre-warmed pools (90% faster startups), Pod Snapshots&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation" rel="noopener noreferrer"&gt;Linux Foundation Announces Formation of Agentic AI Foundation (AAIF)&lt;/a&gt; - MCP, Goose, AGENTS.md contributions; cross-industry support&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.infoq.com/articles/mcp-connector-for-building-smarter-modular-ai-agents/" rel="noopener noreferrer"&gt;MCP: The Universal Connector for Building Smarter, Modular AI Agents&lt;/a&gt; - InfoQ on MCP benefits (M×N to M+N integration, interoperability)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://devblogs.microsoft.com/foundry/introducing-microsoft-agent-framework-the-open-source-engine-for-agentic-ai-apps/" rel="noopener noreferrer"&gt;Introducing Microsoft Agent Framework&lt;/a&gt; - Microsoft Foundry Blog on open standards (MCP, A2A, OpenAPI) and enterprise readiness&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://devblogs.microsoft.com/foundry/whats-new-in-microsoft-foundry-oct-nov-2025/" rel="noopener noreferrer"&gt;What's new in Microsoft Foundry (Oct/Nov 2025)&lt;/a&gt; - Microsoft Agent Framework updates&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.infoworld.com/article/4080888/github-launches-agent-hq-to-bring-order-to-ai-powered-coding.html" rel="noopener noreferrer"&gt;GitHub launches Agent HQ for AI-powered coding&lt;/a&gt; - InfoWorld on managing multiple coding agents with governance, audit, and mission control&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/advisories/GHSA-gjv4-ghm7-q58q" rel="noopener noreferrer"&gt;CVE-2025-53355: mcp-server-kubernetes command injection vulnerability&lt;/a&gt; - GitHub Advisory on unsanitized execSync and prompt-injection exploit via pod logs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2510.21236" rel="noopener noreferrer"&gt;Securing AI Agent Execution (arXiv:2510.21236)&lt;/a&gt; - Bühler et al. 2025: AgentBound permission framework for MCP tools, auto-policy generation (~80% accuracy)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2512.04367" rel="noopener noreferrer"&gt;AgentBay: A Hybrid Interaction Sandbox (arXiv:2512.04367)&lt;/a&gt; - Piao et al. 2025: unified sandbox with AI API control + live human takeover (48% higher task success with HITL)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openreview.net/pdf/d8cae2e9cc3facabfe822f031acdbe043046f70f.pdf" rel="noopener noreferrer"&gt;MCP-SafetyBench (OpenReview)&lt;/a&gt; - Lan et al. 2025: real MCP server benchmark, 30–48% attack success on tested LLMs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/@adnanmasood/model-context-protocol-mcp-attacks-threats-taxonomy-and-defenses-for-tool-using-llms-de65fbffedd3" rel="noopener noreferrer"&gt;MCP Attacks: Threats, Taxonomy, and Defenses&lt;/a&gt; - Adnan Masood on threat taxonomy for tool-using LLMs&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>sandbox</category>
      <category>ai</category>
    </item>
    <item>
      <title>eBPF Tutorial: Extending Kernel Subsystems with BPF struct_ops</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 27 Jan 2026 07:20:44 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-extending-kernel-subsystems-with-bpf-structops-2n3g</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-extending-kernel-subsystems-with-bpf-structops-2n3g</guid>
      <description>&lt;p&gt;Have you ever wanted to extend kernel behavior—like adding a custom scheduler, network protocol, or security policy—but were put off by the complexity of writing and maintaining a full kernel module? What if you could define the logic directly in eBPF, with dynamic updates, safe execution, and programmable control, all without recompiling the kernel or risking system stability?&lt;/p&gt;

&lt;p&gt;This is the power of &lt;strong&gt;BPF struct_ops&lt;/strong&gt;. This advanced eBPF feature allows BPF programs to implement the callbacks of a kernel operations structure, effectively letting you "plug in" custom logic to extend kernel subsystems. It goes beyond simple tracing or filtering—you can now implement core kernel operations in BPF. For example, we use it to implement GPU scheduling and memory offloading extensions in GPU drivers (see &lt;a href="https://lpc.events/event/19/contributions/2168/" rel="noopener noreferrer"&gt;LPC 2024 talk&lt;/a&gt; and &lt;a href="https://github.com/eunomia-bpf/gpu_ext" rel="noopener noreferrer"&gt;gpu_ext project&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;In this tutorial, we will explore how to use &lt;code&gt;struct_ops&lt;/code&gt; to dynamically extend kernel subsystem behavior. We won't be using the common TCP congestion control example. Instead, we'll take a more fundamental approach that mirrors the extensibility seen with kfuncs. We will create a custom kernel module that defines a new, simple subsystem with a set of operations. This module will act as a placeholder, creating new attachment points for our BPF programs. Then, we will write a BPF program to implement the logic for these operations. This demonstrates a powerful pattern: using a minimal kernel module to expose a &lt;code&gt;struct_ops&lt;/code&gt; interface, and then using BPF to provide the full, complex implementation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The complete source code for this tutorial can be found here: &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/struct_ops" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/struct_ops&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction to BPF struct_ops: Programmable Kernel Subsystems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Challenge: Extending Kernel Behavior Safely and Dynamically
&lt;/h3&gt;

&lt;p&gt;Traditionally, adding new functionality to the Linux kernel, such as a new file system, a network protocol, or a scheduler algorithm, requires writing a kernel module. While powerful, kernel modules come with significant challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complexity:&lt;/strong&gt; Kernel development has a steep learning curve and requires a deep understanding of kernel internals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety:&lt;/strong&gt; A bug in a kernel module can easily crash the entire system. There are no sandboxing guarantees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance:&lt;/strong&gt; Kernel modules must be maintained and recompiled for different kernel versions, creating a tight coupling with the kernel's internal APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;eBPF has traditionally addressed these issues for tracing, networking, and security by providing a safe, sandboxed environment. However, most eBPF programs are attached to existing hooks (like tracepoints, kprobes, or XDP) and react to events. They don't typically &lt;em&gt;implement&lt;/em&gt; the core logic of a kernel subsystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Implementing Kernel Operations with BPF
&lt;/h3&gt;

&lt;p&gt;BPF &lt;code&gt;struct_ops&lt;/code&gt; bridges this gap. It allows a BPF program to implement the functions within a &lt;code&gt;struct_ops&lt;/code&gt;—a common pattern in the kernel where a structure holds function pointers for a set of operations. Instead of these pointers pointing to functions compiled into the kernel or a module, they can point to BPF programs.&lt;/p&gt;

&lt;p&gt;This is a paradigm shift. It's no longer just about observing or filtering; it's about &lt;em&gt;implementing&lt;/em&gt;. Imagine a kernel subsystem that defines a set of operations like &lt;code&gt;open&lt;/code&gt;, &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;. With &lt;code&gt;struct_ops&lt;/code&gt;, you can write BPF programs that serve as the implementation for these very functions.&lt;/p&gt;

&lt;p&gt;This approach is similar in spirit to how &lt;strong&gt;kfuncs&lt;/strong&gt; allow developers to extend the capabilities of BPF. With kfuncs, we can add custom helper functions to the BPF runtime by defining them in a kernel module. With &lt;code&gt;struct_ops&lt;/code&gt;, we take this a step further: we define a whole new &lt;em&gt;set of attach points&lt;/em&gt; for BPF programs, effectively creating a custom, BPF-programmable subsystem within the kernel.&lt;/p&gt;

&lt;p&gt;The benefits are immense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Implementation&lt;/strong&gt;: You can load, update, and unload the BPF programs implementing the subsystem logic on the fly, without restarting the kernel or the application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety&lt;/strong&gt;: The BPF verifier ensures that the BPF programs are safe to run, preventing common pitfalls like infinite loops, out-of-bounds memory access, and system crashes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: The logic is in the BPF program, which can be developed and updated independently of the kernel module that defines the &lt;code&gt;struct_ops&lt;/code&gt; interface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Programmability&lt;/strong&gt;: Userspace applications can interact with and control the BPF programs, allowing for dynamic configuration and control of the kernel subsystem's behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this tutorial, we will walk through a practical example of this pattern. We'll start with a kernel module that defines a new &lt;code&gt;struct_ops&lt;/code&gt; type, and then we'll write a BPF program to implement its functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Kernel Module: Defining the Subsystem Interface
&lt;/h2&gt;

&lt;p&gt;The first step is to create a kernel module that defines our new BPF-programmable subsystem. This module doesn't need to contain much logic itself. Its primary role is to define a &lt;code&gt;struct_ops&lt;/code&gt; type and register it with the kernel, creating a new attachment point for BPF programs. It also provides a mechanism to trigger the operations, which in our case will be a simple proc file.&lt;/p&gt;

&lt;p&gt;This approach is powerful because it separates the interface definition (in the kernel module) from the implementation (in the BPF program). The kernel module is stable and minimal, while the complex, dynamic logic resides in the BPF program, which can be updated at any time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Kernel Module: &lt;code&gt;module/hello.c&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Here is the complete source code for our kernel module. It defines a &lt;code&gt;struct_ops&lt;/code&gt; named &lt;code&gt;bpf_testmod_ops&lt;/code&gt; with three distinct operations that our BPF program will later implement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/init.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/module.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/kernel.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/bpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/btf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/btf_ids.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/proc_fs.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/seq_file.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;linux/bpf_verifier.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cm"&gt;/* Define our custom struct_ops operations */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="cm"&gt;/* Global instance that BPF programs will implement */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="n"&gt;__rcu&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testmod_ops&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* Proc file to trigger the struct_ops */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;proc_dir_entry&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;trigger_file&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* CFI stub functions - required for struct_ops */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops__test_1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops__test_2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops__test_3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* CFI stubs structure */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="n"&gt;__bpf_ops_bpf_testmod_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops__test_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops__test_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops__test_3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="cm"&gt;/* BTF and verifier callbacks */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;btf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;btf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* Initialize BTF if needed */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;bool&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops_is_valid_access&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;off&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;bpf_access_type&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_prog&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_insn_access_aux&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* Allow all accesses for now */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Allow specific BPF helpers to be used in struct_ops programs */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_func_proto&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="nf"&gt;bpf_testmod_ops_get_func_proto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;bpf_func_id&lt;/span&gt; &lt;span class="n"&gt;func_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_prog&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* Use base func proto which includes trace_printk and other basic helpers */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bpf_base_func_proto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_verifier_ops&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_verifier_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_valid_access&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_is_valid_access&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_proto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_get_func_proto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops_init_member&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;btf_type&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;btf_member&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;member&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;kdata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;udata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* No special member initialization needed */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Registration function */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops_reg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;kdata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kdata&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Only one instance at a time */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmpxchg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;testmod_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;EEXIST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bpf_testmod_ops registered&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Unregistration function */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;bpf_testmod_ops_unreg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;kdata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kdata&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmpxchg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;testmod_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;pr_warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bpf_testmod_ops: unexpected unreg&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bpf_testmod_ops unregistered&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Struct ops definition */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_struct_ops&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_struct_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verifier_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_verifier_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init_member&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_init_member&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_reg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unreg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_unreg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cfi_stubs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;__bpf_ops_bpf_testmod_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"bpf_testmod_ops"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;owner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;THIS_MODULE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="cm"&gt;/* Proc file write handler to trigger struct_ops */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;ssize_t&lt;/span&gt; &lt;span class="nf"&gt;trigger_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;__user&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loff_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;kbuf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kbuf&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kbuf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;copy_from_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kbuf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;EFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;kbuf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;rcu_read_lock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rcu_dereference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;testmod_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calling struct_ops callbacks:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"test_1() returned: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"test_2(10, 20) returned: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kbuf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"test_3() called with buffer&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"No struct_ops registered&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;rcu_read_unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;proc_ops&lt;/span&gt; &lt;span class="n"&gt;trigger_proc_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proc_write&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trigger_write&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;__init&lt;/span&gt; &lt;span class="nf"&gt;testmod_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Register the struct_ops */&lt;/span&gt;
    &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;register_bpf_struct_ops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_ops_struct_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;pr_err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to register struct_ops: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Create proc file for triggering */&lt;/span&gt;
    &lt;span class="n"&gt;trigger_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;proc_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bpf_testmod_trigger"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mo"&gt;0222&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;trigger_proc_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;trigger_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="cm"&gt;/* Note: No unregister function available in this kernel version */&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ENOMEM&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bpf_testmod loaded with struct_ops support&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;__exit&lt;/span&gt; &lt;span class="nf"&gt;testmod_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;proc_remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger_file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="cm"&gt;/* Note: struct_ops unregister happens automatically on module unload */&lt;/span&gt;
    &lt;span class="n"&gt;pr_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bpf_testmod unloaded&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;module_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;testmod_init&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;module_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;testmod_exit&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;MODULE_LICENSE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;MODULE_AUTHOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"eBPF Example"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;MODULE_DESCRIPTION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"BPF struct_ops test module"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;MODULE_VERSION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the Kernel Module Code
&lt;/h3&gt;

&lt;p&gt;This module may seem complex, but its structure is logical and serves a clear purpose: to safely expose a new programmable interface to the BPF subsystem. Let's break it down.&lt;/p&gt;

&lt;p&gt;First, we define the structure of our new operations. This is a simple C struct containing function pointers. This &lt;code&gt;struct bpf_testmod_ops&lt;/code&gt; is the interface that our BPF program will implement. Each function pointer defines a "slot" that a BPF program can fill.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we have the core &lt;code&gt;bpf_struct_ops&lt;/code&gt; definition. This is a special kernel structure that describes our new &lt;code&gt;struct_ops&lt;/code&gt; type to the BPF system. It's the glue that connects our custom &lt;code&gt;bpf_testmod_ops&lt;/code&gt; to the BPF infrastructure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_struct_ops&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_struct_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verifier_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_verifier_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init_member&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_init_member&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_reg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unreg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_unreg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cfi_stubs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;__bpf_ops_bpf_testmod_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"bpf_testmod_ops"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;owner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;THIS_MODULE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure is filled with callbacks that the kernel will use to manage our &lt;code&gt;struct_ops&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.reg&lt;/code&gt; and &lt;code&gt;.unreg&lt;/code&gt;: These are registration and unregistration callbacks. The kernel invokes &lt;code&gt;.reg&lt;/code&gt; when a BPF program tries to attach an implementation for &lt;code&gt;bpf_testmod_ops&lt;/code&gt;. Our implementation uses &lt;code&gt;cmpxchg&lt;/code&gt; to ensure only one BPF program can be attached at a time. &lt;code&gt;.unreg&lt;/code&gt; is called when the BPF program is detached.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.verifier_ops&lt;/code&gt;: This points to a structure of callbacks for the BPF verifier. It allows us to customize how the verifier treats BPF programs attached to this &lt;code&gt;struct_ops&lt;/code&gt;. For example, we can control which helper functions are allowed. In our case, we use &lt;code&gt;bpf_base_func_proto&lt;/code&gt; to allow a basic set of helpers, including &lt;code&gt;bpf_printk&lt;/code&gt;, which is useful for debugging.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.init&lt;/code&gt; and &lt;code&gt;.init_member&lt;/code&gt;: These are for BTF (BPF Type Format) initialization. They are required for the kernel to understand the types and layout of our &lt;code&gt;struct_ops&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.name&lt;/code&gt; and &lt;code&gt;.owner&lt;/code&gt;: These identify our &lt;code&gt;struct_ops&lt;/code&gt; and tie it to our module, ensuring proper reference counting so the module isn't unloaded while a BPF program is still attached.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The module's &lt;code&gt;testmod_init&lt;/code&gt; function is where the magic starts. It calls &lt;code&gt;register_bpf_struct_ops&lt;/code&gt;, passing our definition. This makes the kernel aware of the new &lt;code&gt;bpf_testmod_ops&lt;/code&gt; type, and from this point on, BPF programs can target it.&lt;/p&gt;

&lt;p&gt;Finally, to make this demonstrable, the module creates a file in the proc filesystem: &lt;code&gt;/proc/bpf_testmod_trigger&lt;/code&gt;. When a userspace program writes to this file, the &lt;code&gt;trigger_write&lt;/code&gt; function is called. This function checks if a BPF program has registered an implementation for &lt;code&gt;testmod_ops&lt;/code&gt;. If so, it calls the function pointers (&lt;code&gt;test_1&lt;/code&gt;, &lt;code&gt;test_2&lt;/code&gt;, &lt;code&gt;test_3&lt;/code&gt;), which will execute the code in our BPF program. This provides a simple way to invoke the BPF-implemented operations from userspace. The use of RCU (&lt;code&gt;rcu_read_lock&lt;/code&gt;, &lt;code&gt;rcu_dereference&lt;/code&gt;) ensures that we can safely access the &lt;code&gt;testmod_ops&lt;/code&gt; pointer even if it's being updated concurrently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The BPF Program: Implementing the Operations
&lt;/h2&gt;

&lt;p&gt;With the kernel module in place defining the &lt;em&gt;what&lt;/em&gt; (the &lt;code&gt;bpf_testmod_ops&lt;/code&gt; interface), we can now write a BPF program to define the &lt;em&gt;how&lt;/em&gt; (the actual implementation of those operations). This BPF program will contain the logic that executes when the &lt;code&gt;test_1&lt;/code&gt;, &lt;code&gt;test_2&lt;/code&gt;, and &lt;code&gt;test_3&lt;/code&gt; functions are called from the kernel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete BPF Program: &lt;code&gt;struct_ops.bpf.c&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This program provides the concrete implementations for the function pointers in &lt;code&gt;bpf_testmod_ops&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cm"&gt;/* SPDX-License-Identifier: GPL-2.0 */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;vmlinux.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_tracing.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"module/bpf_testmod.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;_license&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* Implement the struct_ops callbacks */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"struct_ops/test_1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;bpf_printk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"BPF test_1 called!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"struct_ops/test_2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;bpf_printk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"BPF test_2 called: %d + %d = %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"struct_ops/test_3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;BPF_PROG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;read_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;bpf_printk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"BPF test_3 called with buffer length %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Safely read from kernel buffer using bpf_probe_read_kernel */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;read_len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_probe_read_kernel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="cm"&gt;/* Successfully read buffer - print first few characters */&lt;/span&gt;
            &lt;span class="n"&gt;bpf_printk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Buffer content: '%c%c%c%c'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="n"&gt;bpf_printk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Full buffer: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read_buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;bpf_printk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to read buffer, ret=%ld&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Define the struct_ops map */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".struct_ops"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="n"&gt;testmod_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the BPF Code
&lt;/h3&gt;

&lt;p&gt;The BPF code is remarkably straightforward, which is a testament to the power of the &lt;code&gt;struct_ops&lt;/code&gt; abstraction.&lt;/p&gt;

&lt;p&gt;Each function in the BPF program corresponds to one of the operations defined in the kernel module's &lt;code&gt;bpf_testmod_ops&lt;/code&gt; struct. The magic lies in the &lt;code&gt;SEC&lt;/code&gt; annotations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SEC("struct_ops/test_1")&lt;/code&gt;: This tells the BPF loader that the &lt;code&gt;bpf_testmod_test_1&lt;/code&gt; program is an implementation for a &lt;code&gt;struct_ops&lt;/code&gt; operation. The name after the slash isn't strictly enforced to match the function name, but it's a good convention. The key part is the &lt;code&gt;struct_ops&lt;/code&gt; prefix.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementations themselves are simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bpf_testmod_test_1&lt;/code&gt;: This function takes no arguments, prints a message to the kernel trace log using &lt;code&gt;bpf_printk&lt;/code&gt;, and returns the integer &lt;code&gt;42&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bpf_testmod_test_2&lt;/code&gt;: This function takes two integers, &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;, calculates their sum, prints the operation and result, and returns the sum.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bpf_testmod_test_3&lt;/code&gt;: This function demonstrates handling data from userspace. It receives a character buffer and its length. It uses &lt;code&gt;bpf_probe_read_kernel&lt;/code&gt; to safely copy the data from the buffer passed by the kernel module into a local buffer on the BPF stack. This is a crucial safety measure, as BPF programs cannot directly access arbitrary kernel memory pointers. After reading, it prints the content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final piece is the &lt;code&gt;struct_ops&lt;/code&gt; map itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".struct_ops"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops&lt;/span&gt; &lt;span class="n"&gt;testmod_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_test_3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the most critical part for linking everything together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SEC(".struct_ops")&lt;/code&gt;: This special section identifies the following data structure as a &lt;code&gt;struct_ops&lt;/code&gt; map.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;struct bpf_testmod_ops testmod_ops&lt;/code&gt;: We declare a variable named &lt;code&gt;testmod_ops&lt;/code&gt; of the type &lt;code&gt;struct bpf_testmod_ops&lt;/code&gt;. The &lt;strong&gt;name of this variable is important&lt;/strong&gt;. It must match the &lt;code&gt;name&lt;/code&gt; field in the &lt;code&gt;bpf_struct_ops&lt;/code&gt; definition within the kernel module (&lt;code&gt;.name = "bpf_testmod_ops"&lt;/code&gt;). This is how &lt;code&gt;libbpf&lt;/code&gt; knows which kernel &lt;code&gt;struct_ops&lt;/code&gt; this BPF program intends to implement.&lt;/li&gt;
&lt;li&gt;The structure is initialized by assigning the BPF programs (&lt;code&gt;bpf_testmod_test_1&lt;/code&gt;, etc.) to the corresponding function pointers. This maps our BPF functions to the "slots" in the &lt;code&gt;struct_ops&lt;/code&gt; interface.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the userspace loader attaches this &lt;code&gt;struct_ops&lt;/code&gt;, &lt;code&gt;libbpf&lt;/code&gt; and the kernel work together to find the &lt;code&gt;bpf_testmod_ops&lt;/code&gt; registered by our kernel module and link these BPF programs as its implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Userspace Loader: Attaching and Triggering
&lt;/h2&gt;

&lt;p&gt;The final component is the userspace program. Its job is to load the BPF program, attach it to the &lt;code&gt;struct_ops&lt;/code&gt; defined by the kernel module, and then trigger the operations to demonstrate that everything is working.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Userspace Program: &lt;code&gt;struct_ops.c&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;signal.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/libbpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"struct_ops.skel.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="n"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;handle_signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;trigger_struct_ops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/proc/bpf_testmod_trigger"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_WRONLY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;perror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"open /proc/bpf_testmod_trigger"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strlen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;perror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"write"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;struct_ops_bpf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handle_signal&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIGTERM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handle_signal&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Open BPF application */&lt;/span&gt;
    &lt;span class="n"&gt;skel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;struct_ops_bpf__open&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to open BPF skeleton&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Load BPF programs */&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;struct_ops_bpf__load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to load BPF skeleton: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Register struct_ops */&lt;/span&gt;
    &lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map__attach_struct_ops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;testmod_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to attach struct_ops&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Successfully loaded and attached BPF struct_ops!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Triggering struct_ops callbacks...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Trigger the struct_ops by writing to proc file */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger_struct_ops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello from userspace!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to trigger struct_ops - is the kernel module loaded?&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Load it with: sudo insmod module/hello.ko&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Triggered struct_ops successfully! Check dmesg for output.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Press Ctrl-C to exit...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Main loop - trigger periodically */&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;exiting&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;exiting&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;trigger_struct_ops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Periodic trigger"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Triggered struct_ops again...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Detaching struct_ops...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_link__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nl"&gt;cleanup:&lt;/span&gt;
    &lt;span class="n"&gt;struct_ops_bpf__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the Userspace Code
&lt;/h3&gt;

&lt;p&gt;The userspace code orchestrates the entire process.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Signal Handling&lt;/strong&gt;: It sets up a signal handler for &lt;code&gt;SIGINT&lt;/code&gt; and &lt;code&gt;SIGTERM&lt;/code&gt; to allow for a graceful exit. This is crucial for &lt;code&gt;struct_ops&lt;/code&gt; because we need to ensure the BPF program is detached properly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open and Load&lt;/strong&gt;: It uses the standard &lt;code&gt;libbpf&lt;/code&gt; skeleton API to open and load the BPF application (&lt;code&gt;struct_ops_bpf__open()&lt;/code&gt; and &lt;code&gt;struct_ops_bpf__load()&lt;/code&gt;). This loads the BPF programs and the &lt;code&gt;struct_ops&lt;/code&gt; map into the kernel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Attach &lt;code&gt;struct_ops&lt;/code&gt;&lt;/strong&gt;: The key step is the attachment:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map__attach_struct_ops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;testmod_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;This &lt;code&gt;libbpf&lt;/code&gt; function does the heavy lifting. It takes the &lt;code&gt;struct_ops&lt;/code&gt; map from our BPF skeleton (&lt;code&gt;skel-&amp;gt;maps.testmod_ops&lt;/code&gt;) and asks the kernel to link it to the corresponding &lt;code&gt;struct_ops&lt;/code&gt; definition (which it finds by the name "bpf_testmod_ops"). If successful, the kernel's &lt;code&gt;reg&lt;/code&gt; callback in our module is executed, and the function pointers in the kernel are now pointing to our BPF programs. The function returns a &lt;code&gt;bpf_link&lt;/code&gt;, which represents the active attachment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Triggering&lt;/strong&gt;: The &lt;code&gt;trigger_struct_ops&lt;/code&gt; function simply opens the &lt;code&gt;/proc/bpf_testmod_trigger&lt;/code&gt; file and writes a message to it. This action invokes the &lt;code&gt;trigger_write&lt;/code&gt; handler in our kernel module, which in turn calls the BPF-implemented operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cleanup&lt;/strong&gt;: When the user presses Ctrl-C, the &lt;code&gt;exiting&lt;/code&gt; flag is set, the loop terminates, and &lt;code&gt;bpf_link__destroy(link)&lt;/code&gt; is called. This is the counterpart to the attach step. It detaches the BPF programs, causing the kernel to call the &lt;code&gt;unreg&lt;/code&gt; callback in our module. This cleans up the link and decrements the module's reference count, allowing it to be unloaded cleanly. If this step is skipped (e.g., by killing the process with &lt;code&gt;-9&lt;/code&gt;), the module will remain "in use" until the kernel's garbage collection cleans up the link, which can take time.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Compilation and Execution
&lt;/h2&gt;

&lt;p&gt;Now that we have all three components—the kernel module, the BPF program, and the userspace loader—let's compile and run the example to see &lt;code&gt;struct_ops&lt;/code&gt; in action.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Build the Kernel Module
&lt;/h3&gt;

&lt;p&gt;First, navigate to the &lt;code&gt;module&lt;/code&gt; directory and compile the kernel module. This requires having the kernel headers installed for your current kernel version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;module
make
&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will produce a &lt;code&gt;hello.ko&lt;/code&gt; file, which is our compiled kernel module.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Load the Kernel Module
&lt;/h3&gt;

&lt;p&gt;Load the module into the kernel using &lt;code&gt;insmod&lt;/code&gt;. This will register our &lt;code&gt;bpf_testmod_ops&lt;/code&gt; struct_ops type and create the &lt;code&gt;/proc/bpf_testmod_trigger&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;insmod module/hello.ko
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can verify that the module loaded successfully by checking the kernel log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dmesg | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a message like: &lt;code&gt;bpf_testmod loaded with struct_ops support&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Build and Run the eBPF Application
&lt;/h3&gt;

&lt;p&gt;Next, compile and run the userspace loader, which will also compile the BPF program.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./struct_ops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon running, the userspace application will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Load the BPF programs.&lt;/li&gt;
&lt;li&gt; Attach the BPF implementation to the &lt;code&gt;bpf_testmod_ops&lt;/code&gt; struct_ops.&lt;/li&gt;
&lt;li&gt; Write to &lt;code&gt;/proc/bpf_testmod_trigger&lt;/code&gt; to invoke the BPF functions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You should see output in your terminal like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Successfully loaded and attached BPF struct_ops!
Triggering struct_ops callbacks...
Triggered struct_ops successfully! Check dmesg for output.

Press Ctrl-C to exit...
Triggered struct_ops again...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Check the Kernel Log for BPF Output
&lt;/h3&gt;

&lt;p&gt;While the userspace program is running, open another terminal and watch the kernel log to see the output from our BPF programs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;dmesg &lt;span class="nt"&gt;-w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time the proc file is written to, you will see messages printed by the BPF programs via &lt;code&gt;bpf_printk&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ ... ] bpf_testmod_ops registered
[ ... ] Calling struct_ops callbacks:
[ ... ] BPF test_1 called!
[ ... ] test_1() returned: 42
[ ... ] BPF test_2 called: 10 + 20 = 30
[ ... ] test_2(10, 20) returned: 30
[ ... ] BPF test_3 called with buffer length 21
[ ... ] Buffer content: 'Hell'
[ ... ] Full buffer: Hello from userspace!
[ ... ] test_3() called with buffer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This output confirms that the calls from the kernel module are being correctly dispatched to our BPF programs.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Clean Up
&lt;/h3&gt;

&lt;p&gt;When you are finished, press &lt;code&gt;Ctrl-C&lt;/code&gt; in the terminal running &lt;code&gt;./struct_ops&lt;/code&gt;. The program will gracefully detach the BPF link. Then, you can unload the kernel module.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;rmmod hello
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, clean up the build artifacts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make clean
&lt;span class="nb"&gt;cd &lt;/span&gt;module
make clean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note on Unloading the Module&lt;/strong&gt;: Gracefully stopping the userspace program is important. It ensures &lt;code&gt;bpf_link__destroy()&lt;/code&gt; is called, which allows the kernel module's reference count to be decremented. If the userspace process is killed abruptly (e.g., with &lt;code&gt;kill -9&lt;/code&gt;), the kernel module may remain "in use," and &lt;code&gt;rmmod&lt;/code&gt; will fail until the BPF link is garbage collected by the kernel, which can take some time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting Common Issues
&lt;/h2&gt;

&lt;p&gt;When working with advanced features like &lt;code&gt;struct_ops&lt;/code&gt;, which involve kernel modules, BTF, and the BPF verifier, you may encounter some tricky issues. This section covers common problems and their solutions, based on the development process of this example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 1: Failed to find BTF for &lt;code&gt;struct_ops&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; The userspace loader fails with an error like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;libbpf: failed to find BTF info for struct_ops/bpf_testmod_ops
Failed to attach struct_ops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root Cause:&lt;/strong&gt; This error means the kernel module (&lt;code&gt;hello.ko&lt;/code&gt;) was compiled without the necessary BTF (BPF Type Format) information. The BPF system relies on BTF to understand the structure and types defined in the module, which is essential for linking the BPF program to the &lt;code&gt;struct_ops&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ensure &lt;code&gt;vmlinux&lt;/code&gt; with BTF is available:&lt;/strong&gt; The kernel build system needs access to the &lt;code&gt;vmlinux&lt;/code&gt; file corresponding to your running kernel to generate BTF for external modules. This file is often not available by default. You may need to copy it from &lt;code&gt;/sys/kernel/btf/vmlinux&lt;/code&gt; or build it from your kernel source. A common location for the build system to look is &lt;code&gt;/lib/modules/$(uname -r)/build/vmlinux&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ensure &lt;code&gt;pahole&lt;/code&gt; is up-to-date:&lt;/strong&gt; BTF generation depends on the &lt;code&gt;pahole&lt;/code&gt; tool (part of the &lt;code&gt;dwarves&lt;/code&gt; package). Older versions of &lt;code&gt;pahole&lt;/code&gt; may lack the features needed for modern BTF generation. Ensure you have &lt;code&gt;pahole&lt;/code&gt; v1.16 or newer. If your distribution's version is too old, you may need to compile it from source.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rebuild the module:&lt;/strong&gt; After ensuring the dependencies are met, rebuild the kernel module. The &lt;code&gt;Makefile&lt;/code&gt; for this example already includes the &lt;code&gt;-g&lt;/code&gt; flag, which instructs the compiler to generate debug information that &lt;code&gt;pahole&lt;/code&gt; uses to create BTF.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can verify that BTF information is present in your module with &lt;code&gt;readelf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;readelf &lt;span class="nt"&gt;-S&lt;/span&gt; module/hello.ko | &lt;span class="nb"&gt;grep&lt;/span&gt; .BTF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see sections named &lt;code&gt;.BTF&lt;/code&gt; and &lt;code&gt;.BTF.ext&lt;/code&gt;, indicating that BTF data has been embedded.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 2: Kernel Panic on Module Load
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; The system crashes (kernel panic) immediately after you run &lt;code&gt;sudo insmod hello.ko&lt;/code&gt;. The &lt;code&gt;dmesg&lt;/code&gt; log might show a &lt;code&gt;NULL pointer dereference&lt;/code&gt; inside &lt;code&gt;register_bpf_struct_ops&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root Cause:&lt;/strong&gt; The kernel's &lt;code&gt;struct_ops&lt;/code&gt; registration logic expects certain callback pointers in the &lt;code&gt;bpf_struct_ops&lt;/code&gt; structure to be non-NULL. In older kernel versions or certain configurations, if callbacks like &lt;code&gt;.verifier_ops&lt;/code&gt;, &lt;code&gt;.init&lt;/code&gt;, or &lt;code&gt;.init_member&lt;/code&gt; are missing, the kernel may dereference a NULL pointer, causing a panic. The kernel's code doesn't always perform defensive NULL checks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Always provide all required callbacks in your &lt;code&gt;bpf_struct_ops&lt;/code&gt; definition, even if they are just empty functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In module/hello.c&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_verifier_ops&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_verifier_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_valid_access&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_is_valid_access&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_proto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_get_func_proto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_struct_ops&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_struct_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verifier_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bpf_testmod_verifier_ops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// REQUIRED&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;// REQUIRED&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init_member&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_init_member&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// REQUIRED&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_reg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unreg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_unreg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By explicitly defining these callbacks, you prevent the kernel from attempting to call a NULL function pointer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 3: BPF Program Fails to Load with "Invalid Argument"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; The userspace loader fails with an error indicating that a BPF helper function is not allowed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;libbpf: prog 'bpf_testmod_test_1': BPF program load failed: Invalid argument
program of this type cannot use helper bpf_trace_printk#6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root Cause:&lt;/strong&gt; BPF programs of type &lt;code&gt;struct_ops&lt;/code&gt; run in a different kernel context than tracing programs (like kprobes or tracepoints). As a result, they are subject to a different, often more restrictive, set of allowed helper functions. The &lt;code&gt;bpf_trace_printk&lt;/code&gt; helper (which &lt;code&gt;bpf_printk&lt;/code&gt; is a macro for) is a tracing helper and is not allowed by default in &lt;code&gt;struct_ops&lt;/code&gt; programs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; While you can't use &lt;code&gt;bpf_printk&lt;/code&gt; by default, you can explicitly allow it for your &lt;code&gt;struct_ops&lt;/code&gt; type. This is done in the kernel module by implementing the &lt;code&gt;.get_func_proto&lt;/code&gt; callback in your &lt;code&gt;bpf_verifier_ops&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In module/hello.c&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_func_proto&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="nf"&gt;bpf_testmod_ops_get_func_proto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;bpf_func_id&lt;/span&gt; &lt;span class="n"&gt;func_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_prog&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* Use base func proto which includes trace_printk and other basic helpers */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bpf_base_func_proto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_verifier_ops&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_verifier_ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_valid_access&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_is_valid_access&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_func_proto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_testmod_ops_get_func_proto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Add this line&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;bpf_base_func_proto&lt;/code&gt; function provides access to a set of common, basic helpers, including &lt;code&gt;bpf_trace_printk&lt;/code&gt;. By adding this to our verifier operations, we tell the BPF verifier that programs attached to &lt;code&gt;bpf_testmod_ops&lt;/code&gt; are permitted to use these helpers. This makes debugging with &lt;code&gt;bpf_printk&lt;/code&gt; possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we explored the powerful capabilities of BPF &lt;code&gt;struct_ops&lt;/code&gt; by moving beyond common examples. We demonstrated a robust pattern for extending the kernel: creating a minimal kernel module to define a new, BPF-programmable subsystem interface, and then providing the full, complex implementation in a safe, updatable BPF program. This approach combines the extensibility of kernel modules with the safety and flexibility of eBPF.&lt;/p&gt;

&lt;p&gt;We saw how the kernel module registers a &lt;code&gt;struct_ops&lt;/code&gt; type, how the BPF program implements the required functions, and how a userspace loader attaches this implementation and triggers its execution. This architecture opens the door to implementing a wide range of kernel-level features in BPF, from custom network protocols and security policies to new filesystem behaviors, all while maintaining system stability and avoiding the need to recompile the kernel.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you'd like to dive deeper into eBPF, check out our tutorial repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or visit our website at &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kernel Source for &lt;code&gt;struct_ops&lt;/code&gt;&lt;/strong&gt;: The implementation can be found in &lt;code&gt;kernel/bpf/bpf_struct_ops.c&lt;/code&gt; in the Linux source tree.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel Test Module for &lt;code&gt;struct_ops&lt;/code&gt;&lt;/strong&gt;: The official kernel self-test module provides a reference implementation: &lt;code&gt;tools/testing/selftests/bpf/test_kmods/bpf_testmod.c&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BPF Documentation&lt;/strong&gt;: The official BPF documentation in the kernel source: &lt;a href="https://www.kernel.org/doc/html/latest/bpf/" rel="noopener noreferrer"&gt;https://www.kernel.org/doc/html/latest/bpf/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ebpf</category>
      <category>structops</category>
      <category>kernel</category>
    </item>
    <item>
      <title>eBPF Tutorial: BPF Workqueues for Asynchronous Sleepable Tasks</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 20 Jan 2026 07:20:47 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-bpf-workqueues-for-asynchronous-sleepable-tasks-58nb</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-bpf-workqueues-for-asynchronous-sleepable-tasks-58nb</guid>
      <description>&lt;p&gt;Ever needed your eBPF program to sleep, allocate memory, or wait for device I/O? Traditional eBPF programs run in restricted contexts where blocking operations crash the system. But what if your HID device needs timing delays between injected key events, or your cleanup routine needs to sleep while freeing resources?&lt;/p&gt;

&lt;p&gt;This is what &lt;strong&gt;BPF Workqueues&lt;/strong&gt; enable. Created by Benjamin Tissoires at Red Hat in 2024 for HID-BPF device handling, workqueues let you schedule asynchronous work that runs in process context where sleeping and blocking operations are allowed. In this tutorial, we'll explore why workqueues were created, how they differ from timers, and build a complete example demonstrating async callback execution.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The complete source code: &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_wq" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_wq&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction to BPF Workqueues: Solving the Sleep Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: When eBPF Can't Sleep
&lt;/h3&gt;

&lt;p&gt;Before BPF workqueues existed, developers had &lt;code&gt;bpf_timer&lt;/code&gt; for deferred execution. Timers work great for scheduling callbacks after a delay, perfect for updating counters or triggering periodic events. But there's a fundamental limitation that made timers unusable for certain critical use cases: &lt;strong&gt;bpf_timer runs in softirq (software interrupt) context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Softirq context has strict rules enforced by the kernel. You cannot sleep or wait for I/O - any attempt to do so will cause kernel panics or deadlocks. You cannot allocate memory using &lt;code&gt;kzalloc()&lt;/code&gt; with &lt;code&gt;GFP_KERNEL&lt;/code&gt; flag because memory allocation might need to wait for pages. You cannot communicate with hardware devices that require waiting for responses. Essentially, you cannot perform any blocking operations that might cause the CPU to wait.&lt;/p&gt;

&lt;p&gt;This limitation became a real problem for Benjamin Tissoires at Red Hat when he was developing HID-BPF in 2023. HID devices (keyboards, mice, tablets, game controllers) frequently need operations that timers simply can't handle. Imagine implementing keyboard macro functionality where pressing F1 types "hello" - you need 10ms delays between each keystroke for the system to properly process events. Or consider a device with buggy firmware that needs re-initialization after system wake - you must send commands and wait for hardware responses. Timer callbacks in softirq context can't do any of this.&lt;/p&gt;

&lt;p&gt;As Benjamin Tissoires explained in his kernel patches: "I need something similar to bpf_timers, but not in soft IRQ context... the bpf_timer functionality would prevent me to kzalloc and wait for the device."&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Process Context Execution
&lt;/h3&gt;

&lt;p&gt;In early 2024, Benjamin proposed and developed &lt;strong&gt;bpf_wq&lt;/strong&gt; - essentially "bpf_timer but in process context instead of softirq." The kernel community merged it into Linux v6.10+ in April 2024. The key insight is simple but powerful: by running callbacks in process context (through the kernel's workqueue infrastructure), BPF programs gain access to the full range of kernel operations.&lt;/p&gt;

&lt;p&gt;Here's what changes with process context:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;bpf_timer (softirq)&lt;/th&gt;
&lt;th&gt;bpf_wq (process)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Can sleep?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ No - will crash&lt;/td&gt;
&lt;td&gt;✅ Yes - safe to sleep&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory allocation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Limited flags only&lt;/td&gt;
&lt;td&gt;✅ Full &lt;code&gt;kzalloc()&lt;/code&gt; support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Device I/O&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Cannot wait&lt;/td&gt;
&lt;td&gt;✅ Can wait for responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Blocking operations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Prohibited&lt;/td&gt;
&lt;td&gt;✅ Fully supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very low (microseconds)&lt;/td&gt;
&lt;td&gt;Higher (milliseconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Time-critical fast path&lt;/td&gt;
&lt;td&gt;Sleepable slow path&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Workqueues enable the classic "fast path + slow path" pattern. Your eBPF program handles performance-critical operations immediately in the fast path, then schedules expensive cleanup or I/O operations to run asynchronously in the slow path. The fast path stays responsive while the slow path gets the capabilities it needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Applications
&lt;/h3&gt;

&lt;p&gt;The applications span multiple domains. &lt;strong&gt;HID device handling&lt;/strong&gt; was the original motivation - injecting keyboard macros with timing delays, fixing broken device firmware dynamically without kernel drivers, re-initializing devices after wake from sleep, transforming input events on the fly. All these require sleepable operations that only workqueues can provide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network packet processing&lt;/strong&gt; benefits from async cleanup patterns. Your XDP program enforces rate limits and drops packets in the fast path (non-blocking), while a workqueue cleans up stale tracking entries in the background. This prevents memory leaks without impacting packet processing performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security monitoring&lt;/strong&gt; can apply fast rules immediately, then use workqueues to query reputation databases or external threat intelligence services. The fast path makes instant decisions while the slow path updates policies based on complex analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource cleanup&lt;/strong&gt; defers expensive operations. Instead of blocking the main code path while freeing memory, closing connections, or compacting data structures, you schedule a workqueue to handle cleanup in the background.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: Simple Workqueue Test
&lt;/h2&gt;

&lt;p&gt;Let's build a complete example that demonstrates the workqueue lifecycle. We'll create a program that triggers on the &lt;code&gt;unlink&lt;/code&gt; syscall, schedules async work, and verifies that both the main path and workqueue callback execute correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete BPF Program: wq_simple.bpf.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cm"&gt;/* Simple BPF workqueue example */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;vmlinux.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"bpf_experimental.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;LICENSE&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* Element with embedded workqueue */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_wq&lt;/span&gt; &lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="cm"&gt;/* Array to store our element */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF_MAP_TYPE_ARRAY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__uint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="nf"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".maps"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="cm"&gt;/* Result variables */&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;wq_executed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;main_executed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* Workqueue callback - runs asynchronously in workqueue context */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;wq_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* This runs later in workqueue context */&lt;/span&gt;
    &lt;span class="n"&gt;wq_executed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* Modify the value asynchronously */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Main program - schedules work */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fentry/do_unlinkat"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;test_workqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;elem&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_wq&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;main_executed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Initialize element in map */&lt;/span&gt;
    &lt;span class="n"&gt;bpf_map_update_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Get element from map */&lt;/span&gt;
    &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Initialize workqueue */&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_wq_init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Set callback function */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_wq_set_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq_callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Schedule work to run asynchronously */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_wq_start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the BPF Code
&lt;/h3&gt;

&lt;p&gt;The program demonstrates the complete workqueue workflow from initialization through async execution. We start by defining a structure that embeds a workqueue. The &lt;code&gt;struct elem&lt;/code&gt; contains both application data (&lt;code&gt;value&lt;/code&gt;) and the workqueue handle (&lt;code&gt;struct bpf_wq work&lt;/code&gt;). This embedding pattern is critical - the workqueue infrastructure needs to know which map contains the workqueue structure, and embedding it in the map value establishes this relationship.&lt;/p&gt;

&lt;p&gt;Our map is a simple array with one entry, chosen for simplicity in this example. In production code, you'd typically use hash maps to track multiple entities, each with its own embedded workqueue. The global variables &lt;code&gt;wq_executed&lt;/code&gt; and &lt;code&gt;main_executed&lt;/code&gt; serve as test instrumentation, letting userspace verify that both code paths ran.&lt;/p&gt;

&lt;p&gt;The workqueue callback shows the signature that all workqueue callbacks must follow: &lt;code&gt;int callback(void *map, int *key, void *value)&lt;/code&gt;. The kernel invokes this function asynchronously in process context, passing the map containing the workqueue, the key of the entry, and a pointer to the value. This signature gives the callback full context about which element triggered it and access to the element's data. Our callback sets &lt;code&gt;wq_executed = 1&lt;/code&gt; to prove it ran, and modifies &lt;code&gt;val-&amp;gt;value = 42&lt;/code&gt; to demonstrate that async modifications persist in the map.&lt;/p&gt;

&lt;p&gt;The main program attached to &lt;code&gt;fentry/do_unlinkat&lt;/code&gt; triggers whenever the &lt;code&gt;unlink&lt;/code&gt; syscall executes. This gives us an easy way to activate the program - userspace just needs to delete a file. We set &lt;code&gt;main_executed = 1&lt;/code&gt; immediately to mark the synchronous path. Then we initialize an element and store it in the map using &lt;code&gt;bpf_map_update_elem()&lt;/code&gt;. This is necessary because the workqueue must be embedded in a map entry.&lt;/p&gt;

&lt;p&gt;The workqueue initialization follows a three-step sequence. First, &lt;code&gt;bpf_wq_init(wq, &amp;amp;array, 0)&lt;/code&gt; initializes the workqueue handle, passing the map that contains it. The verifier uses this information to validate that the workqueue and its container are properly related. Second, &lt;code&gt;bpf_wq_set_callback(wq, wq_callback, 0)&lt;/code&gt; registers our callback function. The verifier checks that the callback has the correct signature. Third, &lt;code&gt;bpf_wq_start(wq, 0)&lt;/code&gt; schedules the workqueue to execute asynchronously. This call returns immediately - the main program continues executing while the kernel queues the work for later execution in process context.&lt;/p&gt;

&lt;p&gt;The flags parameter in all three functions is reserved for future use and should be 0 in current kernels. The pattern allows future extensions without breaking API compatibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete User-Space Program: wq_simple.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cm"&gt;/* Userspace test for BPF workqueue */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;sys/resource.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/libbpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"wq_simple.skel.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;libbpf_print_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;libbpf_print_level&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;va_list&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;vfprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;wq_simple_bpf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;libbpf_set_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;libbpf_print_fn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Open and load BPF application */&lt;/span&gt;
    &lt;span class="n"&gt;skel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wq_simple_bpf__open_and_load&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to open and load BPF skeleton&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Attach tracepoint handler */&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wq_simple_bpf__attach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to attach BPF skeleton&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"BPF workqueue program attached. Triggering unlink syscall...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Create a temporary file to trigger do_unlinkat */&lt;/span&gt;
    &lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/tmp/wq_test_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;O_CREAT&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;O_WRONLY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mo"&gt;0644&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/tmp/wq_test_file"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Give workqueue time to execute */&lt;/span&gt;
    &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Check results */&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Results:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  main_executed = %u (expected: 1)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;main_executed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  wq_executed = %u (expected: 1)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;wq_executed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;main_executed&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;wq_executed&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;✓ Test PASSED!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;✗ Test FAILED!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nl"&gt;cleanup:&lt;/span&gt;
    &lt;span class="n"&gt;wq_simple_bpf__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the User-Space Code
&lt;/h3&gt;

&lt;p&gt;The userspace program orchestrates the test and verifies results. We use the skeleton API from libbpf which embeds the compiled BPF bytecode in a C structure, making loading trivial. The &lt;code&gt;wq_simple_bpf__open_and_load()&lt;/code&gt; call compiles (if needed), loads the BPF program into the kernel, and creates all maps in one operation.&lt;/p&gt;

&lt;p&gt;After loading, &lt;code&gt;wq_simple_bpf__attach()&lt;/code&gt; attaches the fentry program to &lt;code&gt;do_unlinkat&lt;/code&gt;. From this point, any unlink syscall will trigger our BPF program. We deliberately trigger this by creating and immediately deleting a temporary file. The &lt;code&gt;open()&lt;/code&gt; creates &lt;code&gt;/tmp/wq_test_file&lt;/code&gt;, we close the fd, then &lt;code&gt;unlink()&lt;/code&gt; deletes it. This deletion enters the kernel's &lt;code&gt;do_unlinkat&lt;/code&gt; function, triggering our fentry probe.&lt;/p&gt;

&lt;p&gt;Here's the critical timing aspect: workqueue execution is asynchronous. Our main BPF program schedules the work and returns immediately. The kernel queues the callback for later execution by a kernel worker thread. This is why we &lt;code&gt;sleep(1)&lt;/code&gt; - giving the workqueue time to execute before we check results. In production code, you'd use more sophisticated synchronization, but for a simple test, sleep is sufficient.&lt;/p&gt;

&lt;p&gt;After the sleep, we read global variables from the BPF program's &lt;code&gt;.bss&lt;/code&gt; section. The skeleton provides convenient access through &lt;code&gt;skel-&amp;gt;bss-&amp;gt;main_executed&lt;/code&gt; and &lt;code&gt;skel-&amp;gt;bss-&amp;gt;wq_executed&lt;/code&gt;. If both are 1, we know the synchronous path (fentry) and async path (workqueue callback) both executed successfully.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Workqueue APIs
&lt;/h2&gt;

&lt;p&gt;The workqueue API consists of three essential functions that manage the lifecycle. &lt;strong&gt;&lt;code&gt;bpf_wq_init(wq, map, flags)&lt;/code&gt;&lt;/strong&gt; initializes a workqueue handle, establishing the relationship between the workqueue and its containing map. The map parameter is crucial - it tells the verifier which map contains the value with the embedded &lt;code&gt;bpf_wq&lt;/code&gt; structure. The verifier uses this to ensure memory safety across async execution. Flags should be 0 in current kernels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;bpf_wq_set_callback(wq, callback_fn, flags)&lt;/code&gt;&lt;/strong&gt; registers the function to execute asynchronously. The callback must have the signature &lt;code&gt;int callback(void *map, int *key, void *value)&lt;/code&gt;. The verifier checks this signature at load time and will reject programs with mismatched signatures. This type safety prevents common async programming errors. Flags should be 0.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;bpf_wq_start(wq, flags)&lt;/code&gt;&lt;/strong&gt; schedules the workqueue to run. This returns immediately - your BPF program continues executing synchronously. The kernel queues the callback for execution by a worker thread in process context at some point in the future. The callback might run microseconds or milliseconds later depending on system load. Flags should be 0.&lt;/p&gt;

&lt;p&gt;The callback signature deserves attention. Unlike &lt;code&gt;bpf_timer&lt;/code&gt; callbacks which receive &lt;code&gt;(void *map, __u32 *key, void *value)&lt;/code&gt;, workqueue callbacks receive &lt;code&gt;(void *map, int *key, void *value)&lt;/code&gt;. Note the key type difference - &lt;code&gt;int *&lt;/code&gt; vs &lt;code&gt;__u32 *&lt;/code&gt;. This reflects the evolution of the API and must be matched exactly or the verifier rejects your program. The callback runs in process context, so it can safely perform operations that would crash in softirq context.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Workqueues vs Timers
&lt;/h2&gt;

&lt;p&gt;Choose &lt;strong&gt;bpf_timer&lt;/strong&gt; when you need microsecond-precision timing, operations are fast and non-blocking, you're updating counters or simple state, or implementing periodic fast-path operations like statistics collection or packet pacing. Timers excel at time-critical tasks that must execute with minimal latency.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;bpf_wq&lt;/strong&gt; when you need to sleep or wait, allocate memory with &lt;code&gt;kzalloc()&lt;/code&gt;, perform device or network I/O, or defer cleanup operations that can happen later. Workqueues are perfect for the "fast path + slow path" pattern where critical operations happen immediately and expensive processing runs asynchronously. Examples include HID device I/O (keyboard macro injection with delays), async map cleanup (preventing memory leaks), security policy updates (querying external databases), and background processing (compression, encryption, aggregation).&lt;/p&gt;

&lt;p&gt;The fundamental trade-off is latency vs capability. Timers have lower latency but restricted capabilities. Workqueues have higher latency but full process context capabilities including sleeping and blocking I/O.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compilation and Execution
&lt;/h2&gt;

&lt;p&gt;Navigate to the bpf_wq directory and build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bpf-developer-tutorial/src/features/bpf_wq
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Makefile compiles the BPF program with the experimental workqueue features enabled and generates a skeleton header.&lt;/p&gt;

&lt;p&gt;Run the simple workqueue test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./wq_simple
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BPF workqueue program attached. Triggering unlink syscall...

Results:
  main_executed = 1 (expected: 1)
  wq_executed = 1 (expected: 1)

✓ Test PASSED!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test verifies that both the synchronous fentry probe and the asynchronous workqueue callback executed successfully. If the workqueue callback didn't run, &lt;code&gt;wq_executed&lt;/code&gt; would be 0 and the test would fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Historical Timeline and Context
&lt;/h2&gt;

&lt;p&gt;Understanding how workqueues came to exist helps appreciate their design. In 2022, Benjamin Tissoires started work on HID-BPF, aiming to let users fix broken HID devices without kernel drivers. By 2023, he realized &lt;code&gt;bpf_timer&lt;/code&gt; limitations made HID device I/O impossible - you can't wait for hardware responses in softirq context. In early 2024, he proposed &lt;code&gt;bpf_wq&lt;/code&gt; as "bpf_timer in process context," collaborating with the BPF community on the design. The kernel merged workqueues in April 2024 as part of Linux v6.10. Since then, they've been used for HID quirks, rate limiting, async cleanup, and other sleepable operations.&lt;/p&gt;

&lt;p&gt;The key quote from Benjamin's patches captures the motivation perfectly: "I need something similar to bpf_timers, but not in soft IRQ context... the bpf_timer functionality would prevent me to kzalloc and wait for the device."&lt;/p&gt;

&lt;p&gt;This real-world need drove the design. Workqueues exist because device handling and resource management require sleepable, blocking operations that timers fundamentally cannot provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary and Next Steps
&lt;/h2&gt;

&lt;p&gt;BPF workqueues solve a fundamental limitation of eBPF by enabling sleepable, blocking operations in process context. Created specifically to support HID device handling where timing delays and device I/O are essential, workqueues unlock powerful new capabilities for eBPF programs. They enable the "fast path + slow path" pattern where performance-critical operations execute immediately while expensive cleanup and I/O happen asynchronously without blocking.&lt;/p&gt;

&lt;p&gt;Our simple example demonstrates the core workqueue lifecycle: embedding a &lt;code&gt;bpf_wq&lt;/code&gt; in a map value, initializing and configuring it, scheduling async execution, and verifying the callback runs in process context. This same pattern scales to production use cases like network rate limiting with async cleanup, security monitoring with external service queries, and device handling with I/O operations.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you'd like to dive deeper into eBPF, check out our tutorial repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or visit our website at &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Original Kernel Patches:&lt;/strong&gt; Benjamin Tissoires' HID-BPF and bpf_wq patches (2023-2024)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux Kernel Source:&lt;/strong&gt; &lt;code&gt;kernel/bpf/helpers.c&lt;/code&gt; - workqueue implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tutorial Repository:&lt;/strong&gt; &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_wq" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_wq&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example adapted from Linux kernel BPF selftests with educational enhancements. Requires Linux kernel 6.10+ for workqueue support. Complete source code available in the tutorial repository.&lt;/p&gt;

</description>
      <category>ebpf</category>
      <category>workqueue</category>
      <category>kernel</category>
    </item>
    <item>
      <title>eBPF Tutorial: BPF Iterators for Kernel Data Export</title>
      <dc:creator>云微</dc:creator>
      <pubDate>Tue, 13 Jan 2026 07:18:49 +0000</pubDate>
      <link>https://dev.to/yunwei37/ebpf-tutorial-bpf-iterators-for-kernel-data-export-137f</link>
      <guid>https://dev.to/yunwei37/ebpf-tutorial-bpf-iterators-for-kernel-data-export-137f</guid>
      <description>&lt;p&gt;Ever tried monitoring hundreds of processes and ended up parsing thousands of &lt;code&gt;/proc&lt;/code&gt; files just to find the few you care about? Or needed custom formatted kernel data but didn't want to modify the kernel itself? Traditional &lt;code&gt;/proc&lt;/code&gt; filesystem access is slow, inflexible, and forces you to process tons of data in userspace even when you only need a small filtered subset.&lt;/p&gt;

&lt;p&gt;This is what &lt;strong&gt;BPF Iterators&lt;/strong&gt; solve. Introduced in Linux kernel 5.8, iterators let you traverse kernel data structures directly from BPF programs, apply filters in-kernel, and output exactly the data you need in any format you want. In this tutorial, we'll build a dual-mode iterator that shows kernel stack traces and open file descriptors for processes, with in-kernel filtering by process name - dramatically faster than parsing &lt;code&gt;/proc&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The complete source code: &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_iters" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_iters&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction to BPF Iterators: The /proc Replacement
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: /proc is Slow and Rigid
&lt;/h3&gt;

&lt;p&gt;Traditional Linux monitoring revolves around the &lt;code&gt;/proc&lt;/code&gt; filesystem. Need to see what processes are doing? Read &lt;code&gt;/proc/*/stack&lt;/code&gt;. Want open files? Parse &lt;code&gt;/proc/*/fd/*&lt;/code&gt;. This works, but it's painfully inefficient when you're monitoring systems at scale or need specific filtered views of kernel data.&lt;/p&gt;

&lt;p&gt;The performance problem is systemic. Every &lt;code&gt;/proc&lt;/code&gt; access requires a syscall, kernel mode transition, text formatting, data copy to userspace, and then you parse that text back into structures. If you want stack traces for all "bash" processes among 1000 total processes, you still read all 1000 &lt;code&gt;/proc/*/stack&lt;/code&gt; files and filter in userspace. That's 1000 syscalls, 1000 text parsing operations, and megabytes of data transferred just to find a handful of matches.&lt;/p&gt;

&lt;p&gt;Format inflexibility compounds the problem. The kernel chooses what data to show and how to format it. Want stack traces with custom annotations? Too bad, you get the kernel's fixed format. Need to aggregate data across processes? Parse everything in userspace. The &lt;code&gt;/proc&lt;/code&gt; interface is designed for human consumption, not programmatic filtering and analysis.&lt;/p&gt;

&lt;p&gt;Here's what traditional monitoring looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find stack traces for all bash processes&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;pid &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;pgrep bash&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== PID &lt;/span&gt;&lt;span class="nv"&gt;$pid&lt;/span&gt;&lt;span class="s2"&gt; ==="&lt;/span&gt;
  &lt;span class="nb"&gt;cat&lt;/span&gt; /proc/&lt;span class="nv"&gt;$pid&lt;/span&gt;/stack
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This spawns &lt;code&gt;pgrep&lt;/code&gt; as a subprocess, makes a syscall per matching PID to read stack files, parses text output, and does all filtering in userspace. Simple to write, horrible for performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Programmable In-Kernel Iteration
&lt;/h3&gt;

&lt;p&gt;BPF iterators flip the model. Instead of pulling all data to userspace for processing, you push your processing logic into the kernel where the data lives. An iterator is a BPF program attached to a kernel data structure traversal that gets called for each element. The kernel walks tasks, files, or sockets, invokes your BPF program with each element's context, and your code decides what to output and how to format it.&lt;/p&gt;

&lt;p&gt;The architecture is elegant. You write a BPF program marked &lt;code&gt;SEC("iter/task")&lt;/code&gt; or &lt;code&gt;SEC("iter/task_file")&lt;/code&gt; that receives each task or file during iteration. Inside this program, you have direct access to kernel struct fields, can filter based on any criteria using normal C logic, and use &lt;code&gt;BPF_SEQ_PRINTF()&lt;/code&gt; to format output exactly as needed. The kernel handles the iteration mechanics while your code focuses purely on filtering and formatting.&lt;/p&gt;

&lt;p&gt;When userspace reads from the iterator file descriptor, the magic happens entirely in the kernel. The kernel walks the task list, calls your BPF program for each task passing the task_struct pointer. Your program checks if the task name matches your filter - if not, it returns 0 immediately with no output. If it matches, your program extracts the stack trace and formats it to a seq_file. All this happens in kernel context before any data crosses to userspace.&lt;/p&gt;

&lt;p&gt;The benefits are transformative. &lt;strong&gt;In-kernel filtering&lt;/strong&gt; means only relevant data crosses the kernel boundary, eliminating wasted work. &lt;strong&gt;Custom formats&lt;/strong&gt; let you output binary, JSON, CSV, whatever your tools need. &lt;strong&gt;Single read operation&lt;/strong&gt; replaces thousands of individual &lt;code&gt;/proc&lt;/code&gt; file accesses. &lt;strong&gt;Zero parsing&lt;/strong&gt; because you formatted the data correctly in the kernel. &lt;strong&gt;Composability&lt;/strong&gt; works with standard Unix tools since iterator output comes through a normal file descriptor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Iterator Types and Capabilities
&lt;/h3&gt;

&lt;p&gt;The kernel provides iterators for many subsystems. &lt;strong&gt;Task iterators&lt;/strong&gt; (&lt;code&gt;iter/task&lt;/code&gt;) walk all tasks giving you access to process state, credentials, resource usage, and parent-child relationships. &lt;strong&gt;File iterators&lt;/strong&gt; (&lt;code&gt;iter/task_file&lt;/code&gt;) traverse open file descriptors showing files, sockets, pipes, and other fd types. &lt;strong&gt;Network iterators&lt;/strong&gt; (&lt;code&gt;iter/tcp&lt;/code&gt;, &lt;code&gt;iter/udp&lt;/code&gt;) walk active network connections with full socket state. &lt;strong&gt;BPF object iterators&lt;/strong&gt; (&lt;code&gt;iter/bpf_map&lt;/code&gt;, &lt;code&gt;iter/bpf_prog&lt;/code&gt;) enumerate loaded BPF programs and maps for introspection.&lt;/p&gt;

&lt;p&gt;Our tutorial focuses on task and task_file iterators because they solve common monitoring needs and demonstrate core concepts applicable to all iterator types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: Dual-Mode Task Iterator
&lt;/h2&gt;

&lt;p&gt;Let's build a complete example demonstrating two iterator types in one tool. We'll create a program that can show either kernel stack traces or open file descriptors for processes, with optional filtering by process name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete BPF Program: task_stack.bpf.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cm"&gt;/* Kernel task stack and file descriptor iterator */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;vmlinux.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf_helpers.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;_license&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"license"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"GPL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cp"&gt;#define MAX_STACK_TRACE_DEPTH   64
&lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MAX_STACK_TRACE_DEPTH&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
&lt;span class="cp"&gt;#define SIZE_OF_ULONG (sizeof(unsigned long))
&lt;/span&gt;
&lt;span class="cm"&gt;/* Filter: only show stacks for tasks with this name (empty = show all) */&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;stacks_shown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;files_shown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/* Task stack iterator */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"iter/task"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;dump_task_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_iter__task&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;seq_file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;task_struct&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retlen&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="cm"&gt;/* End of iteration - print summary */&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stacks_shown&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Summary: %u task stacks shown ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;stacks_shown&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Filter by task name if specified */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Get kernel stack trace for this task */&lt;/span&gt;
    &lt;span class="n"&gt;retlen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_get_task_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;MAX_STACK_TRACE_DEPTH&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;SIZE_OF_ULONG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retlen&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;stacks_shown&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cm"&gt;/* Print task info and stack trace */&lt;/span&gt;
    &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"=== Task: %s (pid=%u, tgid=%u) ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tgid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Stack depth: %u frames&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retlen&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;SIZE_OF_ULONG&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MAX_STACK_TRACE_DEPTH&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retlen&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;SIZE_OF_ULONG&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"  [%2ld] %pB&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Task file descriptor iterator */&lt;/span&gt;
&lt;span class="n"&gt;SEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"iter/task_file"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;dump_task_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_iter__task_file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;seq_file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;task_struct&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files_shown&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;seq_num&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Summary: %u file descriptors shown ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;files_shown&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Filter by task name if specified */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;seq_num&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"%-16s %8s %8s %6s %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="s"&gt;"COMM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"TGID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"PID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"FD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"FILE_OPS"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;files_shown&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;BPF_SEQ_PRINTF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"%-16s %8d %8d %6d 0x%lx&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tgid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;f_op&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the BPF Code
&lt;/h3&gt;

&lt;p&gt;The program implements two separate iterators sharing common filtering logic. The &lt;code&gt;SEC("iter/task")&lt;/code&gt; annotation registers &lt;code&gt;dump_task_stack&lt;/code&gt; as a task iterator - the kernel will call this function once for each task in the system. The context structure &lt;code&gt;bpf_iter__task&lt;/code&gt; provides three critical pieces: the &lt;code&gt;meta&lt;/code&gt; field containing iteration metadata and the seq_file for output, the &lt;code&gt;task&lt;/code&gt; pointer to the current task_struct, and a NULL task pointer when iteration finishes so you can print summaries.&lt;/p&gt;

&lt;p&gt;The task stack iterator shows in-kernel filtering in action. When &lt;code&gt;task&lt;/code&gt; is NULL, we've reached the end of iteration and can print summary statistics showing how many tasks matched our filter. For each task, we first apply filtering by comparing &lt;code&gt;task-&amp;gt;comm&lt;/code&gt; (the process name) against &lt;code&gt;target_comm&lt;/code&gt;. We can't use standard library functions like &lt;code&gt;strcmp()&lt;/code&gt; in BPF, so we manually loop through characters comparing byte by byte. If the names don't match and filtering is enabled, we immediately return 0 with no output - this task is skipped entirely in the kernel without crossing to userspace.&lt;/p&gt;

&lt;p&gt;Once a task passes filtering, we extract its kernel stack trace using &lt;code&gt;bpf_get_task_stack()&lt;/code&gt;. This BPF helper captures up to 64 stack frames into our &lt;code&gt;entries&lt;/code&gt; array, returning the number of bytes written. We format the output using &lt;code&gt;BPF_SEQ_PRINTF()&lt;/code&gt; which writes to the kernel's seq_file infrastructure. The special &lt;code&gt;%pB&lt;/code&gt; format specifier symbolizes kernel addresses, turning raw pointers into human-readable function names like &lt;code&gt;schedule+0x42/0x100&lt;/code&gt;. This makes stack traces immediately useful for debugging.&lt;/p&gt;

&lt;p&gt;The file descriptor iterator demonstrates a different iterator type. &lt;code&gt;SEC("iter/task_file")&lt;/code&gt; tells the kernel to call this function for every open file descriptor across all tasks. The context provides &lt;code&gt;task&lt;/code&gt;, &lt;code&gt;file&lt;/code&gt; (the kernel's struct file pointer), and &lt;code&gt;fd&lt;/code&gt; (the numeric file descriptor). We apply the same task name filtering, then format output as a table. Using &lt;code&gt;ctx-&amp;gt;meta-&amp;gt;seq_num&lt;/code&gt; to detect the first output lets us print column headers exactly once.&lt;/p&gt;

&lt;p&gt;Notice how filtering happens before any expensive operations. We check the task name first, and only if it matches do we extract stack traces or format file information. This minimizes work in the kernel fast path - non-matching tasks are rejected with just a string comparison, no memory allocation, no formatting, no output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete User-Space Program: task_stack.c
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SPDX-License-Identifier: GPL-2.0&lt;/span&gt;
&lt;span class="cm"&gt;/* Userspace program for task stack and file iterator */&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/libbpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bpf/bpf.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;"task_stack.skel.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;libbpf_print_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;libbpf_print_level&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;va_list&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;vfprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run_iterator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_program&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_link&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;iter_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_program__attach_iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to attach %s iterator&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;iter_fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_iter_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bpf_link__fd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iter_fd&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to create %s iterator: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iter_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;bpf_link__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iter_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iter_fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;bpf_link__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;task_stack_bpf&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;show_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;libbpf_set_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;libbpf_print_fn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Parse arguments */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;strcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"--files"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;show_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Open BPF application */&lt;/span&gt;
    &lt;span class="n"&gt;skel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task_stack_bpf__open&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to open BPF skeleton&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Configure filter before loading */&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;strncpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;bss&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;target_comm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Filtering for tasks matching: %s&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Usage: %s [--files] [comm]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  --files    Show open file descriptors instead of stacks&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  comm       Filter by process name&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cm"&gt;/* Load BPF program */&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task_stack_bpf__load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Failed to load BPF skeleton&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;show_files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"=== BPF Task File Descriptor Iterator ===&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;run_iterator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"task_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump_task_file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"=== BPF Task Stack Iterator ===&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;run_iterator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;progs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump_task_stack&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nl"&gt;cleanup:&lt;/span&gt;
    &lt;span class="n"&gt;task_stack_bpf__destroy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the User-Space Code
&lt;/h3&gt;

&lt;p&gt;The userspace program showcases how simple iterator usage is once you understand the pattern. The &lt;code&gt;run_iterator()&lt;/code&gt; function encapsulates the three-step iterator lifecycle. First, &lt;code&gt;bpf_program__attach_iter()&lt;/code&gt; attaches the BPF program to the iterator infrastructure, registering it to be called during iteration. Second, &lt;code&gt;bpf_iter_create()&lt;/code&gt; creates a file descriptor representing an iterator instance. Third, simple &lt;code&gt;read()&lt;/code&gt; calls consume the iterator output.&lt;/p&gt;

&lt;p&gt;Here's what makes this powerful: when you read from the iterator fd, the kernel transparently starts walking tasks or files. For each element, it calls your BPF program passing the element's context. Your BPF code filters and formats output to a seq_file buffer. The kernel accumulates this output and returns it through the read() call. From userspace's perspective, it's just reading a file - all the iteration, filtering, and formatting complexity is hidden in the kernel.&lt;/p&gt;

&lt;p&gt;The main function handles mode selection and configuration. We parse command-line arguments to determine whether to show stacks or files, and what process name to filter for. Critically, we set &lt;code&gt;skel-&amp;gt;bss-&amp;gt;target_comm&lt;/code&gt; before loading the BPF program. This writes the filter string into the BPF program's global data section, making it visible to kernel code when the program runs. This is how we pass configuration from userspace to kernel without complex communication channels.&lt;/p&gt;

&lt;p&gt;After loading, we select which iterator to run based on the &lt;code&gt;--files&lt;/code&gt; flag. Both iterators use the same filtering logic, but produce different output - one shows stack traces, the other shows file descriptors. The shared filtering code demonstrates how BPF programs can implement reusable logic across different iterator types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compilation and Execution
&lt;/h2&gt;

&lt;p&gt;Navigate to the bpf_iters directory and build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bpf-developer-tutorial/src/features/bpf_iters
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Makefile compiles the BPF program with BTF support and generates a skeleton header containing the compiled bytecode embedded in C structures. This skeleton API makes BPF program loading trivial.&lt;/p&gt;

&lt;p&gt;Show kernel stack traces for all systemd processes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./task_stack systemd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Filtering for tasks matching: systemd

=== BPF Task Stack Iterator ===

=== Task: systemd (pid=1, tgid=1) ===
Stack depth: 6 frames
  [ 0] ep_poll+0x447/0x460
  [ 1] do_epoll_wait+0xc3/0xe0
  [ 2] __x64_sys_epoll_wait+0x6d/0x110
  [ 3] x64_sys_call+0x19b1/0x2310
  [ 4] do_syscall_64+0x7e/0x170
  [ 5] entry_SYSCALL_64_after_hwframe+0x76/0x7e

=== Summary: 1 task stacks shown ===
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Show open file descriptors for bash processes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./task_stack &lt;span class="nt"&gt;--files&lt;/span&gt; bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Filtering for tasks matching: bash

=== BPF Task File Descriptor Iterator ===

COMM                 TGID      PID     FD FILE_OPS
bash                12345    12345      0 0xffffffff81e3c6e0
bash                12345    12345      1 0xffffffff81e3c6e0
bash                12345    12345      2 0xffffffff81e3c6e0
bash                12345    12345    255 0xffffffff82145dc0

=== Summary: 4 file descriptors shown ===
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run without filtering to see all tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./task_stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows stacks for every task in the system. On a typical desktop, this might display hundreds of tasks. Notice how fast it runs compared to parsing &lt;code&gt;/proc/*/stack&lt;/code&gt; for all processes - the iterator is dramatically more efficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use BPF Iterators vs /proc
&lt;/h2&gt;

&lt;p&gt;Choose &lt;strong&gt;BPF iterators&lt;/strong&gt; when you need filtered kernel data without userspace processing overhead, custom output formats that don't match &lt;code&gt;/proc&lt;/code&gt; text, performance-critical monitoring that runs frequently, or integration with BPF-based observability infrastructure. Iterators excel when you're monitoring many entities but only care about a subset, or when you need to aggregate and transform data in the kernel.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;/proc&lt;/strong&gt; when you need simple one-off queries, are debugging or prototyping where development speed matters more than runtime performance, want maximum portability across kernel versions (iterators require relatively recent kernels), or run in restricted environments where you can't load BPF programs.&lt;/p&gt;

&lt;p&gt;The fundamental trade-off is processing location. Iterators push filtering and formatting into the kernel for efficiency and flexibility, while &lt;code&gt;/proc&lt;/code&gt; keeps the kernel simple and does all processing in userspace. For production monitoring of complex systems, iterators usually win due to their performance benefits and programming flexibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary and Next Steps
&lt;/h2&gt;

&lt;p&gt;BPF iterators revolutionize how we export kernel data by enabling programmable, filtered iteration directly from BPF code. Instead of repeatedly reading and parsing &lt;code&gt;/proc&lt;/code&gt; files, you write a BPF program that iterates kernel structures in-kernel, applies filtering at the source, and formats output exactly as needed. This eliminates massive overhead from syscalls, mode transitions, and userspace parsing while providing complete flexibility in output format.&lt;/p&gt;

&lt;p&gt;Our dual-mode iterator demonstrates both task and file iteration, showing how one BPF program can export multiple views of kernel data with shared filtering logic. The kernel handles complex iteration mechanics while your BPF code focuses purely on filtering and formatting. Iterators integrate seamlessly with standard Unix tools through their file descriptor interface, making them composable building blocks for sophisticated monitoring pipelines.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you'd like to dive deeper into eBPF, check out our tutorial repository at &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial&lt;/a&gt; or visit our website at &lt;a href="https://eunomia.dev/tutorials/" rel="noopener noreferrer"&gt;https://eunomia.dev/tutorials/&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BPF Iterator Documentation:&lt;/strong&gt; &lt;a href="https://docs.kernel.org/bpf/bpf_iterators.html" rel="noopener noreferrer"&gt;https://docs.kernel.org/bpf/bpf_iterators.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel Iterator Selftests:&lt;/strong&gt; Linux kernel tree &lt;code&gt;tools/testing/selftests/bpf/*iter*.c&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tutorial Repository:&lt;/strong&gt; &lt;a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_iters" rel="noopener noreferrer"&gt;https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_iters&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;libbpf Iterator API:&lt;/strong&gt; &lt;a href="https://github.com/libbpf/libbpf" rel="noopener noreferrer"&gt;https://github.com/libbpf/libbpf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BPF Helpers Manual:&lt;/strong&gt; &lt;a href="https://man7.org/linux/man-pages/man7/bpf-helpers.7.html" rel="noopener noreferrer"&gt;https://man7.org/linux/man-pages/man7/bpf-helpers.7.html&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples adapted from Linux kernel BPF selftests with educational enhancements. Requires Linux kernel 5.8+ for iterator support, BTF enabled, and libbpf. Complete source code available in the tutorial repository.&lt;/p&gt;

</description>
      <category>ebpf</category>
      <category>iterator</category>
      <category>kernel</category>
    </item>
  </channel>
</rss>
