DEV Community

Yuzuki Ishiyama
Yuzuki Ishiyama

Posted on

My first time submitting patches to the Linux kernel

Triggered by my graduate school research, I submitted a patch to the Linux kernel. In this article, I will explain the technical background and the details of my contribution.

NOTE: Original version is posted in Japanese and translated by Gemini.

commit of bpf: add bpf_strncasecmp kfunc

Technical Background

This section explains the technical background related to my contribution.

The Linux Kernel (OS)

In a nutshell, the role of an OS is "resource management." For example:

  • Computational Resources: Allocating CPU time equally among multiple processes.
  • Storage Resources: Allocating RAM to processes / Saving files to HDD.
  • Network Resources: Sending data received from processes to the network.

By entrusting this resource management to the OS, we can manipulate files without worrying about whether the storage is an HDD or SSD, and utilize the Internet without being conscious of the underlying TCP/IP or Ethernet protocols.

In recent years, there have been increasing demands for additional OS features, such as security and monitoring in distributed systems. However, since stability and safety are required of the OS, adding new features to the kernel itself is a difficult and time-consuming task. This is where eBPF comes in, allowing features to be added in a plugin-like manner without modifying the OS core.

eBPF (extended Berkeley Packet Filter)

eBPF is a mechanism that safely extends the functionality of the Linux kernel. Although the name includes "Packet Filter" due to its history, modern eBPF is far removed from just being a filter.

Previously, there were a few ways to extend OS functionality:

  1. Patching the OS source code itself.
  2. Using kernel modules.

Method 1 is the direct approach, but the sheer size of the Linux codebase and the difficulty of debugging are significant issues. Method 2 is a plugin-style approach used for many years, but if there is a bug in the module itself, it can cause the entire OS to freeze. When this happens, the only recovery method is a forced reboot of the machine.

eBPF was proposed to solve this. It provides a sandbox environment inside the OS and runs plugin programs within it. This prevents worst-case scenarios like OS freezes, even if the program itself has issues. Securing this sandbox environment is the core technology of eBPF.

The Verifier

eBPF guarantees safety by verifying plugin programs beforehand. "Safety" here means:

  • No illegal memory access.
  • The program terminates within a realistic time.

The eBPF Verifier uses static analysis to ensure these security issues do not exist. Specifically, it tracks ranges of variable values and function calls across all code paths. If the verifier detects attempts to access memory outside the sandbox or potential infinite loops, it rejects the program load.

While the verifier is powerful for ensuring safety, it imposes constraints on programming. For example, complex programs with multiple nested loops are often rejected because the verifier cannot track the variables. This nature is very inconvenient for processing tasks like string comparison.

To address this, there is a mechanism to move parts of the program outside the scope of the verifier.

KFunc

KFuncs are kernel functions that can be called from eBPF. The eBPF verifier checks the ranges and types of the function arguments but does not verify the content inside the KFunc itself. Therefore, by defining string comparison functions as KFuncs, we can bypass the verifier's constraints.

In the Linux kernel, commonly used string comparison functions are pre-defined as KFuncs. Examples include:

  • bpf_strcmp: Compares two strings.
  • bpf_strcasecmp: Compares two strings (case-insensitive).
  • bpf_strstr: Searches for a string within a string.

These functions resemble those defined in the C standard library <string.h>, but they are tuned for eBPF, having internal limits on string length and annotations so the verifier can check the arguments.

My Contribution

I added bpf_strncasecmp as a new KFunc. The behavior of bpf_strcasecmp and bpf_str'n'casecmp is almost the same, but the latter only compares the first n characters of the given strings. Here are examples of how they work:

  • bpf_strcasecmp("hello", "HELLO"): The two strings are equal.
  • bpf_strcasecmp("hello", "HELLO WORLD"): The two strings are different.
  • bpf_strncasecmp("hello", "HELLO WORLD", 5): The two strings are equal.

This function is particularly useful when parsing HTTP headers in eBPF. HTTP headers ignore case for keys. When parsing HTTP headers line by line, case-insensitive prefix matching is necessary to search for specific keys.

Reflections

This was my first time developing using a mailing list, and I was impressed by how well-organized the development environment was. A script could fetch the people to CC in emails, and CI would run automatically if specific prefixes were included in the email—a system built to allow many developers to maintain the project in a distributed manner. I also noticed the introduction of AI reviews using Claude, realizing that the Linux kernel development is adopting the latest technologies.

One piece of feedback from a reviewer that stood out was regarding "consistency with other code." This is the policy that programs with similar behavior should adopt similar code structures. Prioritizing consistent code over code that might run slightly faster feels unique to the environment of the Linux kernel, where stability and safety are paramount.

Through this experience, I learned a lot about debugging the Linux kernel, how to send patches, and how to build maintainable systems.

References

Top comments (0)