<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leesoo Ahn</title>
    <description>The latest articles on DEV Community by Leesoo Ahn (@lsahn).</description>
    <link>https://dev.to/lsahn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1067290%2Fbece31b4-5db0-4142-8da6-ee93c84a1719.png</url>
      <title>DEV Community: Leesoo Ahn</title>
      <link>https://dev.to/lsahn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lsahn"/>
    <language>en</language>
    <item>
      <title>The Anatomy of Barriers</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Mon, 05 May 2025 10:08:31 +0000</pubDate>
      <link>https://dev.to/lsahn/the-anatomy-of-barriers-34n2</link>
      <guid>https://dev.to/lsahn/the-anatomy-of-barriers-34n2</guid>
      <description>&lt;p&gt;I've been working on migrating an Arm-based product to a different architecture. Throughout the process, I came across some lines of code with barriers. The challenge was that I couldn't fully understand or modify them properly without knowing exactly what those barriers were.&lt;/p&gt;

&lt;p&gt;In this post, we'll take a closer look at the two major types of barriers you often encounter: &lt;em&gt;compiler barriers&lt;/em&gt; and &lt;em&gt;memory barriers&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;center&gt; Barriers? &lt;/center&gt;
&lt;/h2&gt;

&lt;p&gt;A barrier (also known as fence) is a mechanism of preventing memory operations from being reordered by compilers or CPUs. Modern processors and compilers often execute instructions out of order to improve performance, which can lead to unexpected behaviors in concurrent or low-level programming. Barriers enforce strict ordering by ensuring that certain memory operations are completed before others begin.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;center&gt; Compiler Barriers &lt;/center&gt;
&lt;/h2&gt;

&lt;p&gt;A compiler barrier is an instruction or directive that tells the compiler:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not move memory operations across this point. Preserve the order of instructions as written.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is important to understand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A compiler barrier only affects the compiler's optimization passes.&lt;/li&gt;
&lt;li&gt;It does not directly affect the CPU's execution or memory ordering.
In other words, a compiler barrier controls the compiler, &lt;em&gt;not the hardware&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Modern compilers are insanely aggressive. They assume that reordering memory accesses is fine if the program's &lt;em&gt;observable behavior (the as-if rule)&lt;/em&gt; doesn't change.&lt;/p&gt;

&lt;p&gt;However, it's a critical correctness requirement if you're writing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lock-free data structures&lt;/li&gt;
&lt;li&gt;Spinlocks or mutexes&lt;/li&gt;
&lt;li&gt;Hardware drivers&lt;/li&gt;
&lt;li&gt;IPC code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even a simple-looking optimization can break things:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;flag&lt;/code&gt; signals to another thread that &lt;code&gt;*ptr&lt;/code&gt; is ready, but the compiler reorders these two stores, your system may crash or behave unpredictably as &lt;code&gt;flag&lt;/code&gt; taken first before &lt;code&gt;*ptr&lt;/code&gt; set to one.&lt;/p&gt;

&lt;p&gt;You need a compiler barrier in that case to guarantee the order you wrote is the order that gets emitted in the machine code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;asm&lt;/span&gt; &lt;span class="nf"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;:::&lt;/span&gt; &lt;span class="s"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It tells compilers not to reorder them.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;center&gt;Memory Barriers on Arm&lt;/center&gt;
&lt;/h2&gt;

&lt;p&gt;When working with Arm processors, especially in multi-core or multi-threaded environments, memory consistency issues quickly become a real concern. Because Arm implements a &lt;em&gt;weakly ordered memory model&lt;/em&gt;, the order in which memory operations appear to execute is not always the order you wrote in your code.&lt;/p&gt;

&lt;p&gt;This can lead to subtle, hard-to-reproduce bugs unless you use &lt;em&gt;memory barriers&lt;/em&gt; properly.&lt;/p&gt;

&lt;p&gt;Modern CPUs like Arm prioritize performance. Hence, they allow memory accesses (loads and stores) to be: &lt;em&gt;Reordered&lt;/em&gt;, &lt;em&gt;Delayed&lt;/em&gt;, and &lt;em&gt;Speculated&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For instances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A store you wrote earlier might become visible to another core after a later store.&lt;/li&gt;
&lt;li&gt;A load you wrote later might complete before an earlier store is visible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is usually harmless in single-threaded programs, but when multiple cores or devices are involved, it can break correctness.&lt;/p&gt;

&lt;p&gt;Arm defines three main types of memory barrier instructions: &lt;code&gt;dmb&lt;/code&gt;, &lt;code&gt;dsb&lt;/code&gt;, and &lt;code&gt;isb&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  DMB (Data Memory Barrier)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Ensures that memory accesses before the &lt;code&gt;dmb&lt;/code&gt; are globally observed before memory accesses after it.&lt;/li&gt;
&lt;li&gt;Only affects memory accesses - instructions can still be fetched and decoded out of order.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Message passing: ensuring that data is visible before a flag is set.&lt;/li&gt;
&lt;li&gt;Synchronizing shared variables across cores.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;str r5, [r1]    ; write data
dmb             ; make sure the data is globally visible
str r0, [r2]    ; signal that data is ready
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  DSB (Data Synchronization Barrier)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Stronger than &lt;code&gt;dmb&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Ensures that all memory accesses and side effects before the &lt;code&gt;dsb&lt;/code&gt; are complete, and execution doesn't proceed until they are done.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before entering low-power states: &lt;code&gt;wfi&lt;/code&gt;, and &lt;code&gt;wfe&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Before sending interrupts via memory-mapped registers.&lt;/li&gt;
&lt;li&gt;After cache or TLB maintenance operations.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;str r5, [r1]    ; update a shared buffer
dsb             ; ensure the update is complete
wfi             ; wait for interrupt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  ISB (Instruction Synchronization Barrier)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Flushes the CPU's pipeline.&lt;/li&gt;
&lt;li&gt;Ensures that all instructions following the ISB are fetched anew.&lt;/li&gt;
&lt;li&gt;Used after changing system control registers or modifying code at runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After enabling or disabling MMU, caches, or other system registers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcr p15, 0, r0, c1, c0, 0  ; update system control register
isb                        ; ensure the update takes effect immediately
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's summarize when to use each barrier:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Barrier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ensure memory write ordering across cores&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dmb&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complete all previous memory transactions before continuing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dsb&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flush the instruction pipeline after system configuration changes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;isb&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sending an interrupt (through a mailbox) after writing data&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dsb&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cleaning cache lines and invalidating TLBs&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;dsb&lt;/code&gt; + &lt;code&gt;isb&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;center&gt;Conclusion&lt;/center&gt;
&lt;/h2&gt;

&lt;p&gt;Compiler, and memory barriers are essential tools for writing correct and reliable low-level code on a specific architecture. They might seem like &lt;em&gt;magic words&lt;/em&gt; at first, but once you understand their role: &lt;em&gt;controlling visibility and ordering of memory accesses&lt;/em&gt;, they become a logical part of your system design.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Understanding barriers is a rite of passage for serious system programmers&lt;/em&gt;. And once you get it right, your systems will be faster, safer, and far less mysterious.&lt;/p&gt;

&lt;p&gt;Remember &lt;em&gt;your code doesn't always execute as you write and expect&lt;/em&gt;.&lt;/p&gt;

</description>
      <category>arm</category>
      <category>memoryordering</category>
      <category>barriers</category>
      <category>performance</category>
    </item>
    <item>
      <title>The Power of Memory Map</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Sat, 02 Nov 2024 16:59:58 +0000</pubDate>
      <link>https://dev.to/lsahn/the-power-of-memory-map-8f6</link>
      <guid>https://dev.to/lsahn/the-power-of-memory-map-8f6</guid>
      <description>&lt;p&gt;Since early this year, I’ve been working on a BSP project. The biggest challenge was understanding physical memory layout, specifically why certain addresses are defined in the DTS and don’t fall within other expected ranges.&lt;/p&gt;

&lt;p&gt;To tackle this, I created a complete memory map of the chip&lt;a href="https://www.nxp.com/products/processors-and-microcontrollers/s32-automotive-platform/s32g-vehicle-network-processors/s32g3-processors-for-vehicle-networking:S32G3" rel="noopener noreferrer"&gt;1&lt;/a&gt;, which helped me gain a clear understanding, and use the resources of the reference board&lt;a href="https://www.nxp.com/design/design-center/development-boards-and-designs/s32g3-vehicle-networking-reference-design:S32G-VNP-RDB3" rel="noopener noreferrer"&gt;2&lt;/a&gt; to explain.&lt;/p&gt;




&lt;p&gt;Fortunately, NXP has made the kernel source and reference manuals for the S32G3 chipset publicly available. This allows us to practice designing memory map diagrams freely using these resources—big thanks to NXP!&lt;/p&gt;

&lt;p&gt;The image below shows the complete memory map, including kernel-reserved memory regions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bywo2mvuzj0uivqu226.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bywo2mvuzj0uivqu226.jpg" alt="s32g3-memory-map-diagram" width="775" height="1320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The S32G3 chip includes five categories of memory ranges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extended Address Map&lt;/li&gt;
&lt;li&gt;External DRAM&lt;/li&gt;
&lt;li&gt;Peripherals&lt;/li&gt;
&lt;li&gt;RAM&lt;/li&gt;
&lt;li&gt;QSPI Memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Extended Address Map
&lt;/h3&gt;

&lt;p&gt;A 4GB DRAM can be mapped within a 32-bit address space. However, since the lower half of this range is allocated to peripherals, only up to 2GB is available for DRAM.&lt;/p&gt;

&lt;p&gt;To overcome this limitation, the system can extend the address space to 40-bit mode. This allows more than 2GB of DRAM to be mapped and provides additional address space for other devices, including the PCIe endpoint as shown in the diagram.&lt;/p&gt;

&lt;h3&gt;
  
  
  External DRAM
&lt;/h3&gt;

&lt;p&gt;This is the basic range where DRAM is mapped. It serves as the main memory used by the kernel, where tasks like loading the kernel image during boot, memory management like the page allocation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Peripherals
&lt;/h3&gt;

&lt;p&gt;This range is where most peripherals are mapped to specific areas of the SoC, allowing access to their controllers.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAM
&lt;/h3&gt;

&lt;p&gt;It’s integrated into SoC chips because key components like PCIe, CPU, and GPU need ultra-high-speed communication. Its size is quite limited compared to DRAM, typically ranging from KB to MB, due to the high cost of larger capacities. This type of memory is commonly used for cache and CPU registers, where ultra-high speed is essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  QSPI Flash Memory
&lt;/h3&gt;

&lt;p&gt;A QSPI-interfaced flash memory is used to store resources such as pre/boot loaders, kernel images, and additional binaries. This area is used by the M7 cores to store their firmware.&lt;/p&gt;

&lt;p&gt;However, the actual accessible address size on the board is limited to &lt;code&gt;0x03FF_FFFF&lt;/code&gt; (64MB), even though the total address space extends up to &lt;code&gt;0x1FFF_FFFF&lt;/code&gt; (512MB), because the NOR flash&lt;a href="https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/S32G/5077/1/QSPI_Octal%20SPI%20Flash%20-%20MX25UW51245GAutomotiveVer11C2203048NIO.pdf" rel="noopener noreferrer"&gt;3&lt;/a&gt; is designed as a 64MB storage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We have been exploring the memory map of the chip. This can be challenging for BSP newcomers, but it’s essential knowledge. For instance, U-Boot, a bootloader uses environment variables such as &lt;code&gt;loadaddr&lt;/code&gt; and &lt;code&gt;fdtaddr&lt;/code&gt; to load binaries into DRAM. In such cases, understanding the accessible memory range is crucial.&lt;/p&gt;

&lt;p&gt;I hope you found this post helpful and insightful!&lt;/p&gt;

</description>
      <category>bsp</category>
      <category>memory</category>
      <category>hardware</category>
      <category>soc</category>
    </item>
    <item>
      <title>A Yocto Cheatsheet</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Fri, 01 Nov 2024 03:48:37 +0000</pubDate>
      <link>https://dev.to/lsahn/a-yocto-cheatsheet-3jbf</link>
      <guid>https://dev.to/lsahn/a-yocto-cheatsheet-3jbf</guid>
      <description>&lt;h2&gt;
  
  
  Use external kernel source
&lt;/h2&gt;

&lt;p&gt;Add the following lines to &lt;code&gt;local.conf&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INHERIT += "externalsrc"
EXTERNALSRC:pn-linux-raspberrypi = "/path/to/linux-kernel"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Add extra tasks in recipe file
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;do_reloc_what_you_want() {
    // specific jobs
}
addtask reloc_what_you_want before do_configure after do_prepare_recipe_sysroot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>yocto</category>
      <category>cheatsheet</category>
    </item>
    <item>
      <title>DebConf24, a conference trip</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Sat, 12 Oct 2024 04:40:47 +0000</pubDate>
      <link>https://dev.to/lsahn/debconf24-a-conference-trip-1apb</link>
      <guid>https://dev.to/lsahn/debconf24-a-conference-trip-1apb</guid>
      <description>&lt;p&gt;때는 날씨가 더워지기 시작했던 5월과 6월 사이의 어느 날... Debian 개발자들의 연례 행사인 DebConf24가 대한민국 부산에서 열린다는 소식에 땀을 닦지도 않은 채 등록을 시작했다. 국제 컨퍼런스가 한국에서 열린다고 하니 통 크게 놀아보고 싶어 발표까지 하기로 마음먹었다. 청중을 위한 발표이니 영어를 사용해야했지만 무섭지 않았다. 이번이 아니면 큰 무대에서 발표할 수 있는 기회가 얼마나 많을까 싶은 마음뿐이었다.&lt;/p&gt;

&lt;h2&gt;
  
  
  선 등록, 후 고민
&lt;/h2&gt;

&lt;p&gt;어릴때는 그렇게나 겁이 많았다. 툭하면 "엄마!" 하며 울었다고 하는데 난 기억이 전혀 없다. 그렇게 중/고등학교를 지나 대학교에서 이것저것 일을 벌렸고 혼자 프로젝트 진행, 연구 대회도 개인으로 참여하여 개발/발표까지 하며 경험을 쌓았다.&lt;/p&gt;

&lt;p&gt;졸업한지 몇 년이 지났지만 하고 싶은게 생기면 별로 고민하지 않고 일단 시작한 뒤에 해결 방법을 고민한다.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;못하면 쪽팔리는 것 밖에 더하겠어? 죽는 것도 아니여 ~&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;그렇게 테스트도 하지 않았던, 머릿속에 오로지 &lt;strong&gt;"잘 될거야!"&lt;/strong&gt; 라는 생각 하나만으로 연구중이던 AppArmor Namespaces + Linux Container 주제로 발표 신청을 했다.&lt;/p&gt;

&lt;p&gt;시간은 흘러 8월초가 되어 부경대로 향했다. 지난 몇 년간 '준비, 발표, 후회' 사이클을 겪고 나니 이제는 그러려니 하면서 긴장하지 않았다.&lt;/p&gt;

&lt;h2&gt;
  
  
  발표요? 놀고나서 생각합시다!
&lt;/h2&gt;

&lt;p&gt;DebConf처럼 길게(2주정도) 진행되는 컨퍼런스의 경우 중간중간 영화 상영, 음악/와인 파티, Day trip의 세션이 존재하기도 한다. 경주, 울산, 부산 코스가 있었는데 나는 경주를 선택했다.&lt;/p&gt;

&lt;p&gt;이날 날씨가 상당히 더웠음에도 불구하고 각자 자신에게 맞는 한복을 열심히 골라본다.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovih5mrqv52cvfx3bzmv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fovih5mrqv52cvfx3bzmv.jpg" alt="222" width="460" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;모자도 중요하죠! 신중하게 고르는 그들. 왕이니까 신분에 맞는 모자도 써야지!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37tn86ger0xinli35ome.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37tn86ger0xinli35ome.jpg" alt="111" width="460" height="626"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;풍경이 예뻐서 찰칵!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4i0krjubxshc8hc5895.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4i0krjubxshc8hc5895.jpg" alt="333" width="460" height="259"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  두근두근 발표날
&lt;/h2&gt;

&lt;p&gt;초기 계획은 데모없이 자료만 사용하여 발표하려고 했다. 하지만 데모가 없으면 아마추어 같아서 급하게 발표 전날에 준비했다. WSL에서 QEMU 기반으로 테스트 환경을 만들어 데모를 준비했는데 예상과 다르게 테스트가 잘 되지 않아 조마조마했다. 다행히 AppArmor 매뉴얼과 코드 분석을 통해 무사히 끝낼 수 있었다.&lt;/p&gt;

&lt;p&gt;발표 주제는 &lt;em&gt;Linux Containers with AppArmor Policy Namespaces&lt;/em&gt;로 LXC 컨테이너가 동작중인 리눅스 환경에서 Host와 Container가 서로 다른 AppArmor 보안 정책을 사용할 수 있는 기술에 대한 내용이다.&lt;/p&gt;

&lt;p&gt;AppArmor, SELinux 같은 LSM 기반의 보안 모듈들은 커널에서 동작하며 이는 Host, Container 구분없이 모든 system-call에 대해 동일한 정책을 사용하게 된다는 의미이다. AppArmor는 Policy Namespace 기능을 지원하므로 커널은 Host와 Container의 system-call을 구분할 수 있게 되고 서로 독립적인 정책을 사용할 수 있게 되는 것이다.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/ZhzSyhlJ2xQ"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  세션 참여
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;loong64 port BoF

&lt;ul&gt;
&lt;li&gt;Debian 배포판에 loong64 arch를 포팅하는 내용이며 메인테이너가 진행했다. 아키텍처 메인테이너가 어떻게 작업하는지 볼 수 있는 귀중한 시간이었다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;What's new in eBPF and how you could use it today

&lt;ul&gt;
&lt;li&gt;eBPF에 대한 세션이었으며 해당 기술이 무엇인지, 어떤 도구가 존재하고 어떻게 사용하는지 간단하게 알아보았다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Past, Present and Future of Networking in Debian

&lt;ul&gt;
&lt;li&gt;Netplan 메인테이너가 진행했고 프로젝트의 현 상황과 앞으로의 계획에 대해서 토론하였다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Kernel Engineer BoF
&lt;/h2&gt;

&lt;p&gt;컨퍼런스가 얼마 남지 않은 날에 이렇게 끝내기에는 아쉽다는 느낌이 많아서 한국 커널 개발자분들에게 "Kernel BoF라도 하는게 어떨까요"라고 넌지시 물었고 다들 긍정적으로 생각하셨다. 그래서 급하게 Contents 팀에 연락하여 &lt;a href="https://debconf24.debconf.org/talks/176-kernel-engineer-bof/" rel="noopener noreferrer"&gt;Kernel Engineer BoF&lt;/a&gt; 세션을 등록해달라 부탁하였고 컨퍼런스 막바지였던 금요일에 진행했다.&lt;/p&gt;

&lt;p&gt;해당 세션은 각자가 진행중인 커널 프로젝트 또는 패치에 대해 짧게 얘기하고 토론하는 시간으로 편성했다. 나는 당시에 작업했던 &lt;a href="https://lore.kernel.org/all/20240726071023.4078055-1-lsahn@wewakecorp.com/" rel="noopener noreferrer"&gt;sparsemap_buf 최적화&lt;/a&gt;를 바탕으로 리뷰하였다. BoF 운영이 처음이다 보니 타이트한 시간 편성 및 작은 실수들이 많았는데 그래도 많은 분들이 와주셨다.&lt;/p&gt;

&lt;p&gt;BoF 끝나고 기념 사진! (초상권 중요하죠)&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmc8xgoe38fbaxcss1on.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmc8xgoe38fbaxcss1on.jpg" alt="Kernel Engineer BoF" width="500" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  후기
&lt;/h2&gt;

&lt;p&gt;중간에 참여한 것이 못내 아쉬웠다. 참여하고 싶었으나 시간이 맞지 않고 Video 녹화도 되지 않던 세션이 여렀있었다. 그래도 국내 컨퍼런스들과는 사뭇 다른 분위기여서 새롭고 재밌었다.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;편안한 환경에서 서로 장난치며 얘기하고, 대학교 프로젝트의 분위기가 강하다.&lt;/li&gt;
&lt;li&gt;메인테이너, 컨트리뷰터라고 상대를 내려다 보거나 반대로 우러러 보지 않는다. 모두 같은 컨트리뷰터로 대하고 자유롭게 의견을 낸다.&lt;/li&gt;
&lt;li&gt;BoF 세션을 처음 참여했는데 같은 방식의 micro-conference가 한국에도 많이 있으면 좋겠다.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  후원
&lt;/h2&gt;

&lt;p&gt;NIPA와 Open-UP의 지원을 받아 DebConf24에 참여하였습니다. 이 자릴 빌어 지원해준 기관에 감사드립니다.&lt;/p&gt;

</description>
      <category>debian</category>
      <category>techtalks</category>
      <category>linux</category>
      <category>apparmor</category>
    </item>
    <item>
      <title>Tracing the Arm64 Linux System Call Path</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Tue, 13 Aug 2024 13:23:12 +0000</pubDate>
      <link>https://dev.to/lsahn/tracing-the-arm64-linux-system-call-path-2ema</link>
      <guid>https://dev.to/lsahn/tracing-the-arm64-linux-system-call-path-2ema</guid>
      <description>&lt;p&gt;Arm64 system has two type of traps,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Synchronous&lt;/li&gt;
&lt;li&gt;Asynchronous&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and four exceptions which start with &lt;strong&gt;el&lt;/strong&gt; (stands for exception level.)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;el0 (userspace)&lt;/li&gt;
&lt;li&gt;el1 (kernel)&lt;/li&gt;
&lt;li&gt;el2 (hypervisor)&lt;/li&gt;
&lt;li&gt;el3 (secure mode)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Synchronous&lt;/em&gt; is known as system-call among many, while &lt;em&gt;Asynchronous&lt;/em&gt; is as hardware interrupt in Arm whitepaper. But the latter is off-topic in this article.&lt;/p&gt;

&lt;p&gt;One process is working in el0 and it would raise its hand by itself if it needs any system resource at a time. This is system-call and switches the exception level of CPUs from el0 to el1. Kernel takes the CPU and does something for the leftovers instead of the process. Once it's done, it hands out the CPU to the process again.&lt;/p&gt;




&lt;p&gt;The following code is about one of (real) system-call APIs from musl, a well-known libc library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define __asm_syscall(...) do { \
    __asm__ __volatile__ ( "svc 0" \
    : "=r"(x0) : __VA_ARGS__ : "memory", "cc"); \
    return x0; \
} while (0)
&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;__syscall0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;register&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;x8&lt;/span&gt; &lt;span class="n"&gt;__asm__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"x8"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;register&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;x0&lt;/span&gt; &lt;span class="n"&gt;__asm__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"x0"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;__asm_syscall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x8&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Imagine that one process mentioned above is about to call &lt;code&gt;fork()&lt;/code&gt; very soon. The API doesn't take any arguments and therefore, it maps to &lt;code&gt;__syscall0(..)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What you need to keep in mind regarding to the code is &lt;code&gt;svc&lt;/code&gt; instruction (stands for supervisor-call), to switch from el0 to el1 with &lt;code&gt;x8&lt;/code&gt; register holding digits that represent the system-call number.&lt;/p&gt;




&lt;p&gt;&lt;code&gt;el0t_64_sync_handler&lt;/code&gt; would be called in el1 by the exception vector table describing what to do if &lt;code&gt;svc&lt;/code&gt; raised and jump to &lt;code&gt;el0_svc(..)&lt;/code&gt; by &lt;code&gt;esr&lt;/code&gt; system register holds syndrome information which is used to recognize the exception class (also known as exception reason.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;el0t_64_sync_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;esr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read_sysreg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;esr_el1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ESR_ELx_EC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;esr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;ESR_ELx_EC_SVC64&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;el0_svc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From now on, showing a code diagram will be easier than words to understand for everyone. (code is based on v5.15)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;el0_svc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;do_el0_svc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="o"&gt;+-----+&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="nf"&gt;do_el0_svc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;el0_svc_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
           &lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="n"&gt;__NR_syscalls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="n"&gt;sys_call_table&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="o"&gt;+------+&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="nf"&gt;el0_svc_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;scno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;sc_nr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;syscall_fn_t&lt;/span&gt; &lt;span class="n"&gt;syscall_table&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;invoke_syscall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sc_nr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;syscall_table&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We're almost at our destination now. &lt;code&gt;scno&lt;/code&gt; was from &lt;code&gt;x8&lt;/code&gt; register (again, it was holding digits that represent a system-call number) and &lt;code&gt;invoke_syscall(..)&lt;/code&gt; is looking up the system-call function in &lt;code&gt;syscall_table&lt;/code&gt; using the number from &lt;code&gt;scno&lt;/code&gt;. Eventually, it will carry out what was requested.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;invoke_syscall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;scno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;sc_nr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;syscall_fn_t&lt;/span&gt; &lt;span class="n"&gt;syscall_table&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scno&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;sc_nr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;syscall_fn_t&lt;/span&gt; &lt;span class="n"&gt;syscall_fn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;syscall_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;syscall_table&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array_index_nospec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sc_nr&lt;/span&gt;&lt;span class="p"&gt;)];&lt;/span&gt;
        &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;__invoke_syscall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;syscall_fn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;                &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;              &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;                    &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="o"&gt;+----------------+&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="nf"&gt;__invoke_syscall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;syscall_fn_t&lt;/span&gt; &lt;span class="n"&gt;syscall_fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;syscall_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;You may wonder that as far as we know, each system-call has a different number of parameters. But &lt;code&gt;syscall_fn(..)&lt;/code&gt; takes only one, &lt;code&gt;regs&lt;/code&gt;. We will see two cases by code, one for taking nothing and another does five parameters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;fork()&lt;/code&gt; takes nothing in parameters, therefore &lt;code&gt;struct pt_regs&lt;/code&gt; object passing to &lt;code&gt;syscall_fn&lt;/code&gt; is unused.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define SYSCALL_DEFINE0(sname) \
    ...
&lt;/span&gt;    &lt;span class="n"&gt;asmlinkage&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;__arm64_sys_&lt;/span&gt;&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="n"&gt;sname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;__unused&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the other hands, &lt;code&gt;clone()&lt;/code&gt; takes five parameters, therefore &lt;code&gt;struct pt_regs&lt;/code&gt; object expands itself to the number of parameters by &lt;code&gt;SC_ARM64_REGS_TO_ARGS(..)&lt;/code&gt; and &lt;code&gt;__MAP(..)&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSCALL_DEFINE5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clone_flags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newsp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;__user&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parent_tidptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;__user&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child_tidptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="o"&gt;+--------+&lt;/span&gt;
               &lt;span class="o"&gt;|&lt;/span&gt;
               &lt;span class="n"&gt;V&lt;/span&gt;
&lt;span class="cp"&gt;#define __SYSCALL_DEFINEx(x, name, ...) \
    ...
&lt;/span&gt;    &lt;span class="n"&gt;__arm64_sys&lt;/span&gt;&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pt_regs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;regs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;{&lt;/span&gt; \
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;__se_sys&lt;/span&gt;&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SC_ARM64_REGS_TO_ARGS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__VA_ARGS__&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; \
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="o"&gt;+--------+&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="n"&gt;V&lt;/span&gt;
    &lt;span class="n"&gt;__se_sys&lt;/span&gt;&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__MAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__SC_LONG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__VA_ARGS__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; \
    &lt;span class="p"&gt;{&lt;/span&gt; \
        &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;__do_sys&lt;/span&gt;&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__MAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__SC_CAST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__VA_ARGS__&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; \
        &lt;span class="p"&gt;...&lt;/span&gt;              &lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;                  &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="o"&gt;+---------------+&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="n"&gt;V&lt;/span&gt;
    &lt;span class="n"&gt;__do_sys&lt;/span&gt;&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__MAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__SC_DECL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;__VA_ARGS__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;We have walked through the system-call code from el0 to el1. It wasn't a long journey, but wasn't easy either. I hope this tiny map (I like using metaphors) guides you to where you want to be.&lt;/p&gt;

&lt;p&gt;happy hacking!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>esBPF: Stress-Testing compares Software-Offload with iptables</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Mon, 12 Aug 2024 15:30:08 +0000</pubDate>
      <link>https://dev.to/lsahn/esbpf-stress-testing-compares-software-offload-with-iptables-1igc</link>
      <guid>https://dev.to/lsahn/esbpf-stress-testing-compares-software-offload-with-iptables-1igc</guid>
      <description>&lt;p&gt;This article was written on Nov 29th, 2022.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/lsahn-gh/esbpf" rel="noopener noreferrer"&gt;esBPF&lt;/a&gt; project has been over one year and it began with the idea that Is it worth filtering ingress packets on Software-Offload layer instead of Network Stack? Software-Offload is similar to Hardware-Offload, but it works in ethernet driver. Now time to do Stress-testing since its prototype was released and the comparison object will be iptables.&lt;/p&gt;

&lt;p&gt;Before walking through the article, let me define a few short terms against typing exhausting long terms,&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Long Term&lt;/th&gt;
&lt;th&gt;Short Term&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raspberry Pi 3&lt;/td&gt;
&lt;td&gt;Rpi3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Host Machine&lt;/td&gt;
&lt;td&gt;Host&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Testbed
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Host&lt;/code&gt; and &lt;code&gt;Rpi3&lt;/code&gt; are on link connection of the same LAN of the AP below that it supports &lt;strong&gt;HW-offload&lt;/strong&gt; and being &lt;strong&gt;Bridge&lt;/strong&gt; mode against its Kernel interrupts forwarding packets between them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    High-Performance AP
                      - HW-offload Supported
                      - Bridge Mode
                    +-----------------+
                    |   Wireless AP   |
                    +-----------------+
      100Mbps link    |             |     1Gbps link
           +----------+             +-----------+
           |                                    |
+-------------------+                 +-------------------+
| Raspberry Pi 3    |                 | Host Machine      |
| (192.168.219.103) |                 | (192.168.219.108) |
+-------------------+                 +-------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also using &lt;em&gt;hping3&lt;/em&gt; program for Stress-testing that is going to be just flooding ICMP packets to &lt;code&gt;Rpi3&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ hping3 --icmp --faster 192.168.219.103 -d 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tuning Raspberry-Pi 3 for the testing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ubuntu 22.10 Kinetic Release - Kernel 5.19.0-1007 (Arm64)&lt;/li&gt;
&lt;li&gt;Enable &lt;code&gt;CONFIG_HOTPLUG_CPU&lt;/code&gt; to on/off CPU cores&lt;/li&gt;
&lt;li&gt;esBPF-based customized eth driver, &lt;a href="https://github.com/memnoth/smsc95xx-esbpf" rel="noopener noreferrer"&gt;smsc95xx-esbpf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Off &lt;code&gt;wlan0&lt;/code&gt; interface not to mess up routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's set up using 2 cores instead of entire CPUs to load up full traffic on a specific number of cores by &lt;code&gt;maxcpus=2&lt;/code&gt; at boot command-line. Hence we have 2 online and offline cores respectively,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ubuntu@ubuntu:~$ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0,1
  Off-line CPU(s) list:  2,3
Vendor ID:               ARM
  Model name:            Cortex-A53
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Briefing about smsc95xx-esbpf
&lt;/h2&gt;

&lt;p&gt;Two significant files exist under a directory &lt;code&gt;/proc/smsc95xx/esbpf&lt;/code&gt; once the driver has been loaded on Kernel and each other is responsible for ... &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;rx_enable : turns on/off &lt;code&gt;esbpf&lt;/code&gt; operations.&lt;/li&gt;
&lt;li&gt;rx_hooks  : is supposed to be written by a program of cBPF instructions.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Stress-testing
&lt;/h2&gt;

&lt;p&gt;We are going to look at &lt;em&gt;mpstat&lt;/em&gt; values and compare &lt;em&gt;NET_RX&lt;/em&gt; in &lt;code&gt;/proc/softirqs&lt;/code&gt; before and after executing &lt;em&gt;hping3&lt;/em&gt;. Please suppose the program would be running for 60 seconds on &lt;code&gt;Host&lt;/code&gt; in each case.&lt;/p&gt;

&lt;p&gt;Here is the idle usage of the CPUs of &lt;code&gt;Rpi3&lt;/code&gt;. The &lt;em&gt;idle&lt;/em&gt; columns are almost the same in both testing cases, &lt;strong&gt;iptables&lt;/strong&gt; and &lt;strong&gt;Software-Offload&lt;/strong&gt; before generating massive traffic on the LAN.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ mpstat -P ALL 3
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
all    0.00    0.00    0.17    0.00    0.00    0.17    0.00    0.00    0.00   99.66
  0    0.00    0.00    0.34    0.00    0.00    0.00    0.00    0.00    0.00   99.66
  1    0.00    0.00    0.00    0.00    0.00    0.34    0.00    0.00    0.00   99.66
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. iptables
&lt;/h3&gt;

&lt;p&gt;In the first test, the following rule is supposed to be appended in &lt;em&gt;INPUT&lt;/em&gt; part on &lt;code&gt;Rpi3&lt;/code&gt; and as the result, one of the CPUs is being performed by &lt;em&gt;softirq&lt;/em&gt; which means &lt;em&gt;so busy&lt;/em&gt; to work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;iptables &lt;span class="nt"&gt;-A&lt;/span&gt; INPUT &lt;span class="nt"&gt;-p&lt;/span&gt; icmp &lt;span class="nt"&gt;-j&lt;/span&gt; DROP
&lt;span class="nv"&gt;$ &lt;/span&gt;iptables &lt;span class="nt"&gt;-nvL&lt;/span&gt;
Chain INPUT &lt;span class="o"&gt;(&lt;/span&gt;policy ACCEPT 0 packets, 0 bytes&lt;span class="o"&gt;)&lt;/span&gt;
 pkts bytes target     prot opt &lt;span class="k"&gt;in     &lt;/span&gt;out     &lt;span class="nb"&gt;source               &lt;/span&gt;destination         
    0     0 DROP       icmp &lt;span class="nt"&gt;--&lt;/span&gt;  &lt;span class="k"&gt;*&lt;/span&gt;      &lt;span class="k"&gt;*&lt;/span&gt;       0.0.0.0/0            0.0.0.0/0

&lt;span class="c"&gt;# NET_RX softirq count before massive traffic&lt;/span&gt;
                    CPU0       CPU1       CPU2       CPU3
      NET_RX:        123         66          0          0

&lt;span class="c"&gt;# NET_RX softirq count after that&lt;/span&gt;
                    CPU0       CPU1       CPU2       CPU3
      NET_RX:      15040      35021          0          0

&lt;span class="c"&gt;# mpstat&lt;/span&gt;
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
all    0.00    0.00    0.18    0.00    0.00   52.89    0.00    0.00    0.00   46.94
  0    0.00    0.00    0.37    0.00    0.00    0.74    0.00    0.00    0.00   98.89
  1    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. esBPF
&lt;/h3&gt;

&lt;p&gt;In the second test, it's going to drop the same type of packets in &lt;strong&gt;Software-Offload&lt;/strong&gt;, in other words, in-driver. Special tools are required for doing that, &lt;em&gt;tcpdump&lt;/em&gt; and &lt;a href="https://github.com/memnoth/esbpf/tree/master/tools" rel="noopener noreferrer"&gt;&lt;em&gt;filter_icmp&lt;/em&gt;&lt;/a&gt; but the latter already has hard-coded cBPF instructions, so &lt;em&gt;tcpdump&lt;/em&gt; ain't necessary at this point.&lt;/p&gt;

&lt;p&gt;The hard-coded part is as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;sock_filter&lt;/span&gt; &lt;span class="n"&gt;insns&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="cm"&gt;/* tcpdump -dd -nn icmp */&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mh"&gt;0x28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x0000000c&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mh"&gt;0x15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x00000800&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mh"&gt;0x30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x00000017&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mh"&gt;0x15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x00000001&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mh"&gt;0x6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x00040000&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mh"&gt;0x6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0x00000000&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the program is executed by the following command that actually tries writing the above instructions to &lt;em&gt;esBPF&lt;/em&gt; module.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; ./filter_icmp /proc/smsc95xx/esbpf/rx_hooks
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo echo &lt;/span&gt;1 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /proc/smsc95xx/esbpf/rx_enable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even though &lt;em&gt;hping3&lt;/em&gt; works in the same flow, &lt;em&gt;NET_RX&lt;/em&gt; didn't rise as much as the first case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# NET_RX softirq count before massive traffic&lt;/span&gt;
                    CPU0       CPU1       CPU2       CPU3
      NET_RX:        129         81          0          0

&lt;span class="c"&gt;# NET_RX softirq count after that&lt;/span&gt;
                    CPU0       CPU1       CPU2       CPU3
      NET_RX:        141         94          0          0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, the average usage of CPUs by &lt;em&gt;softirq&lt;/em&gt; is around &lt;strong&gt;8%&lt;/strong&gt; up to &lt;strong&gt;30%&lt;/strong&gt; by looking at the best and worst cases respectively.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# mpstat in the best case&lt;/span&gt;
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
all    0.00    0.00    0.64    0.00    0.00    7.99    0.00    0.00    0.00   91.37
  0    0.00    0.00    0.65    0.00    0.00    6.54    0.00    0.00    0.00   92.81
  1    0.00    0.00    0.62    0.00    0.00    9.38    0.00    0.00    0.00   90.00

&lt;span class="c"&gt;# mpstat in the worst case&lt;/span&gt;
CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
all   18.31    0.00    4.58    0.96    0.00   27.47    0.00    0.00    0.00   48.67
  0   14.50    0.00    4.00    1.00    0.00   26.00    0.00    0.00    0.00   54.50
  1   21.86    0.00    5.12    0.93    0.00   28.84    0.00    0.00    0.00   43.26
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that you may sometimes see a few ICMP packets coming to the Network Stack although &lt;em&gt;esBPF&lt;/em&gt; is enabled. No worries they are just from &lt;em&gt;lo&lt;/em&gt; interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;esBPF&lt;/em&gt; works in &lt;strong&gt;Software-Offload&lt;/strong&gt;, as known as device driver layer against &lt;em&gt;Netfilter&lt;/em&gt;, a super-set of &lt;em&gt;iptables&lt;/em&gt; which works in &lt;strong&gt;Network Stack&lt;/strong&gt;. Therefore it drops all incoming packets matched to the filters in &lt;em&gt;tasklet&lt;/em&gt; level instead of &lt;em&gt;NET_RX (part of Network Stack)&lt;/em&gt; and as we see the result of &lt;em&gt;esBPF&lt;/em&gt;, Kernel doesn't need any extra tasks.&lt;/p&gt;

&lt;p&gt;The project could be better than packet filtering in Network Stack in some cases even though the worst case takes CPU resources about four times than the best case. Of course, it depends on how big/long cBPF instructions are in &lt;em&gt;esBPF&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The project is still in progress for making it to be more flexible, optimization, and taking cache mechanism.&lt;/p&gt;

&lt;p&gt;I figured out through this &lt;strong&gt;Stress-testing&lt;/strong&gt; that it will be worth putting more effort into the project and keep working. Also, it was a great time to take the responsibility for the entire process from design to testing.&lt;/p&gt;

&lt;p&gt;happy hacking!&lt;/p&gt;

</description>
      <category>bpf</category>
      <category>packetfilter</category>
      <category>network</category>
      <category>kernel</category>
    </item>
    <item>
      <title>AppArmor testsuite</title>
      <dc:creator>Leesoo Ahn</dc:creator>
      <pubDate>Sat, 11 May 2024 08:47:21 +0000</pubDate>
      <link>https://dev.to/lsahn/apparmor-testsuite-2421</link>
      <guid>https://dev.to/lsahn/apparmor-testsuite-2421</guid>
      <description>&lt;p&gt;유저레벨에서 개발되는 미들웨어 프로그램들은 대부분(?) unittest를 지원한다. 그러나 kernel은 이야기가 좀 달라지는데... 하드웨어 위에서 자원을 관리하는 프로그램이다 보니 unittest를 수행할 환경이 없다. 최근에 KUnit을 이용해서 함수/기능 단위로 테스트하긴 하지만 오늘은 유저 레벨에서 수행하는 테스트에 대해 이야기해보자.&lt;/p&gt;

&lt;p&gt;클라이언트와 kernel 관련 업무를 진행하다보면 항상 stability 단어가 언급되곤 한다. 특히나 선행연구 과제라면 kernel에서 지원하지 않는 기능에 대해 프로토타입으로 개발하고 양산에 적용할지 고민하는데, 클라이언트는 기능이 동작하는 동안 크리티컬 이슈가 발생하지 않길 원하여 안정성을 검증할 만한 테스트를 요구한다. 오늘 소개하는 프레임워크(?)가 바로 AppArmor의 안정성을 검증할 수 있는 하나의 방법이다.&lt;/p&gt;

&lt;p&gt;AppArmor &lt;a href="https://gitlab.com/apparmor/apparmor" rel="noopener noreferrer"&gt;프로젝트 사이트&lt;/a&gt;에 접속하면 user-space 도구를 관리하는 repo를 볼 수 있는데 그중에 &lt;code&gt;tests/regression/apparmor&lt;/code&gt;가 testsuite이다. &lt;em&gt;HOW TO INSTALL&lt;/em&gt;은 off topic이고 여기서는 간단하게 사용하고 결과를 보도록 하자.&lt;/p&gt;

&lt;p&gt;해당 repo를 clone 하고 &lt;code&gt;tests/regression/apparmor&lt;/code&gt;로 이동하면 아래와 같이 여러개의 파일이 보인다.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1xpjzigrmbmzhsqk19x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1xpjzigrmbmzhsqk19x.png" alt="ls-apparmor" width="800" height="674"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;이제 &lt;code&gt;make tests&lt;/code&gt; 명령어를 실행하여 testsuite를 순차적으로 실행한다. (실행에 필요한 디펜던시는 이미 설치하였다)&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzy5ww8xy98z31kfx1zj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzy5ww8xy98z31kfx1zj.png" alt="run-testsuite" width="800" height="674"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;자동화 프레임워크이다 보니 혼자서 수행하고 아래와 같이 report를 출력하며, 사용자는 이를 통해 PASS 및 FAIL 서브 테스트를 확인할 수 있다.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhksfbpzz8lesslc7ooxo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhksfbpzz8lesslc7ooxo.png" alt="report" width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;결국 testsuite를 수행함으로서 선행연구 패치가 적용된 kernel이  apparmor 기능 수행에 문제가 없는지 판단하여 양산에 적용할지 결정하게 된다. 물론 이를 통해 발견하지 못한 이슈들도 여전히 있으나 testsuite는 기능이 안정적으로 동작함을 검증할 수 있는 첫번째 단계가 되기도 한다.&lt;/p&gt;

</description>
      <category>apparmor</category>
      <category>testsuite</category>
      <category>kernel</category>
      <category>security</category>
    </item>
  </channel>
</rss>
