<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Raja Rathour</title>
    <description>The latest articles on DEV Community by Raja Rathour (@raja_rathour_27afcb168fe0).</description>
    <link>https://dev.to/raja_rathour_27afcb168fe0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3677943%2Fc27a7f1f-86c7-4ba8-a9a9-07d67efe22b6.png</url>
      <title>DEV Community: Raja Rathour</title>
      <link>https://dev.to/raja_rathour_27afcb168fe0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raja_rathour_27afcb168fe0"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Raja Rathour</dc:creator>
      <pubDate>Thu, 25 Dec 2025 08:28:49 +0000</pubDate>
      <link>https://dev.to/raja_rathour_27afcb168fe0/-35f9</link>
      <guid>https://dev.to/raja_rathour_27afcb168fe0/-35f9</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/raja_rathour_27afcb168fe0" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3677943%2Fc27a7f1f-86c7-4ba8-a9a9-07d67efe22b6.png" alt="raja_rathour_27afcb168fe0"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/raja_rathour_27afcb168fe0/how-i-optimized-ffmpeg-filters-with-slice-threading-my-first-contribution-41nc" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;How I Optimized FFmpeg Filters with Slice Threading (My First Contribution)&lt;/h2&gt;
      &lt;h3&gt;Raja Rathour ・ Dec 25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#c&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#opensource&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ffmpeg&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#performance&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>c</category>
      <category>opensource</category>
      <category>ffmpeg</category>
      <category>performance</category>
    </item>
    <item>
      <title>How I Optimized FFmpeg Filters with Slice Threading (My First Contribution)</title>
      <dc:creator>Raja Rathour</dc:creator>
      <pubDate>Thu, 25 Dec 2025 08:26:06 +0000</pubDate>
      <link>https://dev.to/raja_rathour_27afcb168fe0/how-i-optimized-ffmpeg-filters-with-slice-threading-my-first-contribution-41nc</link>
      <guid>https://dev.to/raja_rathour_27afcb168fe0/how-i-optimized-ffmpeg-filters-with-slice-threading-my-first-contribution-41nc</guid>
      <description>&lt;p&gt;Contributing to &lt;strong&gt;FFmpeg&lt;/strong&gt; has always been a "final boss" level goal for me. As a Mathematics and Computing Engineering student aiming for Google Summer of Code (GSoC), I knew I needed to do more than just fix a typo. I wanted to touch the core performance of the software.&lt;/p&gt;

&lt;p&gt;My mission? &lt;strong&gt;Slice Threading.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FFmpeg has hundreds of video filters, but many of them still run on a single CPU core. My task was to take &lt;code&gt;vf_alphamerge&lt;/code&gt; (which merges an alpha channel into a video) and &lt;code&gt;vf_blackframe&lt;/code&gt; (which detects black frames), and modernize them to run in parallel across all available cores.&lt;/p&gt;

&lt;p&gt;Here is the story of how I modified the code, broke the Windows build, and learned to love C90 strictness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge: Why was it slow?
&lt;/h2&gt;

&lt;p&gt;Video processing is computationally expensive. In the original code for &lt;code&gt;vf_alphamerge&lt;/code&gt;, the pixel processing happened in a simple, nested &lt;code&gt;for&lt;/code&gt; loop. This meant that even though my laptop has 16 logical cores, the filter was only using &lt;strong&gt;one&lt;/strong&gt; of them. The other 15 sat idle.&lt;/p&gt;

&lt;p&gt;Here is a simplified look at the "Before" code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The old, single-threaded way&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// ... heavy pixel copying logic ...&lt;/span&gt;
        &lt;span class="c1"&gt;// This runs sequentially, row by row.&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Solution: Slice Threading
&lt;/h2&gt;

&lt;p&gt;To fix this, I needed to implement &lt;strong&gt;Slice Threading&lt;/strong&gt;. This technique splits the video frame into horizontal "slices" (strips) and hands each slice to a different thread to process simultaneously.&lt;/p&gt;

&lt;p&gt;I replaced the standard loop with FFmpeg's &lt;code&gt;avfilter_execute&lt;/code&gt; callback system. I moved the processing logic into a new function &lt;code&gt;alphamerge_slice&lt;/code&gt; and created a &lt;code&gt;ThreadData&lt;/code&gt; struct to pass pointers.&lt;/p&gt;

&lt;p&gt;It felt great. The code compiled on my Linux machine, the logic seemed sound, and I confidently sent my patch to the &lt;code&gt;ffmpeg-devel&lt;/code&gt; mailing list using &lt;code&gt;git send-email&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Red 1" Error: A Lesson in C90
&lt;/h2&gt;

&lt;p&gt;Then, I saw it. The dreaded &lt;strong&gt;Red "Fail"&lt;/strong&gt; on the Patchwork dashboard.&lt;/p&gt;

&lt;p&gt;My patch passed on Linux and macOS, but the &lt;strong&gt;Windows (MSVC)&lt;/strong&gt; build bot failed with a compilation error. I was confused. &lt;em&gt;It compiles fine for me! What is wrong?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The issue wasn't my logic; it was my standard. FFmpeg adheres to the strict &lt;strong&gt;C90 (ISO C)&lt;/strong&gt; standard because it needs to run on everything from supercomputers to ancient embedded devices. The Microsoft Visual C++ compiler (MSVC) is notoriously strict about this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Mistake:&lt;/strong&gt; I had declared variables inside loops and &lt;code&gt;if&lt;/code&gt; blocks—standard practice in modern C++ or Python, but illegal in C90.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ C99 Style (What I wrote initially)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;is_packed_rgb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Error! Declaration after statement&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I had to refactor the entire function, forcing every single variable declaration to the absolute top of the function scope.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ C90 Style (What FFmpeg demands)&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;is_packed_rgb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It felt tedious, but it taught me a valuable lesson about writing portable, low-level code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification: Debugging with GDB
&lt;/h2&gt;

&lt;p&gt;Fixing the compiler error was one thing, but how could I &lt;em&gt;prove&lt;/em&gt; it was actually multithreaded?&lt;/p&gt;

&lt;p&gt;I fired up &lt;strong&gt;GDB (GNU Debugger)&lt;/strong&gt;. This led to my favorite moment of the project. I set a breakpoint inside my new slice function and inspected the &lt;code&gt;nb_jobs&lt;/code&gt; (number of jobs) variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(gdb) break alphamerge_slice
(gdb) run
...
[New Thread 0x7fffb2ff56c0 (LWP 25133)]
Thread 42 hit Breakpoint 1, alphamerge_slice (jobnr=2, nb_jobs=16)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;nb_jobs = 16&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Seeing that number confirmed that FFmpeg was spinning up 16 separate threads to crunch the video data. I also learned a cool GDB trick: &lt;code&gt;set scheduler-locking on&lt;/code&gt;. Without this, the threads raced each other so fast that my debugger kept jumping between them!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This contribution taught me that Open Source isn't just about writing algorithms. It's about:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Respecting Standards:&lt;/strong&gt; Legacy support matters (C90 vs C99).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tooling:&lt;/strong&gt; Mastering &lt;code&gt;git send-email&lt;/code&gt; and &lt;code&gt;GDB&lt;/code&gt; is a superpower.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence:&lt;/strong&gt; A "Fail" on the dashboard isn't a rejection; it's just a to-do list item.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My patches for &lt;code&gt;vf_alphamerge&lt;/code&gt; and &lt;code&gt;vf_blackframe&lt;/code&gt; are now under review. If you are a student scared to contribute to open source, just dive in. The errors are scary, but solvable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;My Patch Submission:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://patchwork.ffmpeg.org/project/ffmpeg/patch/20251223164441.123475-1-imraja729@gmail.com/" rel="noopener noreferrer"&gt;View on FFmpeg Patchwork for avfilter/vf_blackframe&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://patchwork.ffmpeg.org/project/ffmpeg/patch/20251223164219.122112-1-imraja729@gmail.com/" rel="noopener noreferrer"&gt;View on FFmpeg Patchwork for avfilter/vf_alphamerge&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools Used:&lt;/strong&gt; GDB, Git, MSVC Compiler&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53p41t6tm80bsls2i8fh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53p41t6tm80bsls2i8fh.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>c</category>
      <category>opensource</category>
      <category>ffmpeg</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
