<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jayash Tripathy</title>
    <description>The latest articles on DEV Community by Jayash Tripathy (@jayash_tripathy_921c22d37).</description>
    <link>https://dev.to/jayash_tripathy_921c22d37</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1669189%2Feb896d3a-268e-4e98-ba50-2b4acc2681b2.png</url>
      <title>DEV Community: Jayash Tripathy</title>
      <link>https://dev.to/jayash_tripathy_921c22d37</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jayash_tripathy_921c22d37"/>
    <language>en</language>
    <item>
      <title>Distributed Media Inferencing with Kafka</title>
      <dc:creator>Jayash Tripathy</dc:creator>
      <pubDate>Thu, 06 Nov 2025 20:02:15 +0000</pubDate>
      <link>https://dev.to/jayash_tripathy_921c22d37/distributed-media-inferencing-with-kafka-48jg</link>
      <guid>https://dev.to/jayash_tripathy_921c22d37/distributed-media-inferencing-with-kafka-48jg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As Vision AI gains popularity, more organizations from government to the private sector are adopting it to solve real-world challenges: tracking assets, counting people, monitoring vehicles, and more. However, one of the biggest challenges we face while building these systems is Scaling (&lt;code&gt;Processing massive volumes of data in real-time while working within constrained computational resources&lt;/code&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before diving in, you should be familiar with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka basics:&lt;/strong&gt; Topics, partitions, producers, and consumers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python:&lt;/strong&gt; Threading and async programming concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video processing:&lt;/strong&gt; Basic understanding of frames and inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker:&lt;/strong&gt; Running containerized services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Problem: Bottlenecks in Traditional Approaches
&lt;/h3&gt;

&lt;p&gt;Traditional media processing pipelines often rely on synchronous, monolithic architectures that are difficult to scale. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resource Contention&lt;/strong&gt;: CPU/GPU resources are tied up during both extraction and inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Fault Tolerance&lt;/strong&gt;: If one process fails, the entire pipeline stops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inflexible Scaling&lt;/strong&gt;: Can't independently scale frame extraction vs. inference workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enter Kafka: Decoupling and Scaling
&lt;/h3&gt;

&lt;p&gt;Kafka addresses these challenges by introducing an event-driven, decoupled architecture. Here's how:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous Decoupling&lt;/strong&gt;
Frame extraction and inference can run independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal Scaling&lt;/strong&gt;
Easily scale extraction and inference workloads independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fault Tolerance&lt;/strong&gt;
If one process fails, the other can continue running.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent Scaling&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;High video input? Add more producers and extraction workers.&lt;/li&gt;
&lt;li&gt;Slow inference? Add more inference workers.&lt;/li&gt;
&lt;li&gt;Need real-time results? Scale inference workers; extraction continues unaffected.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Enough talk - let's roll up our sleeves and dive into the Magic. 🎬
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Clone the repository
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/JayashTripathy/Distributed-Media-Inferencing-With-Kafka.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Distributed-Media-Inferencing-With-Kafka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Start Docker Compose
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Install dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;In this blog I will not overwhelm you with big and cumbersome &lt;br&gt;
codeblocks. I will focus on the core concepts and how to use them in a practical way. And anyways you can always checkout the code in the &lt;a href="https://github.com/JayashTripathy/Distributed-Media-Inferencing-With-Kafka" rel="noopener noreferrer"&gt;repository&lt;/a&gt;.&lt;/p&gt;


&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;p&gt;Here's how everything fits together:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51kkcufevlgu1b2lbfio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51kkcufevlgu1b2lbfio.png" alt="Architecture Diagram" width="800" height="1130"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  The Flow:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Videos are stored in &lt;code&gt;media/videos_to_process/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Producers extract frames and send metadata to Kafka (3 partitions)&lt;/li&gt;
&lt;li&gt;Kafka distributes frame metadata across partitions&lt;/li&gt;
&lt;li&gt;Consumers (3 threads) pull metadata, load frames, and run inference&lt;/li&gt;
&lt;li&gt;Annotated frames are saved to &lt;code&gt;media/annotated_frames/&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;So, where it all begin? &lt;strong&gt;Media streams&lt;/strong&gt; - the life blood of our pipeline, whether it's a video file, a live camera feed, or streaming content, we need a way to ingest it. For this example we'll work with video files stored in the &lt;code&gt;media/videos_to_process&lt;/code&gt; directory. You can add as many videos as you want to this directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;media/videos_to_process/
├── video1.mp4 &lt;span class="c"&gt;# Video 1&lt;/span&gt;
├── video2.mp4 &lt;span class="c"&gt;# Video 2&lt;/span&gt;
├── video3.mp4 &lt;span class="c"&gt;# Video 3&lt;/span&gt;
└── ... &lt;span class="c"&gt;# More videos&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Act 1: Building the Highway 🛣️ - Setup Kafka topics
&lt;/h3&gt;

&lt;p&gt;Before we start moving frames, we build the highway they will travel on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python cli.py setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Behind this command, we're creating a Kafka topic with 3 partitions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;admin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AdminClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kafka_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bootstrap_servers&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="c1"&gt;# Partitions: 3 (enables parallel processing)
&lt;/span&gt;    &lt;span class="c1"&gt;# Replication: 1 (for local dev; production would use 3+)
&lt;/span&gt;    &lt;span class="n"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_topics&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;NewTopic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafka_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;frame_topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Understanding Partitions:&lt;/strong&gt; Think of a Kafka topic as a highway with multiple lanes. Each lane (partition) is an ordered, immutable sequence of messages. Multiple vehicles (consumers) can travel side-by-side, each in their own lane, without blocking each other.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frame_topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt; &lt;span class="n"&gt;Partition&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;frame1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt; &lt;span class="n"&gt;Partition&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;frame2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="err"&gt;└──&lt;/span&gt; &lt;span class="n"&gt;Partition&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;frame3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;More lanes 🛣️ = More vehicles 🚗 = More throughput 🚀&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;we start with 3, but you can scale up as your traffic grows.&lt;/p&gt;

&lt;p&gt;This topic is where frame metadata will flow, ready to be distributed to your inference workers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Act 2: The Frame Factory 🔨 - Producing Frames
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Journey Begins
&lt;/h4&gt;

&lt;p&gt;For each video, the producer performs the following steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;videos_to_process&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;open_video&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="c1"&gt;# Step 1: Extract frame
&lt;/span&gt;         &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

         &lt;span class="c1"&gt;# Step 2: Store heavy frame locally (not in Kafka!)
&lt;/span&gt;         &lt;span class="n"&gt;frame_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;media/raw_frames/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;video_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;frame_no&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
         &lt;span class="nf"&gt;save_to_disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

         &lt;span class="c1"&gt;# Step 3: Send lightweight metadata to Kafka
&lt;/span&gt;         &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
               &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;video_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frame_no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;frame_no&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;
         &lt;span class="n"&gt;kafka_producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frame_topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

         &lt;span class="c1"&gt;# Frame continues journey on Kafka highway...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The producer does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extracts frames&lt;/strong&gt; at a configurable interval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stores raw frames locally&lt;/strong&gt; in the &lt;code&gt;media/raw_frames/&lt;/code&gt; directory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Publishes lightweight metadata&lt;/strong&gt; to Kafka — just a JSON message containing the video name and frame number.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;Note: In a production environment, you would typically use a more robust storage solution, such as S3 or a database.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here's the clever part: &lt;strong&gt;&lt;em&gt;we don't send the entire frame, just the metadata. This keeps the payload small and fast.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store frames on disk (they're heavy—JPEG images can be 100KB+ each)&lt;/li&gt;
&lt;li&gt;Send metadata through Kafka (lightweight—just a few bytes of JSON)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Kafka is optimized for small messages. Sending 100KB images would slow everything down and fill up your Kafka brokers quickly. By storing frames separately and only sending metadata, we keep Kafka fast and efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now, let's launch a multi-threaded producers army&lt;/strong&gt; that extracts frames from the videos and sends them to Kafka using the following command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python cli.py produce
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Internally, the producer uses a ThreadPoolExecutor to process multiple videos simultaneously. This means if you have 5 videos, up to 10 threads can work on extracting and producing frames concurrently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;media_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Each video gets its own worker thread
&lt;/span&gt;        &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;produce_media_frames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;media_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;media_path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;media_paths&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Act 3: The Inference Squad 🤖 - Consuming and Processing Frames
&lt;/h3&gt;

&lt;p&gt;Frames are queued in Kafka. Time for the inference workers to process them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python cli.py consume
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This launches a multi-threaded consumer army&lt;/strong&gt; that pulls messages from Kafka and runs YOLO11 inference.&lt;/p&gt;

&lt;p&gt;This is what is happening inside of the consumer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;consumer_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;consumer_thread&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 1: Pull message from Kafka highway
&lt;/span&gt;        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafka_consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Load actual frame from disk
&lt;/span&gt;        &lt;span class="n"&gt;frame_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;media/raw_frames/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;video_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;frame_no&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;frame_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_from_disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 3: Run YOLO11 inference
&lt;/span&gt;        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;YOLO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yolo11n.pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 4: Draw bounding boxes &amp;amp; annotations
&lt;/span&gt;        &lt;span class="n"&gt;annotated_frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;draw_detections&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 5: Save annotated frame
&lt;/span&gt;        &lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;media/annotated_frames/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;video_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;frame_no&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="nf"&gt;save_to_disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;annotated_frame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 6: Commit to Kafka - "I'm done, next frame!"
&lt;/span&gt;        &lt;span class="n"&gt;kafka_consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Magic of Parallel Processing: Remember those 3 partitions we created in Act 1? Here's how they enable true parallelism:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Partition&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Consumer&lt;/span&gt; &lt;span class="n"&gt;Thread&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt; &lt;span class="n"&gt;Frames&lt;/span&gt;
&lt;span class="n"&gt;Partition&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Consumer&lt;/span&gt; &lt;span class="n"&gt;Thread&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt; &lt;span class="n"&gt;Frames&lt;/span&gt;  
&lt;span class="n"&gt;Partition&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Consumer&lt;/span&gt; &lt;span class="n"&gt;Thread&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt; &lt;span class="n"&gt;Frames&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Each thread processes frames from its assigned partition independently. This means:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;No waiting: Thread 1 doesn't block Thread 2 or Thread 3&lt;/li&gt;
&lt;li&gt;Full resource utilization: All CPU/GPU cores can be used simultaneously&lt;/li&gt;
&lt;li&gt;Linear scaling: Add more partitions + threads = more throughput&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Power of Consumer Groups
&lt;/h3&gt;

&lt;p&gt;Kafka's consumer groups automatically distribute work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Message distribution&lt;/strong&gt;: Each consumer gets a fair share of the messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load balancing&lt;/strong&gt;: Even if some consumers are slower, the others pick up the slack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fault tolerance&lt;/strong&gt;: If a consumer fails, the group automatically rebalances&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;After inference, annotated frames are saved to &lt;code&gt;media/annotated_frames/&lt;/code&gt;, organized by video name. Each frame shows detected objects and their bounding boxes.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Journey Complete:
&lt;/h4&gt;

&lt;p&gt;Video → Frame Extraction → Kafka Topic → Multi-threaded Consumers → YOLO Inference → Annotated Frames&lt;/p&gt;

&lt;p&gt;The beauty of this architecture is everything runs asynchronously. Producers keep feeding Kafka, Kafka keeps distributing work, and consumers keep processing - all independently, all in parallel.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dataengineering</category>
      <category>architecture</category>
      <category>python</category>
    </item>
  </channel>
</rss>
