<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ByteDance Open Source</title>
    <description>The latest articles on DEV Community by ByteDance Open Source (@bytedanceoss).</description>
    <link>https://dev.to/bytedanceoss</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F3590%2F2f2dc0f9-07d3-4c26-9aef-ea70400f4147.jpeg</url>
      <title>DEV Community: ByteDance Open Source</title>
      <link>https://dev.to/bytedanceoss</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bytedanceoss"/>
    <language>en</language>
    <item>
      <title>Inside AIO Sandbox (Part 1): Files &amp; Shell — The Foundations of Agent Execution</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:36:56 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/inside-aio-sandbox-part-1-files-shell-the-foundations-of-agent-execution-4pe5</link>
      <guid>https://dev.to/bytedanceoss/inside-aio-sandbox-part-1-files-shell-the-foundations-of-agent-execution-4pe5</guid>
      <description>&lt;p&gt;&lt;em&gt;by AIO Sandbox Team&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Modern AI agents are no longer just generating text—they are expected to &lt;strong&gt;write files, modify code, and execute commands&lt;/strong&gt;.&lt;br&gt;
But doing this directly on your local machine or production systems is risky and hard to control.&lt;br&gt;
This is where &lt;strong&gt;&lt;a href="https://github.com/agent-infra/sandbox" rel="noopener noreferrer"&gt;AIO Sandbox&lt;/a&gt;&lt;/strong&gt; comes in. It provides an &lt;strong&gt;isolated, programmable environment&lt;/strong&gt; where agents can safely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;create and manipulate files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;run shell commands&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;execute code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;produce artifacts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and many more...&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike a typical docker container, which often requires manual configuration for tool-chaining, the AIO Sandbox integrates a browser, a shell, and a file system into a single environment designed for AI agents. This unified architecture ensures that artifacts remain persistent and accessible across every stage of an AI-driven workflow executing within the sandbox.&lt;br&gt;
In this first post, we’ll focus on the two most fundamental capabilities:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧩 &lt;strong&gt;Filesystem (state)&lt;/strong&gt; &lt;br&gt;
⚙️ &lt;strong&gt;Shell (execution)&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By the end, you’ll see how these combine into a complete runtime for agents.&lt;/p&gt;

&lt;h1&gt;
  
  
  🌐 Multi-language SDK Support
&lt;/h1&gt;

&lt;p&gt;While this tutorial uses Python, &lt;strong&gt;AIO Sandbox&lt;/strong&gt; &lt;strong&gt;is not limited to Python developers&lt;/strong&gt;.&lt;br&gt;
The &lt;strong&gt;agent-sandbox&lt;/strong&gt; &lt;strong&gt;SDK&lt;/strong&gt; &lt;strong&gt;also supports&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;TypeScript / JavaScript&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Go (Golang)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This makes it easy to integrate AIO Sandbox into a wide range of agent frameworks, backend services, and developer stacks.&lt;/p&gt;

&lt;h1&gt;
  
  
  🛠️ Prerequisites
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Python 3.12+&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A running AIO Sandbox instance at &lt;code&gt;http://localhost:8080&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Docker&lt;/strong&gt; &lt;strong&gt;Command&lt;/strong&gt;:&lt;br&gt;
docker run --security-opt seccomp=unconfined --rm -it -p 8080:8080 ghcr.io/agent-infra/sandbox:latest&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Python SDK installed&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;pip install agent-sandbox&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  🧠 Mental Model
&lt;/h1&gt;

&lt;p&gt;Think of AIO Sandbox as a &lt;strong&gt;remote, disposable&lt;/strong&gt; &lt;strong&gt;Linux&lt;/strong&gt; &lt;strong&gt;machine&lt;/strong&gt; that your agent controls via APIs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Filesystem → where data and artifacts live&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shell → how actions are executed&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Simple Flow:&lt;br&gt;
Agent → API → Sandbox → Filesystem + Shell&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Autonomous Data Processing &amp;amp; Validation Agent
&lt;/h1&gt;

&lt;p&gt;Rather than presenting APIs for these services in isolation, we will demonstrate the following end-to-end comprehensive agent workflow example operating within the sandbox environment.&lt;br&gt;
This example simulates an agent that executes the following workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create some data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read it (look at it)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write a script file (process.py)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;List files (see what exists)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the script&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read the output file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check if output looks correct&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Notice bad data &amp;amp; Fix the data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the script again&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read updated output&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find files created&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download final result&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this use case demonstrates&lt;/strong&gt;&lt;br&gt;
This use case demonstrates a realistic agent loop enabled using AIO Sandbox File &amp;amp; Shell primitives:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Read → Execute → Read → Validate → Fix → Re-run → Export&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This workflow makes each AIO Sandbox File &amp;amp; Shell primitive feel purposeful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;File primitives&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;write_file&lt;/code&gt; creates data and code&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;read_file&lt;/code&gt; lets the agent inspect inputs and outputs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;list_path&lt;/code&gt; gives workspace awareness&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;replace_in_file&lt;/code&gt; lets the agent repair bad input&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;search_in_file&lt;/code&gt; validates expected output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;find_files&lt;/code&gt; discovers generated artifacts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;download_file&lt;/code&gt; exports results out of the sandbox&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Shell primitive&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;exec_command&lt;/code&gt; runs the actual processing job
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_sandbox&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Sandbox&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sandbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 1. Setup workspace
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;home_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;home_dir&lt;/span&gt;
&lt;span class="n"&gt;app_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;home_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/data_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;data_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/data.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;script_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/process.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;report_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sandbox home directory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;home_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;App directory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 2. Create raw input data (with an intentional error)
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;10
20
INVALID
40
50
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Created raw input data.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 3. Read and inspect input data
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;data_preview&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Raw input data:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_preview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 4. Write processing script
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;script_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;numbers = []

with open(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as f:
    for line in f:
        try:
            numbers.append(int(line.strip()))
        except:
            print(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Skipping invalid line:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, line.strip())

total = sum(numbers)
avg = total / len(numbers)

report = f&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s"&gt;Report Summary
--------------
Valid Count: {len(numbers)}
Total: {total}
Average: {avg}
&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s"&gt;

with open(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as f:
    f.write(report)

print(report)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Created processing script.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 5. List workspace contents
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;workspace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;recursive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Workspace contents:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# --------------------------------------------------
# 6. Execute the processing script
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec_command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cd &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &amp;amp;&amp;amp; python3 process.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;First execution output:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exit code:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 7. Read generated report
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Generated report:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 8. Validate report contents
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_in_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Average: .*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Report validation result:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 9. Detect bad input and fix it
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;data_check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INVALID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data_check&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Detected invalid data. Fixing input file...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace_in_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;old_str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INVALID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;new_str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Read input again after fix
&lt;/span&gt;&lt;span class="n"&gt;updated_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Updated input data:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 10. Re-run the processing script
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec_command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cd &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &amp;amp;&amp;amp; python3 process.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Second execution output:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exit code:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 11. Read final report again
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;final_report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Final report:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 12. Find generated text artifacts
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Discovered artifacts:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 13. Download final report to local machine
# --------------------------------------------------
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Final report downloaded locally as final_report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  🎯 Key Insight
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;AIO Sandbox gives agents a &lt;strong&gt;safe, programmable runtime&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Files → memory/state&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shell → actions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Together, they enable real-world workflows like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;code generation and execution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;data processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;automation pipelines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tool orchestration&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  🚀 What’s Next
&lt;/h1&gt;

&lt;p&gt;Thanks for reading! Hope it was helpful! This is just the beginning. In upcoming posts, we’ll dive into topics such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;🌐 Browser automation (CDP-based)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🔌 MCP tool integration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;📓 Jupyter / notebook execution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🤖 OpenClaw integration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🎯 Reinforcement learning inside sandbox&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  💬 Final Thoughts
&lt;/h1&gt;

&lt;p&gt;AIO Sandbox bridges the gap between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;“LLM that generates text”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Agent that can actually do things”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it does so safely, reproducibly, and programmatically.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Introducing AIO Sandbox, All-in-One Sandbox Environment for AI Agents</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Fri, 27 Mar 2026 02:33:45 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/introducing-aio-sandbox-all-in-one-sandbox-environment-for-ai-agents-18k0</link>
      <guid>https://dev.to/bytedanceoss/introducing-aio-sandbox-all-in-one-sandbox-environment-for-ai-agents-18k0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rz25miqt9hute1a0rjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rz25miqt9hute1a0rjg.png" alt="banner" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unchecked AI autonomy is a ticking time bomb; it’s time to pull the plug on full system unfettered access. We can no longer afford to give AI agents the 'keys to the kingdom' without oversight. The 'wild west' of AI agents running with total system control is officially over.&lt;/p&gt;

&lt;p&gt;AIO Sandbox is an open-source project designed to solve these problems. It is everything your agent needs, out of the box. No more juggling multiple services. AIO Sandbox ships a complete, pre-wired environment in a single Docker container.&lt;/p&gt;

&lt;p&gt;The AIO (All-in-One) Sandbox is a containerized environment designed for both human developers and AI agents. Its architecture is built around a "Batteries-Included" philosophy, providing a full Linux desktop-like environment inside a single Docker container.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified Environment:&lt;/strong&gt; One Docker container with shared filesystem. Files downloaded in the browser are instantly accessible in Terminal and VSCode.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Out of the Box:&lt;/strong&gt; Built‑in VNC browser, VS Code, Jupyter, file manager, and terminal—accessible directly via API/SDK.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent-Ready:&lt;/strong&gt; Pre-configured MCP Server with Browser, File, Terminal, Markdown, Ready-to-use for AI agents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Developer Friendly:&lt;/strong&gt; Cloud-based VSCode with persistent terminals, intelligent port forwarding, and instant frontend/backend previews.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Secure Execution:&lt;/strong&gt; Isolated Python and Node.js sandboxes. Safe code execution without system risks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Production Ready:&lt;/strong&gt; Enterprise-grade Docker deployment. Lightweight, scalable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Calling all AI agent developers!&lt;/strong&gt; How are you securing your builds? Let’s try running your agent in AIO Sandbox and compare notes.&lt;/p&gt;

&lt;p&gt;AIO Sandbox is open-sourced under the Apache License 2.0. Contributions welcome.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/agent-infra/sandbox" rel="noopener noreferrer"&gt;https://github.com/agent-infra/sandbox&lt;/a&gt;&lt;br&gt;
Official website: &lt;a href="https://sandbox.agent-infra.com/" rel="noopener noreferrer"&gt;https://sandbox.agent-infra.com/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Kitex/Hertz Empowers LLMs: A Retrospective of Key Features on Its Third Anniversary</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Thu, 23 Jan 2025 08:24:37 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/kitexhertz-empowers-llms-a-retrospective-of-key-features-on-its-third-anniversary-3aip</link>
      <guid>https://dev.to/bytedanceoss/kitexhertz-empowers-llms-a-retrospective-of-key-features-on-its-third-anniversary-3aip</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;By Yang Rui from CloudWeGo Team&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Github:&lt;a href="https://github.com/cloudwego" rel="noopener noreferrer"&gt;https://github.com/cloudwego&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It has been three years since CloudWeGo's open-source journey began. Adhering to the principle of &lt;strong&gt;Internal and External Consistency&lt;/strong&gt;, we have continuously iterated on our open-source repository, releasing features that served ByteDance internally to the external world. From 2023 to 2024, Kitex/Hertz focused on&lt;strong&gt; LLM back-end services&lt;/strong&gt;, &lt;strong&gt;user experience&lt;/strong&gt;, and &lt;strong&gt;performance&lt;/strong&gt;, aiding the rapid development of new business scenarios while continuously optimizing user experience and performance. Meanwhile, Kitex/Hertz has been widely adopted by external enterprises and attracted numerous external developers, enhancing our CloudWeGo ecosystem all the way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowzv71qkfxama8gquemg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowzv71qkfxama8gquemg.png" alt="keywords" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article summarizes the presentation "&lt;strong&gt;Kitex/Hertz Empowers LLMs: A Retrospective of Key Features on The Third Anniversary&lt;/strong&gt;". It introduces the significant features of Kitex/Hertz over the past year, aiming to assist enterprise users and community developers in better applying Kitex/Hertz to build their microservices systems in their projects.&lt;/p&gt;

&lt;h1&gt;
  
  
  Enhanced Streaming Capabilities to Support LLMs
&lt;/h1&gt;

&lt;p&gt;With the rapid development of LLMs and ByteDance's AI applications, &lt;strong&gt;streaming communication&lt;/strong&gt; has emerged as the primary communication mode for LLM application services. To better support business growth, we have optimized streaming communication in microservices in terms of stability, engineering practices, and performance over the past year.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous Streaming Capabilities of Kitex/Hertz
&lt;/h2&gt;

&lt;p&gt;Both Kitex and Hertz support streaming scenarios. Kitex supports gRPC with better performance than the official gRPC and aligns with its functionality. Hertz supports HTTP Chunked Transfer Encoding and WebSocket. However, these capabilities were insufficient to support the rapid development of LLM internally at ByteDance due to several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;More SSE Applications on Clients&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before MultiModal Machine Learning Models, LLM applications were mainly text-based dialogue scenarios, often using the SSE protocol to return server results to clients in real-time. Text push scenarios are simpler, requiring only a browser-friendly, straightforward protocol.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The Burden of Transitioning from Thrift to Protobuf&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although gRPC (Protobuf) is commonly used for streaming communication in RPC scenarios, and Kitex also supports gRPC, ByteDance's server-side services primarily use Thrift IDLs. Developers are more familiar with Thrift, and there are not many services using gRPC protocols. However, as the demand for streaming increases, we need to reduce the cognitive burden on developers during the transition based on internal realities. Additionally, widely increasing services defined by Protobuf is not conducive to unified IDL/interface management.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lack of Engineering Practices&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to the PingPong model of one-send-one-receive, streaming communication adds complexity in service governance and engineering practices. The industry lacks accumulated engineering practices for streaming communication. Streaming interfaces can be easily misused, affecting service stability. From an observability perspective, there is no definition for streaming monitoring. &lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Capabilities – SSE/Thrift Streaming
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hertz SSE
&lt;/h3&gt;

&lt;p&gt;SSE (Server-Send Events) is based on the HTTP protocol, supporting unidirectional data push from the server to the client. Its advantages include simplicity, ease of use, and developer-friendly, making it suitable for text transmission and meeting the basic communication needs of text dialogue models. Compared to WebSocket, SSE is lighter. For text-based dialogue LLM applications, the server only needs to push data to the client without handling the complexity of bidirectional communication. However, in voice dialogue scenarios, WebSocket, which is also browser-friendly, is more suitable. SSE can define different event types and process data on the client based on event types. In LLM applications, this can be used to distinguish different types of response data (e.g., partial outputs, error messages, status updates).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruydqzlq63w8efh2i335.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruydqzlq63w8efh2i335.png" alt="Hertz SSE" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, SSE is not suitable for the server side due to the following reasons: high computational and transmission performance requirements on the server side, unsuitability for inefficient text protocols, JSON's simplicity but unsuitability for complex server-side interaction scenarios, preference for strongly typed RPC, and the need for bidirectional streaming communication in certain cases.&lt;/p&gt;

&lt;p&gt;Therefore, considering ByteDance's internal needs, we choose to support Thrift Streaming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kitex Thrift Streaming
&lt;/h3&gt;

&lt;p&gt;Streaming communication is used not only in &lt;strong&gt;LLM&lt;/strong&gt; applications but also in other business. For example, &lt;strong&gt;Douyin Search&lt;/strong&gt; aims to improve performance by RPC streaming results. During the video packaging stage, it retrieves information related to recalled video IDs, hoping to bundle services (10 docs) in one request and return the first completed package. In the &lt;strong&gt;Lark People&lt;/strong&gt; data export scenario, data is retrieved concurrently. If all data is filled into an Excel sheet before returning, excessive data can lead to OOM (Out of Memory), causing process exception terminated. Enhancing streaming capabilities not only supports the rapid development of LLMs but also meets the development needs of other business scenarios. &lt;/p&gt;

&lt;p&gt;Although Kitex supports gRPC, we recommend using Thrift internally. Supporting diversity can meet various needs. However, it is best for a company to establish a best practice to minimize the burden on developers' choices, and the tool chain system will also be more supportive. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming Protocols&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Within ByteDance, traffic control for Streaming protocols mainly relies on Service Mesh. However, to quickly support implementation without relying on Service Mesh's support for new protocols, Kitex first supported Thrift Streaming based on gRPC (HTTP2). Since the official gRPC protocol specification supports extending content-type, the implementation is &lt;strong&gt;based on gRPC's RPC communication specification, changing Protobuf encoding to Thrift encoding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zxhb469s8blvt5okm90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zxhb469s8blvt5okm90.png" alt="stream 1" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thrift over gRPC&lt;/strong&gt; began its Alpha at ByteDance in December 2023 and was officially released in Kitex &lt;strong&gt;v0.9.0&lt;/strong&gt; in March 2024. It is now widely used internally, with usage instructions available on the official website.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service Mesh Compatibility: Based on HTTP2 transmission, no separate support is required from Service Mesh.&lt;/li&gt;
&lt;li&gt;Low Support Cost: The decoding type is explicitly determined based on SubContentType (an extension supported by gRPC protocol specification.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High Resource Consumption: Flow control and dynamic windows introduce additional overhead.&lt;/li&gt;
&lt;li&gt;Significant Latency Impact: Flow control can significantly degrade latency with heavier traffic or larger packets, requiring users to adjust WindowSize.&lt;/li&gt;
&lt;li&gt;Difficult Troubleshooting: Increased complexity also raises the difficulty of troubleshooting.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Thrift over gRPC can be quickly implemented. However, from the perspectives of performance and troubleshooting, we have developed a Streaming protocol (Streaming over TTHeader) to simplify streaming communication. It is currently under internal debugging and trials, with an expected release in November-December 2024. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to Define Streaming in Thrift&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users familiar with Thrift know that native Apache Thrift does not support the definition of streaming interfaces. Adding new keywords would make other Thrift parsing tools, including IDE plugins, incompatible. Therefore, defining streaming types for Thrift's RPC methods through annotations ensures parsing compatibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;streaming.mode="bidirectional": Bidirectional Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;streaming.mode="client": Client Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;streaming.mode="server": Server Streaming&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbaym2pgg0vqla8mt2e2q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbaym2pgg0vqla8mt2e2q.png" alt="streaming in Thrift" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both the currently supported Thrift Streaming over gRPC and the upcoming Thrift Streaming over TTHeader use this method to define streaming methods. The client-side will provide options to specify which Streaming protocol to use, while the server-side will support multiple protocols through protocol detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Generalized Streaming Invocation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If SSE is used for streaming communication on clients and Thrift Streaming is used on servers, how does the overall communication from clients to servers work?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmi5ysmos9ff8hpi3f1ha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmi5ysmos9ff8hpi3f1ha.png" alt="SSE-thrift streaming" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Taking the internal text dialogue model as an example, the traffic undergoes protocol conversion after passing through the API gateway, and the server uses the Server Streaming type to push data to the client.&lt;/p&gt;

&lt;p&gt;An important capability here is protocol conversion. Additionally, pressure testing and interface testing platforms need to dynamically construct data to test server services.&lt;/p&gt;

&lt;p&gt;Users of Kitex know that Kitex provides generalized invocation for Thrift protocols, primarily supporting such general services. Previously, internal microservices were mainly Thrift PingPong services. Kitex provided generalized invocation for Map, JSON, HTTP data types, as well as binary generalized invocation for traffic forwarding.&lt;/p&gt;

&lt;p&gt;Therefore, for streaming interfaces, Kitex has added support for generalized streaming invocation. Compared to PingPong generalized interfaces, generalized streaming requires separate interfaces for the three streaming types.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PingPong/Unary Generalized Invocation Interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvscacrfgcmhobiv9xt2o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvscacrfgcmhobiv9xt2o.png" alt="Image description" width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Streaming Generalized Invocation Interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0xj5j9fvd4kqizdk5oc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0xj5j9fvd4kqizdk5oc.png" alt="Image description" width="650" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Currently, support for the mainstream &lt;strong&gt;JSON &lt;/strong&gt;data type is complete, and other data types will be supported based on business needs in the future. (Since the Kitex Streaming v2 interface is yet to be released, and to avoid affecting the user experience of generalized streaming, this support has not been officially announced, but the functionality is ready. Users can visit the generalized invocation section on the official website for English documents.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9tprg1meh8ji0h15kjo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9tprg1meh8ji0h15kjo.png" alt="Image description" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  User Experience of Streaming Capability
&lt;/h2&gt;

&lt;p&gt;Although we have improved the basic cases for streaming and introduced the streaming capabilities that Kitex/Hertz has supported in the past, newly supported, and will soon release, do developers who have worked on streaming interfaces, including those using other frameworks like the official gRPC, know how to properly use streaming interfaces and how to locate issues when they arise? &lt;/p&gt;

&lt;p&gt;Within ByteDance, as streaming services have evolved, we've noticed a significant increase in feedback issues. On one hand, compared to Thrift PingPong, our support at the basic capability level is still incomplete. On the other hand, developing streaming interfaces requires a deep understanding of proper usage; otherwise, misuse can easily lead to problems. &lt;/p&gt;

&lt;p&gt;Therefore, in 2024, we initiated a &lt;strong&gt;Streaming Optimization Project&lt;/strong&gt;, sorted through various issues, and optimized them one by one. In terms of user experience, some issues are related to streaming interface definitions. After comprehensive consideration, we decided to shed the streaming burden and release the Streaming v2 interface. Below are some of the existing issues and ongoing optimizations. It's difficult to enforce proper usage of streaming interfaces solely from the framework level. Therefore, we will release usage specifications and best practices for streaming interfaces to help users develop high-quality streaming interfaces. If you have better suggestions for streaming usage, we welcome your feedback! &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblf4ongdwige20hskpzz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblf4ongdwige20hskpzz.png" alt="Image description" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Taking streaming observability as an example, previously, streaming interface monitoring was not defined separately, which reuse PingPong reporting, resulting in only overall stream reporting information and &lt;strong&gt;lacking Recv/Send monitoring&lt;/strong&gt;. Therefore, when supporting Thrift Streaming, StreamSend &amp;amp; StreamRecv events were added, with the framework recording the time of occurrence and the size of user-transmitted data. For custom Tracer reporting by enterprise users, it only requires implementing the &lt;a href="https://github.com/cloudwego/kitex/blob/v0.9.1/pkg/rpcinfo/tracer.go#L31" rel="noopener noreferrer"&gt;rpcinfo.StreamEventReporter&lt;/a&gt; interface. Kitex will call this interface after each Recv and Send execution, allowing access to the event information for this Recv and Send. Below is the Trace information for Send/Recv within a Stream.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F595jxrg8bgivzi18p22l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F595jxrg8bgivzi18p22l.png" alt="Image description" width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Review of New Features, User Experience/Performance Improvements
&lt;/h1&gt;

&lt;p&gt;While specialized support and optimization for streaming capabilities have been conducted over the past year, we have also provided other new features to meet user needs, enhance user experience, and continue to improve framework performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Features – Thrift/gRPC Multi-Services
&lt;/h2&gt;

&lt;p&gt;The official gRPC framework supports multi-services, but previous versions of Kitex did not, mainly to align with Thrift usage. &lt;strong&gt;Thrift's limitation arises from supporting multi-services introducing protocol incompatibility changes&lt;/strong&gt;, impacting users. Within ByteDance, the TTHeader protocol is widely used, so we decided to transmit the IDL Service Name via TTHeader to solve the issue of Thrift not supporting multi-services.&lt;/p&gt;

&lt;p&gt;Kitex v0.9.0 officially supports &lt;strong&gt;registering multiple IDL Services within one Server&lt;/strong&gt;, including Thrift and Protobuf. Thrift provides true multi-service functionality at the protocol level based on TTHeader, while being compatible with the old CombineService.&lt;/p&gt;

&lt;p&gt;Here is a briefly introduction of Combine Service. Kitex previously provided a pseudo-multi-service feature, Combine Service, to address the issue of excessively large IDLs (leading to large code outputs and slow compilation speeds). It allows the server to split one IDL Service into multiple ones, but requires that the multiple IDL Services cannot have methods with the same name (since the protocol does not support multi-services, method routing cannot be done). Ultimately, Kitex merges multiple IDL Services into one Service, hence the name CombineService. &lt;/p&gt;

&lt;p&gt;With Kitex's new multi-service support, the server &lt;strong&gt;can not only register multiple IDL Services, but also provide both Thrift and Protobuf interfaces simultaneously&lt;/strong&gt;. For example, using Kitex-gRPC (Protobuf) but wanting to switch to Thrift Streaming while ensuring compatibility with old interface traffic, two types of IDL interfaces can be provided for transition.&lt;/p&gt;

&lt;p&gt;Below is an example of registering multiple services on the server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnz3pzcdqh1p1lsbmd2ah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnz3pzcdqh1p1lsbmd2ah.png" alt="Image description" width="596" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  New Features – Mixed Retry
&lt;/h2&gt;

&lt;p&gt;Kitex previously provided two retries: &lt;strong&gt;Failure Retry&lt;/strong&gt; and &lt;strong&gt;Backup Request&lt;/strong&gt;. Failure Retry can improve success rates (enhancing service SLAs), but most are timeout retries, leading to increased latency; Backup Request can reduce request latency, but if there is a failed return, it terminates retries.&lt;/p&gt;

&lt;p&gt;In internal practice, businesses generally express a desire to &lt;strong&gt;have both retries&lt;/strong&gt;, offering advantages to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Optimize the overall retry latency of Failure Retry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improve the request success rate of Backup Request&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, Kitex supports Mixed Retry in v0.11.0, a hybrid retry function combining Failure Retry and Backup Request functions. &lt;/p&gt;

&lt;p&gt;To facilitate understanding the differences between the three retries, here is a scenario: assume the first request takes 1200ms, the second request takes 900ms, with RPCTimeout configured to 1000ms, MaxRetryTimes to 2, and BackupDelay to 200ms. &lt;/p&gt;

&lt;p&gt;Comparing the results of the three retries:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs214jl0zournhwoghpt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs214jl0zournhwoghpt7.png" alt="Image description" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Mixed Retry: &lt;strong&gt;Success&lt;/strong&gt;, cost &lt;strong&gt;1100ms &lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Failure Retry: &lt;strong&gt;Success&lt;/strong&gt;, cost &lt;strong&gt;1900ms &lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backup Retry: &lt;strong&gt;Failure&lt;/strong&gt;, cost 1000ms &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  User Experience - Frugal &amp;amp; FastCodec (Thrift)
&lt;/h2&gt;

&lt;p&gt;Both Frugal and FastCodec (Thrift) are high-performance Thrift serialization tools provided by Kitex. Frugal's advantage over FastCodec is that it does not require code generation, significantly addressing the issue of excessively large outputs.&lt;/p&gt;

&lt;p&gt;But still two cons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Both Frugal and FastCodec decoding must rely on packets with headers. If it's a Thrift Buffered packet, it will fallback to Apache Codec. Users need to be clear about the received protocol; otherwise, using Frugal cannot completely eliminate code generation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Frugal is based on JIT implementation, with x86 support completed, but ARM just provides a fallback strategy with poor performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Addressing the protocol binding issue, the new version supports SkipDecode. Test results show that SkipDecode + FastCodec still outperforms Apache Thrift Codec. &lt;/p&gt;

&lt;p&gt;For the Frugal ARM issue, new reflection support is provided, eliminating the need for separate support for different architectures. Although using reflection, bypassing type checks within reflection achieves higher performance. Test results are slightly better compared to JIT.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguosvokvr8wt1p8cpx55.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguosvokvr8wt1p8cpx55.png" alt="Image description" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  User Experience - Output Reduction and Generation Speed Optimization
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Large output size, slow output generation,and compilation speeds&lt;/strong&gt; are significant pain points for services with longer iterations within ByteDance. Therefore, Kitex provides various optimization methods to reduce output size and improve output generation speed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;IDL Trimming&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A complex IDL with longer iterations contains many obsolete struct definitions. Actively cleaning up these unnecessary definitions can also increase development burden. The trimming tool supports generating code based on the struct definitions required by RPC methods. Users can also specify which methods to generate. According to pilot projects in large ByteDance repositories, &lt;strong&gt;generation time is halved, and output size is reduced by over 60%&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Usage: &lt;em&gt;$ kitex -module xx -thrift **trim_idl &lt;/em&gt;&lt;em&gt;xxxx.thrift &lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Example effect: In the example below, the trimming tool deleted 60,000 unused structs and 530,000 fields.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fel7386tqnleqsdnf9q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fel7386tqnleqsdnf9q.png" alt="Image description" width="800" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;no_fmt Speedup&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After code output generated, the code is formatted by default to improve readability, but users rarely care about that. Therefore, users can disable the fmt option to improve generation speed. &lt;/p&gt;

&lt;p&gt;Usage: &lt;em&gt;$ kitex -module xx -thrift **no_fmt &lt;/em&gt;&lt;em&gt;xxxx.thrift &lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Effect: The P90 generation time for a certain platform within ByteDance decreased from &lt;strong&gt;80s to 20s&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Removing Unnecessary Codes from Kitex&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kitex defaults to generating the full Apache Thrift code, but in reality, only the Codec part is used in fallback scenarios, and the rest of the code is not needed. &lt;/p&gt;

&lt;p&gt;Therefore, Kitex v0.10.0 defaults to removing the Thrift Processor and can remove all Apache Thrift code via parameter specification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kitex -module xxx -thrift no_default_serdes xxx.thrift
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage: &lt;em&gt;$ kitex -module xxx -thrift no_default_serdes xxx.thrift&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Effect: Output size is reduced by about &lt;strong&gt;50%+&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Frugal Slim Extreme Reduction&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Usage: $ kitex -thrift frugal_tag,&lt;strong&gt;template=slim&lt;/strong&gt; -service p.s.m idl/api.thrift, using Frugal for Thrift serialization.&lt;/p&gt;

&lt;p&gt;Effect: Output size is reduced by about &lt;strong&gt;90%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  User Experience - kitexcall
&lt;/h2&gt;

&lt;p&gt;Although RPC calls are simpler and more convenient than HTTP, they are not convenient to &lt;strong&gt;test&lt;/strong&gt;, requiring tools to generate code first and then construct request data. Previously mentioned testing platforms use generalized invocations to construct request data without relying on generated code, but the cost of using generalized invocations is not low. Users must first understand the usage and data construction of the method.&lt;/p&gt;

&lt;p&gt;To &lt;strong&gt;improve testing convenience&lt;/strong&gt;, based on Kitex &lt;strong&gt;JSON generalized invocations&lt;/strong&gt;, a separate command tool - kitexcall is provided, allowing users to initiate Thrift tests using JSON data. (This feature is supported by community contributions; thanks here!) &lt;/p&gt;

&lt;p&gt;Usage: &lt;em&gt;$ kitexcall -idl-path echo.thrift -m echo -d '{"message": "hello"}' -e 127.0.0.1:8888&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Future optimization plans:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Graphical interface for more convenient testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for gRPC testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No need to specify IDL, using server reflection to obtain IDL information&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Optimization – Thrift On-Demand Serialization
&lt;/h2&gt;

&lt;p&gt;As business iterations make IDL definitions increasingly complex, upstream services in production may &lt;strong&gt;only need partial fields&lt;/strong&gt; but need to serialize and transmit all of those, introducing additional performance overhead. Considering this issue, Kitex supports on-demand serialization for Thrift.&lt;/p&gt;

&lt;p&gt;Reference Protobuf to provide a Thrift FieldMask feature, allowing users to select encoding fields and optimize serialization and transmission overhead. &lt;/p&gt;

&lt;p&gt;For example, below, only the Foo field is encoded and returned, ignoring the Bar field:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cep3j501jr6s8tk25t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cep3j501jr6s8tk25t8.png" alt="Image description" width="668" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;User constructs Bar data, but annotate the Foo field, and the framework will only encode Foo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lb186aeiajf5als1885.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lb186aeiajf5als1885.png" alt="Image description" width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It also supports specifying required fields by the opposite end; for specific usage, see the on-demand serialization documentation on the official website.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Optimization – Thrift Memory Allocation Optimization
&lt;/h2&gt;

&lt;p&gt;Kitex continuously monitors RPC performance. In the current context of high cost pressures, we are deeply exploring more optimizations. Routine optimizations on hot paths have all been done, but further ones are less conventional. v0.10.0 released new optimizations focusing on memory allocation and GC.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Span Cache&lt;/strong&gt;: Optimizes String/Binary decoding costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-allocates memory, reducing mallocgc calls&lt;/li&gt;
&lt;li&gt;Reduces the actual number of generated objects -&amp;gt; lower GC costs&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Centralized memory allocation for container fields&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Similarly, changes from separate memory allocation for each element to centralized approach&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Span Cache can optimize CPU but increases memory usage. To avoid impacting services with small memory specifications, it is not enabled by default and requires user to turn on:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbwlxb6hrpv12rm4ui1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbwlxb6hrpv12rm4ui1q.png" alt="Image description" width="658" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Effect: Under extreme testing, throughput is increased by about 10%, and latency is reduced by about 30%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Analysis Tool
&lt;/h2&gt;

&lt;p&gt;The received objects in RPC/HTTP are constructed, memory-allocated, and value-assigned by the framework before being returned to the user. However, if the user's code holds onto these objects indefinitely, it can lead to memory leaks. While pprof heap can indicate where memory is allocated, it cannot show where references are made. So, how do we determine &lt;strong&gt;who is referencing a Go object&lt;/strong&gt;? &lt;/p&gt;

&lt;p&gt;In fact, GC scans and marks objects, capturing reference relationships. By combining this with variable names and type information, we can analyze the referencing situation of objects. Leveraging Delve, we have developed the &lt;strong&gt;goref &lt;/strong&gt;object reference analysis tool, which was open-sourced in July (&lt;em&gt;github.com/cloudwego/goref&lt;/em&gt;). This addresses the limitation of Go's native tools in analyzing memory references, aiding Go developers &lt;strong&gt;in quickly identifying memory leaks&lt;/strong&gt; and enhancing the Go tooling ecosystem.&lt;/p&gt;

&lt;p&gt;For instance, the Heap Profile of pprof shown in the following image reveals that the currently referenced objects are primarily allocated within FastRead (Kitex's deserialization code). It is normal for decoding to allocate memory for construct data, but this flame graph offers limited help in troubleshooting as allocated memory addresses are often not the source of memory leaks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8qwfmihw4ds3mfjwar2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8qwfmihw4ds3mfjwar2.png" alt="Image description" width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, using the goref tool yields the following result: mockCache holds an RPC Resp, preventing memory from being released. The issue is immediately apparent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0yxyrlv8ga7fo8nsgmxn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0yxyrlv8ga7fo8nsgmxn.png" alt="Image description" width="800" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion and Outlook
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Enhancing Streaming Capabilities to Support LLMs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Streaming capabilities provided by Kitex/Hertz: gRPC, HTTP 1.1 Chunked, WebSocket, SSE, Thrift Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SSE &amp;lt;-&amp;gt; Thrift Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generalized streaming invocations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Streaming capability optimizations to enhance user experience and engineering practices&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Review of New Features, User Experience, and Performance Improvements&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;New Features: Thrift/gRPC multi-service support, Mixed Retry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;User Experience: Frugal/FastCodec, streamlined outputs, generation speed optimizations, kitexcall&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance Optimization: Thrift on-demand serialization, memory allocation improvement&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory Analysis Tool: goref&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Outlook
&lt;/h2&gt;

&lt;p&gt;In the coming year, we will continue to enhance streaming capabilities and optimize the user experience. We will provide usage guidelines for streaming interfaces to help users better develop their streaming services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Release Kitex Streaming v2 interface to address historical issues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Release TTHeader Streaming for better performance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Engineering practices: graceful shutdown, retries, timeout control&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Release streaming-related specifications: error handling, interface usage guidelines&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Furthermore, we will consider strengthening the streaming ecosystem, such as enriching generalized streaming invocations and providing more gateway-friendly support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SSE &amp;lt;-&amp;gt; Thrift Streaming(HTTP2 and TTHeader Streaming)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;WebSocket &amp;lt;-&amp;gt; Thrift Streaming (HTTP2 and TTHeader Streaming)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Binary and Map generalized invocations for Streaming&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Special announcement: Kitex plans to gradually remove Apache Thrift-generated code in future versions. Due to incompatible changes in Apache Thrift v0.14, Kitex is forced to be locked-in Apache Thrift v0.13. To resolve this, Kitex will eliminate its dependency on Apache Thrift.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>backend</category>
      <category>performance</category>
      <category>application</category>
    </item>
    <item>
      <title>BMF: Frame extraction acceleration- video similarity search with Pinecone</title>
      <dc:creator>Josh Alphonse</dc:creator>
      <pubDate>Fri, 10 May 2024 17:42:44 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/bmf-frame-extraction-acceleration-video-similarity-search-with-pinecone-5e23</link>
      <guid>https://dev.to/bytedanceoss/bmf-frame-extraction-acceleration-video-similarity-search-with-pinecone-5e23</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;TL;DR: This is a tutorial on how to create a video similarity search with BMF and &lt;a href="https://www.pinecone.io/"&gt;Pinecone&lt;/a&gt; from scratch. View this project's code on &lt;a href="https://github.com/Joshalphonse/BMF-video-similarity-search/blob/main/Video_Extraction.ipynb"&gt;github&lt;/a&gt; and test it out in a notebook like colab.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So you might have seen in my last blog post that I showed you how to accelerate video frame extraction using GPU's and Babit multimedia framework. In this blog we are going to improve upon our video frame extractor and create a video similarity search(Reverse video search) utlizing different RAG(Retrival Augemented Gerneation) concepts with &lt;a href="https://www.pinecone.io/"&gt;Pinecone&lt;/a&gt;, the vector database that will help us build knowledgeable AI. Pinecone is designed to perform vector searches effectively. You'll see throughout this blog how we extrapulate vectors from videos to make our search work like a charm. With Pinecone, you can quickly find items in a dataset that are most similar to a query vector, making it handy for tasks like recommendation engines, similar item search, or even detecting duplicate content. It's particularly well-suited for machine learning applications where you deal with high-dimensional data and need fast, accurate similarity search capabilities.&lt;br&gt;
Reverse video search works like reverse image search but uses a video to find other videos that are alike. Essentially, you use a video to look for matching ones. While handling videos is generally more complex and the accuracy might not be as good as with other models, the use of AI for video tasks is growing. Reverse video search is really good at finding videos that are connected and can make other video applications better.&lt;br&gt;
So why would you want to create a video similarity search app? &lt;/p&gt;
&lt;h2&gt;
  
  
  Here are some reasons:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Content Discovery: It enables users to find videos that are visually or contextually similar to what they're interested in, enhancing content discoverability on platforms like streaming services or stock footage libraries.&lt;/li&gt;
&lt;li&gt;Recommendation Systems: Enhances recommendation engines by suggesting content that is similar to a user's viewing history, thus improving user engagement and retention.&lt;/li&gt;
&lt;li&gt;Duplicate or Near-duplicate Detection: Helps in identifying copies or slight variations of the same video, which is useful for copyright enforcement or content management.&lt;/li&gt;
&lt;li&gt;Categorization and Tagging: Assists in automatically categorizing and tagging videos based on content, which can simplify content management and improve searchability.&lt;/li&gt;
&lt;li&gt;User-generated Content Moderation: Useful in moderating platforms where vector similarity can help identify potentially problematic content by comparing new uploads with known flagged videos.&lt;/li&gt;
&lt;li&gt;Video Analysis: In fields like surveillance, sports, or medical imaging, it can help in analyzing and identifying specific moments or objects in video sequences.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Oh yeah and of course a similarity search like what we'll do in this blog! I've taken inspiration by reading the &lt;a href="https://milvus.io/docs/video_similarity_search.md"&gt;Milvus video reserve video search notebook&lt;/a&gt; and decided to recreate it using technologies I prefer. &lt;br&gt;
Babit Multimedia framework brings forth all the great things we know and love about FFMPEG and amplifies it all with its multi-language support and GPU acceleration capabilities.&lt;br&gt;
Now you might be familiar with other frame extraction methods using OpenCV, FFmpeg,or GStreamer. These are all great options. However, I'm choosing to use BMF for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-language support- BMF supports the use of Python, GO and C++&lt;/li&gt;
&lt;li&gt;Full compatiblity with FFmpeg- BMF is fully compatible with FFmpeg’s processing capabilities and indicators, such as demuxing, decoding, filter, encoding, and muxing capabilities. The configuration of these processing capabilities and the results consistent with FFmpeg’s pts, duration, bitrate, fps and other indicators can be obtained. Very good It satisfies the need to quickly integrate FFmpeg capabilities into projects.&lt;/li&gt;
&lt;li&gt;Enhanced Support for NVIDIA GPUs to create enterprise ready GPU accelerated video pipelines

&lt;ul&gt;
&lt;li&gt;NVENC/NVDEC/GPU filters work out-of-box by inheriting abilities from FFmpeg.&lt;/li&gt;
&lt;li&gt;High performance frame processing is enabled by integration of CV-CUDA and customized CUDA kernels.&lt;/li&gt;
&lt;li&gt;AI inferencing can be easily integrated into video pipelines using TensorRT.&lt;/li&gt;
&lt;li&gt;Data moving between CPU and GPU can be done by a simple call.
Alright, so that's more than just a few reasons, but you get the point! Now let's build a video similarity search/&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;### The Architecture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2b5n3obm4ardgxbpuug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2b5n3obm4ardgxbpuug.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1z646055cxwmavitmsj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1z646055cxwmavitmsj.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Required Python packages to install:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9-3.10&lt;/li&gt;
&lt;li&gt;pinecone-client&lt;/li&gt;
&lt;li&gt; BabitMF-GPU&lt;/li&gt;
&lt;li&gt;  torch&lt;/li&gt;
&lt;li&gt;  torchvision&amp;gt;=0.12.0&lt;/li&gt;
&lt;li&gt;  python-dotenv&lt;/li&gt;
&lt;li&gt;  av
Grab a video. I had a short video stored on a github repo. You can use a video stored on your system or elsewhere. BMF can handle any video format(FFmpeg compatibility!)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Inserting the dataset into a Pinecone index
&lt;/h3&gt;

&lt;p&gt;Let's start with inserting videos from our dataset into our Pinecone index. We do this so that our vector database has knowledge of the videos we will be comparing to the end user's video. This is a necessary starting point for our application.&lt;br&gt;
First, I'm going to create an account on Pinecone and create my first index using Pinecone serverless. Pinecone is a fully managed vector database. You can use the CLI or the dashboard when you log in. Here's to learn how to set it up: &lt;a href="https://docs.pinecone.io/guides/getting-started/quickstart"&gt;https://docs.pinecone.io/guides/getting-started/quickstart&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;git clone https://github.com/Joshalphonse/Bmf-Huggingface.git&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Install BMF with GPU capabilities&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install -qU \
  pinecone-client \
  BabitMF-GPU \
  torch \
  torchvision&amp;gt;=0.12.0 \
  python-dotenv \
  av
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install this video dataset or use your own &lt;br&gt;
The data is organized as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;train: candidate videos, 20 classes, 10 videos per class (200 in total)&lt;/li&gt;
&lt;li&gt;test: query videos, same 20 classes as train data, 1 video per class (20 in total)&lt;/li&gt;
&lt;li&gt;reverse_video_search.csv: a csv file containing an id, path, and label for each video in train data
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;! curl -L https://github.com/towhee-io/examples/releases/download/data/reverse_video_search.zip -O
! unzip -q -o reverse_video_search.zip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Put the files in a dataframe and convert them to a list&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./reverse_video_search.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#put the files in the dataframe
&lt;/span&gt;&lt;span class="n"&gt;video_paths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;#convert df to python list
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_paths&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#check if the video paths
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure to import all of the necessary packages. Then create environment variables to manage your configurations. They will make your life a lot easier. &lt;br&gt;
Afterwards, load the CSV file from the data set folder I'm also limiting the list to 3 rows just to speed things up for demo purposes. &lt;br&gt;
We'll also load the ResNet Pretrained model because in the next steps we will use it to generate the vector embeddings.&lt;br&gt;
Lastly, in this code snippet, configure a preprocessing pipeline for images using PyTorch's &lt;code&gt;transforms&lt;/code&gt; module, which is often used in deep learning for preparing data before feeding it into a neural network.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;av&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServerlessSpec&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchvision.transforms&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;transforms&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchvision.models&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;

&lt;span class="n"&gt;PINECONE_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;PINECONE_ENVIRONMENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_ENVIRONMENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;PINECONE_DATABASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_DATABASE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Replace 'your_pinecone_api_key' with your actual Pinecone API key or use environment variables like I am here
&lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PINECONE_ENVIRONMENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PINECONE_DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#load the CSV file
&lt;/span&gt;&lt;span class="n"&gt;csv_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./reverse_video_search.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;video_paths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_paths&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#check if the video paths
&lt;/span&gt;
&lt;span class="c1"&gt;#load a pretrained ResNet model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resnet18&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pretrained&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;#remove the last fully connectected layer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;children&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Define the preprocessing transforms
&lt;/span&gt;&lt;span class="n"&gt;preprocess&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Compose&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.485&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.456&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.406&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.229&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.225&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since we have our dataset and our files ready to go, we will iterate over each video path and generate an embedding. I'm also using the &lt;a href="https://pypi.org/project/av/"&gt;av&lt;/a&gt; package to handle the video file, so we can open it and do the extraction.&lt;br&gt;
We then iterate over the frames of the video, preprocessing each frame (using a preprocess function that is not shown) and generating an embedding for the frame using a pre-trained ResNet model. These frame embeddings are stored in a list.&lt;br&gt;
Once all the frame embeddings have been collected, then we calculate the average of the embeddings to get a single embedding that represents the entire video.&lt;br&gt;
Now all we have to do is use the Pinecone package to upsert (insert or update) the average video embedding to a Pinecone index, under the namespace 'video_embeddings'. The video path is used as the unique identifier for the embedding&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Iterate over each video path and generate embeddings
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;video_path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;video_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Open the video file
&lt;/span&gt;    &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;av&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get the first video stream
&lt;/span&gt;    &lt;span class="n"&gt;video_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;streams&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize variables for storing embeddings
&lt;/span&gt;    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Iterate over the video frames
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Convert the frame to a numpy array
&lt;/span&gt;        &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_ndarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rgb24&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Preprocess the frame
&lt;/span&gt;        &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;preprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unsqueeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Add batch dimension
&lt;/span&gt;
        &lt;span class="c1"&gt;# Generate embeddings using the ResNet model
&lt;/span&gt;        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;squeeze&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Append the embedding to the list
&lt;/span&gt;        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert the list of embeddings to a numpy array
&lt;/span&gt;    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate the average embedding for the video
&lt;/span&gt;    &lt;span class="n"&gt;avg_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Upsert the embedding to Pinecone
&lt;/span&gt;    &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;video_embeddings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Upserted embedding for video: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can either use the Pinecone CLI or the dashboard to view what in your index we just updated the data to. Check out the picture below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9qwuejzpxumeu0qobgx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9qwuejzpxumeu0qobgx.png" alt="Image description" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Searching For A Similar Video
&lt;/h3&gt;

&lt;p&gt;Install ffmpeg and related libraries. For this demo, we don't have to do this step, because ffmpeg libraries are already installed in the Google Colab environment.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo apt install ffmpeg&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;List the ffmpeg libraries. It is expected that the related libraries such libavcodec, libavformat are installed. The output should be shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3v4bfz0cot2ue25fmxpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3v4bfz0cot2ue25fmxpy.png" alt="Image description" width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo apt install libdw1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dpkg -l | grep -i ffmpeg&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ffmpeg -version&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Install the following package to show the BMF C++ logs in the colab console, otherwise only python logs are printed. This step is not necessary if you're not in a Colab or iPython notebook environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install wurlitzer
%load_ext wurlitzer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now import all of these dependencies listed and the beginning of our process is the same as our data upsert from above. Use your Pinecone credentials that we stored in a .env file and work with the ResNet18 pretrained model.&lt;br&gt;
The difference here is that we are finally using BMF for frame extraction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchvision.transforms&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;transforms&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchvision.models&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;IPython&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="n"&gt;PINECONE_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;PINECONE_ENVIRONMENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_ENVIRONMENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;PINECONE_DATABASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_DATABASE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Replace 'your_pinecone_api_key' with your actual Pinecone API key or use environment variables like I am here
&lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PINECONE_ENVIRONMENT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PINECONE_DATABASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resnet18&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pretrained&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;#remove the last fully connectected layer
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;children&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;preprocess&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Compose&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ToTensor&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;transforms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.485&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.456&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.406&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.229&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.225&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;input_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/content/linedancing.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./extracted-images/simple_%03d.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dump_graph&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;fps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Frame extraction completed successfully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error during frame extraction: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we will load the extracted query frames and generate the embeddings for our video that we will compare to the ones stored in our Pinecone index. &lt;/p&gt;

&lt;p&gt;Let me break it down for you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load the extracted query frames: I used the &lt;code&gt;glob&lt;/code&gt; module to find all the file paths of the extracted query frames, which are stored in the query_frame_paths variable. These are individual frames extracted from the original video.&lt;/li&gt;
&lt;li&gt;Generate embeddings for each query frame: We then iterate over each query frame path, load the image using &lt;code&gt;cv2.imread&lt;/code&gt;, preprocess it (using a preprocess function that is not shown), and generate an embedding for the frame using the pre-trained model.&lt;/li&gt;
&lt;li&gt;Store the embeddings: The generated embeddings for each frame are stored in the &lt;code&gt;query_embeddings&lt;/code&gt; list.&lt;/li&gt;
&lt;li&gt;Calculate the average embedding: Once all the frame embeddings have been collected, then we calculate the average of the embeddings to get a single embedding that represents the entire set of query frames.
By generating an average embedding for the query frames, we are able to capture the overall visual content of the query, which is a main component of how our similarity search will work.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load the extracted query frames and generate embeddings
&lt;/span&gt;&lt;span class="n"&gt;query_frame_paths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%03d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;query_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;frame_path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_frame_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cvtColor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COLOR_BGR2RGB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;preprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unsqueeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Add batch dimension
&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;squeeze&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;avg_query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, let's perform our similarity search with Pinecone. &lt;br&gt;
The &lt;code&gt;query&lt;/code&gt; method from Pinecone will be used to search for the most similar vectors to the &lt;code&gt;avg_query_embedding&lt;/code&gt; we created. The top_k parameter is set to 5, which means that the code will retrieve the 5 closest matching vectors to the query(choose whatever number you'd like depending on how many items were upserted into your database. The include_metadata parameter is set to True, which means that we will retrieve the metadata (in this case, the video file paths) associated with the matching vectors.&lt;br&gt;
This step is really straight forward. Pinecone has great documentation and a really easy to use package.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Perform similarity search using Pinecone
&lt;/span&gt;&lt;span class="n"&gt;num_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;  &lt;span class="c1"&gt;# Number of similar videos to retrieve
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;avg_query_embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;video_embeddings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Print the most similar video paths
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Similar video: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  And our result is....
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn322rf60h213wm1ujto6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn322rf60h213wm1ujto6.png" alt="Image description" width="800" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Bonus
&lt;/h3&gt;

&lt;p&gt;Since I'm using a notebook and I don't want to use up a ton of memory, I also converted all the videos to gifs to view them easier. So here is some bonus code for ya!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;video_to_gif&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;gif_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.gif&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;frames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;cap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VideoCapture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isOpened&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cvtColor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COLOR_BGR2RGB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gif_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GIF&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;append_images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;save_all&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gif_path&lt;/span&gt;

&lt;span class="c1"&gt;# Display the input video as a GIF
&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Query video &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &amp;lt;br/&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_video_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;query_gif&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;video_to_gif&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_video_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;html_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;img src=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt; &amp;lt;br/&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_gif&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;html_line&lt;/span&gt;
&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Top {} search results: &amp;lt;br/&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Display the similar videos as GIFs
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="n"&gt;gif_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;video_to_gif&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;html_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;img src=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; style=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display:inline;margin:1px&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gif_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;html_line&lt;/span&gt;

&lt;span class="n"&gt;display&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HTML&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  You can do it too
&lt;/h3&gt;

&lt;p&gt;What I've showed you is a niche use case for BMF. Video frame extraction has a lot of use cases outside of our example. There are a ton of features with this framework, especially when it comes to building video processing pipelines. Make sure you check out the &lt;a href="https://babitmf.github.io/docs/bmf/overview/"&gt;BMF documentation&lt;/a&gt; and try out some other example apps on the &lt;a href="https://babitmf.github.io/docs/bmf/quick_experience/"&gt;quick experience page&lt;/a&gt; for more. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>KubeAdmiral: next-generation multi-cluster orchestration engine based on Kubernetes</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Mon, 15 Apr 2024 09:33:34 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/kubeadmiral-next-generation-multi-cluster-orchestration-engine-based-on-kubernetes-2d0b</link>
      <guid>https://dev.to/bytedanceoss/kubeadmiral-next-generation-multi-cluster-orchestration-engine-based-on-kubernetes-2d0b</guid>
      <description>&lt;p&gt;Project link: &lt;a href="https://github.com/kubewharf/kubeadmiral"&gt;https://github.com/kubewharf/kubeadmiral&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since its release in 2014, Kubernetes has become the de facto standard for cloud native orchestration and scheduling systems, delivering substantial value to infrastructure developers around the world. &lt;/p&gt;

&lt;p&gt;As an increasing number of corporations embrace cloud native technologies and migrate their workloads to Kubernetes, the scale of their clusters grows rapidly. &lt;/p&gt;

&lt;p&gt;The community edition of Kubernetes, capped at 5000 nodes per cluster, is no longer able to keep up with the scale requirements of large-scale enterprise applications. Moreover, many companies are adopting multi-cloud architectures to achieve cost reduction, increased resource and operational efficiency, geographical disaster recovery, and environment isolation. &lt;/p&gt;

&lt;p&gt;As a result, the demand for multi-cluster orchestration and scheduling tools is on the rise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Brief History of Kubernetes at ByteDance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dedicated Clusters
&lt;/h3&gt;

&lt;p&gt;In the early years of ByteDance’s cloud native adoption, each business line operated in separate dedicated clusters due to isolation concerns. However, this led to low resource elasticity and efficiency, observed in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each business line had to maintain independent resource buffers for scaling and upgrading.&lt;/li&gt;
&lt;li&gt;Applications were tightly coupled to specific clusters, and manual resource transfer was required to balance resource utilization as applications scale.&lt;/li&gt;
&lt;li&gt;SRE teams had to deeply understand both the businesses and the clusters in order to manage resources efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Consequently, this resulted in inefficient resource management and suboptimal overall deployment rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  KubeFed v2
&lt;/h3&gt;

&lt;p&gt;To address these challenges, the technical infrastructure team at ByteDance started exploring cluster federation based on KubeFed v2 in 2019. The goal is to pool resources across business lines, reduce unnecessary buffers, and improve the efficiency of resource management. &lt;/p&gt;

&lt;p&gt;KubeFed v2 introduces the concept of host and member clusters. Users create federated workloads (e.g. FederatedDeployment) in the host cluster, and KubeFed schedules and dispatches workloads in the member clusters based on these federated workloads. Each federated workload contains three primary fields: &lt;strong&gt;Template&lt;/strong&gt; (specifying the workload to be dispatched to member clusters), &lt;strong&gt;Placement&lt;/strong&gt; (designating target member clusters), and &lt;strong&gt;Overrides&lt;/strong&gt; (indicating how the template should be varied in some clusters). For example, the following FederatedDeployments instructs KubeFed to create a Deployment in cluster1 and cluster2 with 2 and 3 replicas respectively.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: types.kubefed.k8s.io/v1beta1
kind: FederatedDeployment
metadata:
  name: test-deployment
spec:
  template:
    metadata:
      labels:
        app: nginx
    spec:
      replicas: 5
      # more Deployment fields...
  placement:
    clusters:
    - name: cluster1
    - name: cluster2
  overrides: 
  - clusterName: cluster1
    clusterOverrides:
    - path: /spec/replicas
      value: 2
  - clusterName: cluster2
    clusterOverrides:
    - path: /spec/replicas
      value: 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Deployment and ReplicaSet, KubeFed supports dividing the desired replicas across multiple clusters based on ReplicaSchedulingPreference (RSP). Users can configure the weights, minimum replicas, and maximum replicas for each cluster, and the RSP controller computes a valid replica distribution and updates the Placement and Overrides fields of FederatedDeployment or FederatedReplicaSet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7byfqe4yygyjdu1454s.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7byfqe4yygyjdu1454s.jpeg" alt="RSP Scheduling" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;RSP Scheduling (Image credit: &lt;a href="https://www.kubernetes.org.cn/5702.html"&gt;https://www.kubernetes.org.cn/5702.html&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;KubeFed laid the foundation of Kubernetes cluster federation at ByteDance. However, we soon found KubeFed unable to meet our production requirements. The primary pain points were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Uneven resource utilization across clusters – KubeFed’s RSP only supports static cluster weights and lacks the ability to adapt to fluctuations in cluster resources dynamically.&lt;/li&gt;
&lt;li&gt;Service disruption after rescheduling – During rescheduling, replicas might be abruptly migrated between clusters, disrupting service availability.&lt;/li&gt;
&lt;li&gt;Limitations in scheduling semantics – KubeFed supports stateless, replica-based resources through RSP, but lacks support for more diverse resources such as stateful workloads and jobs. Moreover, extending the existing scheduling semantics is difficult.&lt;/li&gt;
&lt;li&gt;High onboarding cost – KubeFed requires the creation of federated objects and is incompatible with the native Kubernetes API. Users and downstream platforms need to completely overhaul their usage patterns.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  KubeAdmiral
&lt;/h2&gt;

&lt;p&gt;With the evolution of cloud native infrastructure at ByteDance, we raised our standards for efficiency, scalability, performance, and cost. Meanwhile, the size and number of our Kubernetes clusters continue to grow phenomenally along with the businesses. Additionally, workloads beyond stateless microservices, including stateful services, storage, offline and machine learning jobs, started embracing cloud native technologies. Against this backdrop, the limitations of KubeFed became increasingly difficult to manage. Therefore, at the end of 2021, we began our endeavor to develop the next generation cluster federation system, building upon KubeFed v2’s foundation. We named it KubeAdmiral to capture our aspiration for it to manage multiple clusters as effectively as a seasoned navy admiral commands a fleet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dlvg73bei6xvpersq6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dlvg73bei6xvpersq6.jpeg" alt="Timeline of Kubernetes at ByteDance" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Timeline of Kubernetes at ByteDance&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;KubeAdmiral offers enhanced multi-cluster orchestration and scheduling capabilities for various mainstream business scenarios. Today at ByteDance, KubeAdmiral manages more than 100,000 microservices with more than 10,000,000 pods running on dozens of federated Kubernetes clusters. It supports upwards of 30,000 upgrade and scaling operations daily, and maintains a stable deployment rate of 95-98% without the need for manual intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  KubeAdmiral Feature Highlight
&lt;/h2&gt;

&lt;p&gt;KubeAdmiral not only supports native Kubernetes resources and third-party custom resources, but also offers a rich and extensible scheduling framework. Moreover, it refines numerous aspects of scheduling and dispatching, backed by years of practical production experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr2n36rj7xbnk50zryre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr2n36rj7xbnk50zryre.png" alt="KubeAdmiral architecture diagram" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;KubeAdmiral architecture diagram&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rich Multi-Cluster Scheduling Capabilities&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The scheduler is a core component of KubeAdmiral responsible for computing the desired placement of workloads in member clusters. When scheduling replica-based workloads, it also computes the appropriate replicas for each cluster. Functioning as KubeAdmiral’s “brain”, its decisions directly impact critical aspects such as fault tolerance, resource efficiency, and stability.&lt;/p&gt;

&lt;p&gt;KubeFed provides the RSP scheduler for replica-based workloads, but its customizability and extensibility are very limited, and modifying its behavior requires code modification. Additionally, it lacks support for stateful services, job-like resources, etc., which require different sets of scheduling semantics.&lt;/p&gt;

&lt;p&gt;KubeAdmiral introduces more comprehensive scheduling semantics. It supports more flexible and fine-grained mechanisms to select clusters via labels, taints, etc, and score clusters based on resource utilization, affinity, and so on. Beyond just replica-based workloads, it also supports scheduling stateful workloads and job-like resources. Additionally, it brings about convenient features such as automatic dependency scheduling (dependencies such as ConfigMaps can automatically follow their Deployment to corresponding member clusters). The scheduling behavior can be configured using a PropagationPolicy object, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: core.kubeadmiral.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: mypolicy
  namespace: default
spec:
  # Many different ways to select clusters.
  placement:
  # Manually specify desired clusters and replica weights, if required.
  - cluster: cluster-01
    preferences:
      weight: 4
  - cluster: cluster-02
    preferences:
      weight: 3
  - cluster: cluster-03
    preferences:
      weight: 4
  # Filter clusters based on label selectors.
  clusterSelector:
    IPv6: "true"
  # Filter clusters based on affinity.
  clusterAffinity:
  - matchExpressions:
    - key: region
      operator: In
      values:
      - us-east
  # Filter clusters based on taints and tolerations.
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
  # Mode of scheduling - divide or duplicate.
  schedulingMode: Divide
  reschedulePolicy: 
    # Only schedule on creation and do not reschedule afterwards.
    # Suitable for stateful workloads.
    disableRescheduling: false
    # When rescheduling should be triggered.
    # More triggers: reschedule more frequently - favor agility.
    # Fewer triggers: reschedule less frequently - favor stability.
    rescheduleWhen:
      policyContentChanged: true
      clusterLabelsChanged: false
    # Whether to rebalance replicas on reschedule.
    # Enabling rebalance results in optimal placement, but at the potential cost
    # of disrupting existing replicas.
    replicaRescheduling:
      avoidDisruption: true
  # Limit propagation to a single cluster.
  # Suitable for job-like workloads.
  maxClusters: 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of writing Overrides manually, KubeAdmiral supports generating Overrides based on OverridePolicy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: core.kubeadmiral.io/v1alpha1
kind: OverridePolicy
metadata:
  name: example
  namespace: default
spec:
  # Flexible ways to select target clusters.
  overrideRules:
  - targetClusters:
      # Select clusters by name.
      clusters:
      - on-prem-1
      - edge-1
      # Select clusters by label.
      clusterSelector:
        region: us-east
        az: az1
      # Select clusters by affinity.
      clusterAffinity:
      - matchExpressions:
        - key: region
          operator: In
          values:
          - us-east
      # Change the container image in the target clusters using jsonpatch.
      overriders:
        jsonpatch:
        - path: "/spec/template/spec/containers/0/image"
          operator: replace
          value: "nginx:test"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Scheduler Extension&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Taking inspiration from kube-scheduler’s design, KubeAdmiral offers a flexible scheduling framework. It simplifies the scheduling process by dividing it into four distinct stages: &lt;strong&gt;Filter, Score, Select, and Replica.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Each stage is handled by individual plugins, creating a logical separation that promotes modularity. For instance, in the provided PropagationPolicy example above, most behaviors are implemented through built-in scheduling plugins. The beauty of this approach is that plugins can be easily added or removed, without any impact on the remaining plugins. This greatly simplifies the scheduler logic and reduces its overall complexity. Although the built-in plugins in KubeAdmiral offer versatile features that cater to common use cases, users have the flexibility to enhance the functionality by creating their own custom scheduling plugins for specific niche scenarios. This empowers users to seamlessly integrate with internal or existing systems. &lt;/p&gt;

&lt;p&gt;The KubeAdmiral scheduler interacts with external plugins via the HTTP protocol, enabling users to extend the scheduling logic with minimal effort and without having to modify the KubeAdmiral control plane. The plugin only needs to output the desired placement, and KubeAdmiral takes care of binding and enforcing those results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67gelcareqx1uxvi1pd5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67gelcareqx1uxvi1pd5.png" alt="Scheduler stages and plugins" width="800" height="343"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Scheduler stages and plugins&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Automatic Migration of Unschedulable Workloads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For replica scheduling, KubeAdmiral calculates the number of replicas that each member cluster should receive and overrides the replicas field in the template before distributing the resources to the member clusters. After the resources are distributed to member clusters, the kube-scheduler in each member cluster assigns the corresponding pods to available nodes. Thus, a full scheduling chain is completed.&lt;/p&gt;

&lt;p&gt;Occasionally, there are cases where the kube-scheduler fails to find suitable nodes for pods due to reasons including node outages, resource shortages, and unmet node affinity requirements. If left unaddressed, the unschedulable pods will remain pending. KubeAdmiral resolves this by automatically migrating the unschedulable pods to other clusters, enabling better resource utilization overall.&lt;/p&gt;

&lt;p&gt;As an illustration, consider three clusters A, B, and C with an equal weight distribution for six replicas. After the initial scheduling by KubeAdmiral, each cluster receives two replicas. If the two replicas in cluster C fail to be scheduled by kube-scheduler after a while, KubeAdmiral automatically shifts them to clusters A and B, ensuring the desired availability of 6 replicas across all clusters.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cluster&lt;/th&gt;
&lt;th&gt;A&lt;/th&gt;
&lt;th&gt;B&lt;/th&gt;
&lt;th&gt;C&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weight&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas before scaling down&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change&lt;/td&gt;
&lt;td&gt;-10&lt;/td&gt;
&lt;td&gt;-11&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas after scaling down&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Dynamic Replica Distribution Based on Real-Time Resource Availability&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In a multi-cluster setup, the resource utilization of each cluster fluctuates as machines go online or offline. Relying solely on the static weight replica scheduling provided by KubeFed RSP can easily lead to skewed resource utilization. Clusters with a high deployment rate are prone to pod pending during upgrade, while clusters with a low deployment rate have idle resources that are wasted.&lt;/p&gt;

&lt;p&gt;As a solution to this, KubeAdmiral introduces dynamic weight scheduling based on real-time cluster resource utilization. It calculates the amount of available resources by collecting the total and allocated resources of each cluster, and uses it as the weight for replica scheduling. This ultimately achieves dynamic load balancing across all member clusters. In practice, we are able to maintain a stable deployment rate of 95-98% or above in all member clusters with this approach.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Refined Replicas Rescheduling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;KubeFed’s replica rescheduling algorithm usually results in less than ideal distributions for scaling operations. As an illustration, consider 30 replicas currently distributed to 3 member clusters A, B, and C with equal weights. If the workload is scaled down to 9 replicas, KubeFed has 2 possible behaviors depending whether the user enables rebalance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If rebalance = false, KubeFed retains existing replicas, disregarding cluster weights.&lt;/li&gt;
&lt;li&gt;If rebalance = true, KubeFed disregards current distribution and rebalances replicas based on weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As seen above, KubeFed is unable to devise a distribution that satisfies fault tolerance and load balancing requirements without compromising service availability. To address this, KubeAdmiral developed a refined replica rescheduling algorithm that guarantees service availability and produces distributions that are as close to the optimal distribution as possible. The gist of the algorithm is to distribute the increment or decrement in replicas, instead of the total replicas.&lt;/p&gt;

&lt;p&gt;Using the same scenario of scaling down from 30 replicas to 9 replicas above, the refined algorithm roughly proceeds as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Current distribution = [15, 15, 0]; total current replicas: 30&lt;/li&gt;
&lt;li&gt;Desired distribution = [3, 3, 3]; total desired replicas: 9&lt;/li&gt;
&lt;li&gt;Distance = desired – current = [-12, -12, 3]; total distance: -21&lt;/li&gt;
&lt;li&gt;For scaling down, remove any positive distance terms; distance = [-12, -12, 0]&lt;/li&gt;
&lt;li&gt;Distribute the total distance -21 using the distance vector [-12, -12, 0] as weights; adjustments = [-10, -11, 0]&lt;/li&gt;
&lt;li&gt;Final distribution = current + adjustments = [15, 15, 0] + [-10, -11, 0] = [5, 4, 0]&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cluster&lt;/th&gt;
&lt;th&gt;A&lt;/th&gt;
&lt;th&gt;B&lt;/th&gt;
&lt;th&gt;C&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weight&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas before scaling down&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change&lt;/td&gt;
&lt;td&gt;-10&lt;/td&gt;
&lt;td&gt;-11&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas after scaling down&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Support for Native Kubernetes Resource API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Unlike KubeFed, which requires users to use an incompatible “federated” API, KubeAdmiral caters to the usage habits of single-cluster Kubernetes users by providing support for native Kubernetes APIs. After the user creates a native resource (such as Deployment), KubeAdmiral’s federate-controller automatically converts it into an internal object for use by downstream KubeAdmiral controllers. This enables users to quickly transition from a single-cluster to a multi-cluster architecture with low onboarding cost.&lt;/p&gt;

&lt;p&gt;However, KubeAdmiral doesn’t stop there. In a single cluster, Kubernetes controllers update the status of resources to reflect their current state. For example, a Deployment‘s status reflects its rollout progress and number of replicas it currently has. Users or upper-layer systems often rely on such status. In a multi-cluster environment, the status is populated on individual Deployments propagated to member clusters. Users must check the status of resources in each cluster individually, leading to a fragmented perspective and reduced operational efficiency.&lt;/p&gt;

&lt;p&gt;To solve this problem and seamlessly support native resources, KubeAdmiral introduces status aggregation. The KubeAdmiral status-aggregator collects and aggregates the status of individual resources from member clusters and writes it back to the native resources. This allows users to observe the global resource status at a glance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;KubeAdmiral has been incubating within ByteDance for a while, and has been an integral part of ByteDance’s internal PaaS platform TCE. Battle-tested by large-scale applications, it has accumulated many valuable practical experiences. To give back to the community, KubeAdmiral has officially been open-sourced on GitHub.&lt;/p&gt;

&lt;p&gt;Looking forward, we plan to continue working on KubeAdmiral, especially in the following areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continue to improve the orchestration and scheduling capabilities of stateful and job-like workloads, and develop advanced capabilities such as automatic migration and cost-based scheduling to embrace the new era of multi-cloud batch computing.&lt;/li&gt;
&lt;li&gt;Improve user experience and further reduce users’ cognitive burden, striving for a pleasant out-of-the-box experience.&lt;/li&gt;
&lt;li&gt;Improve observability, optimize logging and metrics, and enhance the scheduler’s explainability.&lt;/li&gt;
&lt;li&gt;Explore features such as one-click migration from single cluster, further smoothening the onboarding experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-cluster orchestration and scheduling is not a simple topic. We hope our experience and solution could be useful to the community. We look forward to more friends joining the KubeAdmiral community, and welcome everyone to try KubeAdmiral and give us suggestions!&lt;/p&gt;

&lt;p&gt;GitHub repo: &lt;a href="https://github.com/kubewharf/kubeadmiral"&gt;https://github.com/kubewharf/kubeadmiral&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudnative</category>
      <category>kubernetes</category>
      <category>cloud</category>
      <category>go</category>
    </item>
    <item>
      <title>Gödel Scheduler open-sourced: a unified scheduler for online and offline workloads</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Wed, 03 Apr 2024 09:16:53 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/godel-scheduler-open-sourced-a-unified-scheduler-for-online-and-offline-workloads-4a8i</link>
      <guid>https://dev.to/bytedanceoss/godel-scheduler-open-sourced-a-unified-scheduler-for-online-and-offline-workloads-4a8i</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Since its open-source release in 2014, Kubernetes has rapidly become the de facto standard for container orchestration. The infrastructure team at ByteDance adopted Kubernetes early on to build our private cloud platform. Over the years, ByteDance’s rapid growth across various business lines, including microservices, recommendation/advertising/search services, machine learning &amp;amp; big data, and storage, has led to a substantial increase in the demand for computing resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ty5pkzccf8jyhtwlko3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ty5pkzccf8jyhtwlko3.png" alt="business lines" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Initially, ByteDance managed its online and offline workloads with separate resource pools, each dedicated to distinct business segments. To accommodate the surge in online business demands during significant holidays and major events, the infrastructure team usually needed to plan ahead by reallocating resources from offline to online pools to bolster the capacity for handling increased online activities. While this temporary fix satisfied immediate requirements, the inter-pool resource borrowing process proved to be time-consuming, operationally heavy, and inefficient. &lt;/p&gt;

&lt;p&gt;Furthermore, maintaining separate resource pools for online and offline workloads resulted in significant colocation costs, leaving little scope for enhancing resource utilization. &lt;/p&gt;

&lt;p&gt;Therefore, the infrastructure team sought to implement a unified system for scheduling and managing both online and offline workloads. This initiative aimed to facilitate resource pooling, enhance resource utilization and elasticity, optimize costs and user experiences, and alleviate operational burdens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practice of Unified Scheduling
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Enhancement beyond Kubernetes Default Scheduler:
&lt;/h3&gt;

&lt;p&gt;Since extensive use of Kubernetes in 2018, ByteDance continuously optimized various components of Kubernetes for functionality and performance. However, with the containerization of recommendation/advertising/search services in 2019, the native Kubernetes scheduler, in terms of both functionality and performance, was farther away from meeting ByteDance’s business requirements.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;In terms of functionality&lt;/strong&gt;, more granular resource scheduling capabilities and flexible preemption strategies were required. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In terms of performance&lt;/strong&gt;, the native Kubernetes default scheduler could only achieve a scheduling throughput of around 10 pods per second in a cluster of 5000 nodes, often causing business upgrades to be bottlenecked, far from meeting the requirements. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, the team introduced a number of key optimizations to the Kubernetes default scheduler, including:&lt;/p&gt;

&lt;h4&gt;
  
  
  Functionality:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Extended the scheduling capabilities to support non-native resources, such as memory bandwidth and network bandwidth.&lt;/li&gt;
&lt;li&gt;Supported for micro-topology scheduling.&lt;/li&gt;
&lt;li&gt;Refactored preemption implementation by providing a pluggable preemption framework to support extending preemption capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Performance:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Optimized the data synchronization mechanism between Scheduler cache and Snapshot by refactoring data structure and further strengthening the concept of incremental update between snapshots.&lt;/li&gt;
&lt;li&gt;Cached scheduling results for homogenous scheduling units to reduce redundant calculations and improve efficiency.&lt;/li&gt;
&lt;li&gt;Optimized preemption implementation by reorganizing preemption-related data structures, applying pruning timely, and reducing unnecessary computation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By implementing the aforementioned optimizations, we successfully enhanced our containerization capabilities to meet ByteDance’s rapidly expanding needs. This led to a remarkable 30-fold increase in scheduling throughput. That is, in a cluster comprising 10,000 nodes, we consistently achieved a scheduling rate of 300 pods per second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gödel Scheduler
&lt;/h2&gt;

&lt;p&gt;In 2020, ByteDance initiated a unified scheduling and resource management project for both online and offline business operations. The objective was to enhance overall resource utilization, improve operational efficiency, and reduce maintenance overheads. Initially, the plan involved managing both online and offline tasks through a singular scheduling system. However, this approach presented challenges, primarily due to the intricate nature of offline scheduling, which differed markedly from online processes, especially in its demand for high throughput.&lt;/p&gt;

&lt;p&gt;The native Kubernetes scheduler, primarily designed for Pod-level scheduling, was somewhat limited in its support for more complex “Job” scheduling semantics and encountered performance limitations when dealing with these higher-level demands. To effectively address these unique requirements and to better align with the diverse operational needs of ByteDance, the decision was made to develop a bespoke, in-house distributed scheduler. This led to the creation of the Gödel Scheduler, specifically tailored to integrate with the Kubernetes system and to handle the demanding and varied scheduling needs of ByteDance’s expansive and evolving business landscape.&lt;/p&gt;

&lt;p&gt;The Gödel Scheduler is a distributed system crafted to consolidate the scheduling of both online and offline workloads. This scheduler is an enhancement of the Kubernetes (K8s) Scheduler, designed to augment scalability and improve scheduling quality. It is adept at fulfilling the diverse functional and performance demands of ByteDance’s online and offline operations. Key features of the Gödel Scheduler include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Optimistic Concurrency&lt;/strong&gt;: It incorporates optimistic concurrency concepts, moving the most time-consuming unit-to-node matching (filtering and scoring) to the scheduler component. This allows for concurrent execution and improves scheduling throughput in large-scale clusters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-Layer Scheduling Abstraction (Unit and Pod) and Framework&lt;/strong&gt;: This provides more flexible batch scheduling capabilities, better supporting offline operations while also improving scheduling throughput and system scalability. The extended framework handles special scenarios more effectively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich Functionality and High Performance&lt;/strong&gt;: It meets the demands of various operations including online, offline (batch and stream), and training tasks, achieving true unified scheduling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility with the Kubernetes Ecosystem&lt;/strong&gt;: It can serve as an alternative to the K8s Scheduler, but due to performance and architectural optimizations, its framework interface is not entirely the same as the K8s Scheduler. However, its extensibility remains unaffected, and it can implement scheduling plugins like Kubernetes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is the architecture diagram of Gödel Scheduler.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklw95ffa3hhelotkwss5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklw95ffa3hhelotkwss5.png" alt="architecture diagram" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As outlined, the Gödel Scheduler consists of three primary components: the Dispatcher, the Scheduler, and the Binder. Key to its architecture is the Scheduler component, which is typically deployed in multiple shards to facilitate optimistic concurrency scheduling. This multi-shard deployment enhances its efficiency and scalability. On the other hand, the Dispatcher and the Binder are each deployed as single instances, a configuration that suits their specific roles and responsibilities within the Gödel Scheduler system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dispatcher
&lt;/h3&gt;

&lt;p&gt;The Dispatcher plays a pivotal role in managing application queuing, distribution, and node partitioning. It is comprised of several key components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sort Policy Manager&lt;/strong&gt;: This module handles the queuing of applications. Currently, it implements FIFO and DRF/FairShare queuing strategies, the latter still pending production use. Future enhancements will introduce more sophisticated queuing strategies, including those based on priority values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dispatching Policy Manager&lt;/strong&gt;: Its primary function is to allocate applications across various Scheduler instances. At present, the LoadBalance strategy is employed as the default. Future updates aim to make this feature more versatile and plugin-based.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node Shuffler&lt;/strong&gt;: This component is tasked with organizing cluster nodes relative to the number of Scheduler instances. It assigns each node to a specific node partition, with each Scheduler instance overseeing one partition. During the scheduling process, a Scheduler first considers nodes within its partition before exploring nodes in other partitions. This arrangement is dynamically adjusted in response to changes in node availability or Scheduler count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition Rules&lt;/strong&gt;: Currently, the system strives for an even distribution of nodes among Scheduler instances. Plans are underway to expand these partition strategies, enhancing their configurability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler Maintainer&lt;/strong&gt;: This element is responsible for monitoring the status of Scheduler instances. It tracks aspects like health status, workload, and the node count within each partition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reconciler&lt;/strong&gt;: Operating periodically, the Reconciler oversees the status of various elements like Pods, Nodes, Schedulers, and SchedulingUnits. It addresses any errors, discrepancies, or deficiencies, ensuring system integrity and performance. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Scheduler
&lt;/h3&gt;

&lt;p&gt;The Scheduler plays a critical role in the decision-making process for scheduling and preempting applications, although it does not execute these decisions itself (that task is handled by the Binder). It operates on a two-tier framework: the Unit Scheduling Framework and the Pod Scheduling Framework. The entire scheduling procedure is segmented into three principal phases: Node Organizing, Unit Scheduling, and Unit Preempting.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node Organizing&lt;/strong&gt;: This phase involves filtering and sorting nodes to streamline the scheduling process and enhance its quality. It consists of two types of plugins:

&lt;ul&gt;
&lt;li&gt;Locating Plugins: These filter nodes are based on specific application information.&lt;/li&gt;
&lt;li&gt;Grouping Plugins: These group nodes according to available resources or Job-level affinities.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit Scheduling&lt;/strong&gt;: In this stage, nodes are matched and scored in alignment with application requests that have been filtered through the Node Organizing plugins. This process is analogous to the Kubernetes (K8s) Scheduler framework, encompassing:

&lt;ul&gt;
&lt;li&gt;Filtering Plugins: These filter nodes are based on the requisites of the application requests.&lt;/li&gt;
&lt;li&gt;Scoring Plugins: These assign scores to nodes that have been filtered in the previous step.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit Preempting&lt;/strong&gt;: If suitable nodes are not found during the Unit Scheduling phase, the Scheduler progresses to the preemption phase. Here, it tries to free up resources by preempting running application instances to make room for new ones. This phase includes:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Victims Searching: Identifying potential victim applications that can be preempted.&lt;/li&gt;
&lt;li&gt;Candidates Sorting: Sorting both nodes and potential victims to identify the most appropriate choices for preemption.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Binder
&lt;/h3&gt;

&lt;p&gt;The Binder plays a crucial role in the final stages of the scheduling process, focusing on conflict detection, preemptive operations, and executing the binding of applications to resources. It comprises three main components: ConflictResolver, PreemptionOperator, and UnitBinder.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ConflictResolver&lt;/strong&gt;: This component is tasked with detecting concurrent conflicts in the scheduling process. It operates in two modes:

&lt;ul&gt;
&lt;li&gt;Cross Node Conflict Resolver: Checks for conflicts that might occur across different nodes.&lt;/li&gt;
&lt;li&gt;Single Node Conflict Resolver: Identifies conflicts within a single node.
If any conflict is detected, the application is immediately rejected and rescheduled.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PreemptionOperator&lt;/strong&gt;: In scenarios where no conflict exists but preemption is necessary, this operator takes charge. It executes the preemption by deleting the victims (applications or processes that need to be terminated to free up resources) and then awaits the final scheduling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UnitBinder&lt;/strong&gt;: This part of the Binder is responsible for the preparatory work required before binding, such as dynamically creating storage volumes, and then carries out the actual binding operation, linking applications to the designated resources.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Noteworthy, the current version of the Binder integrates a PodGroup controller. This controller is responsible for managing the state and lifecycle of PodGroups. However, it’s important to note that in a future version we plan to remove this functionality from the Binder, transitioning it into an independent controller. &lt;/p&gt;

&lt;h2&gt;
  
  
  Experience
&lt;/h2&gt;

&lt;p&gt;Over the past two years, the Gödel Scheduler has been a cornerstone within ByteDance, offering a wealth of scheduling features and semantics. It has efficiently and reliably supported the operations of ByteDance’s diverse and complex business workloads.&lt;/p&gt;

&lt;p&gt;Building upon the foundation of architectural enhancements, ByteDance has implemented profound performance optimizations drawing from its experience with the Kubernetes native scheduler. Integrated with ByteDance’s internally refined Kubernetes system, the Gödel Scheduler now boasts an impressive throughput: 2000+ pods/s in a single shard and  5000+ pods/s across multiple shards. ByteDance’s ongoing efforts to expand single-cluster capacity have culminated in their largest prod cluster reaching over 20,000 nodes and more than 1,000,000 pods.&lt;/p&gt;

&lt;p&gt;After years of thorough internal practices and enhancements within ByteDance, Gödel Scheduler has achieved a state of relative stability. In 2023, the top-notch cloud computing conference, SoCC, accepted our paper on Gödel Scheduler, highlighting ByteDance’s unified approach to large-scale resource management and scheduling. The RD team was also invited to present the work at the conference. For those interested, the Gödel Scheduler paper is available at &lt;a href="https://dl.acm.org/doi/10.1145/3620678.3624663"&gt;https://dl.acm.org/doi/10.1145/3620678.3624663&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With a commitment to contributing to the open-source community, the Bytedance team decided to open-source the Gödel Scheduler, offering a new scheduling solution that enhances cloud-native experiences for both online and offline services through its outstanding performance and comprehensive scheduling capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;Looking ahead, ByteDance is committed to the continual development of the Gödel Scheduler, focusing on enriching its features and enhancing its scalability. A significant area of attention will be optimizing the scheduling throughput in specific challenging scenarios, such as those involving high rates of deployment and frequent preemptions. Through innovative rescheduling strategies, ByteDance aims to tackle the intricate balance between maintaining scheduling performance and enhancing its quality. The overarching goal is to not only preserve the current scheduling throughput but also to substantially elevate the quality of scheduling.&lt;/p&gt;

&lt;p&gt;Moreover, ByteDance places a high priority on ecosystem development. Efforts will be made to ensure Gödel Scheduler’s compatibility with leading systems and frameworks used in various business applications. This initiative will include integration with prominent big data and machine learning frameworks, accompanied by practical usage examples and comprehensive documentation.&lt;/p&gt;

&lt;p&gt;To keep the community engaged and informed, a detailed roadmap for the Gödel Scheduler will be methodically laid out and made available on the Gödel Scheduler Repository. This will provide an opportunity for interested parties to track progress, contribute, and become active participants in the project.&lt;/p&gt;

&lt;p&gt;While the Gödel Scheduler has undergone numerous iterations within ByteDance, been rigorously tested in various scenarios, and demonstrated its effectiveness, ByteDance acknowledges that there is still considerable potential for advancement in terms of generality and standardization. ByteDance warmly invites and encourages members of the community to join in the development of the Gödel Scheduler, believing that collaborative efforts will lead to even greater improvements and innovations.&lt;/p&gt;

&lt;p&gt;Gödel Scheduler Project Repository: &lt;a href="https://github.com/kubewharf/godel-scheduler"&gt;https://github.com/kubewharf/godel-scheduler&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>cloud</category>
      <category>showdev</category>
      <category>go</category>
    </item>
    <item>
      <title>Katalyst: A QoS-based resource management system for workload colocation on Kubernetes</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Mon, 01 Apr 2024 11:01:26 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/katalyst-a-qos-based-resource-management-system-for-workload-colocation-on-kubernetes-5g2j</link>
      <guid>https://dev.to/bytedanceoss/katalyst-a-qos-based-resource-management-system-for-workload-colocation-on-kubernetes-5g2j</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This Blog originally published on &lt;a href="https://gokatalyst.io/blog/2023/12/06/katalyst-a-qos-based-resource-management-system-for-workload-colocation-on-kubernetes/"&gt;Katalyst’s blog&lt;/a&gt; by Pengcheng Tang&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The resource usage of web applications tends to fluctuate with changes in the number of visitors, displaying noticeable tidal characteristics. To ensure stability, service providers often allocate resources for their applications according to resource usage during peak periods. These resources can easily be underutilized during off-peak hours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d2j9gczww5g8h5bqyc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d2j9gczww5g8h5bqyc7.png" alt="resource-provision" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If these idle resources can be reclaimed and temporarily allocated to lower-priority services and returned promptly to online services when needed, the overall resource utilization rate can be significantly improved.&lt;/p&gt;

&lt;h2&gt;
  
  
  ByteDance colocation practices
&lt;/h2&gt;

&lt;p&gt;ByteDance operates at a massive scale with diverse business types, encompassing various categories such as microservices, advertising, machine learning, big data, and storage. Typically, different business types have distinct resource management requirements at the infrastructure level. The conventional approach involves segmenting resource pools based on business lines or service types to meet customized demands.&lt;/p&gt;

&lt;p&gt;However, this method of resource pool segmentation can lead to resource silos, preventing flexible resource sharing and hindering the overall efficiency of resource utilization and cost optimization. It also adds to the operational burden of managing clusters.&lt;/p&gt;

&lt;p&gt;Furthermore, considering that different types of businesses have complementary SLO requirements and resource usage patterns, the infrastructure team aims to leverage these characteristics fully. They do so through scheduling and control mechanisms to optimize resource efficiency, achieve the convergence and consolidation of resource pools, and assist business teams in attaining lower resource costs and greater elasticity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqndbwv05dmn3bc43pr0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqndbwv05dmn3bc43pr0.png" alt="Types of workloads" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To achieve unified resource management, ByteDance began building a unified infrastructure based on Kubernetes in 2016. At the current stage, ByteDance has essentially completed the containerization of all microservices, advertising, and a significant portion of machine learning and big data businesses. Throughout this process, the infrastructure team has continued to explore resource optimization methods under a unified resource pool and gradually developed a resource pool deployment approach that combines ’elastic scaling’ and ‘colocation.’ Elastic Scaling: This enables machine-level and Numa-level resource time-sharing, combining business and system metrics to guide horizontal and vertical scaling strategies for application instances. This ultimately allows offline services to purchase more idle resources at a lower cost, and online services to purchase more peak-time resources at a higher cost through resource market-oriented operations, leading to overall efficiency improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Colocation&lt;/strong&gt;: It offers the ability to oversell resources, making full use of ‘sold but underutilized resources’ in the cluster to deploy more low-priority tasks. Simultaneously, we enhance resource isolation mechanisms across multiple dimensions such as CPU, memory, disk, and network at the system level. Minute-level control mechanisms, combined with intelligent load prediction algorithms, are adopted to ensure service stability according to their SLOs.&lt;/p&gt;

&lt;p&gt;This solution combines Kubernetes and Yarn systems for joint control. It runs control components of both Kubernetes and Yarn on the same machine, and coordinates the allocatable resources between the two systems through a central coordination component. On top of this joint control system, we achieve real-time resource estimation based on service resource profiles, ensuring more flexible and dynamic resource allocation while meeting various service SLA requirements.&lt;/p&gt;

&lt;p&gt;During the implementation of this colocation solution, the infrastructure team verified the feasibility of resource pooling, constructed the foundational capabilities for colocation, and improved the overall utilization of core clusters from 23% to 60%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febxl6rwvfe1g4kb2oobb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febxl6rwvfe1g4kb2oobb.png" alt="Colocation Practices" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Katalyst: From Internal Validation to Open Source
&lt;/h2&gt;

&lt;p&gt;After undergoing extensive testing with businesses like Douyin and Toutiao, which have large-scale tidal traffic, ByteDance’s cloud-native colocation practices have matured. In order to help end users in cloud native community understand the principles behind large-scale colocation practices and improve overall resource efficiency of their own, we have refactored and enhanced the resource management system in a Kubernetes Native manner and built the resource management system “Katalyst”, which has now been officially open-sourced. The name “Katalyst” is derived from the word “catalyst,” and the ‘K’ symbolizes its ability to provide enhanced automation for resource management for all workloads running within the Kubernetes ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Katalyst
&lt;/h3&gt;

&lt;p&gt;Katalyst originated from ByteDance’s colocation practices and has been extended and supplemented in terms of resource management capabilities:&lt;/p&gt;

&lt;p&gt;Developed entirely within the context of hyperscale colocation practices, achieving true reuse of internal and external technology systems.&lt;br&gt;
Built based on a plugin-based architecture, allowing users to customize various modules such as scheduling, control strategies on top of the Katalyst Framework.&lt;br&gt;
Provides one-click deployment templates and comprehensive operation manuals, reducing the understanding and deployment costs for end users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Abstraction
&lt;/h3&gt;

&lt;p&gt;The native Kubernetes Quality of Service (QoS) system does not meet the requirements of large-scale production environments, prompting Katalyst to build a QoS system of its own. Katalyst defines four QoS classes including dedicated_cores, shared_cores, reclaimed_cores and system_cores. Users can assign different QoS class to their applications according to different QoS requirements. Based on our practices in Bytedance, CPU, in most scenarios, is a dominant resource that can affect applications’ performance and users tend to associate QoS requirements with CPU as well. So despite the fact that QoS requirement encompasses various kinds of resources(i.e. CPU, memory, disk io, network bandwidth etc), we named it after CPU. Each level is accompanied by various enhancement mechanisms (e.g., whether NUMA node binding is required or if network affinity and bandwidth restrictions are necessary), enabling differentiated resource allocation and control strategies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvtfio1fes2wo1k90isb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvtfio1fes2wo1k90isb.png" alt="QoS Classes" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Through this abstract resource model, Katalyst provides users with a unified resource entry point. Users can accurately express their specific needs by mapping business services to the appropriate QoS class and sales model based on business requirements. This ultimately allows users to obtain resources from a unified resource pool without needing to delve into the underlying details.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51molk23jbzzr9yxtzmp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51molk23jbzzr9yxtzmp.png" alt="Node View" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Overall Architecture
&lt;/h3&gt;

&lt;p&gt;In the early stages, the colocation architecture had several issues: although the joint control of Kubernetes and Yarn systems achieved colocation of online and offline businesses, the complexity of the system incurs more maintenance costs. Additionally, this architecture led to resource usage overhead. It comes from the resource consumption of agents running on each node. While resource consumption on an individual node is not significant, the accumulated overhead in hyperscale clusters can be substantial. Moreover, the use of two control systems increased system complexity. Any abnormalities at any stage could lead to resource calculation errors. In Katalyst, we optimized and refactored the overall colocation architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F590i78slgt211o3p1ut2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F590i78slgt211o3p1ut2.jpg" alt="Overall Architecture" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the control layer, we integrated the fusion system based on both Kubernetes and Yarn systems into a single Kubernetes-based system. Specifically, we retained the API entry points for Kubernetes and Yarn at the access layer, while unifying metadata management and resource control implementation within Katalyst, which is Kubernetes-native.&lt;/p&gt;

&lt;p&gt;At the scheduling layer, Katalyst implemented a coordinated resource scheduling and control mechanism between “centralized scheduling” and “node resource management” based on unified metadata.&lt;/p&gt;

&lt;p&gt;On the node side: Katalyst extends Kubernetes with a module named QoS Resource Manager (QRM). This module enables plugin-based node-level topology affinity allocation and reports the topology to the control plane through custom CRDs, facilitating scheduling processes. At runtime, Katalyst continuously makes estimation for resource allocation according to system metrics, service level indicators and QoS requirements of the pods. The allocation result is then dynamically sent to the Container Runtime Interface (CRI) through the QRM reconcile loops. The resource estimation algorithm and QRM implementation can be customized through plugins, making resource control strategies more adaptable to different business scenarios.&lt;/p&gt;

&lt;p&gt;On the scheduling side: Katalyst extends scheduler with richer scheduling capabilities through scheduler framework. During scheduling, it takes into account how applications of different QoS classes should allocate and collaborate resources when running in the same cluster. The scheduler also combines real-time data and service profiles to do dynamic rebalancing across the entire cluster, reducing cluster vacancy rates and enhancing business stability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5mqo0fe70wmzmqui1yn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5mqo0fe70wmzmqui1yn.png" alt="QoS Resource Manager" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lastly, under a unified control system, we can fully leverage Kubernetes’ advantages in API design. By decoupling internal systems and generalizing control strategies through custom CRDs, we are able to iteratively improve the system through a plugin-based approach, achieving true convergence between internal and external systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  RoadMap
&lt;/h2&gt;

&lt;p&gt;Katalyst, as a resource management system, has colocation as one of its core business scenarios. In addition to abstracting the core concepts mentioned above, we have provided and planned various QoS capabilities for Katalyst:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fine-grained resource estimation strategies: Katalyst supports multiple resource estimation strategies including heuristics, unsupervised learning, and QoS-aware algorithms, improving resource utilization by accurately calculating and predicting the amount of resources that can be reclaimed from the nodes.&lt;/li&gt;
&lt;li&gt;Multi-dimensional resource isolation capabilities: Using technologies such as cgroup, RDT, iocost, tc, etc., Katalyst achieves effective isolation of various resources, including CPU, memory, disk, and network, in different colocation scenarios.&lt;/li&gt;
&lt;li&gt;Multi-level load eviction strategies: Katalyst supports multi-level eviction strategies based on various metrics, ensuring online business QoS while maximizing offline business QoS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Besides colocation, Katalyst also provides enhanced resource management capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recommendation and autoscaling: Katalyst provides enhanced VPA/HPA capabilities and advanced recommendation algorithms. This helps end users make a more accurate estimation of pod resource request/limit or replica numbers, hence improve deployment rates and resource utilization.&lt;/li&gt;
&lt;li&gt;Tidal(Exclusive-mode) colocation: While colocating online and offline applications on the same node gives more resource efficiency improvement, it requires all the infrastructure intricacies(e.g. Resource isolation, scheduling etc.) to work smoothly, which makes the overall system complicated. Katalyst provides an exclusive-mode colocation where the resource is reclaimed in node granularity so that either online or offline applications can run on the same node simultaneously. This allows users to improve resource efficiency at a lower operation cost.&lt;/li&gt;
&lt;li&gt;Node overcommitment: With node overcommitment, Katalyst allows the scheduler to schedule more pods to a node without end users’ awareness. Meanwhile, Katalyst adopts methods like interference detection and mitigation, node resource prediction algorithm and so forth to guarantee the QoS requirement of higher priority tasks.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg5mvdfedwbf8nbk32bz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg5mvdfedwbf8nbk32bz.png" alt="Resource Efficiency Suite" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For detailed plans, please refer to the roadmap. Besides, we have just given a brief introduction about Katalyst at Kubecon China. You can also refer to &lt;a href="https://www.bilibili.com/video/BV1bc411R7xQ/?spm_id_from=333.999.0.0&amp;amp;vd_source=c09f0713b2507369924e94f4fec6c133"&gt;our talk&lt;/a&gt; for more information.&lt;/p&gt;

&lt;p&gt;While colocation has undergone several iterations within ByteDance, a universal, standardized platform foundation must be refined through various scenarios. We look forward to your participation in the Katalyst community and sharing your scenarios and requirements for colocation, resource efficiency improvement and so forth.&lt;/p&gt;

&lt;p&gt;GitHub| &lt;a href="https://github.com/kubewharf/katalyst-core"&gt;https://github.com/kubewharf/katalyst-core&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>opensource</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Monorepos with Rspack Just Makes Things Easier</title>
      <dc:creator>Josh Alphonse</dc:creator>
      <pubDate>Tue, 06 Feb 2024 23:17:00 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/monorepos-with-rspack-just-makes-things-easier-45l3</link>
      <guid>https://dev.to/bytedanceoss/monorepos-with-rspack-just-makes-things-easier-45l3</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6dptumUk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DMGU2NzFjNzE3Y2MyZjlhZGE2ODQzN2Q2ZDZlYjgzZGVfcm5QbW1JeVBVblVDaG1jUWdSOFlNVW5aNTUyQjR0WWZfVG9rZW46VmtOYWJhaEJXb0dZd1h4YnlJM3VZU2NNc2tJXzE3MDcyNjEzNDY6MTcwNzI2NDk0Nl9WNA" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6dptumUk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DMGU2NzFjNzE3Y2MyZjlhZGE2ODQzN2Q2ZDZlYjgzZGVfcm5QbW1JeVBVblVDaG1jUWdSOFlNVW5aNTUyQjR0WWZfVG9rZW46VmtOYWJhaEJXb0dZd1h4YnlJM3VZU2NNc2tJXzE3MDcyNjEzNDY6MTcwNzI2NDk0Nl9WNA" alt="" width="800" height="306"&gt;&lt;/a&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ll4Z7JJo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DZTliY2FlZWI2MzA3YmI1MWRiZTc4ZDcwZWI2OTNkYzdfTkpxVktMM2pndzZwQWhDQjNwRTBrMk5PS1dhdGhnZE5fVG9rZW46TUNnamJ2dEhab2RDSFZ4U0VxNnV4YWdrc0ZiXzE3MDcyNjEzNDc6MTcwNzI2NDk0N19WNA" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ll4Z7JJo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DZTliY2FlZWI2MzA3YmI1MWRiZTc4ZDcwZWI2OTNkYzdfTkpxVktMM2pndzZwQWhDQjNwRTBrMk5PS1dhdGhnZE5fVG9rZW46TUNnamJ2dEhab2RDSFZ4U0VxNnV4YWdrc0ZiXzE3MDcyNjEzNDc6MTcwNzI2NDk0N19WNA" alt="" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;When it comes to web dev, it's hard to keep up with all the latest trends. I also can't come up with many examples of ecosystems that move faster than JavaScript's. It feels as if a new version of the iPhone is released every month. However, just because things are moving fast, doesn't mean you have to be left in the dust.&lt;/p&gt;

&lt;p&gt;Monorepos are one of the hottest additions to the scene especially for web developers. One of the biggest challenges when building a full stack app is managing complexity. The larger your application becomes, the more complex it gets and it doesn't matter how big your organization or team may be. You've probably experienced this kind of struggle if you're building a product that has many different front-end code bases that share backend's and interfaces.&lt;/p&gt;

&lt;p&gt;The way orgs are solving this issue now is with the use of monorepos. Monorepos help mitigate the confusion that comes along with having to manage multiple repositories for multiple distinct projects/products. A monorepo can consolidate your projects all into one repository. Contrary to the name, monorepos are technically not monolithic. A common misconception is that if you use a monorepo, every project in the repository have to be released on the same day. This acutally isn't the case. Just because we are developing our code in the same place doesn't mean we have to deploy everything at the same time nor to the same place. Since your code bases can exist in one repository, it makes common tasks like code sharing and refactoring way easier. This leads to significantly lowering the cost that comes with creating libs, microservices and microfronents. If you want true development flexibility, then you have to give this approach a try. This is why some of the biggest companies are implementing monorepos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rspack
&lt;/h3&gt;

&lt;p&gt;So, where does &lt;a href="https://www.rspack.dev/"&gt;Rspack&lt;/a&gt; come into play here?&lt;/p&gt;

&lt;p&gt;As you may know, Rspack is a high performance web bundler that offers interoperability with the webpack ecosystem but also build systems like &lt;a href="https://nx.dev/"&gt;Nx&lt;/a&gt; to build smart monorepos! Rspack can bring several key benefits when used in conjunction with NX, primarily by enhancing the performance aspects of the development and build processes. Here are a few ways Rspack can positively impact an NX-powered project :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Speeding up builds:&lt;/strong&gt;Rspack is known for its quick bundling capabilities. In complex NX workspaces with multiple applications and libraries, faster bundling translates to quicker build times. This is especially beneficial when you're dealing with large-scale projects where build times can significantly impact developer productivity and continuous integration/deployment pipelines.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Reducing setup time:&lt;/strong&gt;Rspack has a reputation for requiring less configuration out of the box compared to other bundlers, which can simplify the setup process. In an NX workspace, this means you can get applications up and running more quickly, and you can spend less time adjusting bundler configurations and more time developing features for the e-commerce store.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Efficient code splitting:&lt;/strong&gt; Rspack can facilitate code splitting, which means that each page or feature can have its bundle. This ensures that users are only downloading the code necessary for the page they're visiting, improving load times and enhancing the user experience.&lt;/li&gt;
&lt;li&gt; Providing &lt;strong&gt;Hot Module Replacement (HMR)&lt;/strong&gt;While NX manages the overall structure and dependencies in your monorepo, Rspack can provide Hot Module Replacement for a smoother development experience. HMR allows changes to be applied to a running application without needing a full reload, preserving the application state and speeding up the development process.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Although monorepos can solve a bunch of problems for applications of any size, there is one feature in particular that I'm going to highlight in this blog post. That feature being the ability to manage aliases with Rspack and Nx.&lt;/p&gt;

&lt;p&gt;Defining aliases with rspack in an NX monorepo provides multiple benefits and solves several issues commonly faced in large-scale projects. Let's take a look before we dive into the tutorial:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Simplified Imports&lt;/strong&gt;: With aliases, you can avoid long and complicated relative paths that are hard to read and maintain. Aliases provide a clear, concise way to import modules.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Without alias... How do you even know where your functinos are?! Half the time in guesssing.
import utility from '../../../../libs/utils/src/utility';

// With alias
import utility from '@utils/utility';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Easier&lt;/strong&gt; &lt;strong&gt;Refactoring&lt;/strong&gt;: When you decide to rearrange your project's folder structure, you won't need to update every import statement. Instead, you only update the alias paths in the rspack configuration.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Code Readability&lt;/strong&gt;: Aliases can help signify the intent or the origin of the imported module more clearly than relative paths.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Consistency Across Apps and Libs&lt;/strong&gt;: In a monorepo, you typically have multiple apps and libraries. Aliases ensure that every part of your project refers to shared libs in the same way.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Namespace&lt;/strong&gt; &lt;strong&gt;Clarity&lt;/strong&gt;: By using aliases that represent features or shared libraries (like &lt;code&gt;@feature-a&lt;/code&gt; or &lt;code&gt;@shared&lt;/code&gt;), you provide a clear namespace. This indicates that the import is from a shared source and not a local module.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Avoid File Path Errors&lt;/strong&gt;: As projects grow, relative paths become more prone to errors when files are moved or when the developer is unsure of the current file's depth in the directory structure.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Enhanced Autocompletion&lt;/strong&gt;: Many IDEs can provide improved autocompletion for imports when aliases are set up properly, making development faster and reducing typos.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Collaboration Enhancement&lt;/strong&gt;: When working with a team, aliases ensure that everyone is using the same paths to common resources, reducing the cognitive load of understanding where a file is located within the project hierarchy.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Integration with Build Tools&lt;/strong&gt;: Tools like rspack can understand these aliases and use them to resolve the actual bundle paths during the build process, ensuring the generated bundles reflect the same structure and readability of the source code.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Decoupling the Code&lt;/strong&gt;: Aliases can help in abstracting the actual file system paths, which means developers can think more about architecture and less about the file system.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Tutorial
&lt;/h3&gt;

&lt;p&gt;Alright, so now that we have some base knowledge of what monorepos are, let's jump into a tutorial. This time around, we're going to set up a monorepo for an e-commerce store using NX and Rspack. We'll configure aliases to simplify module resolution, keeping our import statements clean and readable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before we get started
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Prerequisites&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;  At the minimum, a basic understanding of Javascript/Typescript and react&lt;/li&gt;
&lt;li&gt;  Node.js, yarn, or pnpm&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://nx.dev/getting-started/installation#installing-nx-globally"&gt;Install Nx globally&lt;/a&gt; depending on your package manager. I'm using &lt;code&gt;npm add --global nx@latest&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  Create a project with &lt;code&gt;npx create-nx-workspace myrspackapp --preset=@nx/rspack&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Step 1: Creating an NX Workspace
&lt;/h4&gt;

&lt;p&gt;First, we need to create a new NX workspace by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npx create-nx-workspace monrepo-example --preset=@nx/rspack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choose &lt;code&gt;empty&lt;/code&gt; as the preset for full customization. Once the setup is complete, navigate into your new workspace directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd monorepo-example
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Adding Applications and Libraries
&lt;/h4&gt;

&lt;p&gt;With NX, you can have multiple apps and libraries co-existing in a single monorepo. Since we installed Nx globally, we can use the nx command to generate our libraries.&lt;/p&gt;

&lt;p&gt;For this example, let's add an application and two libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nx generate @nrwl/react:application store-front
nx generate @nrwl/react:library ui-shared --directory=shared
nx generate @nrwl/react:library product-data --directory=shared
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will set up a React application (&lt;code&gt;store-front&lt;/code&gt;) and two shared libraries (&lt;code&gt;ui-shared&lt;/code&gt; and &lt;code&gt;product-data&lt;/code&gt;) under the &lt;code&gt;libs/shared/&lt;/code&gt; directory.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Configuring Aliases in rspack.config.js
&lt;/h4&gt;

&lt;p&gt;Create a &lt;code&gt;rspack.config.js&lt;/code&gt; file at the root of your workspace if it doesn't exist, and define your aliases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const path = require('path');

module.exports = (config) =&amp;gt; {
  config.resolve.alias = {
    ...config.resolve.alias,

    // For that beautiful UI:
    '@ui-elements': path.resolve(__dirname, 'libs/shared/ui-elements/src'),

    // For the business logic behind products:
    '@product-services': path.resolve(__dirname, 'libs/shared/product-services/src'),

    // Continue as your project grows...
  };
  return config;
};
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 4: Using Aliases in Your Code
&lt;/h4&gt;

&lt;p&gt;Now that we have our aliases, we can use them within our applications and libraries. For example, in our &lt;code&gt;store-front&lt;/code&gt; app, we can now import from our shared libraries like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { CheckoutButton, StoreBanner } from '@ui-elements';
import { fetchAllProducts } from '@product-services';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 5: Building the Application
&lt;/h4&gt;

&lt;p&gt;To build your application with the new alias configurations, simply use the NX build command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nx build store-front
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;NX will invoke Rspack with your custom configuration including the aliases, and you should see a successful build output.&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;p&gt;By using aliases with rspack in an NX monorepo, you're essentially laying down a scalable, maintainable foundation that streamlines development workflows and helps manage complexity as your project grows. Keep an eye out for more content and if you have any questions and want to join the community find us on &lt;a href="https://github.com/web-infra-dev/rspack"&gt;Github&lt;/a&gt; and &lt;a href="https://discord.gg/4wXUpdrK2z"&gt;Join us on our ByteDance Open Source Discord Server!&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>BMF 📹 + Hugging Face🤗, The New Video Processing BFFs</title>
      <dc:creator>Josh Alphonse</dc:creator>
      <pubDate>Thu, 01 Feb 2024 01:27:15 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/bmf-hugging-face-the-new-video-processing-bffs-59m8</link>
      <guid>https://dev.to/bytedanceoss/bmf-hugging-face-the-new-video-processing-bffs-59m8</guid>
      <description>&lt;p&gt;&lt;em&gt;TL;DR&lt;/em&gt;&lt;em&gt;if you want to test this tutorial before we start, try it out&lt;/em&gt; &lt;em&gt;&lt;a href="https://colab.research.google.com/drive/1eQxiZc2vZeyOggMoFle_b0xnblupbiXd?usp=sharing"&gt;here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/"&gt;Hugging Face&lt;/a&gt; has created a major shift in the AI community. It fuels cutting-edge open source machine learning/AI models and datasets. The Hugging Face community is thriving with great ideas and innovations to the point where the possibilities seem endless.&lt;/p&gt;

&lt;p&gt;Hugging Face is revolutionizing Natural Language Processing (NLP) with state-of-the-art solutions for tasks like translation, summarization, sentiment analysis, and contextual understanding. Its arsenal of pre-trained models makes it a robust platform for diverse NLP tasks, streamlining the integration of machine learning functionalities. Hugging Face simplifies the training, evaluation, and deployment of models with a user-friendly interface. The more I used Hugging Face in my own personal projects, the more I felt inspired to combine it with &lt;a href="https://babitmf.github.io/"&gt;Babit Multimedia Framework (BMF)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're reading this and are not familiar with BMF, it's a cross-platform multimedia processing framework by ByteDance Open Source. Currently, BMF is used to process over 2 billion videos a day across multiple social media apps. Can this get complex? Yes, it sure can. However, in this article, I'll break it all down, so you know how to create unique experiences across any type of media platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why&lt;/strong&gt; &lt;strong&gt;BMF&lt;/strong&gt;&lt;strong&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BMF stands out with its multilingual support, putting it ahead in the video processing game. BMF excels in various scenarios like video transcoding, editing, videography, and analysis. The integration of advanced technologies like Hugging Face with BMF is a game-changer for complex multimedia processing challenges.&lt;/p&gt;

&lt;p&gt;Before we get started with the tutorial, let me share with you some ideas I envision coming to life with BMF + Hugging Face:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multimedia Content Analysis:&lt;/strong&gt; Leveraging Hugging Face's NLP models, BMF can delve deep into textual data associated with multimedia content, like subtitles or comments, for richer insights.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accessibility:&lt;/strong&gt; NLP models can automatically generate video captions, enhancing accessibility for the hard-of-hearing or deaf community.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Content Categorization and Recommendation:&lt;/strong&gt; These models can sort multimedia content based on textual descriptions, paving the way for sophisticated recommendation systems.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Enhanced User Interaction:&lt;/strong&gt; Sentiment analysis on user comments can offer valuable insights into user engagement and feedback for content improvement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What now?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Source AI is creating the building blocks of the future. Generative AI impacts all industries, and this leads me to think about how generative AI can impact the future of broadcasting and video processing. I experimented with BMF and Hugging Face to create the building blocks for a broadcasting service that uses AI to create unique experiences for viewers. So, enough about the background, let's get it going!&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What we'll build&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Follow along, as we'll build a video processing pipeline with BMF that uses the &lt;a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"&gt;runwayml/stable-diffusion-v1-5&lt;/a&gt; model to generate an image to display as an overlayed image ontop of an encoded video. If that didn't make sense, don't worry, here's a picture for reference:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--O_ajNTgr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DODQ1NDg1YWEyYjg0MjM1NjhkNWIzNTgzZTIyODkxNzFfZE5TNkJsNnRVWUtqakJPd2hnWlNhYmd1UG9LN0l3WmNfVG9rZW46UmE1aGJ4OFJYb0p5dkF4MXVTRHVqamZDc2tlXzE3MDY3NTA2MjQ6MTcwNjc1NDIyNF9WNA" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--O_ajNTgr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DODQ1NDg1YWEyYjg0MjM1NjhkNWIzNTgzZTIyODkxNzFfZE5TNkJsNnRVWUtqakJPd2hnWlNhYmd1UG9LN0l3WmNfVG9rZW46UmE1aGJ4OFJYb0p5dkF4MXVTRHVqamZDc2tlXzE3MDY3NTA2MjQ6MTcwNjc1NDIyNF9WNA" alt="" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So why is this significant? The image of the panda is AI generated and combined with BMF , we can put it down a processing pipeline to put it on top of our video. Think about! There could be a scenario where you are creating a video broadcasting service and during live streams, you'd like to display images quickly and display them for your audience with a simple prompt. There can also be a scenario where you are using BMF to edit your videos and you'd like to add some AI-generated art. This tutorial is just one example. BMF combined with models created by the Hugging Face community opens up a whole new world of possibilities.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Let's Get Started&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A GPU(I'm using google Colab A100 GPU. You can also use v100 or TP4 GPUs but they will just run a bit slower)&lt;/li&gt;
&lt;li&gt;  Install &lt;a href="https://babitmf.github.io/docs/bmf/getting_started_yourself/install/#pip"&gt;BMFGPU&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Python 3.9-3.10 (strictly required to work with bmf)&lt;/li&gt;
&lt;li&gt;  FFMPEG&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find all the BMF installation docs &lt;a href="https://babitmf.github.io/docs/bmf/getting_started_yourself/install/#ffmpeg"&gt;here&lt;/a&gt;. The docs will highlight more system requirements if you decide to run things locally.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Getting Started&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Begin by ensuring that essential toolkits like Hugging Face Transformers and BMF are installed in your Python environment. Use pip for installation:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Initial Setup&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; First, we'll clone the following repository to get our video that we want to process(If you are coding along and want to use your own video, create your own repo and simply add a video file, preferably a short video and add to easily clone just like I did. You can also just save the video to the directory you're coding in.)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/Joshalphonse/Bmf-Huggingface.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; Install BabitMF-GPU to accelerate your video processing pipeline with BMF's GPU capablities
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install BabitMF-GPU
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; Install the following dependencies
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install requests diffusers transformers torch accelerate scipy safetensors moviepy Pillow tqdm numpy modelscope==1.4.2 open_clip_torch pytorch-lightning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; Install ffmpeg.BMF framework utilizes the FFmpeg video decoders and encoders as the built-in modules for video decoding and encoding. It's necessary for users to install supported FFmpeg libraries before using BMF.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install ffmpeg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dpkg -l | grep -i ffmpeg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ffmpeg -version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This package below is installed to show the BMF C++ logs in the colab console, otherwise only python logs are printed. This step is not necessary if you're not in a Colab or iPython notebook environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install wurlitzer
%load_ext wurlitzer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; Create a new folder in the directory for the github repository we cloned. We'll need this path later on.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sys
sys.path.insert(0, '/content/Bmf-Huggingface')
print(sys.path)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Creating the Module&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now it's time for the fun part. We'll create a module to process the video.Here's the module I created and I'll break it down for you below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import bmf
from bmf import bmf_sync, Packet
from bmf import SubGraph
from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of a panda eating waffles"
image = pipe(prompt).images[0]

image.save("panda_photo.png")

class video_overlay(SubGraph):

    def create_graph(self, option=None):
        # create source stream
        self.inputs.append('source')
        source_stream = self.graph.input_stream('source')
        # create overlay stream
        overlay_streams = []
        for (i, _) in enumerate(option['overlays']):
            self.inputs.append('overlay_' + str(i))
            overlay_streams.append(self.graph.input_stream('overlay_' + str(i)))

        # pre-processing for source layer
        info = option['source']
        output_stream = (
            source_stream.scale(info['width'], info['height'])
                .trim(start=info['start'], duration=info['duration'])
                .setpts('PTS-STARTPTS')
        )

        # overlay processing
        for (i, overlay_stream) in enumerate(overlay_streams):
            overlay_info = option['overlays'][i]

            # overlay layer pre-processing
            p_overlay_stream = (
                overlay_stream.scale(overlay_info['width'], overlay_info['height'])
                    .loop(loop=overlay_info['loop'], size=10000)
                    .setpts('PTS+%f/TB' % (overlay_info['start']))
            )

            # calculate overlay parameter
            x = 'if(between(t,%f,%f),%s,NAN)' % (overlay_info['start'],
                                                 overlay_info['start'] + overlay_info['duration'],
                                                 str(overlay_info['pox_x']))
            y = 'if(between(t,%f,%f),%s,NAN)' % (overlay_info['start'],
                                                 overlay_info['start'] + overlay_info['duration'],
                                                 str(overlay_info['pox_y']))
            if overlay_info['loop'] == -1:
                repeat_last = 0
                shortest = 1
            else:
                repeat_last = overlay_info['repeat_last']
                shortest = 1

            # do overlay
            output_stream = (
                output_stream.overlay(p_overlay_stream, x=x, y=y,
                                      repeatlast=repeat_last)
            )

        # finish creating graph
        self.output_streams = self.finish_create_graph([output_stream])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Code Breakdown:
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Importing Required Modules:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import bmf
from bmf import bmf_sync, Packet
from bmf import SubGraph
from diffusers import StableDiffusionPipeline
import torch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;bmf&lt;/code&gt; and its components are imported to harness the functionalities of the Babit Multimedia Framework for video processing tasks.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;SubGraph&lt;/code&gt; is a class in BMF, used to create a customizable processing node.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;StableDiffusionPipeline&lt;/code&gt; is imported from the &lt;code&gt;diffusers&lt;/code&gt; library that allows the generation of images using text prompts.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;torch&lt;/code&gt; is the PyTorch library used for machine learning applications, which Stable Diffusion relies on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuring the Stable Diffusion Model:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  The Stable Diffusion model is loaded with the specified &lt;code&gt;model_id&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  The &lt;code&gt;torch_dtype&lt;/code&gt; parameter ensures the model uses lower precision to reduce memory usage.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;.to("cuda")&lt;/code&gt; moves the model to GPU for faster computation if CUDA is available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Generating an Image Using Stable Diffusion:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt = "a photo of a panda eating waffles"
image = pipe(prompt).images[0]
image.save("panda_photo.png")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  We then set a text prompt to generate an image of "a photo of a panda eating waffles".&lt;/li&gt;
&lt;li&gt;  The image is created and saved to "panda_photo.png".&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Defining a Custom BMF SubGraph for Video Overlay:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class video_overlay(SubGraph):
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;video_overlay&lt;/code&gt; class is derived from &lt;code&gt;SubGraph&lt;/code&gt;. This class will define a custom graph for video overlay operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Creating the Graph:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_graph(self, option=None):
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;create_graph&lt;/code&gt; method is where the actual graph (workflow) of the video and overlays are constructed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Processing Source and Overlay Streams:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;self.inputs.append('source')
source_stream = self.graph.input_stream('source')
overlay_streams = []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  Registers input streams for the source and prepares a list of overlay input streams.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Scaling and Trimming Source Video:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;info = option['source']
output_stream = (
    source_stream.scale(info['width'], info['height']).trim(start=info['start'], duration=info['duration']).setpts('PTS-STARTPTS'))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  The source video is scaled and trimmed according to the specified &lt;code&gt;option&lt;/code&gt;. Adjustments are made for the timeline placement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Scaling and Looping Overlay Streams:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;p_overlay_stream = (
    overlay_stream.scale(overlay_info['width'], overlay_info['height']).loop(loop=overlay_info['loop'], size=10000).setpts('PTS+%f/TB' % (overlay_info['start'])))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  Each overlay is scaled and looped as needed, providing a dynamic and flexible overlay process.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Overlaying on the Source Stream:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;output_stream = (
    output_stream.overlay(p_overlay_stream, x=x, y=y,
                          repeatlast=repeat_last))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  Overlays are added to the source stream at the calculated position and with the proper configuration. This allows multiple overlays to exist within the same timeframe without conflicts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Finalizing the Graph:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;self.output_streams = self.finish_create_graph([output_stream])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  Final output streams are set, which concludes the creation of the graph. Now, after this, it's time for us to encode the video and display it how we want to.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Applying Hugging Face Model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's add our image as an overlay to the video file. Let's break down each section of the code to explain how it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input_video_path = "/content/Bmf-Huggingface/black_and_white.mp4"
logo_path = "/content/panda_photo.png"
output_path = "./complex_edit.mp4"
dump_graph = 0

duration = 10

overlay_option = {
    "dump_graph": dump_graph,
    "source": {
        "start": 0,
        "duration": duration,
        "width": 1280,
        "height": 720
    },
    "overlays": [
        {
            "start": 0,
            "duration": duration,
            "width": 300,
            "height": 200,
            "pox_x": 0,
            "pox_y": 0,
            "loop": 0,
            "repeat_last": 1
        }
    ]
}

my_graph = bmf.graph({
    "dump_graph": dump_graph
})

logo_1 = my_graph.decode({'input_path': logo_path})['video']

video1 = my_graph.decode({'input_path': input_video_path})

overlay_streams = list()
overlay_streams.append(bmf.module([video1['video'], logo_1], 'video_overlay', overlay_option, entry='__main__.video_overlay')[0])

bmf.encode(
    overlay_streams[0],
    video1['audio'],
    {"output_path": output_path}
    ).run()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Let's break this down too
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Defining Paths and Options:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input_video_path = "/content/Bmf-Huggingface/black_and_white.mp4"
logo_path = "/content/panda_photo.png"
output_path = "./complex_edit.mp4"
dump_graph = 0
duration = 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;input_video_path&lt;/code&gt;: Specifies the file path to the input video.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;logo_path&lt;/code&gt;: File path to the image (logo) you want to overlay on the video.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;output_path&lt;/code&gt;: The file path where the edited video will be saved.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;dump_graph&lt;/code&gt;: A debugging tool in BMF that can be set to &lt;code&gt;1&lt;/code&gt; to visualize the graph but is set to &lt;code&gt;0&lt;/code&gt; here, meaning no graph will be dumped.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;duration&lt;/code&gt;: The duration in seconds for the overlay to be visible in the video.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Overlay Configuration:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;overlay_option = {
    "dump_graph": dump_graph,
    "source": {
        "start": 0,
        "duration": duration,
        "width": 1280,
        "height": 720
    },
    "overlays": [
        {
            "start": 0,
            "duration": duration,
            "width": 300,
            "height": 200,
            "pox_x": 0,
            "pox_y": 0,
            "loop": 0,
            "repeat_last": 1
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;overlay_option&lt;/code&gt;: A dictionary that defines the settings for the source video and the overlay.&lt;/li&gt;
&lt;li&gt;  For the source, the width and height you want to scale the video to, and when the overlay should start and end are specified.&lt;/li&gt;
&lt;li&gt;  For the overlays, detailed options such as position, size, and behavior (like &lt;code&gt;loop&lt;/code&gt; and &lt;code&gt;repeat_last&lt;/code&gt;) are defined.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Creating a BMF Graph:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my_graph = bmf.graph({"dump_graph": dump_graph
})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;my_graph&lt;/code&gt; is an instance of BMF graph which sets up the processing graph (pipeline), with &lt;code&gt;dump_graph&lt;/code&gt; passed as an option.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Decoding the Logo and Video Streams:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;logo_1 = my_graph.decode({'input_path': logo_path})['video']
video1 = my_graph.decode({'input_path': input_video_path})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  The video and logo are loaded and decoded to be processed. This decoding extracts the video streams to be used in subsequent steps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Creating Overlay Streams:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;overlay_streams = list()
overlay_streams.append(bmf.module([video1['video'], logo_1], 'video_overlay', overlay_option, entry='__main__.video_overlay')[0])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  An empty list &lt;code&gt;overlay_streams&lt;/code&gt; is created to hold the video layers.&lt;/li&gt;
&lt;li&gt;  The &lt;code&gt;bmf.module&lt;/code&gt; function is used to create an overlay module, where the source video and logo are processed using the &lt;code&gt;video_overlay&lt;/code&gt; class defined previously with the corresponding options.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Encoding the Final Output:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bmf.encode(
    overlay_streams[0],
    video1['audio'],{"output_path": output_path}).run()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  The final video stream, with the overlay applied, and the original audio from the input video are encoded together into a new output file specified by &lt;code&gt;output_path&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  The &lt;code&gt;.run()&lt;/code&gt; method is called to execute the encoding process.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our final output should look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--idKWpg8V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DNTg4MjMwNjMwZWQ3NGJiYTFlNDJiOGM2YWEwMGJhNjZfYlY1Q0FlYUdLaGFZM1dPeUs2aEFxcUcyUWlHNElTSE5fVG9rZW46SXZka2JuQkhyb2lCMFR4aWNkUXVpaWtBc3RkXzE3MDY3NTA2MjQ6MTcwNjc1NDIyNF9WNA" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--idKWpg8V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://bytedance.us.larkoffice.com/space/api/box/stream/download/asynccode/%3Fcode%3DNTg4MjMwNjMwZWQ3NGJiYTFlNDJiOGM2YWEwMGJhNjZfYlY1Q0FlYUdLaGFZM1dPeUs2aEFxcUcyUWlHNElTSE5fVG9rZW46SXZka2JuQkhyb2lCMFR4aWNkUXVpaWtBc3RkXzE3MDY3NTA2MjQ6MTcwNjc1NDIyNF9WNA" alt="" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thats it! We've explored a practical example of utilizing Babit Multimedia Framework (BMF) a video editing task using AI to create an image we can overlay on a video. Now you know how to set up a BMF graph, decode the input streams, create overlay modules, and finally encode the edited video with the overlay in place. In the future, I will consider adding more AI models, like one to improve the resolution, or even a model that creates a video from text. Through the power of BMF and Hugging Face open source models, users can create complex video editing workflows with overlays that can dynamically change over time, offering vast creative possibilities.&lt;/p&gt;

&lt;p&gt;Try it out on CoLab and tell us what you think:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://colab.research.google.com/drive/1eQxiZc2vZeyOggMoFle_b0xnblupbiXd?usp=sharing"&gt;https://colab.research.google.com/drive/1eQxiZc2vZeyOggMoFle_b0xnblupbiXd?usp=sharing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://discord.gg/4wXUpdrK2z"&gt;Join us on our ByteDance Open Source Discord Server!&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>ffmpeg</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Leveraging GPU Acceleration in BMF for High-Performance Video Processing</title>
      <dc:creator>Josh Alphonse</dc:creator>
      <pubDate>Thu, 25 Jan 2024 01:26:05 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/leveraging-gpu-acceleration-in-bmf-for-high-performance-video-processing-35p8</link>
      <guid>https://dev.to/bytedanceoss/leveraging-gpu-acceleration-in-bmf-for-high-performance-video-processing-35p8</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Multimedia processing and GPU acceleration have become a cornerstone for achieving high performance. Babit Multimedia Framework (BMF) harnesses this power, offering unparalleled speed and efficiency for video processing tasks. In this blog post, we'll explore how BMF utilizes GPU acceleration and provide practical examples to help you integrate this capability into your projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding GPU Acceleration in BMF
&lt;/h2&gt;

&lt;p&gt;BMF's architecture is designed to exploit the parallel processing capabilities of GPUs. This is crucial for tasks like video transcoding, real-time rendering, and applying complex filters or effects, where the computational intensity can be staggering.&lt;/p&gt;

&lt;p&gt;GPU acceleration is like a turbocharged engine in a sports car, propelling you forward at unimaginable speeds. It's all about doing more in less time. Imagine you're editing a video for your youtube channel or streaming a live esports tournament; every millisecond counts. This is where BMF's GPU prowess shines, slicing through processing times like a hot knife through butter.&lt;/p&gt;

&lt;p&gt;BMF has carried out Performance optimization in CPU and GPU heterogeneous scenarios that many FFmpeg existing solutions do not have, and enriched the Pipeline. Taking compression and super resolution scenarios as examples, after statistics, the total throughput of BMF has increased by 15%.&lt;/p&gt;

&lt;p&gt;BMF's GPU codec is inherited from FFmpeg, using GPU NVENC, NVDEC and other proprietary hardware to accelerate video codec, and using FFmpeg's CUDA filters to accelerate image preprocessing, which is no barrier for users familiar with FFmpeg. At this stage, BMF supports GPU decoding, encoding and one-to-many transcoding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Benefits of GPU Acceleration:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Speed: GPUs can process multiple operations simultaneously, drastically reducing processing time.&lt;/li&gt;
&lt;li&gt; Efficiency: Offloading intensive tasks to the GPU frees up the CPU for other operations, improving overall system performance.&lt;/li&gt;
&lt;li&gt; Scalability: As video resolutions and processing demands increase, GPUs can scale to meet these challenges.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Setting Up GPU Acceleration in BMF
&lt;/h2&gt;

&lt;p&gt;Before diving into coding, ensure your environment is set up to leverage GPU capabilities. This typically involves installing the necessary GPU drivers and libraries, like CUDA for NVIDIA GPUs. BMF documentation provides detailed setup instructions. You can use tools like co-lab or your own hardware. BMF also has the capability to run on Windows, Mac OS and Linux.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example: Basic GPU-Accelerated Video Processing
&lt;/h3&gt;

&lt;p&gt;Let's start with a simple example of GPU-accelerated video processing in BMF. This example assumes you have BMF and all necessary GPU libraries installed. If you haven't installed it yet, &lt;a href="https://babitmf.github.io/docs/bmf/getting_started_yourself/install/"&gt;click this link&lt;/a&gt; and you can install BMF based on your system set up. You can also use tools like Colab as well. If you're using GPU just make sure you meet the hardware requirements to do so.&lt;/p&gt;

&lt;p&gt;Prerequsites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Python 3.9&lt;/li&gt;
&lt;li&gt;  Cmake&lt;/li&gt;
&lt;li&gt;  ffmpeg4&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Python, C++, or Go experience&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python code
&lt;/h2&gt;

&lt;p&gt;===========&lt;/p&gt;

&lt;p&gt;In this example , BMF implements a call to the GPU codec function for video transcoding. BMF basically follows the parameters of FFmpeg, and the lines of code you'll see that are written in red are where true magic happens.&lt;/p&gt;

&lt;p&gt;First &lt;a href="https://babitmf.github.io/docs/bmf/getting_started_yourself/create_a_graph/"&gt;create a BMF Graph&lt;/a&gt; and Decode model, specify the incoming Hardware Accelerator parameter as cuda, and then you can decode the GPU.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_gpu_transcode&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Testing gpu transcoding......&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;input_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input.flv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hwaccel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, use the CUDA filter for the decoded video stream. In BMF, CUDA filters can be used serially at the same time. In this case, we used Scale cuda and Yadif cuda. Then we passed in the audio &amp;amp; video stream to build an Encode model, specifying Codec as h264_nvenc and Pix format as cuda. Once the entire pipeline is complete, call RUN to start execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;ff_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scale_cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;720&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ff_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yadif_cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h264_nvenc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pix_fmt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Full Code&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_gpu_transcode&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Testing gpu transcoding......&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;input_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input.flv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;output_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hwaccel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;ff_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scale_cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;720&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ff_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yadif_cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h264_nvenc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pix_fmt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced GPU-Accelerated Video Processing
&lt;/h2&gt;

&lt;p&gt;For more complex scenarios, BMF allows fine-tuning of GPU settings and integration with other GPU-accelerated libraries.&lt;/p&gt;

&lt;p&gt;we introduce CV-CUDA accelerated image preprocessing. In order to fully mobilize the computing power of &lt;a href="https://developer.nvidia.com/cuda-toolkit"&gt;CUDA&lt;/a&gt;, we introduced &lt;a href="https://developer.nvidia.com/cv-cuda"&gt;CV-CUDA&lt;/a&gt; in BMF, which is the acceleration operator base specially developed by Nvidia for Computer Vision applications. At this stage, it provides about 45 common high-performance operators.It provides rich API interfaces such as C/C++/Python API, supports batch input of images of different sizes at the same time, and can realize zero-copy data conversion with other deep learning frameworks, and also provides a variety of scene application examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cuda&lt;/strong&gt; &lt;strong&gt;Operators you can use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Blur&lt;/li&gt;
&lt;li&gt;  Crop&lt;/li&gt;
&lt;li&gt;  Flip&lt;/li&gt;
&lt;li&gt;  Gamma&lt;/li&gt;
&lt;li&gt;  Rotate&lt;/li&gt;
&lt;li&gt;  Scale
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_gpu_transcode&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;  &lt;span class="c1"&gt;# Start of function named 'test_gpu_transcode'
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Testing GPU transcoding...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Print out a string "Testing GPU transcoding..." in the console
&lt;/span&gt;
    &lt;span class="c1"&gt;# Variables containing the paths of the input video and the path to save the output video (transcoded one)
&lt;/span&gt;    &lt;span class="n"&gt;input_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input.flv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;    &lt;span class="c1"&gt;# Path to the video file we want to transcode
&lt;/span&gt;    &lt;span class="n"&gt;output_video_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Path to save the output video
&lt;/span&gt;
    &lt;span class="c1"&gt;# Create a BMF graph to represent a series of processing operations
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Call the 'decode' function of the created BMF graph. Input is the video file pointed by 'input_video_path'.
&lt;/span&gt;    &lt;span class="c1"&gt;# Use hardware acceleration on the GPU to decode the video (hwaccel means hardware accelerator)
&lt;/span&gt;    &lt;span class="n"&gt;video&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hwaccel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Use NVIDIA CUDA technology for hardware accelerated decoding
&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Call the 'encode' function to encode the video and audio streams.
&lt;/span&gt;    &lt;span class="c1"&gt;# The input video stream is first processed by a GPU scale module to resize it to 1280x720 pixels.
&lt;/span&gt;    &lt;span class="c1"&gt;# The encoded video will be saved to the path pointed by 'output_video_path'.
&lt;/span&gt;    &lt;span class="c1"&gt;# Use NVIDIA NVENC technology for GPU accelerated encoding,
&lt;/span&gt;    &lt;span class="c1"&gt;# and 'pix_fmt' is set to 'cuda' to let the GPU to read in the processed frames directly from its own memory.
&lt;/span&gt;    &lt;span class="n"&gt;bmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;module&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scale_gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1280x720&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;    &lt;span class="c1"&gt;# Scaling the video to dimension of 1280x720 pixels using GPU
&lt;/span&gt;        &lt;span class="n"&gt;video&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;    &lt;span class="c1"&gt;# Including the audio stream in the processed video
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Path to save the output video
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h264_nvenc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Use H.264 codec for video encoding with NVENC technology
&lt;/span&gt;                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pix_fmt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;    &lt;span class="c1"&gt;# The input video frames are in GPU memory
&lt;/span&gt;            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;# Execute the graph operations
&lt;/span&gt;
&lt;span class="c1"&gt;# Now Call the above defined function
&lt;/span&gt;&lt;span class="nf"&gt;test_gpu_transcode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;# Call the 'test_gpu_transcode' function to start the whole process
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example: Integrating AI Models for Video Enhancement
&lt;/h3&gt;

&lt;p&gt;BMF's flexibility enables the integration of AI models for tasks like super-resolution or frame interpolation. Here's an example of how you might integrate an AI model for super-resolution. &lt;a href="https://colab.research.google.com/github/BabitMF/bmf/blob/master/bmf/demo/video_enhance/bmf-enhance-demo.ipynb"&gt;Check out this example&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Sorcery with BMF and GPU Acceleration
&lt;/h2&gt;

&lt;p&gt;Let's list up some real-world scenarios where GPU-accelerated BMF works its magic:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; The Live Sports Event: Picture a live sports broadcast. With BMF's GPU acceleration, you can stream high-definition, slow-motion replays almost instantaneously. It's like having the ability to freeze time and zoom in on that crucial game-winning goal.&lt;/li&gt;
&lt;li&gt; Hollywood films: In film editing, BMF with GPU acceleration is your special effects wizard. Render stunning visual effects in a fraction of the time, bringing dragons to life or creating epic space battles that look breathtakingly real.&lt;/li&gt;
&lt;li&gt; The Viral Video Sensation: For content creators, time is of the essence. GPU-accelerated BMF is like having a superpower to edit and render viral-worthy videos in record time, ensuring you hit the trends before they fade.&lt;/li&gt;
&lt;li&gt; The Gaming Livestream: In the gaming world, live streaming with real-time effects is key. With BMF's GPU acceleration, you can stream your gameplay with high-quality graphics and overlays, keeping your audience glued to their screens.&lt;/li&gt;
&lt;li&gt; The AI-Powered Masterpiece: Dive into the future with AI-enhanced video processing. From upscaling vintage film footage to crystal-clear quality to applying real-time face filters in a video chat, BMF's GPU acceleration makes it all possible, and at lightning speeds.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;GPU acceleration in BMF opens up a world of possibilities for high-performance video processing. By leveraging the power of GPUs, developers can achieve remarkable speed and efficiency in multimedia applications. The examples provided are just a starting point -- the real potential lies in how you apply these capabilities to your unique projects.&lt;/p&gt;

&lt;p&gt;Remember, the key to successful implementation is understanding your specific processing requirements and how best to utilize BMF's GPU acceleration features to meet those needs.&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>videoprocessing</category>
      <category>ffmpeg</category>
      <category>ai</category>
    </item>
    <item>
      <title>The fastest way to use code splitting</title>
      <dc:creator>Josh Alphonse</dc:creator>
      <pubDate>Thu, 25 Jan 2024 01:20:30 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/the-fastest-way-to-use-code-splitting-1b42</link>
      <guid>https://dev.to/bytedanceoss/the-fastest-way-to-use-code-splitting-1b42</guid>
      <description>&lt;h2&gt;
  
  
  Performance and Code Splitting with Rspack
&lt;/h2&gt;

&lt;p&gt;Performance can be a big deal. For every optimization there is a sacrifice being made. As applications grow in complexity, the need for efficient resource loading becomes increasingly vital. Enter code splitting - a technique that may not be so new to you but it's revolutionized content delivery on the web. At the forefront of this revolution is Rspack, my web bundler of choice that excels at optimizing and packaging web applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Code Splitting?
&lt;/h2&gt;

&lt;p&gt;Code splitting is the process of breaking down a JavaScript bundle into smaller chunks that can be loaded on demand. This is crucial for improving load times, particularly in large-scale applications. Instead of downloading the entire JavaScript bundle upfront, users only download the necessary code for their current page or feature, significantly reducing the initial load time.&lt;/p&gt;

&lt;p&gt;In the context of Rspack, code splitting can be implemented using dynamic imports. Dynamic imports enable you to load JavaScript modules dynamically at runtime, rather than including them in the main bundle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Splitting with Rspack
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Efficient Performance:&lt;/strong&gt; Rspack is a highly performant tool. Utilizing concepts like Code Splitting and Tree Shaking, it ensures that your web application loads faster and only what's needed. Supporting HTTP/2, it has the ability to split your code into many pieces that can be loaded in parallel, drastically improving loading times.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Loaders and Plugins:&lt;/strong&gt; Rspack boasts of a variety of loaders and plugins to make the development process smooth. Loaders preprocess files, allowing you to bundle any static resource, while plugins provide a wide range of solutions such as bundle optimization, environment variable injection, and HTML generation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Import Magic Comments:&lt;/strong&gt; Rspack takes advantage of Webpack’s import syntax to provide the developer with succinct control over the chunk names, which can be useful in debugging and provides a way to control caching via customized chunk names.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Webpack Compatibility:&lt;/strong&gt; Rspack aims to maintain a high level of compatibility with Webpack's plugin and loader ecosystem. Developers familiar with Webpack's configuration can easily set up and configure Rspack. Many of Webpack’s features, such as code splitting, dynamic imports, module federation, hot module replacement, among others, are supported by Rspack as well.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Implementing Code Splitting in Rspack
&lt;/h4&gt;

&lt;p&gt;There are three primary methods for code splitting in Rspack:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Entry Points:&lt;/strong&gt; Manually split code using the entry configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SplitChunksPlugin:&lt;/strong&gt; Use this plugin to deduplicate and split chunks, extracting shared modules into a new chunk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Imports:&lt;/strong&gt; Using the &lt;code&gt;import()&lt;/code&gt; syntax for dynamic imports to split code within modules.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each method has its own configuration approach and use case, providing flexibility and control over how your assets are generated and managed. In this blog post, we are going to use the dynamic imports method.&lt;/p&gt;

&lt;p&gt;For detailed code examples and further explanation, you can refer to Rspack's code splitting official documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up Rspack/Rsbuild
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node.Js&lt;/li&gt;
&lt;li&gt;JavaScript/Framework of choice knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before diving into code splitting techniques, ensure you have Rspack installed. Rspack supports frameworks like Svelte, React, Vue, SolidJS, NestJS, and Modern.JS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm create rsbuild@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run this command, let it load and choose a framework you want to work with. For this example, I'll be using React. Rspack and React work well together and have they both have some built-in features we'll discuss later in this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploring Code Splitting Techniques in Rspack
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F871bwbt2dur5idykitkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F871bwbt2dur5idykitkw.png" alt="Image description" width="558" height="810"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a breakdown of some of the files in our project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pages/&lt;/code&gt; : This directory contains different pages of your app, such as &lt;code&gt;Home.tsx&lt;/code&gt;, and &lt;code&gt;ProductList.jsx&lt;/code&gt;.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;App.jsx&lt;/code&gt;: Where you define your primary routes and wrappers around your app.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;package.json&lt;/code&gt;: It contains metadata about the project, like the project name, version, dependencies, etc.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rsbuild.config.mjs&lt;/code&gt;: The configuration file for configuring rsbuild settings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Let's Begin
&lt;/h3&gt;

&lt;p&gt;Code splitting is a feature supported by Modern.js and it works alongside Rspack by splitting code into different "chunks". This is a crucial optimization technique which is used when bundling large applications. It works a bit differently than other frameworks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Define Routes
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;App.jsx&lt;/code&gt; will define the routing for your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Suspense&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lazy&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BrowserRouter&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Routes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Link&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react-router-dom&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./Components/Home&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ProductList&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./Components/ProductList&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt; &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Loading...&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;nav&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;ul&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;li&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Home&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;li&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;li&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"/products"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Products&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;li&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;ul&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;nav&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Routes&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Route&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt; &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Home&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Route&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"/products"&lt;/span&gt; &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ProductList&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Routes&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;App&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;React's lazy function is combined with Suspense to dynamically import the component for each route. When the Route is rendered, React will automatically load the chunk containing the corresponding component.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Home Component&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;Home.jsx&lt;/code&gt; file, import another chunk using &lt;code&gt;React.lazy&lt;/code&gt;. It will be automatically split into its own chunk by webpack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Components/Home.js
import React from 'react';

const Home = () =&amp;gt; {
  return &amp;lt;h1&amp;gt;Welcome to Our Online Store!&amp;lt;/h1&amp;gt;;
};

export default Home;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ProductList Component&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// /Components/ProductList
import React from 'react';
import { Link } from 'react-router-dom';

const products = [
  // Dummy products data
  { id: 1, name: 'Product 1' },
  { id: 2, name: 'Product 2' },
  { id: 3, name: 'Product 3' },
];

const ProductList = () =&amp;gt; {
  return (
    &amp;lt;div&amp;gt;
      &amp;lt;h1&amp;gt;Product List&amp;lt;/h1&amp;gt;
      &amp;lt;ul&amp;gt;
        {products.map((product) =&amp;gt; (
          &amp;lt;li key={product.id}&amp;gt;
            &amp;lt;Link to={`/product/${product.id}`}&amp;gt;{product.name}&amp;lt;/Link&amp;gt;
          &amp;lt;/li&amp;gt;
        ))}
      &amp;lt;/ul&amp;gt;
    &amp;lt;/div&amp;gt;
  );
};

export default ProductList;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;index.js&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your &lt;code&gt;index.js&lt;/code&gt; file is the starting point of your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import React from 'react';
import ReactDOM from 'react-dom/client';
import App from './App';
import Home from './Components/Home';

console.log('index.js')

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(
  &amp;lt;React.StrictMode&amp;gt;
    &amp;lt;App /&amp;gt;
    &amp;lt;Home/&amp;gt;
  &amp;lt;/React.StrictMode&amp;gt;,
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;rsbuild.confic.mjs&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { defineConfig } from '@rsbuild/core';
import { pluginReact } from '@rsbuild/plugin-react';

export default defineConfig({
  plugins: [pluginReact()],
  mode: 'development',
  entry: {
    index: './src/index.jsx',
  },
  output: {
    filename: '[name].bundle.js',
  },

});
module.exports.defineConfig = defineConfig;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this configuration React with Rspack will automatically split each page component (Home.jsx and ProductList.jsx) into its own chunk that gets loaded only when the corresponding route is navigated to using dynamic imports.&lt;/p&gt;

&lt;p&gt;You can see this behavior in the Network tab of your browser's developer tools when you navigate between the different pages of your app. When you switch from &lt;code&gt;/&lt;/code&gt; to &lt;code&gt;/products&lt;/code&gt;, for example, you will notice the browser loading a new JavaScript file for that page.&lt;/p&gt;

&lt;p&gt;Code splitting in React with Rspack enhances application performance by loading only necessary code chunks on demand. This way of splitting code ensures that the user only downloads the necessary code for the current page rather than all the code at once, substantially improving the load time of your application. Code splitting is just one piece of the optimization puzzle, but it's a significant one.&lt;/p&gt;

&lt;p&gt;Be sure to &lt;a href="https://discord.gg/4wXUpdrK2z"&gt;join us on our ByteDance Open Source Discord Server!&lt;/a&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
