<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: XIAOXU CHANG </title>
    <description>The latest articles on DEV Community by XIAOXU CHANG  (@xxisxuxu).</description>
    <link>https://dev.to/xxisxuxu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1101844%2F98d2ca8f-66d8-4275-bf8a-8dbb895ce332.jpg</url>
      <title>DEV Community: XIAOXU CHANG </title>
      <link>https://dev.to/xxisxuxu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xxisxuxu"/>
    <language>en</language>
    <item>
      <title>Inside AIO Sandbox (Part 1): Files &amp; Shell — The Foundations of Agent Execution</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:36:56 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/inside-aio-sandbox-part-1-files-shell-the-foundations-of-agent-execution-4pe5</link>
      <guid>https://dev.to/bytedanceoss/inside-aio-sandbox-part-1-files-shell-the-foundations-of-agent-execution-4pe5</guid>
      <description>&lt;p&gt;&lt;em&gt;by AIO Sandbox Team&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Modern AI agents are no longer just generating text—they are expected to &lt;strong&gt;write files, modify code, and execute commands&lt;/strong&gt;.&lt;br&gt;
But doing this directly on your local machine or production systems is risky and hard to control.&lt;br&gt;
This is where &lt;strong&gt;&lt;a href="https://github.com/agent-infra/sandbox" rel="noopener noreferrer"&gt;AIO Sandbox&lt;/a&gt;&lt;/strong&gt; comes in. It provides an &lt;strong&gt;isolated, programmable environment&lt;/strong&gt; where agents can safely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;create and manipulate files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;run shell commands&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;execute code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;produce artifacts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and many more...&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike a typical docker container, which often requires manual configuration for tool-chaining, the AIO Sandbox integrates a browser, a shell, and a file system into a single environment designed for AI agents. This unified architecture ensures that artifacts remain persistent and accessible across every stage of an AI-driven workflow executing within the sandbox.&lt;br&gt;
In this first post, we’ll focus on the two most fundamental capabilities:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🧩 &lt;strong&gt;Filesystem (state)&lt;/strong&gt; &lt;br&gt;
⚙️ &lt;strong&gt;Shell (execution)&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By the end, you’ll see how these combine into a complete runtime for agents.&lt;/p&gt;

&lt;h1&gt;
  
  
  🌐 Multi-language SDK Support
&lt;/h1&gt;

&lt;p&gt;While this tutorial uses Python, &lt;strong&gt;AIO Sandbox&lt;/strong&gt; &lt;strong&gt;is not limited to Python developers&lt;/strong&gt;.&lt;br&gt;
The &lt;strong&gt;agent-sandbox&lt;/strong&gt; &lt;strong&gt;SDK&lt;/strong&gt; &lt;strong&gt;also supports&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;TypeScript / JavaScript&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Go (Golang)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This makes it easy to integrate AIO Sandbox into a wide range of agent frameworks, backend services, and developer stacks.&lt;/p&gt;

&lt;h1&gt;
  
  
  🛠️ Prerequisites
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Python 3.12+&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A running AIO Sandbox instance at &lt;code&gt;http://localhost:8080&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Docker&lt;/strong&gt; &lt;strong&gt;Command&lt;/strong&gt;:&lt;br&gt;
docker run --security-opt seccomp=unconfined --rm -it -p 8080:8080 ghcr.io/agent-infra/sandbox:latest&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Python SDK installed&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;pip install agent-sandbox&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  🧠 Mental Model
&lt;/h1&gt;

&lt;p&gt;Think of AIO Sandbox as a &lt;strong&gt;remote, disposable&lt;/strong&gt; &lt;strong&gt;Linux&lt;/strong&gt; &lt;strong&gt;machine&lt;/strong&gt; that your agent controls via APIs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Filesystem → where data and artifacts live&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shell → how actions are executed&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Simple Flow:&lt;br&gt;
Agent → API → Sandbox → Filesystem + Shell&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Autonomous Data Processing &amp;amp; Validation Agent
&lt;/h1&gt;

&lt;p&gt;Rather than presenting APIs for these services in isolation, we will demonstrate the following end-to-end comprehensive agent workflow example operating within the sandbox environment.&lt;br&gt;
This example simulates an agent that executes the following workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create some data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read it (look at it)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write a script file (process.py)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;List files (see what exists)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the script&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read the output file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check if output looks correct&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Notice bad data &amp;amp; Fix the data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the script again&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read updated output&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find files created&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download final result&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What this use case demonstrates&lt;/strong&gt;&lt;br&gt;
This use case demonstrates a realistic agent loop enabled using AIO Sandbox File &amp;amp; Shell primitives:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Read → Execute → Read → Validate → Fix → Re-run → Export&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This workflow makes each AIO Sandbox File &amp;amp; Shell primitive feel purposeful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;File primitives&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;write_file&lt;/code&gt; creates data and code&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;read_file&lt;/code&gt; lets the agent inspect inputs and outputs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;list_path&lt;/code&gt; gives workspace awareness&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;replace_in_file&lt;/code&gt; lets the agent repair bad input&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;search_in_file&lt;/code&gt; validates expected output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;find_files&lt;/code&gt; discovers generated artifacts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;download_file&lt;/code&gt; exports results out of the sandbox&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Shell primitive&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;exec_command&lt;/code&gt; runs the actual processing job
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_sandbox&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Sandbox&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sandbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 1. Setup workspace
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;home_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;home_dir&lt;/span&gt;
&lt;span class="n"&gt;app_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;home_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/data_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;data_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/data.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;script_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/process.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;report_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sandbox home directory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;home_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;App directory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 2. Create raw input data (with an intentional error)
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;10
20
INVALID
40
50
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Created raw input data.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 3. Read and inspect input data
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;data_preview&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Raw input data:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_preview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 4. Write processing script
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;script_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;numbers = []

with open(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as f:
    for line in f:
        try:
            numbers.append(int(line.strip()))
        except:
            print(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Skipping invalid line:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, line.strip())

total = sum(numbers)
avg = total / len(numbers)

report = f&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s"&gt;Report Summary
--------------
Valid Count: {len(numbers)}
Total: {total}
Average: {avg}
&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s"&gt;

with open(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as f:
    f.write(report)

print(report)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Created processing script.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 5. List workspace contents
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;workspace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;recursive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Workspace contents:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;workspace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# --------------------------------------------------
# 6. Execute the processing script
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec_command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cd &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &amp;amp;&amp;amp; python3 process.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;First execution output:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exit code:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 7. Read generated report
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Generated report:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 8. Validate report contents
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_in_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Average: .*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Report validation result:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 9. Detect bad input and fix it
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;data_check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INVALID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data_check&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Detected invalid data. Fixing input file...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace_in_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;old_str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INVALID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;new_str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Read input again after fix
&lt;/span&gt;&lt;span class="n"&gt;updated_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Updated input data:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 10. Re-run the processing script
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec_command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cd &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &amp;amp;&amp;amp; python3 process.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Second execution output:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exit code:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 11. Read final report again
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;final_report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Final report:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 12. Find generated text artifacts
# --------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;app_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Discovered artifacts:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --------------------------------------------------
# 13. Download final report to local machine
# --------------------------------------------------
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Final report downloaded locally as final_report.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  🎯 Key Insight
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;AIO Sandbox gives agents a &lt;strong&gt;safe, programmable runtime&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Files → memory/state&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shell → actions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Together, they enable real-world workflows like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;code generation and execution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;data processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;automation pipelines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tool orchestration&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  🚀 What’s Next
&lt;/h1&gt;

&lt;p&gt;Thanks for reading! Hope it was helpful! This is just the beginning. In upcoming posts, we’ll dive into topics such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;🌐 Browser automation (CDP-based)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🔌 MCP tool integration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;📓 Jupyter / notebook execution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🤖 OpenClaw integration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🎯 Reinforcement learning inside sandbox&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  💬 Final Thoughts
&lt;/h1&gt;

&lt;p&gt;AIO Sandbox bridges the gap between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;“LLM that generates text”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Agent that can actually do things”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it does so safely, reproducibly, and programmatically.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Introducing AIO Sandbox, All-in-One Sandbox Environment for AI Agents</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Fri, 27 Mar 2026 02:33:45 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/introducing-aio-sandbox-all-in-one-sandbox-environment-for-ai-agents-18k0</link>
      <guid>https://dev.to/bytedanceoss/introducing-aio-sandbox-all-in-one-sandbox-environment-for-ai-agents-18k0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rz25miqt9hute1a0rjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rz25miqt9hute1a0rjg.png" alt="banner" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unchecked AI autonomy is a ticking time bomb; it’s time to pull the plug on full system unfettered access. We can no longer afford to give AI agents the 'keys to the kingdom' without oversight. The 'wild west' of AI agents running with total system control is officially over.&lt;/p&gt;

&lt;p&gt;AIO Sandbox is an open-source project designed to solve these problems. It is everything your agent needs, out of the box. No more juggling multiple services. AIO Sandbox ships a complete, pre-wired environment in a single Docker container.&lt;/p&gt;

&lt;p&gt;The AIO (All-in-One) Sandbox is a containerized environment designed for both human developers and AI agents. Its architecture is built around a "Batteries-Included" philosophy, providing a full Linux desktop-like environment inside a single Docker container.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified Environment:&lt;/strong&gt; One Docker container with shared filesystem. Files downloaded in the browser are instantly accessible in Terminal and VSCode.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Out of the Box:&lt;/strong&gt; Built‑in VNC browser, VS Code, Jupyter, file manager, and terminal—accessible directly via API/SDK.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent-Ready:&lt;/strong&gt; Pre-configured MCP Server with Browser, File, Terminal, Markdown, Ready-to-use for AI agents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Developer Friendly:&lt;/strong&gt; Cloud-based VSCode with persistent terminals, intelligent port forwarding, and instant frontend/backend previews.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Secure Execution:&lt;/strong&gt; Isolated Python and Node.js sandboxes. Safe code execution without system risks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Production Ready:&lt;/strong&gt; Enterprise-grade Docker deployment. Lightweight, scalable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Calling all AI agent developers!&lt;/strong&gt; How are you securing your builds? Let’s try running your agent in AIO Sandbox and compare notes.&lt;/p&gt;

&lt;p&gt;AIO Sandbox is open-sourced under the Apache License 2.0. Contributions welcome.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/agent-infra/sandbox" rel="noopener noreferrer"&gt;https://github.com/agent-infra/sandbox&lt;/a&gt;&lt;br&gt;
Official website: &lt;a href="https://sandbox.agent-infra.com/" rel="noopener noreferrer"&gt;https://sandbox.agent-infra.com/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Kitex/Hertz Empowers LLMs: A Retrospective of Key Features on Its Third Anniversary</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Thu, 23 Jan 2025 08:24:37 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/kitexhertz-empowers-llms-a-retrospective-of-key-features-on-its-third-anniversary-3aip</link>
      <guid>https://dev.to/bytedanceoss/kitexhertz-empowers-llms-a-retrospective-of-key-features-on-its-third-anniversary-3aip</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;By Yang Rui from CloudWeGo Team&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Github:&lt;a href="https://github.com/cloudwego" rel="noopener noreferrer"&gt;https://github.com/cloudwego&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It has been three years since CloudWeGo's open-source journey began. Adhering to the principle of &lt;strong&gt;Internal and External Consistency&lt;/strong&gt;, we have continuously iterated on our open-source repository, releasing features that served ByteDance internally to the external world. From 2023 to 2024, Kitex/Hertz focused on&lt;strong&gt; LLM back-end services&lt;/strong&gt;, &lt;strong&gt;user experience&lt;/strong&gt;, and &lt;strong&gt;performance&lt;/strong&gt;, aiding the rapid development of new business scenarios while continuously optimizing user experience and performance. Meanwhile, Kitex/Hertz has been widely adopted by external enterprises and attracted numerous external developers, enhancing our CloudWeGo ecosystem all the way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowzv71qkfxama8gquemg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowzv71qkfxama8gquemg.png" alt="keywords" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article summarizes the presentation "&lt;strong&gt;Kitex/Hertz Empowers LLMs: A Retrospective of Key Features on The Third Anniversary&lt;/strong&gt;". It introduces the significant features of Kitex/Hertz over the past year, aiming to assist enterprise users and community developers in better applying Kitex/Hertz to build their microservices systems in their projects.&lt;/p&gt;

&lt;h1&gt;
  
  
  Enhanced Streaming Capabilities to Support LLMs
&lt;/h1&gt;

&lt;p&gt;With the rapid development of LLMs and ByteDance's AI applications, &lt;strong&gt;streaming communication&lt;/strong&gt; has emerged as the primary communication mode for LLM application services. To better support business growth, we have optimized streaming communication in microservices in terms of stability, engineering practices, and performance over the past year.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous Streaming Capabilities of Kitex/Hertz
&lt;/h2&gt;

&lt;p&gt;Both Kitex and Hertz support streaming scenarios. Kitex supports gRPC with better performance than the official gRPC and aligns with its functionality. Hertz supports HTTP Chunked Transfer Encoding and WebSocket. However, these capabilities were insufficient to support the rapid development of LLM internally at ByteDance due to several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;More SSE Applications on Clients&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before MultiModal Machine Learning Models, LLM applications were mainly text-based dialogue scenarios, often using the SSE protocol to return server results to clients in real-time. Text push scenarios are simpler, requiring only a browser-friendly, straightforward protocol.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The Burden of Transitioning from Thrift to Protobuf&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although gRPC (Protobuf) is commonly used for streaming communication in RPC scenarios, and Kitex also supports gRPC, ByteDance's server-side services primarily use Thrift IDLs. Developers are more familiar with Thrift, and there are not many services using gRPC protocols. However, as the demand for streaming increases, we need to reduce the cognitive burden on developers during the transition based on internal realities. Additionally, widely increasing services defined by Protobuf is not conducive to unified IDL/interface management.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lack of Engineering Practices&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to the PingPong model of one-send-one-receive, streaming communication adds complexity in service governance and engineering practices. The industry lacks accumulated engineering practices for streaming communication. Streaming interfaces can be easily misused, affecting service stability. From an observability perspective, there is no definition for streaming monitoring. &lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Capabilities – SSE/Thrift Streaming
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hertz SSE
&lt;/h3&gt;

&lt;p&gt;SSE (Server-Send Events) is based on the HTTP protocol, supporting unidirectional data push from the server to the client. Its advantages include simplicity, ease of use, and developer-friendly, making it suitable for text transmission and meeting the basic communication needs of text dialogue models. Compared to WebSocket, SSE is lighter. For text-based dialogue LLM applications, the server only needs to push data to the client without handling the complexity of bidirectional communication. However, in voice dialogue scenarios, WebSocket, which is also browser-friendly, is more suitable. SSE can define different event types and process data on the client based on event types. In LLM applications, this can be used to distinguish different types of response data (e.g., partial outputs, error messages, status updates).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruydqzlq63w8efh2i335.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruydqzlq63w8efh2i335.png" alt="Hertz SSE" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, SSE is not suitable for the server side due to the following reasons: high computational and transmission performance requirements on the server side, unsuitability for inefficient text protocols, JSON's simplicity but unsuitability for complex server-side interaction scenarios, preference for strongly typed RPC, and the need for bidirectional streaming communication in certain cases.&lt;/p&gt;

&lt;p&gt;Therefore, considering ByteDance's internal needs, we choose to support Thrift Streaming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kitex Thrift Streaming
&lt;/h3&gt;

&lt;p&gt;Streaming communication is used not only in &lt;strong&gt;LLM&lt;/strong&gt; applications but also in other business. For example, &lt;strong&gt;Douyin Search&lt;/strong&gt; aims to improve performance by RPC streaming results. During the video packaging stage, it retrieves information related to recalled video IDs, hoping to bundle services (10 docs) in one request and return the first completed package. In the &lt;strong&gt;Lark People&lt;/strong&gt; data export scenario, data is retrieved concurrently. If all data is filled into an Excel sheet before returning, excessive data can lead to OOM (Out of Memory), causing process exception terminated. Enhancing streaming capabilities not only supports the rapid development of LLMs but also meets the development needs of other business scenarios. &lt;/p&gt;

&lt;p&gt;Although Kitex supports gRPC, we recommend using Thrift internally. Supporting diversity can meet various needs. However, it is best for a company to establish a best practice to minimize the burden on developers' choices, and the tool chain system will also be more supportive. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming Protocols&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Within ByteDance, traffic control for Streaming protocols mainly relies on Service Mesh. However, to quickly support implementation without relying on Service Mesh's support for new protocols, Kitex first supported Thrift Streaming based on gRPC (HTTP2). Since the official gRPC protocol specification supports extending content-type, the implementation is &lt;strong&gt;based on gRPC's RPC communication specification, changing Protobuf encoding to Thrift encoding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zxhb469s8blvt5okm90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zxhb469s8blvt5okm90.png" alt="stream 1" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thrift over gRPC&lt;/strong&gt; began its Alpha at ByteDance in December 2023 and was officially released in Kitex &lt;strong&gt;v0.9.0&lt;/strong&gt; in March 2024. It is now widely used internally, with usage instructions available on the official website.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service Mesh Compatibility: Based on HTTP2 transmission, no separate support is required from Service Mesh.&lt;/li&gt;
&lt;li&gt;Low Support Cost: The decoding type is explicitly determined based on SubContentType (an extension supported by gRPC protocol specification.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High Resource Consumption: Flow control and dynamic windows introduce additional overhead.&lt;/li&gt;
&lt;li&gt;Significant Latency Impact: Flow control can significantly degrade latency with heavier traffic or larger packets, requiring users to adjust WindowSize.&lt;/li&gt;
&lt;li&gt;Difficult Troubleshooting: Increased complexity also raises the difficulty of troubleshooting.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Thrift over gRPC can be quickly implemented. However, from the perspectives of performance and troubleshooting, we have developed a Streaming protocol (Streaming over TTHeader) to simplify streaming communication. It is currently under internal debugging and trials, with an expected release in November-December 2024. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to Define Streaming in Thrift&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users familiar with Thrift know that native Apache Thrift does not support the definition of streaming interfaces. Adding new keywords would make other Thrift parsing tools, including IDE plugins, incompatible. Therefore, defining streaming types for Thrift's RPC methods through annotations ensures parsing compatibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;streaming.mode="bidirectional": Bidirectional Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;streaming.mode="client": Client Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;streaming.mode="server": Server Streaming&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbaym2pgg0vqla8mt2e2q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbaym2pgg0vqla8mt2e2q.png" alt="streaming in Thrift" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both the currently supported Thrift Streaming over gRPC and the upcoming Thrift Streaming over TTHeader use this method to define streaming methods. The client-side will provide options to specify which Streaming protocol to use, while the server-side will support multiple protocols through protocol detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Generalized Streaming Invocation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If SSE is used for streaming communication on clients and Thrift Streaming is used on servers, how does the overall communication from clients to servers work?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmi5ysmos9ff8hpi3f1ha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmi5ysmos9ff8hpi3f1ha.png" alt="SSE-thrift streaming" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Taking the internal text dialogue model as an example, the traffic undergoes protocol conversion after passing through the API gateway, and the server uses the Server Streaming type to push data to the client.&lt;/p&gt;

&lt;p&gt;An important capability here is protocol conversion. Additionally, pressure testing and interface testing platforms need to dynamically construct data to test server services.&lt;/p&gt;

&lt;p&gt;Users of Kitex know that Kitex provides generalized invocation for Thrift protocols, primarily supporting such general services. Previously, internal microservices were mainly Thrift PingPong services. Kitex provided generalized invocation for Map, JSON, HTTP data types, as well as binary generalized invocation for traffic forwarding.&lt;/p&gt;

&lt;p&gt;Therefore, for streaming interfaces, Kitex has added support for generalized streaming invocation. Compared to PingPong generalized interfaces, generalized streaming requires separate interfaces for the three streaming types.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PingPong/Unary Generalized Invocation Interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvscacrfgcmhobiv9xt2o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvscacrfgcmhobiv9xt2o.png" alt="Image description" width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Streaming Generalized Invocation Interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0xj5j9fvd4kqizdk5oc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0xj5j9fvd4kqizdk5oc.png" alt="Image description" width="650" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Currently, support for the mainstream &lt;strong&gt;JSON &lt;/strong&gt;data type is complete, and other data types will be supported based on business needs in the future. (Since the Kitex Streaming v2 interface is yet to be released, and to avoid affecting the user experience of generalized streaming, this support has not been officially announced, but the functionality is ready. Users can visit the generalized invocation section on the official website for English documents.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9tprg1meh8ji0h15kjo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9tprg1meh8ji0h15kjo.png" alt="Image description" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  User Experience of Streaming Capability
&lt;/h2&gt;

&lt;p&gt;Although we have improved the basic cases for streaming and introduced the streaming capabilities that Kitex/Hertz has supported in the past, newly supported, and will soon release, do developers who have worked on streaming interfaces, including those using other frameworks like the official gRPC, know how to properly use streaming interfaces and how to locate issues when they arise? &lt;/p&gt;

&lt;p&gt;Within ByteDance, as streaming services have evolved, we've noticed a significant increase in feedback issues. On one hand, compared to Thrift PingPong, our support at the basic capability level is still incomplete. On the other hand, developing streaming interfaces requires a deep understanding of proper usage; otherwise, misuse can easily lead to problems. &lt;/p&gt;

&lt;p&gt;Therefore, in 2024, we initiated a &lt;strong&gt;Streaming Optimization Project&lt;/strong&gt;, sorted through various issues, and optimized them one by one. In terms of user experience, some issues are related to streaming interface definitions. After comprehensive consideration, we decided to shed the streaming burden and release the Streaming v2 interface. Below are some of the existing issues and ongoing optimizations. It's difficult to enforce proper usage of streaming interfaces solely from the framework level. Therefore, we will release usage specifications and best practices for streaming interfaces to help users develop high-quality streaming interfaces. If you have better suggestions for streaming usage, we welcome your feedback! &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblf4ongdwige20hskpzz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblf4ongdwige20hskpzz.png" alt="Image description" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Taking streaming observability as an example, previously, streaming interface monitoring was not defined separately, which reuse PingPong reporting, resulting in only overall stream reporting information and &lt;strong&gt;lacking Recv/Send monitoring&lt;/strong&gt;. Therefore, when supporting Thrift Streaming, StreamSend &amp;amp; StreamRecv events were added, with the framework recording the time of occurrence and the size of user-transmitted data. For custom Tracer reporting by enterprise users, it only requires implementing the &lt;a href="https://github.com/cloudwego/kitex/blob/v0.9.1/pkg/rpcinfo/tracer.go#L31" rel="noopener noreferrer"&gt;rpcinfo.StreamEventReporter&lt;/a&gt; interface. Kitex will call this interface after each Recv and Send execution, allowing access to the event information for this Recv and Send. Below is the Trace information for Send/Recv within a Stream.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F595jxrg8bgivzi18p22l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F595jxrg8bgivzi18p22l.png" alt="Image description" width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Review of New Features, User Experience/Performance Improvements
&lt;/h1&gt;

&lt;p&gt;While specialized support and optimization for streaming capabilities have been conducted over the past year, we have also provided other new features to meet user needs, enhance user experience, and continue to improve framework performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  New Features – Thrift/gRPC Multi-Services
&lt;/h2&gt;

&lt;p&gt;The official gRPC framework supports multi-services, but previous versions of Kitex did not, mainly to align with Thrift usage. &lt;strong&gt;Thrift's limitation arises from supporting multi-services introducing protocol incompatibility changes&lt;/strong&gt;, impacting users. Within ByteDance, the TTHeader protocol is widely used, so we decided to transmit the IDL Service Name via TTHeader to solve the issue of Thrift not supporting multi-services.&lt;/p&gt;

&lt;p&gt;Kitex v0.9.0 officially supports &lt;strong&gt;registering multiple IDL Services within one Server&lt;/strong&gt;, including Thrift and Protobuf. Thrift provides true multi-service functionality at the protocol level based on TTHeader, while being compatible with the old CombineService.&lt;/p&gt;

&lt;p&gt;Here is a briefly introduction of Combine Service. Kitex previously provided a pseudo-multi-service feature, Combine Service, to address the issue of excessively large IDLs (leading to large code outputs and slow compilation speeds). It allows the server to split one IDL Service into multiple ones, but requires that the multiple IDL Services cannot have methods with the same name (since the protocol does not support multi-services, method routing cannot be done). Ultimately, Kitex merges multiple IDL Services into one Service, hence the name CombineService. &lt;/p&gt;

&lt;p&gt;With Kitex's new multi-service support, the server &lt;strong&gt;can not only register multiple IDL Services, but also provide both Thrift and Protobuf interfaces simultaneously&lt;/strong&gt;. For example, using Kitex-gRPC (Protobuf) but wanting to switch to Thrift Streaming while ensuring compatibility with old interface traffic, two types of IDL interfaces can be provided for transition.&lt;/p&gt;

&lt;p&gt;Below is an example of registering multiple services on the server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnz3pzcdqh1p1lsbmd2ah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnz3pzcdqh1p1lsbmd2ah.png" alt="Image description" width="596" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  New Features – Mixed Retry
&lt;/h2&gt;

&lt;p&gt;Kitex previously provided two retries: &lt;strong&gt;Failure Retry&lt;/strong&gt; and &lt;strong&gt;Backup Request&lt;/strong&gt;. Failure Retry can improve success rates (enhancing service SLAs), but most are timeout retries, leading to increased latency; Backup Request can reduce request latency, but if there is a failed return, it terminates retries.&lt;/p&gt;

&lt;p&gt;In internal practice, businesses generally express a desire to &lt;strong&gt;have both retries&lt;/strong&gt;, offering advantages to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Optimize the overall retry latency of Failure Retry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improve the request success rate of Backup Request&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, Kitex supports Mixed Retry in v0.11.0, a hybrid retry function combining Failure Retry and Backup Request functions. &lt;/p&gt;

&lt;p&gt;To facilitate understanding the differences between the three retries, here is a scenario: assume the first request takes 1200ms, the second request takes 900ms, with RPCTimeout configured to 1000ms, MaxRetryTimes to 2, and BackupDelay to 200ms. &lt;/p&gt;

&lt;p&gt;Comparing the results of the three retries:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs214jl0zournhwoghpt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs214jl0zournhwoghpt7.png" alt="Image description" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Mixed Retry: &lt;strong&gt;Success&lt;/strong&gt;, cost &lt;strong&gt;1100ms &lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Failure Retry: &lt;strong&gt;Success&lt;/strong&gt;, cost &lt;strong&gt;1900ms &lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backup Retry: &lt;strong&gt;Failure&lt;/strong&gt;, cost 1000ms &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  User Experience - Frugal &amp;amp; FastCodec (Thrift)
&lt;/h2&gt;

&lt;p&gt;Both Frugal and FastCodec (Thrift) are high-performance Thrift serialization tools provided by Kitex. Frugal's advantage over FastCodec is that it does not require code generation, significantly addressing the issue of excessively large outputs.&lt;/p&gt;

&lt;p&gt;But still two cons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Both Frugal and FastCodec decoding must rely on packets with headers. If it's a Thrift Buffered packet, it will fallback to Apache Codec. Users need to be clear about the received protocol; otherwise, using Frugal cannot completely eliminate code generation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Frugal is based on JIT implementation, with x86 support completed, but ARM just provides a fallback strategy with poor performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Addressing the protocol binding issue, the new version supports SkipDecode. Test results show that SkipDecode + FastCodec still outperforms Apache Thrift Codec. &lt;/p&gt;

&lt;p&gt;For the Frugal ARM issue, new reflection support is provided, eliminating the need for separate support for different architectures. Although using reflection, bypassing type checks within reflection achieves higher performance. Test results are slightly better compared to JIT.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguosvokvr8wt1p8cpx55.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguosvokvr8wt1p8cpx55.png" alt="Image description" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  User Experience - Output Reduction and Generation Speed Optimization
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Large output size, slow output generation,and compilation speeds&lt;/strong&gt; are significant pain points for services with longer iterations within ByteDance. Therefore, Kitex provides various optimization methods to reduce output size and improve output generation speed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;IDL Trimming&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A complex IDL with longer iterations contains many obsolete struct definitions. Actively cleaning up these unnecessary definitions can also increase development burden. The trimming tool supports generating code based on the struct definitions required by RPC methods. Users can also specify which methods to generate. According to pilot projects in large ByteDance repositories, &lt;strong&gt;generation time is halved, and output size is reduced by over 60%&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Usage: &lt;em&gt;$ kitex -module xx -thrift **trim_idl &lt;/em&gt;&lt;em&gt;xxxx.thrift &lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Example effect: In the example below, the trimming tool deleted 60,000 unused structs and 530,000 fields.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fel7386tqnleqsdnf9q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fel7386tqnleqsdnf9q.png" alt="Image description" width="800" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;no_fmt Speedup&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After code output generated, the code is formatted by default to improve readability, but users rarely care about that. Therefore, users can disable the fmt option to improve generation speed. &lt;/p&gt;

&lt;p&gt;Usage: &lt;em&gt;$ kitex -module xx -thrift **no_fmt &lt;/em&gt;&lt;em&gt;xxxx.thrift &lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Effect: The P90 generation time for a certain platform within ByteDance decreased from &lt;strong&gt;80s to 20s&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Removing Unnecessary Codes from Kitex&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kitex defaults to generating the full Apache Thrift code, but in reality, only the Codec part is used in fallback scenarios, and the rest of the code is not needed. &lt;/p&gt;

&lt;p&gt;Therefore, Kitex v0.10.0 defaults to removing the Thrift Processor and can remove all Apache Thrift code via parameter specification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kitex -module xxx -thrift no_default_serdes xxx.thrift
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage: &lt;em&gt;$ kitex -module xxx -thrift no_default_serdes xxx.thrift&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Effect: Output size is reduced by about &lt;strong&gt;50%+&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Frugal Slim Extreme Reduction&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Usage: $ kitex -thrift frugal_tag,&lt;strong&gt;template=slim&lt;/strong&gt; -service p.s.m idl/api.thrift, using Frugal for Thrift serialization.&lt;/p&gt;

&lt;p&gt;Effect: Output size is reduced by about &lt;strong&gt;90%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  User Experience - kitexcall
&lt;/h2&gt;

&lt;p&gt;Although RPC calls are simpler and more convenient than HTTP, they are not convenient to &lt;strong&gt;test&lt;/strong&gt;, requiring tools to generate code first and then construct request data. Previously mentioned testing platforms use generalized invocations to construct request data without relying on generated code, but the cost of using generalized invocations is not low. Users must first understand the usage and data construction of the method.&lt;/p&gt;

&lt;p&gt;To &lt;strong&gt;improve testing convenience&lt;/strong&gt;, based on Kitex &lt;strong&gt;JSON generalized invocations&lt;/strong&gt;, a separate command tool - kitexcall is provided, allowing users to initiate Thrift tests using JSON data. (This feature is supported by community contributions; thanks here!) &lt;/p&gt;

&lt;p&gt;Usage: &lt;em&gt;$ kitexcall -idl-path echo.thrift -m echo -d '{"message": "hello"}' -e 127.0.0.1:8888&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Future optimization plans:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Graphical interface for more convenient testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for gRPC testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No need to specify IDL, using server reflection to obtain IDL information&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Optimization – Thrift On-Demand Serialization
&lt;/h2&gt;

&lt;p&gt;As business iterations make IDL definitions increasingly complex, upstream services in production may &lt;strong&gt;only need partial fields&lt;/strong&gt; but need to serialize and transmit all of those, introducing additional performance overhead. Considering this issue, Kitex supports on-demand serialization for Thrift.&lt;/p&gt;

&lt;p&gt;Reference Protobuf to provide a Thrift FieldMask feature, allowing users to select encoding fields and optimize serialization and transmission overhead. &lt;/p&gt;

&lt;p&gt;For example, below, only the Foo field is encoded and returned, ignoring the Bar field:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cep3j501jr6s8tk25t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cep3j501jr6s8tk25t8.png" alt="Image description" width="668" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;User constructs Bar data, but annotate the Foo field, and the framework will only encode Foo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lb186aeiajf5als1885.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lb186aeiajf5als1885.png" alt="Image description" width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It also supports specifying required fields by the opposite end; for specific usage, see the on-demand serialization documentation on the official website.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Optimization – Thrift Memory Allocation Optimization
&lt;/h2&gt;

&lt;p&gt;Kitex continuously monitors RPC performance. In the current context of high cost pressures, we are deeply exploring more optimizations. Routine optimizations on hot paths have all been done, but further ones are less conventional. v0.10.0 released new optimizations focusing on memory allocation and GC.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Span Cache&lt;/strong&gt;: Optimizes String/Binary decoding costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-allocates memory, reducing mallocgc calls&lt;/li&gt;
&lt;li&gt;Reduces the actual number of generated objects -&amp;gt; lower GC costs&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Centralized memory allocation for container fields&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Similarly, changes from separate memory allocation for each element to centralized approach&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Span Cache can optimize CPU but increases memory usage. To avoid impacting services with small memory specifications, it is not enabled by default and requires user to turn on:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbwlxb6hrpv12rm4ui1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbwlxb6hrpv12rm4ui1q.png" alt="Image description" width="658" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Effect: Under extreme testing, throughput is increased by about 10%, and latency is reduced by about 30%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Analysis Tool
&lt;/h2&gt;

&lt;p&gt;The received objects in RPC/HTTP are constructed, memory-allocated, and value-assigned by the framework before being returned to the user. However, if the user's code holds onto these objects indefinitely, it can lead to memory leaks. While pprof heap can indicate where memory is allocated, it cannot show where references are made. So, how do we determine &lt;strong&gt;who is referencing a Go object&lt;/strong&gt;? &lt;/p&gt;

&lt;p&gt;In fact, GC scans and marks objects, capturing reference relationships. By combining this with variable names and type information, we can analyze the referencing situation of objects. Leveraging Delve, we have developed the &lt;strong&gt;goref &lt;/strong&gt;object reference analysis tool, which was open-sourced in July (&lt;em&gt;github.com/cloudwego/goref&lt;/em&gt;). This addresses the limitation of Go's native tools in analyzing memory references, aiding Go developers &lt;strong&gt;in quickly identifying memory leaks&lt;/strong&gt; and enhancing the Go tooling ecosystem.&lt;/p&gt;

&lt;p&gt;For instance, the Heap Profile of pprof shown in the following image reveals that the currently referenced objects are primarily allocated within FastRead (Kitex's deserialization code). It is normal for decoding to allocate memory for construct data, but this flame graph offers limited help in troubleshooting as allocated memory addresses are often not the source of memory leaks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8qwfmihw4ds3mfjwar2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8qwfmihw4ds3mfjwar2.png" alt="Image description" width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, using the goref tool yields the following result: mockCache holds an RPC Resp, preventing memory from being released. The issue is immediately apparent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0yxyrlv8ga7fo8nsgmxn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0yxyrlv8ga7fo8nsgmxn.png" alt="Image description" width="800" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion and Outlook
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Enhancing Streaming Capabilities to Support LLMs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Streaming capabilities provided by Kitex/Hertz: gRPC, HTTP 1.1 Chunked, WebSocket, SSE, Thrift Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SSE &amp;lt;-&amp;gt; Thrift Streaming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generalized streaming invocations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Streaming capability optimizations to enhance user experience and engineering practices&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Review of New Features, User Experience, and Performance Improvements&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;New Features: Thrift/gRPC multi-service support, Mixed Retry&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;User Experience: Frugal/FastCodec, streamlined outputs, generation speed optimizations, kitexcall&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance Optimization: Thrift on-demand serialization, memory allocation improvement&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory Analysis Tool: goref&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Outlook
&lt;/h2&gt;

&lt;p&gt;In the coming year, we will continue to enhance streaming capabilities and optimize the user experience. We will provide usage guidelines for streaming interfaces to help users better develop their streaming services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Release Kitex Streaming v2 interface to address historical issues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Release TTHeader Streaming for better performance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Engineering practices: graceful shutdown, retries, timeout control&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Release streaming-related specifications: error handling, interface usage guidelines&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Furthermore, we will consider strengthening the streaming ecosystem, such as enriching generalized streaming invocations and providing more gateway-friendly support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SSE &amp;lt;-&amp;gt; Thrift Streaming(HTTP2 and TTHeader Streaming)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;WebSocket &amp;lt;-&amp;gt; Thrift Streaming (HTTP2 and TTHeader Streaming)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Binary and Map generalized invocations for Streaming&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Special announcement: Kitex plans to gradually remove Apache Thrift-generated code in future versions. Due to incompatible changes in Apache Thrift v0.14, Kitex is forced to be locked-in Apache Thrift v0.13. To resolve this, Kitex will eliminate its dependency on Apache Thrift.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>backend</category>
      <category>performance</category>
      <category>application</category>
    </item>
    <item>
      <title>KubeAdmiral: next-generation multi-cluster orchestration engine based on Kubernetes</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Mon, 15 Apr 2024 09:33:34 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/kubeadmiral-next-generation-multi-cluster-orchestration-engine-based-on-kubernetes-2d0b</link>
      <guid>https://dev.to/bytedanceoss/kubeadmiral-next-generation-multi-cluster-orchestration-engine-based-on-kubernetes-2d0b</guid>
      <description>&lt;p&gt;Project link: &lt;a href="https://github.com/kubewharf/kubeadmiral"&gt;https://github.com/kubewharf/kubeadmiral&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since its release in 2014, Kubernetes has become the de facto standard for cloud native orchestration and scheduling systems, delivering substantial value to infrastructure developers around the world. &lt;/p&gt;

&lt;p&gt;As an increasing number of corporations embrace cloud native technologies and migrate their workloads to Kubernetes, the scale of their clusters grows rapidly. &lt;/p&gt;

&lt;p&gt;The community edition of Kubernetes, capped at 5000 nodes per cluster, is no longer able to keep up with the scale requirements of large-scale enterprise applications. Moreover, many companies are adopting multi-cloud architectures to achieve cost reduction, increased resource and operational efficiency, geographical disaster recovery, and environment isolation. &lt;/p&gt;

&lt;p&gt;As a result, the demand for multi-cluster orchestration and scheduling tools is on the rise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Brief History of Kubernetes at ByteDance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dedicated Clusters
&lt;/h3&gt;

&lt;p&gt;In the early years of ByteDance’s cloud native adoption, each business line operated in separate dedicated clusters due to isolation concerns. However, this led to low resource elasticity and efficiency, observed in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each business line had to maintain independent resource buffers for scaling and upgrading.&lt;/li&gt;
&lt;li&gt;Applications were tightly coupled to specific clusters, and manual resource transfer was required to balance resource utilization as applications scale.&lt;/li&gt;
&lt;li&gt;SRE teams had to deeply understand both the businesses and the clusters in order to manage resources efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Consequently, this resulted in inefficient resource management and suboptimal overall deployment rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  KubeFed v2
&lt;/h3&gt;

&lt;p&gt;To address these challenges, the technical infrastructure team at ByteDance started exploring cluster federation based on KubeFed v2 in 2019. The goal is to pool resources across business lines, reduce unnecessary buffers, and improve the efficiency of resource management. &lt;/p&gt;

&lt;p&gt;KubeFed v2 introduces the concept of host and member clusters. Users create federated workloads (e.g. FederatedDeployment) in the host cluster, and KubeFed schedules and dispatches workloads in the member clusters based on these federated workloads. Each federated workload contains three primary fields: &lt;strong&gt;Template&lt;/strong&gt; (specifying the workload to be dispatched to member clusters), &lt;strong&gt;Placement&lt;/strong&gt; (designating target member clusters), and &lt;strong&gt;Overrides&lt;/strong&gt; (indicating how the template should be varied in some clusters). For example, the following FederatedDeployments instructs KubeFed to create a Deployment in cluster1 and cluster2 with 2 and 3 replicas respectively.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: types.kubefed.k8s.io/v1beta1
kind: FederatedDeployment
metadata:
  name: test-deployment
spec:
  template:
    metadata:
      labels:
        app: nginx
    spec:
      replicas: 5
      # more Deployment fields...
  placement:
    clusters:
    - name: cluster1
    - name: cluster2
  overrides: 
  - clusterName: cluster1
    clusterOverrides:
    - path: /spec/replicas
      value: 2
  - clusterName: cluster2
    clusterOverrides:
    - path: /spec/replicas
      value: 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Deployment and ReplicaSet, KubeFed supports dividing the desired replicas across multiple clusters based on ReplicaSchedulingPreference (RSP). Users can configure the weights, minimum replicas, and maximum replicas for each cluster, and the RSP controller computes a valid replica distribution and updates the Placement and Overrides fields of FederatedDeployment or FederatedReplicaSet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7byfqe4yygyjdu1454s.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7byfqe4yygyjdu1454s.jpeg" alt="RSP Scheduling" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;RSP Scheduling (Image credit: &lt;a href="https://www.kubernetes.org.cn/5702.html"&gt;https://www.kubernetes.org.cn/5702.html&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;KubeFed laid the foundation of Kubernetes cluster federation at ByteDance. However, we soon found KubeFed unable to meet our production requirements. The primary pain points were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Uneven resource utilization across clusters – KubeFed’s RSP only supports static cluster weights and lacks the ability to adapt to fluctuations in cluster resources dynamically.&lt;/li&gt;
&lt;li&gt;Service disruption after rescheduling – During rescheduling, replicas might be abruptly migrated between clusters, disrupting service availability.&lt;/li&gt;
&lt;li&gt;Limitations in scheduling semantics – KubeFed supports stateless, replica-based resources through RSP, but lacks support for more diverse resources such as stateful workloads and jobs. Moreover, extending the existing scheduling semantics is difficult.&lt;/li&gt;
&lt;li&gt;High onboarding cost – KubeFed requires the creation of federated objects and is incompatible with the native Kubernetes API. Users and downstream platforms need to completely overhaul their usage patterns.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  KubeAdmiral
&lt;/h2&gt;

&lt;p&gt;With the evolution of cloud native infrastructure at ByteDance, we raised our standards for efficiency, scalability, performance, and cost. Meanwhile, the size and number of our Kubernetes clusters continue to grow phenomenally along with the businesses. Additionally, workloads beyond stateless microservices, including stateful services, storage, offline and machine learning jobs, started embracing cloud native technologies. Against this backdrop, the limitations of KubeFed became increasingly difficult to manage. Therefore, at the end of 2021, we began our endeavor to develop the next generation cluster federation system, building upon KubeFed v2’s foundation. We named it KubeAdmiral to capture our aspiration for it to manage multiple clusters as effectively as a seasoned navy admiral commands a fleet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dlvg73bei6xvpersq6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dlvg73bei6xvpersq6.jpeg" alt="Timeline of Kubernetes at ByteDance" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Timeline of Kubernetes at ByteDance&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;KubeAdmiral offers enhanced multi-cluster orchestration and scheduling capabilities for various mainstream business scenarios. Today at ByteDance, KubeAdmiral manages more than 100,000 microservices with more than 10,000,000 pods running on dozens of federated Kubernetes clusters. It supports upwards of 30,000 upgrade and scaling operations daily, and maintains a stable deployment rate of 95-98% without the need for manual intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  KubeAdmiral Feature Highlight
&lt;/h2&gt;

&lt;p&gt;KubeAdmiral not only supports native Kubernetes resources and third-party custom resources, but also offers a rich and extensible scheduling framework. Moreover, it refines numerous aspects of scheduling and dispatching, backed by years of practical production experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr2n36rj7xbnk50zryre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr2n36rj7xbnk50zryre.png" alt="KubeAdmiral architecture diagram" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;KubeAdmiral architecture diagram&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rich Multi-Cluster Scheduling Capabilities&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The scheduler is a core component of KubeAdmiral responsible for computing the desired placement of workloads in member clusters. When scheduling replica-based workloads, it also computes the appropriate replicas for each cluster. Functioning as KubeAdmiral’s “brain”, its decisions directly impact critical aspects such as fault tolerance, resource efficiency, and stability.&lt;/p&gt;

&lt;p&gt;KubeFed provides the RSP scheduler for replica-based workloads, but its customizability and extensibility are very limited, and modifying its behavior requires code modification. Additionally, it lacks support for stateful services, job-like resources, etc., which require different sets of scheduling semantics.&lt;/p&gt;

&lt;p&gt;KubeAdmiral introduces more comprehensive scheduling semantics. It supports more flexible and fine-grained mechanisms to select clusters via labels, taints, etc, and score clusters based on resource utilization, affinity, and so on. Beyond just replica-based workloads, it also supports scheduling stateful workloads and job-like resources. Additionally, it brings about convenient features such as automatic dependency scheduling (dependencies such as ConfigMaps can automatically follow their Deployment to corresponding member clusters). The scheduling behavior can be configured using a PropagationPolicy object, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: core.kubeadmiral.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: mypolicy
  namespace: default
spec:
  # Many different ways to select clusters.
  placement:
  # Manually specify desired clusters and replica weights, if required.
  - cluster: cluster-01
    preferences:
      weight: 4
  - cluster: cluster-02
    preferences:
      weight: 3
  - cluster: cluster-03
    preferences:
      weight: 4
  # Filter clusters based on label selectors.
  clusterSelector:
    IPv6: "true"
  # Filter clusters based on affinity.
  clusterAffinity:
  - matchExpressions:
    - key: region
      operator: In
      values:
      - us-east
  # Filter clusters based on taints and tolerations.
  tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
  # Mode of scheduling - divide or duplicate.
  schedulingMode: Divide
  reschedulePolicy: 
    # Only schedule on creation and do not reschedule afterwards.
    # Suitable for stateful workloads.
    disableRescheduling: false
    # When rescheduling should be triggered.
    # More triggers: reschedule more frequently - favor agility.
    # Fewer triggers: reschedule less frequently - favor stability.
    rescheduleWhen:
      policyContentChanged: true
      clusterLabelsChanged: false
    # Whether to rebalance replicas on reschedule.
    # Enabling rebalance results in optimal placement, but at the potential cost
    # of disrupting existing replicas.
    replicaRescheduling:
      avoidDisruption: true
  # Limit propagation to a single cluster.
  # Suitable for job-like workloads.
  maxClusters: 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of writing Overrides manually, KubeAdmiral supports generating Overrides based on OverridePolicy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: core.kubeadmiral.io/v1alpha1
kind: OverridePolicy
metadata:
  name: example
  namespace: default
spec:
  # Flexible ways to select target clusters.
  overrideRules:
  - targetClusters:
      # Select clusters by name.
      clusters:
      - on-prem-1
      - edge-1
      # Select clusters by label.
      clusterSelector:
        region: us-east
        az: az1
      # Select clusters by affinity.
      clusterAffinity:
      - matchExpressions:
        - key: region
          operator: In
          values:
          - us-east
      # Change the container image in the target clusters using jsonpatch.
      overriders:
        jsonpatch:
        - path: "/spec/template/spec/containers/0/image"
          operator: replace
          value: "nginx:test"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Scheduler Extension&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Taking inspiration from kube-scheduler’s design, KubeAdmiral offers a flexible scheduling framework. It simplifies the scheduling process by dividing it into four distinct stages: &lt;strong&gt;Filter, Score, Select, and Replica.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Each stage is handled by individual plugins, creating a logical separation that promotes modularity. For instance, in the provided PropagationPolicy example above, most behaviors are implemented through built-in scheduling plugins. The beauty of this approach is that plugins can be easily added or removed, without any impact on the remaining plugins. This greatly simplifies the scheduler logic and reduces its overall complexity. Although the built-in plugins in KubeAdmiral offer versatile features that cater to common use cases, users have the flexibility to enhance the functionality by creating their own custom scheduling plugins for specific niche scenarios. This empowers users to seamlessly integrate with internal or existing systems. &lt;/p&gt;

&lt;p&gt;The KubeAdmiral scheduler interacts with external plugins via the HTTP protocol, enabling users to extend the scheduling logic with minimal effort and without having to modify the KubeAdmiral control plane. The plugin only needs to output the desired placement, and KubeAdmiral takes care of binding and enforcing those results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67gelcareqx1uxvi1pd5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67gelcareqx1uxvi1pd5.png" alt="Scheduler stages and plugins" width="800" height="343"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Scheduler stages and plugins&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Automatic Migration of Unschedulable Workloads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For replica scheduling, KubeAdmiral calculates the number of replicas that each member cluster should receive and overrides the replicas field in the template before distributing the resources to the member clusters. After the resources are distributed to member clusters, the kube-scheduler in each member cluster assigns the corresponding pods to available nodes. Thus, a full scheduling chain is completed.&lt;/p&gt;

&lt;p&gt;Occasionally, there are cases where the kube-scheduler fails to find suitable nodes for pods due to reasons including node outages, resource shortages, and unmet node affinity requirements. If left unaddressed, the unschedulable pods will remain pending. KubeAdmiral resolves this by automatically migrating the unschedulable pods to other clusters, enabling better resource utilization overall.&lt;/p&gt;

&lt;p&gt;As an illustration, consider three clusters A, B, and C with an equal weight distribution for six replicas. After the initial scheduling by KubeAdmiral, each cluster receives two replicas. If the two replicas in cluster C fail to be scheduled by kube-scheduler after a while, KubeAdmiral automatically shifts them to clusters A and B, ensuring the desired availability of 6 replicas across all clusters.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cluster&lt;/th&gt;
&lt;th&gt;A&lt;/th&gt;
&lt;th&gt;B&lt;/th&gt;
&lt;th&gt;C&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weight&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas before scaling down&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change&lt;/td&gt;
&lt;td&gt;-10&lt;/td&gt;
&lt;td&gt;-11&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas after scaling down&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Dynamic Replica Distribution Based on Real-Time Resource Availability&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In a multi-cluster setup, the resource utilization of each cluster fluctuates as machines go online or offline. Relying solely on the static weight replica scheduling provided by KubeFed RSP can easily lead to skewed resource utilization. Clusters with a high deployment rate are prone to pod pending during upgrade, while clusters with a low deployment rate have idle resources that are wasted.&lt;/p&gt;

&lt;p&gt;As a solution to this, KubeAdmiral introduces dynamic weight scheduling based on real-time cluster resource utilization. It calculates the amount of available resources by collecting the total and allocated resources of each cluster, and uses it as the weight for replica scheduling. This ultimately achieves dynamic load balancing across all member clusters. In practice, we are able to maintain a stable deployment rate of 95-98% or above in all member clusters with this approach.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Refined Replicas Rescheduling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;KubeFed’s replica rescheduling algorithm usually results in less than ideal distributions for scaling operations. As an illustration, consider 30 replicas currently distributed to 3 member clusters A, B, and C with equal weights. If the workload is scaled down to 9 replicas, KubeFed has 2 possible behaviors depending whether the user enables rebalance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If rebalance = false, KubeFed retains existing replicas, disregarding cluster weights.&lt;/li&gt;
&lt;li&gt;If rebalance = true, KubeFed disregards current distribution and rebalances replicas based on weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As seen above, KubeFed is unable to devise a distribution that satisfies fault tolerance and load balancing requirements without compromising service availability. To address this, KubeAdmiral developed a refined replica rescheduling algorithm that guarantees service availability and produces distributions that are as close to the optimal distribution as possible. The gist of the algorithm is to distribute the increment or decrement in replicas, instead of the total replicas.&lt;/p&gt;

&lt;p&gt;Using the same scenario of scaling down from 30 replicas to 9 replicas above, the refined algorithm roughly proceeds as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Current distribution = [15, 15, 0]; total current replicas: 30&lt;/li&gt;
&lt;li&gt;Desired distribution = [3, 3, 3]; total desired replicas: 9&lt;/li&gt;
&lt;li&gt;Distance = desired – current = [-12, -12, 3]; total distance: -21&lt;/li&gt;
&lt;li&gt;For scaling down, remove any positive distance terms; distance = [-12, -12, 0]&lt;/li&gt;
&lt;li&gt;Distribute the total distance -21 using the distance vector [-12, -12, 0] as weights; adjustments = [-10, -11, 0]&lt;/li&gt;
&lt;li&gt;Final distribution = current + adjustments = [15, 15, 0] + [-10, -11, 0] = [5, 4, 0]&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cluster&lt;/th&gt;
&lt;th&gt;A&lt;/th&gt;
&lt;th&gt;B&lt;/th&gt;
&lt;th&gt;C&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weight&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas before scaling down&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change&lt;/td&gt;
&lt;td&gt;-10&lt;/td&gt;
&lt;td&gt;-11&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicas after scaling down&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Support for Native Kubernetes Resource API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Unlike KubeFed, which requires users to use an incompatible “federated” API, KubeAdmiral caters to the usage habits of single-cluster Kubernetes users by providing support for native Kubernetes APIs. After the user creates a native resource (such as Deployment), KubeAdmiral’s federate-controller automatically converts it into an internal object for use by downstream KubeAdmiral controllers. This enables users to quickly transition from a single-cluster to a multi-cluster architecture with low onboarding cost.&lt;/p&gt;

&lt;p&gt;However, KubeAdmiral doesn’t stop there. In a single cluster, Kubernetes controllers update the status of resources to reflect their current state. For example, a Deployment‘s status reflects its rollout progress and number of replicas it currently has. Users or upper-layer systems often rely on such status. In a multi-cluster environment, the status is populated on individual Deployments propagated to member clusters. Users must check the status of resources in each cluster individually, leading to a fragmented perspective and reduced operational efficiency.&lt;/p&gt;

&lt;p&gt;To solve this problem and seamlessly support native resources, KubeAdmiral introduces status aggregation. The KubeAdmiral status-aggregator collects and aggregates the status of individual resources from member clusters and writes it back to the native resources. This allows users to observe the global resource status at a glance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;KubeAdmiral has been incubating within ByteDance for a while, and has been an integral part of ByteDance’s internal PaaS platform TCE. Battle-tested by large-scale applications, it has accumulated many valuable practical experiences. To give back to the community, KubeAdmiral has officially been open-sourced on GitHub.&lt;/p&gt;

&lt;p&gt;Looking forward, we plan to continue working on KubeAdmiral, especially in the following areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continue to improve the orchestration and scheduling capabilities of stateful and job-like workloads, and develop advanced capabilities such as automatic migration and cost-based scheduling to embrace the new era of multi-cloud batch computing.&lt;/li&gt;
&lt;li&gt;Improve user experience and further reduce users’ cognitive burden, striving for a pleasant out-of-the-box experience.&lt;/li&gt;
&lt;li&gt;Improve observability, optimize logging and metrics, and enhance the scheduler’s explainability.&lt;/li&gt;
&lt;li&gt;Explore features such as one-click migration from single cluster, further smoothening the onboarding experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-cluster orchestration and scheduling is not a simple topic. We hope our experience and solution could be useful to the community. We look forward to more friends joining the KubeAdmiral community, and welcome everyone to try KubeAdmiral and give us suggestions!&lt;/p&gt;

&lt;p&gt;GitHub repo: &lt;a href="https://github.com/kubewharf/kubeadmiral"&gt;https://github.com/kubewharf/kubeadmiral&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudnative</category>
      <category>kubernetes</category>
      <category>cloud</category>
      <category>go</category>
    </item>
    <item>
      <title>Gödel Scheduler open-sourced: a unified scheduler for online and offline workloads</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Wed, 03 Apr 2024 09:16:53 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/godel-scheduler-open-sourced-a-unified-scheduler-for-online-and-offline-workloads-4a8i</link>
      <guid>https://dev.to/bytedanceoss/godel-scheduler-open-sourced-a-unified-scheduler-for-online-and-offline-workloads-4a8i</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Since its open-source release in 2014, Kubernetes has rapidly become the de facto standard for container orchestration. The infrastructure team at ByteDance adopted Kubernetes early on to build our private cloud platform. Over the years, ByteDance’s rapid growth across various business lines, including microservices, recommendation/advertising/search services, machine learning &amp;amp; big data, and storage, has led to a substantial increase in the demand for computing resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ty5pkzccf8jyhtwlko3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ty5pkzccf8jyhtwlko3.png" alt="business lines" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Initially, ByteDance managed its online and offline workloads with separate resource pools, each dedicated to distinct business segments. To accommodate the surge in online business demands during significant holidays and major events, the infrastructure team usually needed to plan ahead by reallocating resources from offline to online pools to bolster the capacity for handling increased online activities. While this temporary fix satisfied immediate requirements, the inter-pool resource borrowing process proved to be time-consuming, operationally heavy, and inefficient. &lt;/p&gt;

&lt;p&gt;Furthermore, maintaining separate resource pools for online and offline workloads resulted in significant colocation costs, leaving little scope for enhancing resource utilization. &lt;/p&gt;

&lt;p&gt;Therefore, the infrastructure team sought to implement a unified system for scheduling and managing both online and offline workloads. This initiative aimed to facilitate resource pooling, enhance resource utilization and elasticity, optimize costs and user experiences, and alleviate operational burdens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practice of Unified Scheduling
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Enhancement beyond Kubernetes Default Scheduler:
&lt;/h3&gt;

&lt;p&gt;Since extensive use of Kubernetes in 2018, ByteDance continuously optimized various components of Kubernetes for functionality and performance. However, with the containerization of recommendation/advertising/search services in 2019, the native Kubernetes scheduler, in terms of both functionality and performance, was farther away from meeting ByteDance’s business requirements.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;In terms of functionality&lt;/strong&gt;, more granular resource scheduling capabilities and flexible preemption strategies were required. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In terms of performance&lt;/strong&gt;, the native Kubernetes default scheduler could only achieve a scheduling throughput of around 10 pods per second in a cluster of 5000 nodes, often causing business upgrades to be bottlenecked, far from meeting the requirements. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, the team introduced a number of key optimizations to the Kubernetes default scheduler, including:&lt;/p&gt;

&lt;h4&gt;
  
  
  Functionality:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Extended the scheduling capabilities to support non-native resources, such as memory bandwidth and network bandwidth.&lt;/li&gt;
&lt;li&gt;Supported for micro-topology scheduling.&lt;/li&gt;
&lt;li&gt;Refactored preemption implementation by providing a pluggable preemption framework to support extending preemption capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Performance:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Optimized the data synchronization mechanism between Scheduler cache and Snapshot by refactoring data structure and further strengthening the concept of incremental update between snapshots.&lt;/li&gt;
&lt;li&gt;Cached scheduling results for homogenous scheduling units to reduce redundant calculations and improve efficiency.&lt;/li&gt;
&lt;li&gt;Optimized preemption implementation by reorganizing preemption-related data structures, applying pruning timely, and reducing unnecessary computation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By implementing the aforementioned optimizations, we successfully enhanced our containerization capabilities to meet ByteDance’s rapidly expanding needs. This led to a remarkable 30-fold increase in scheduling throughput. That is, in a cluster comprising 10,000 nodes, we consistently achieved a scheduling rate of 300 pods per second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gödel Scheduler
&lt;/h2&gt;

&lt;p&gt;In 2020, ByteDance initiated a unified scheduling and resource management project for both online and offline business operations. The objective was to enhance overall resource utilization, improve operational efficiency, and reduce maintenance overheads. Initially, the plan involved managing both online and offline tasks through a singular scheduling system. However, this approach presented challenges, primarily due to the intricate nature of offline scheduling, which differed markedly from online processes, especially in its demand for high throughput.&lt;/p&gt;

&lt;p&gt;The native Kubernetes scheduler, primarily designed for Pod-level scheduling, was somewhat limited in its support for more complex “Job” scheduling semantics and encountered performance limitations when dealing with these higher-level demands. To effectively address these unique requirements and to better align with the diverse operational needs of ByteDance, the decision was made to develop a bespoke, in-house distributed scheduler. This led to the creation of the Gödel Scheduler, specifically tailored to integrate with the Kubernetes system and to handle the demanding and varied scheduling needs of ByteDance’s expansive and evolving business landscape.&lt;/p&gt;

&lt;p&gt;The Gödel Scheduler is a distributed system crafted to consolidate the scheduling of both online and offline workloads. This scheduler is an enhancement of the Kubernetes (K8s) Scheduler, designed to augment scalability and improve scheduling quality. It is adept at fulfilling the diverse functional and performance demands of ByteDance’s online and offline operations. Key features of the Gödel Scheduler include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Optimistic Concurrency&lt;/strong&gt;: It incorporates optimistic concurrency concepts, moving the most time-consuming unit-to-node matching (filtering and scoring) to the scheduler component. This allows for concurrent execution and improves scheduling throughput in large-scale clusters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-Layer Scheduling Abstraction (Unit and Pod) and Framework&lt;/strong&gt;: This provides more flexible batch scheduling capabilities, better supporting offline operations while also improving scheduling throughput and system scalability. The extended framework handles special scenarios more effectively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich Functionality and High Performance&lt;/strong&gt;: It meets the demands of various operations including online, offline (batch and stream), and training tasks, achieving true unified scheduling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility with the Kubernetes Ecosystem&lt;/strong&gt;: It can serve as an alternative to the K8s Scheduler, but due to performance and architectural optimizations, its framework interface is not entirely the same as the K8s Scheduler. However, its extensibility remains unaffected, and it can implement scheduling plugins like Kubernetes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is the architecture diagram of Gödel Scheduler.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklw95ffa3hhelotkwss5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklw95ffa3hhelotkwss5.png" alt="architecture diagram" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As outlined, the Gödel Scheduler consists of three primary components: the Dispatcher, the Scheduler, and the Binder. Key to its architecture is the Scheduler component, which is typically deployed in multiple shards to facilitate optimistic concurrency scheduling. This multi-shard deployment enhances its efficiency and scalability. On the other hand, the Dispatcher and the Binder are each deployed as single instances, a configuration that suits their specific roles and responsibilities within the Gödel Scheduler system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dispatcher
&lt;/h3&gt;

&lt;p&gt;The Dispatcher plays a pivotal role in managing application queuing, distribution, and node partitioning. It is comprised of several key components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sort Policy Manager&lt;/strong&gt;: This module handles the queuing of applications. Currently, it implements FIFO and DRF/FairShare queuing strategies, the latter still pending production use. Future enhancements will introduce more sophisticated queuing strategies, including those based on priority values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dispatching Policy Manager&lt;/strong&gt;: Its primary function is to allocate applications across various Scheduler instances. At present, the LoadBalance strategy is employed as the default. Future updates aim to make this feature more versatile and plugin-based.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node Shuffler&lt;/strong&gt;: This component is tasked with organizing cluster nodes relative to the number of Scheduler instances. It assigns each node to a specific node partition, with each Scheduler instance overseeing one partition. During the scheduling process, a Scheduler first considers nodes within its partition before exploring nodes in other partitions. This arrangement is dynamically adjusted in response to changes in node availability or Scheduler count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition Rules&lt;/strong&gt;: Currently, the system strives for an even distribution of nodes among Scheduler instances. Plans are underway to expand these partition strategies, enhancing their configurability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler Maintainer&lt;/strong&gt;: This element is responsible for monitoring the status of Scheduler instances. It tracks aspects like health status, workload, and the node count within each partition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reconciler&lt;/strong&gt;: Operating periodically, the Reconciler oversees the status of various elements like Pods, Nodes, Schedulers, and SchedulingUnits. It addresses any errors, discrepancies, or deficiencies, ensuring system integrity and performance. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Scheduler
&lt;/h3&gt;

&lt;p&gt;The Scheduler plays a critical role in the decision-making process for scheduling and preempting applications, although it does not execute these decisions itself (that task is handled by the Binder). It operates on a two-tier framework: the Unit Scheduling Framework and the Pod Scheduling Framework. The entire scheduling procedure is segmented into three principal phases: Node Organizing, Unit Scheduling, and Unit Preempting.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node Organizing&lt;/strong&gt;: This phase involves filtering and sorting nodes to streamline the scheduling process and enhance its quality. It consists of two types of plugins:

&lt;ul&gt;
&lt;li&gt;Locating Plugins: These filter nodes are based on specific application information.&lt;/li&gt;
&lt;li&gt;Grouping Plugins: These group nodes according to available resources or Job-level affinities.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit Scheduling&lt;/strong&gt;: In this stage, nodes are matched and scored in alignment with application requests that have been filtered through the Node Organizing plugins. This process is analogous to the Kubernetes (K8s) Scheduler framework, encompassing:

&lt;ul&gt;
&lt;li&gt;Filtering Plugins: These filter nodes are based on the requisites of the application requests.&lt;/li&gt;
&lt;li&gt;Scoring Plugins: These assign scores to nodes that have been filtered in the previous step.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit Preempting&lt;/strong&gt;: If suitable nodes are not found during the Unit Scheduling phase, the Scheduler progresses to the preemption phase. Here, it tries to free up resources by preempting running application instances to make room for new ones. This phase includes:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Victims Searching: Identifying potential victim applications that can be preempted.&lt;/li&gt;
&lt;li&gt;Candidates Sorting: Sorting both nodes and potential victims to identify the most appropriate choices for preemption.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Binder
&lt;/h3&gt;

&lt;p&gt;The Binder plays a crucial role in the final stages of the scheduling process, focusing on conflict detection, preemptive operations, and executing the binding of applications to resources. It comprises three main components: ConflictResolver, PreemptionOperator, and UnitBinder.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ConflictResolver&lt;/strong&gt;: This component is tasked with detecting concurrent conflicts in the scheduling process. It operates in two modes:

&lt;ul&gt;
&lt;li&gt;Cross Node Conflict Resolver: Checks for conflicts that might occur across different nodes.&lt;/li&gt;
&lt;li&gt;Single Node Conflict Resolver: Identifies conflicts within a single node.
If any conflict is detected, the application is immediately rejected and rescheduled.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PreemptionOperator&lt;/strong&gt;: In scenarios where no conflict exists but preemption is necessary, this operator takes charge. It executes the preemption by deleting the victims (applications or processes that need to be terminated to free up resources) and then awaits the final scheduling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UnitBinder&lt;/strong&gt;: This part of the Binder is responsible for the preparatory work required before binding, such as dynamically creating storage volumes, and then carries out the actual binding operation, linking applications to the designated resources.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Noteworthy, the current version of the Binder integrates a PodGroup controller. This controller is responsible for managing the state and lifecycle of PodGroups. However, it’s important to note that in a future version we plan to remove this functionality from the Binder, transitioning it into an independent controller. &lt;/p&gt;

&lt;h2&gt;
  
  
  Experience
&lt;/h2&gt;

&lt;p&gt;Over the past two years, the Gödel Scheduler has been a cornerstone within ByteDance, offering a wealth of scheduling features and semantics. It has efficiently and reliably supported the operations of ByteDance’s diverse and complex business workloads.&lt;/p&gt;

&lt;p&gt;Building upon the foundation of architectural enhancements, ByteDance has implemented profound performance optimizations drawing from its experience with the Kubernetes native scheduler. Integrated with ByteDance’s internally refined Kubernetes system, the Gödel Scheduler now boasts an impressive throughput: 2000+ pods/s in a single shard and  5000+ pods/s across multiple shards. ByteDance’s ongoing efforts to expand single-cluster capacity have culminated in their largest prod cluster reaching over 20,000 nodes and more than 1,000,000 pods.&lt;/p&gt;

&lt;p&gt;After years of thorough internal practices and enhancements within ByteDance, Gödel Scheduler has achieved a state of relative stability. In 2023, the top-notch cloud computing conference, SoCC, accepted our paper on Gödel Scheduler, highlighting ByteDance’s unified approach to large-scale resource management and scheduling. The RD team was also invited to present the work at the conference. For those interested, the Gödel Scheduler paper is available at &lt;a href="https://dl.acm.org/doi/10.1145/3620678.3624663"&gt;https://dl.acm.org/doi/10.1145/3620678.3624663&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With a commitment to contributing to the open-source community, the Bytedance team decided to open-source the Gödel Scheduler, offering a new scheduling solution that enhances cloud-native experiences for both online and offline services through its outstanding performance and comprehensive scheduling capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;Looking ahead, ByteDance is committed to the continual development of the Gödel Scheduler, focusing on enriching its features and enhancing its scalability. A significant area of attention will be optimizing the scheduling throughput in specific challenging scenarios, such as those involving high rates of deployment and frequent preemptions. Through innovative rescheduling strategies, ByteDance aims to tackle the intricate balance between maintaining scheduling performance and enhancing its quality. The overarching goal is to not only preserve the current scheduling throughput but also to substantially elevate the quality of scheduling.&lt;/p&gt;

&lt;p&gt;Moreover, ByteDance places a high priority on ecosystem development. Efforts will be made to ensure Gödel Scheduler’s compatibility with leading systems and frameworks used in various business applications. This initiative will include integration with prominent big data and machine learning frameworks, accompanied by practical usage examples and comprehensive documentation.&lt;/p&gt;

&lt;p&gt;To keep the community engaged and informed, a detailed roadmap for the Gödel Scheduler will be methodically laid out and made available on the Gödel Scheduler Repository. This will provide an opportunity for interested parties to track progress, contribute, and become active participants in the project.&lt;/p&gt;

&lt;p&gt;While the Gödel Scheduler has undergone numerous iterations within ByteDance, been rigorously tested in various scenarios, and demonstrated its effectiveness, ByteDance acknowledges that there is still considerable potential for advancement in terms of generality and standardization. ByteDance warmly invites and encourages members of the community to join in the development of the Gödel Scheduler, believing that collaborative efforts will lead to even greater improvements and innovations.&lt;/p&gt;

&lt;p&gt;Gödel Scheduler Project Repository: &lt;a href="https://github.com/kubewharf/godel-scheduler"&gt;https://github.com/kubewharf/godel-scheduler&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>cloud</category>
      <category>showdev</category>
      <category>go</category>
    </item>
    <item>
      <title>Katalyst: A QoS-based resource management system for workload colocation on Kubernetes</title>
      <dc:creator>XIAOXU CHANG </dc:creator>
      <pubDate>Mon, 01 Apr 2024 11:01:26 +0000</pubDate>
      <link>https://dev.to/bytedanceoss/katalyst-a-qos-based-resource-management-system-for-workload-colocation-on-kubernetes-5g2j</link>
      <guid>https://dev.to/bytedanceoss/katalyst-a-qos-based-resource-management-system-for-workload-colocation-on-kubernetes-5g2j</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This Blog originally published on &lt;a href="https://gokatalyst.io/blog/2023/12/06/katalyst-a-qos-based-resource-management-system-for-workload-colocation-on-kubernetes/"&gt;Katalyst’s blog&lt;/a&gt; by Pengcheng Tang&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The resource usage of web applications tends to fluctuate with changes in the number of visitors, displaying noticeable tidal characteristics. To ensure stability, service providers often allocate resources for their applications according to resource usage during peak periods. These resources can easily be underutilized during off-peak hours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d2j9gczww5g8h5bqyc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d2j9gczww5g8h5bqyc7.png" alt="resource-provision" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If these idle resources can be reclaimed and temporarily allocated to lower-priority services and returned promptly to online services when needed, the overall resource utilization rate can be significantly improved.&lt;/p&gt;

&lt;h2&gt;
  
  
  ByteDance colocation practices
&lt;/h2&gt;

&lt;p&gt;ByteDance operates at a massive scale with diverse business types, encompassing various categories such as microservices, advertising, machine learning, big data, and storage. Typically, different business types have distinct resource management requirements at the infrastructure level. The conventional approach involves segmenting resource pools based on business lines or service types to meet customized demands.&lt;/p&gt;

&lt;p&gt;However, this method of resource pool segmentation can lead to resource silos, preventing flexible resource sharing and hindering the overall efficiency of resource utilization and cost optimization. It also adds to the operational burden of managing clusters.&lt;/p&gt;

&lt;p&gt;Furthermore, considering that different types of businesses have complementary SLO requirements and resource usage patterns, the infrastructure team aims to leverage these characteristics fully. They do so through scheduling and control mechanisms to optimize resource efficiency, achieve the convergence and consolidation of resource pools, and assist business teams in attaining lower resource costs and greater elasticity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqndbwv05dmn3bc43pr0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqndbwv05dmn3bc43pr0.png" alt="Types of workloads" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To achieve unified resource management, ByteDance began building a unified infrastructure based on Kubernetes in 2016. At the current stage, ByteDance has essentially completed the containerization of all microservices, advertising, and a significant portion of machine learning and big data businesses. Throughout this process, the infrastructure team has continued to explore resource optimization methods under a unified resource pool and gradually developed a resource pool deployment approach that combines ’elastic scaling’ and ‘colocation.’ Elastic Scaling: This enables machine-level and Numa-level resource time-sharing, combining business and system metrics to guide horizontal and vertical scaling strategies for application instances. This ultimately allows offline services to purchase more idle resources at a lower cost, and online services to purchase more peak-time resources at a higher cost through resource market-oriented operations, leading to overall efficiency improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Colocation&lt;/strong&gt;: It offers the ability to oversell resources, making full use of ‘sold but underutilized resources’ in the cluster to deploy more low-priority tasks. Simultaneously, we enhance resource isolation mechanisms across multiple dimensions such as CPU, memory, disk, and network at the system level. Minute-level control mechanisms, combined with intelligent load prediction algorithms, are adopted to ensure service stability according to their SLOs.&lt;/p&gt;

&lt;p&gt;This solution combines Kubernetes and Yarn systems for joint control. It runs control components of both Kubernetes and Yarn on the same machine, and coordinates the allocatable resources between the two systems through a central coordination component. On top of this joint control system, we achieve real-time resource estimation based on service resource profiles, ensuring more flexible and dynamic resource allocation while meeting various service SLA requirements.&lt;/p&gt;

&lt;p&gt;During the implementation of this colocation solution, the infrastructure team verified the feasibility of resource pooling, constructed the foundational capabilities for colocation, and improved the overall utilization of core clusters from 23% to 60%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febxl6rwvfe1g4kb2oobb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febxl6rwvfe1g4kb2oobb.png" alt="Colocation Practices" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Katalyst: From Internal Validation to Open Source
&lt;/h2&gt;

&lt;p&gt;After undergoing extensive testing with businesses like Douyin and Toutiao, which have large-scale tidal traffic, ByteDance’s cloud-native colocation practices have matured. In order to help end users in cloud native community understand the principles behind large-scale colocation practices and improve overall resource efficiency of their own, we have refactored and enhanced the resource management system in a Kubernetes Native manner and built the resource management system “Katalyst”, which has now been officially open-sourced. The name “Katalyst” is derived from the word “catalyst,” and the ‘K’ symbolizes its ability to provide enhanced automation for resource management for all workloads running within the Kubernetes ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Katalyst
&lt;/h3&gt;

&lt;p&gt;Katalyst originated from ByteDance’s colocation practices and has been extended and supplemented in terms of resource management capabilities:&lt;/p&gt;

&lt;p&gt;Developed entirely within the context of hyperscale colocation practices, achieving true reuse of internal and external technology systems.&lt;br&gt;
Built based on a plugin-based architecture, allowing users to customize various modules such as scheduling, control strategies on top of the Katalyst Framework.&lt;br&gt;
Provides one-click deployment templates and comprehensive operation manuals, reducing the understanding and deployment costs for end users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Abstraction
&lt;/h3&gt;

&lt;p&gt;The native Kubernetes Quality of Service (QoS) system does not meet the requirements of large-scale production environments, prompting Katalyst to build a QoS system of its own. Katalyst defines four QoS classes including dedicated_cores, shared_cores, reclaimed_cores and system_cores. Users can assign different QoS class to their applications according to different QoS requirements. Based on our practices in Bytedance, CPU, in most scenarios, is a dominant resource that can affect applications’ performance and users tend to associate QoS requirements with CPU as well. So despite the fact that QoS requirement encompasses various kinds of resources(i.e. CPU, memory, disk io, network bandwidth etc), we named it after CPU. Each level is accompanied by various enhancement mechanisms (e.g., whether NUMA node binding is required or if network affinity and bandwidth restrictions are necessary), enabling differentiated resource allocation and control strategies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvtfio1fes2wo1k90isb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvtfio1fes2wo1k90isb.png" alt="QoS Classes" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Through this abstract resource model, Katalyst provides users with a unified resource entry point. Users can accurately express their specific needs by mapping business services to the appropriate QoS class and sales model based on business requirements. This ultimately allows users to obtain resources from a unified resource pool without needing to delve into the underlying details.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51molk23jbzzr9yxtzmp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F51molk23jbzzr9yxtzmp.png" alt="Node View" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Overall Architecture
&lt;/h3&gt;

&lt;p&gt;In the early stages, the colocation architecture had several issues: although the joint control of Kubernetes and Yarn systems achieved colocation of online and offline businesses, the complexity of the system incurs more maintenance costs. Additionally, this architecture led to resource usage overhead. It comes from the resource consumption of agents running on each node. While resource consumption on an individual node is not significant, the accumulated overhead in hyperscale clusters can be substantial. Moreover, the use of two control systems increased system complexity. Any abnormalities at any stage could lead to resource calculation errors. In Katalyst, we optimized and refactored the overall colocation architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F590i78slgt211o3p1ut2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F590i78slgt211o3p1ut2.jpg" alt="Overall Architecture" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the control layer, we integrated the fusion system based on both Kubernetes and Yarn systems into a single Kubernetes-based system. Specifically, we retained the API entry points for Kubernetes and Yarn at the access layer, while unifying metadata management and resource control implementation within Katalyst, which is Kubernetes-native.&lt;/p&gt;

&lt;p&gt;At the scheduling layer, Katalyst implemented a coordinated resource scheduling and control mechanism between “centralized scheduling” and “node resource management” based on unified metadata.&lt;/p&gt;

&lt;p&gt;On the node side: Katalyst extends Kubernetes with a module named QoS Resource Manager (QRM). This module enables plugin-based node-level topology affinity allocation and reports the topology to the control plane through custom CRDs, facilitating scheduling processes. At runtime, Katalyst continuously makes estimation for resource allocation according to system metrics, service level indicators and QoS requirements of the pods. The allocation result is then dynamically sent to the Container Runtime Interface (CRI) through the QRM reconcile loops. The resource estimation algorithm and QRM implementation can be customized through plugins, making resource control strategies more adaptable to different business scenarios.&lt;/p&gt;

&lt;p&gt;On the scheduling side: Katalyst extends scheduler with richer scheduling capabilities through scheduler framework. During scheduling, it takes into account how applications of different QoS classes should allocate and collaborate resources when running in the same cluster. The scheduler also combines real-time data and service profiles to do dynamic rebalancing across the entire cluster, reducing cluster vacancy rates and enhancing business stability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5mqo0fe70wmzmqui1yn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5mqo0fe70wmzmqui1yn.png" alt="QoS Resource Manager" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lastly, under a unified control system, we can fully leverage Kubernetes’ advantages in API design. By decoupling internal systems and generalizing control strategies through custom CRDs, we are able to iteratively improve the system through a plugin-based approach, achieving true convergence between internal and external systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  RoadMap
&lt;/h2&gt;

&lt;p&gt;Katalyst, as a resource management system, has colocation as one of its core business scenarios. In addition to abstracting the core concepts mentioned above, we have provided and planned various QoS capabilities for Katalyst:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fine-grained resource estimation strategies: Katalyst supports multiple resource estimation strategies including heuristics, unsupervised learning, and QoS-aware algorithms, improving resource utilization by accurately calculating and predicting the amount of resources that can be reclaimed from the nodes.&lt;/li&gt;
&lt;li&gt;Multi-dimensional resource isolation capabilities: Using technologies such as cgroup, RDT, iocost, tc, etc., Katalyst achieves effective isolation of various resources, including CPU, memory, disk, and network, in different colocation scenarios.&lt;/li&gt;
&lt;li&gt;Multi-level load eviction strategies: Katalyst supports multi-level eviction strategies based on various metrics, ensuring online business QoS while maximizing offline business QoS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Besides colocation, Katalyst also provides enhanced resource management capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recommendation and autoscaling: Katalyst provides enhanced VPA/HPA capabilities and advanced recommendation algorithms. This helps end users make a more accurate estimation of pod resource request/limit or replica numbers, hence improve deployment rates and resource utilization.&lt;/li&gt;
&lt;li&gt;Tidal(Exclusive-mode) colocation: While colocating online and offline applications on the same node gives more resource efficiency improvement, it requires all the infrastructure intricacies(e.g. Resource isolation, scheduling etc.) to work smoothly, which makes the overall system complicated. Katalyst provides an exclusive-mode colocation where the resource is reclaimed in node granularity so that either online or offline applications can run on the same node simultaneously. This allows users to improve resource efficiency at a lower operation cost.&lt;/li&gt;
&lt;li&gt;Node overcommitment: With node overcommitment, Katalyst allows the scheduler to schedule more pods to a node without end users’ awareness. Meanwhile, Katalyst adopts methods like interference detection and mitigation, node resource prediction algorithm and so forth to guarantee the QoS requirement of higher priority tasks.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg5mvdfedwbf8nbk32bz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbg5mvdfedwbf8nbk32bz.png" alt="Resource Efficiency Suite" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For detailed plans, please refer to the roadmap. Besides, we have just given a brief introduction about Katalyst at Kubecon China. You can also refer to &lt;a href="https://www.bilibili.com/video/BV1bc411R7xQ/?spm_id_from=333.999.0.0&amp;amp;vd_source=c09f0713b2507369924e94f4fec6c133"&gt;our talk&lt;/a&gt; for more information.&lt;/p&gt;

&lt;p&gt;While colocation has undergone several iterations within ByteDance, a universal, standardized platform foundation must be refined through various scenarios. We look forward to your participation in the Katalyst community and sharing your scenarios and requirements for colocation, resource efficiency improvement and so forth.&lt;/p&gt;

&lt;p&gt;GitHub| &lt;a href="https://github.com/kubewharf/katalyst-core"&gt;https://github.com/kubewharf/katalyst-core&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>opensource</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
