<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DASWU</title>
    <description>The latest articles on DEV Community by DASWU (@daswu).</description>
    <link>https://dev.to/daswu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F973459%2F8329f19d-864e-41c4-bb30-a5a0d23e63ce.jpeg</url>
      <title>DEV Community: DASWU</title>
      <link>https://dev.to/daswu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/daswu"/>
    <language>en</language>
    <item>
      <title>Monitoring JuiceFS with Better Stack</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 26 Jun 2026 07:46:14 +0000</pubDate>
      <link>https://dev.to/daswu/monitoring-juicefs-with-better-stack-18ce</link>
      <guid>https://dev.to/daswu/monitoring-juicefs-with-better-stack-18ce</guid>
      <description>&lt;p&gt;After deployment, &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; feels like a local drive, but underneath it's a sophisticated distributed system. This perfectly reflects one of its core design principles: distributed systems are complex, but from a user's perspective, they should be simple to use.&lt;/p&gt;

&lt;p&gt;Even so, that simplicity on the surface doesn't negate the need for deep visibility. For any critical storage system, gaining real-time visibility into its operations is crucial to prevent subtle performance degradations from escalating into significant incidents.&lt;/p&gt;

&lt;p&gt;Fortunately, JuiceFS exposes a suite of monitoring metrics, including throughput, IOPS, latency, data size, and many more, in the widely adopted Prometheus format, making it ready for modern monitoring stacks. Traditionally, you would probably pair Prometheus with Grafana to collect these metrics and visualize them. This is indeed a powerful combination. However, deploying, managing, and maintaining these systems yourself adds operational overhead again. Ironically, you may want to monitor them too, and trust me, you would rather not create yet another monitoring stack just to monitor your Prometheus and Grafana combo.&lt;/p&gt;

&lt;p&gt;That's where &lt;a href="https://betterstack.com/" rel="noopener noreferrer"&gt;Better Stack&lt;/a&gt; comes in. It is a fully managed SaaS observability platform that combines user-friendly dashboards, tracing, logging, error tracking, incident management, automatic alerting, and even &lt;strong&gt;AI-powered SRE&lt;/strong&gt;, all for a predictable, cost-effective price. With Better Stack, you get the power of the best-in-class tools out of the box without the operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frffw4cxuurkbuwep0szt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frffw4cxuurkbuwep0szt.png" alt=" " width="799" height="569"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, we'll guide you through setting up a comprehensive monitoring system for JuiceFS using Better Stack, from metric ingestion to intelligent alerting, so you can ensure your file system remains healthy and performant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing the JuiceFS file system
&lt;/h2&gt;

&lt;p&gt;Before diving into setting up Better Stack for monitoring, you'll need an existing JuiceFS file system that is actively publishing metrics. &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS Community Edition&lt;/a&gt; and &lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; (our cloud service is based on JuiceFS Enterprise Edition) both expose real-time status metrics in Prometheus format, but they do it in slightly different ways.&lt;/p&gt;

&lt;p&gt;For the JuiceFS Community Edition, after mounting the file system, JuiceFS automatically exposes metrics via &lt;code&gt;http://localhost:9567/metrics&lt;/code&gt; by default on the mounting host where the JuiceFS client is running. You can customize this port using the &lt;code&gt;--metrics&lt;/code&gt; option if needed.&lt;/p&gt;

&lt;p&gt;On the other hand, for JuiceFS Enterprise Edition &amp;amp; Cloud Service, metrics are exposed through the console via dedicated API endpoints. You'll need to replace &lt;code&gt;VOLUME_NAME&lt;/code&gt; with your file system name and &lt;code&gt;API_TOKEN&lt;/code&gt; with your API token. In this case, both Prometheus and JSON formats are available for metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Prometheus: &lt;code&gt;https://juicefs.com/api/vol/VOLUME_NAME/metrics?token=API_TOKEN&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JSON: &lt;code&gt;https://juicefs.com/api/volume/VOLUME_NAME/status?token=YOUR_TOKEN&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A quick but important note: &lt;strong&gt;metrics are only generated when the file system is mounted&lt;/strong&gt;. So before proceeding, ensure your JuiceFS file system is properly mounted and accessible. In this guide, we will use the JuiceFS Cloud Service, as it's the simplest to get started. If you haven't set up JuiceFS yet, please refer to the &lt;a href="https://juicefs.com/docs/cloud/getting_started" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for detailed instructions. Once you have created the first file system, URLs for the metrics mentioned above would be available under its &lt;strong&gt;Monitor&lt;/strong&gt; tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fo71ci0b6hafbe2bcnc5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fo71ci0b6hafbe2bcnc5h.png" alt=" " width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up a metrics source in Better Stack
&lt;/h2&gt;

&lt;p&gt;With your JuiceFS file system up and running (don't forget to mount the file system to a host machine) and publishing metrics, the next step is to configure Better Stack to start ingesting that data.&lt;/p&gt;

&lt;p&gt;First, if you haven't already, register for a Better Stack account. The process is seamless. Using a work email is recommended, and the platform provides clear guidance to help you set up your account and organization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6zmxybq3ao4kgx5o4ekp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6zmxybq3ao4kgx5o4ekp.png" alt=" " width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you're logged in, follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;In the left-hand navigation panel, head to &lt;strong&gt;Telemetry&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Under the &lt;strong&gt;Sources&lt;/strong&gt; section, click &lt;strong&gt;Connect source&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Give your telemetry data source a descriptive name, such as "jfs-better-stack" or "juicefs-production", to easily identify it later.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fezbvtp6qflgzoaw29wk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fezbvtp6qflgzoaw29wk6.png" alt=" " width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, you'll configure how Better Stack should collect your metrics. In the collector settings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Under &lt;strong&gt;Metrics&lt;/strong&gt;, choose the &lt;strong&gt;Prometheus scrape&lt;/strong&gt; option and click &lt;strong&gt;Connect source&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the &lt;strong&gt;URLs to scrape&lt;/strong&gt; section, input the JuiceFS metrics endpoint as described above.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgqs367692qv9qv9ntm1m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgqs367692qv9qv9ntm1m.png" alt=" " width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that if you are not using the JuiceFS Cloud Service and your JuiceFS endpoint is behind a firewall, you'll need to allow traffic from Better Stack's scrape servers. The list of IP addresses to add to the allowlist is available in their documentation and from &lt;a href="https://telemetry.betterstack.com/prometheus-scrape-ips.txt" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After saving the configuration, Better Stack will begin scraping the endpoint. Your JuiceFS metrics should be received within a few seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a dashboard with AI SRE
&lt;/h2&gt;

&lt;p&gt;With your JuiceFS metrics flowing into Better Stack, it's time to visualize them. You could build a dashboard manually, but Better Stack provides a smarter and more efficient way to do it by using &lt;strong&gt;AI SRE&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is AI SRE?
&lt;/h3&gt;

&lt;p&gt;AI SRE (Site Reliability Engineering) is Better Stack's chat-based site reliability assistant. It's an autonomous AI agent that can read your telemetry data, analyze incidents, build dashboards, and even write code to fix errors. Instead of waiting for humans to manually set up charts and queries, AI SRE can generate comprehensive dashboards for you based on a prompt.&lt;/p&gt;

&lt;p&gt;It's notable that AI SRE is a paid feature. If you're on the free plan, you can still create dashboards manually using the drag-and-drop chart builder.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a JuiceFS monitoring dashboard with a single prompt
&lt;/h3&gt;

&lt;p&gt;Once your metrics source is ready, follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;From the left panel, head to &lt;strong&gt;Telemetry&lt;/strong&gt; and then &lt;strong&gt;Metrics&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Create dashboard&lt;/strong&gt; and select the &lt;strong&gt;Create with AI&lt;/strong&gt; option.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the prompt field, give AI SRE a clear description of what you need. For example: "Create me a dashboard to track ALL JuiceFS metrics, such as latency, data size, etc."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Also make sure to select the metrics &lt;strong&gt;Source&lt;/strong&gt; you created earlier (for example, "jfs-better-stack") so that AI SRE has the proper context and data to work with.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzuivi0hjlk7hskc8kke6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzuivi0hjlk7hskc8kke6.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Give the platform a few minutes for the dashboard to be created. AI SRE will analyze your JuiceFS metrics and automatically generate a complete set of charts and panels for the important performance indicators such as throughput, IOPS, latency, and storage utilization. For my first time trying this, it just worked like a charm as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9l3she7awbnbozjdnhab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9l3she7awbnbozjdnhab.png" alt=" " width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI SRE is a powerful feature that does so much more than create dashboards. It can analyze incidents, perform root cause analysis, suggest fixes, and even open pull requests. We've only scratched the surface in this post. This is your first step toward a smarter, AI-assisted observability workflow. After building your dashboard, you can further customize it by adding panels, editing queries, or setting alerts directly from the graphs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post, we have walked through how to build a complete observability system for JuiceFS with Better Stack. We started by setting up the JuiceFS file system and getting its Prometheus-formatted metrics, then created a metrics source in Better Stack to ingest the data. We examined rapid creation of a full dashboard with AI SRE.&lt;/p&gt;

&lt;p&gt;We hope this guide helps you gain better visibility into your JuiceFS deployment. If you have any questions or run into issues, we'd love to hear from you. Join the JuiceFS community on &lt;a href="https://github.com/juicedata/juicefs" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;. And don't forget to check out Better Stack's &lt;a href="https://betterstack.com/docs/getting-started/welcome/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; and their &lt;a href="https://www.youtube.com/@betterstack" rel="noopener noreferrer"&gt;amazing YouTube channel&lt;/a&gt; for practical insights about distributed file storage, observability, AI, and more.&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>JuiceFS 1.4: Faster Metadata Operations with Batch Unlink, Batch Clone, and Redis Client-Side Caching</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Thu, 18 Jun 2026 08:30:56 +0000</pubDate>
      <link>https://dev.to/daswu/juicefs-14-faster-metadata-operations-with-batch-unlink-batch-clone-and-redis-client-side-4ao1</link>
      <guid>https://dev.to/daswu/juicefs-14-faster-metadata-operations-with-batch-unlink-batch-clone-and-redis-client-side-4ao1</guid>
      <description>&lt;p&gt;In large-scale file access scenarios such as AI training and dataset management, metadata often becomes the first performance bottleneck as file counts and concurrency grow. Whether you're deleting millions of small files, cloning large datasets, or traversing directories under heavy concurrency, metadata performance directly impacts application efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/introduction" rel="noopener noreferrer"&gt;JuiceFS Community Edition&lt;/a&gt; 1.4 introduces three major metadata optimizations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batch unlink&lt;/strong&gt; for large-scale file deletion
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch clone&lt;/strong&gt; for metadata cloning
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis client-side caching&lt;/strong&gt; for hot metadata reads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These improvements reduce transaction commits, network round trips, and redundant metadata lookups. In tests on a flat directory containing 100,000 files, batch unlink improved performance by up to &lt;strong&gt;93×&lt;/strong&gt;, while batch clone achieved up to &lt;strong&gt;24×&lt;/strong&gt; speedup.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explain the motivation, design, and performance benefits behind these optimizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deletion: From one‑by‑one to batched transactions
&lt;/h2&gt;

&lt;p&gt;Under &lt;a href="https://juicefs.com/docs/community/architecture" rel="noopener noreferrer"&gt;JuiceFS' metadata-data separation architecture&lt;/a&gt;, deleting a file involves much more than removing a directory entry. The system must also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update inode reference counts
&lt;/li&gt;
&lt;li&gt;Reclaim inode and space resources
&lt;/li&gt;
&lt;li&gt;Process trash entries
&lt;/li&gt;
&lt;li&gt;Update quota statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These operations must typically be completed within the same transaction.&lt;/p&gt;

&lt;p&gt;When a directory contains hundreds of thousands or even millions of files, the traditional file-by-file deletion approach used by &lt;code&gt;rm -rf&lt;/code&gt; quickly becomes a bottleneck. Each &lt;code&gt;unlink&lt;/code&gt; request goes through the &lt;a href="https://www.kernel.org/doc/html/next/filesystems/fuse.html" rel="noopener noreferrer"&gt;FUSE protocol&lt;/a&gt;, switches between kernel and user space, and triggers a separate metadata transaction.&lt;/p&gt;

&lt;p&gt;As the number of files grows, the overhead from system calls, context switches, network round trips, and transaction commits accumulates rapidly.&lt;/p&gt;

&lt;p&gt;To mitigate this issue, JuiceFS previously introduced the &lt;code&gt;juicefs rmr&lt;/code&gt; command. Unlike &lt;code&gt;rm -rf&lt;/code&gt;, &lt;code&gt;rmr&lt;/code&gt; bypasses the FUSE layer and sends deletion requests directly to the client. It also supports multi-threaded deletion (50 threads by default), significantly improving throughput.&lt;/p&gt;

&lt;p&gt;However, each file deletion still requires its own metadata transaction. Deleting 100,000 files still means executing 100,000 transactions.&lt;/p&gt;

&lt;p&gt;Batch unlink takes optimization one step further by merging many independent deletion operations within the same directory into a single batch transaction, further removing network overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core design
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The key is to turn many small transactions into fewer large ones. JuiceFS adds a batch unlink interface at the metadata engine layer. It allows the client to delete multiple non‑directory files under the same directory in one call.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When recursively clearing a directory, JuiceFS reduces deletion overhead in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different subdirectories are handled concurrently with multi‑threaded deletion.
&lt;/li&gt;
&lt;li&gt;Inside each directory, normal files and symlinks are grouped into batches and sent to &lt;code&gt;BatchUnlink&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This merges many unlink operations into fewer batch transactions at the metadata level.&lt;br&gt;&lt;br&gt;
It's important to note that &lt;code&gt;BatchUnlink&lt;/code&gt; does not directly delete directories. Directory removal still follows the standard recursive workflow: empty the subdirectory first, and then delete the subdirectory itself.  Therefore, &lt;code&gt;BatchUnlink&lt;/code&gt; only applies to regular files and symbolic links within the same directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This restriction preserves correct recursive deletion semantics while avoiding consistency risks to the directory tree structure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fn363fmtctinhgqmybcpp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fn363fmtctinhgqmybcpp.png" alt=" " width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Implementation across metadata engines
&lt;/h3&gt;

&lt;p&gt;JuiceFS uses different batching strategies depending on the &lt;a href="https://juicefs.com/docs/community/databases_for_metadata/" rel="noopener noreferrer"&gt;metadata backend&lt;/a&gt; to minimize transaction commits and network round trips.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL backends (MySQL, PostgreSQL, etc.):&lt;/strong&gt; Previously, each file deletion required its own sequence of &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;, and &lt;code&gt;UPDATE&lt;/code&gt; statements. With &lt;code&gt;BatchUnlink&lt;/code&gt;, the system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetches all edge records for the target entries in a single batch query.
&lt;/li&gt;
&lt;li&gt;Retrieves the relevant inode attributes in a single locked batch query.
&lt;/li&gt;
&lt;li&gt;Executes edge deletions, inode state updates (decrementing nlink or marking for cleanup), and delfile entry insertions — all within one transaction.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of executing one transaction per file, the entire batch can now be completed in a single transaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redis backend:&lt;/strong&gt; &lt;strong&gt;The optimization uses Redis pipelines and transactions.&lt;/strong&gt; Where individual deletions previously required separate command round trips, &lt;code&gt;BatchUnlink&lt;/code&gt; collects all &lt;code&gt;HDEL&lt;/code&gt; (dentry removal), &lt;code&gt;ZADD&lt;/code&gt; (enqueue for cleanup), &lt;code&gt;SET&lt;/code&gt; (inode attribute update), and &lt;code&gt;INCRBY&lt;/code&gt; (counter update) commands for multiple files into a single pipeline, executed atomically within one &lt;code&gt;MULTI&lt;/code&gt;/&lt;code&gt;EXEC&lt;/code&gt; transaction. To avoid blocking Redis' single-threaded event loop for too long, batch size is capped at 250 entries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TiKV backend:&lt;/strong&gt; &lt;code&gt;BatchUnlink&lt;/code&gt; consolidates multiple deletions into a single transaction, using TiKV's batch write capability to reduce network round trips and transaction overhead. &lt;strong&gt;For distributed key-value backends, this kind of batching allows the backend's concurrent write capacity to be more fully utilized.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The figure below shows benchmark results on a flat directory of 100,000 files using &lt;code&gt;juicefs rmr --threads 16&lt;/code&gt;. &lt;code&gt;BatchUnlink&lt;/code&gt; delivers meaningful improvements across all metadata backends, with TiKV and Redis showing the largest gains.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqh8iyz1hsdtugcauahpt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqh8iyz1hsdtugcauahpt.png" alt=" " width="800" height="572"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Clone: From one‑by‑one copy to batched references
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/guide/clone/" rel="noopener noreferrer"&gt;&lt;code&gt;juicefs clone&lt;/code&gt;&lt;/a&gt; creates fast copies of files or directories for training dataset version management, experiment snapshots, and large-scale directory duplication. Its efficiency comes from the fact that cloning doesn't immediately copy the underlying data blocks. Instead, it creates new file records at the metadata layer and reuses the source file's existing block references. New data blocks are only allocated when the clone is actually written to. This avoids the time and storage overhead of a full copy.&lt;/p&gt;

&lt;p&gt;For large directory clones, the same problem as deletion arises: processing files one by one generates a large number of short transactions and network round trips. &lt;strong&gt;The core idea behind batch clone is to merge the clone operations for multiple files in the same directory into a single batch transaction.&lt;/strong&gt; When recursively cloning a directory, the system reads directory entries in batches as a stream. For each batch, all non-directory entries are collected and cloned together in one operation.&lt;/p&gt;

&lt;p&gt;One key implementation detail is &lt;strong&gt;inode pre-allocation&lt;/strong&gt;: before entering the transaction, the system uses &lt;code&gt;nextInode&lt;/code&gt; to pre-allocate target inodes for all entries to be cloned. This avoids lock contention from repeatedly requesting inodes inside the transaction. Once inside the transaction, the system batch-queries all source file attributes (with row locks), builds all the insertion data for target nodes, edges, chunks, symlinks, and xattrs, and then inserts everything in a single batch.&lt;/p&gt;

&lt;p&gt;Batch clone uses each backend's native batch write capabilities in a similar way to batch unlink. The per-backend implementation details won't be repeated here.&lt;/p&gt;

&lt;p&gt;The performance gains vary across backends depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transaction models
&lt;/li&gt;
&lt;li&gt;Network communication overhead
&lt;/li&gt;
&lt;li&gt;Batch insertion efficiency for metadata records such as nodes, edges, and chunk references&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results on a flat directory of 100,000 files are shown below. MySQL sees the largest improvement at approximately 24x; Redis at approximately 5x; TiKV at approximately 2x.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3ivuei7r7p7f27ej5trt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3ivuei7r7p7f27ej5trt.png" alt=" " width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Redis client-side caching: Keeping hot metadata local
&lt;/h2&gt;

&lt;p&gt;In high-concurrency metadata workloads such as AI training dataset access and large-scale container startup, network round trips between JuiceFS clients and Redis often become a major performance bottleneck.&lt;/p&gt;

&lt;p&gt;Consider the following operation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;open&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/mnt/jfs/dataset/images/cat.jpg"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before the file can be opened, the Linux Virtual File System (VFS) must resolve every component in the path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Look up &lt;code&gt;dataset&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Look up &lt;code&gt;images&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Look up &lt;code&gt;cat.jpg&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwscnpxkx0h3mq7lbuhfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwscnpxkx0h3mq7lbuhfb.png" alt=" " width="799" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the &lt;code&gt;images&lt;/code&gt; directory contains hundreds of thousands of files and training jobs perform random access across the dataset, each lookup requires a &lt;code&gt;GET&lt;/code&gt; request to Redis.&lt;br&gt;&lt;br&gt;
Under heavy concurrency, this results in large numbers of network round trips and increased Redis CPU utilization. &lt;strong&gt;Even though a single Redis query takes only a few dozen microseconds, network latency pushes each lookup to hundreds of microseconds or even milliseconds. When thousands of training processes are accessing files simultaneously, this overhead becomes significant.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works: Redis 6.0 client-side caching
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://redis.io/docs/latest/develop/reference/client-side-caching/" rel="noopener noreferrer"&gt;Redis 6.0 introduced &lt;strong&gt;client-side caching&lt;/strong&gt;&lt;/a&gt;, which allows clients to cache hot keys locally and receive invalidation notifications whenever those keys are modified.&lt;/p&gt;

&lt;p&gt;Based on this capability, JuiceFS caches two categories of metadata in client memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inode attribute cache.&lt;/strong&gt; Keyed by inode number, this stores the complete attribute data for a file, such as type, size, permissions, and timestamps. The caching is implemented transparently through hook mechanisms in the Redis driver layer. On query, it first checks the local cache; on hit, it returns immediately without any network request. On modification, it automatically invalidates the corresponding cache. Application logic requires no awareness of the cache.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Directory entry cache.&lt;/strong&gt; Keyed by "parent inode + path separator + filename," this caches the results of directory lookups. Unlike the inode attribute cache, the lookup logic for entry cache is embedded directly in the directory lookup path rather than being intercepted transparently at the driver layer. When entries for a directory are invalidated, all related cache entries under that directory are cleared using prefix matching. This allows path resolution and repeated access to hot entries in the same directory to be served from local memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Introducing client-side caching creates a consistency challenge in multi-mount scenarios.&lt;/strong&gt; When multiple clients share the same JuiceFS file system, an operation on one client — creating, deleting, renaming, or updating attributes of a file or directory — can invalidate cached inode attributes or directory entries on other clients. Without an effective invalidation mechanism, subsequent reads could hit stale metadata, causing the directory entries or file attributes seen by one client to diverge from the actual state in the backend.&lt;/p&gt;

&lt;p&gt;To address this, JuiceFS introduces a &lt;a href="https://redis.io/docs/latest/commands/client-tracking/" rel="noopener noreferrer"&gt;&lt;strong&gt;Tracking and Broadcast Invalidation&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;(BCAST)&lt;/strong&gt; model on top of Redis' client-side caching mechanism. After connecting to Redis, each client declares the metadata key prefixes it wants to track. When those keys are modified, Redis sends invalidation notifications to the relevant clients. On receiving a notification, the client clears the corresponding inode attribute cache or entry cache entries, so that subsequent accesses fetch fresh data from the metadata engine.&lt;/p&gt;

&lt;p&gt;In addition, at client initialization, JuiceFS warms up metadata for the root directory of the mount point. Since these files are typically the most frequently accessed, benchmarks show this warm-up significantly improves overall access performance.&lt;/p&gt;

&lt;p&gt;Through this mechanism, hot metadata can be reused locally. When the metadata changes, the related caches are evicted in time, reducing the risk of stale metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use it
&lt;/h3&gt;

&lt;p&gt;Redis client‑side caching works best in read‑heavy, write‑light scenarios with repeated access to hot metadata. AI training dataset loading is a good example: the dataset is usually read‑only during training, and tasks repeatedly access the same directories and files, so inode attribute cache and entry cache hit often, reducing redundant lookups and remote metadata queries.&lt;/p&gt;

&lt;p&gt;The benefit is even more obvious when there is higher network latency between the client and the Redis metadata engine, such as in cross-availability-zone deployments.&lt;/p&gt;

&lt;p&gt;Redis 6.0 or later is required to use this feature. The default cache expiration time is 1 minute, which provides a safety net in case of network interruptions or connection anomalies where invalidation notifications may not arrive, preventing stale entries from persisting indefinitely. For workloads with stricter consistency requirements, the expiration time can be shortened or client-side caching can be disabled entirely to reduce the risk of reading stale metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;These three optimizations each target a different path through the metadata layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batch unlink&lt;/strong&gt; merges multiple independent unlink operations within the same directory into a single batch transaction.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch clone&lt;/strong&gt; merges multiple independent clone operations within the same directory into a single batch transaction.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis client-side caching&lt;/strong&gt; keeps hot metadata in client memory, bringing read latency from network-level down to memory-level, with broadcast invalidation to maintain consistency across multiple clients.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;BatchUnlink&lt;/code&gt; and &lt;code&gt;BatchClone&lt;/code&gt; are internal interfaces. Users do not call them directly. Just use the right commands: &lt;code&gt;juicefs rmr&lt;/code&gt; for deleting large directories, &lt;code&gt;juicefs clone&lt;/code&gt; for copying directories. The optimization will be applied automatically.&lt;/p&gt;

&lt;p&gt;One thing worth noting: both batch operations work by merging regular files within the same directory into a single batch transaction. Subdirectories are handled recursively by concurrent goroutines. The larger the directory, the greater the benefit.&lt;/p&gt;

&lt;p&gt;Batch operations mainly merge ordinary files under the same directory into one batch transaction. Subdirectories are handled recursively by concurrent goroutines. The larger the directory, the bigger the benefit.  &lt;/p&gt;

&lt;p&gt;All optimizations above are available in JuiceFS Community Edition 1.4. Upgrade the client to get the performance gains.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>How Gongjiyun Keeps Model Distribution Fast Enough for Cross-Cloud Elastic Inference</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 12 Jun 2026 03:25:32 +0000</pubDate>
      <link>https://dev.to/daswu/how-gongjiyun-keeps-model-distribution-fast-enough-for-cross-cloud-elastic-inference-2g8l</link>
      <guid>https://dev.to/daswu/how-gongjiyun-keeps-model-distribution-fast-enough-for-cross-cloud-elastic-inference-2g8l</guid>
      <description>&lt;p&gt;Founded in 2023 at Tsinghua University, &lt;a href="https://www.techinasia.com/companies/gongjiyun" rel="noopener noreferrer"&gt;Gongjiyun&lt;/a&gt; provides compute platforms and Model as a Service (MaaS) for artificial intelligence generated content (AIGC) enterprises and research institutions. We aim to alleviate the mismatch between elastic compute demand and supply. By aggregating idle IDC resources and edge resources, the platform offers containerized services, delivering rapidly schedulable compute for volatile workloads such as AI inference, video rendering, data processing, and data synthesis.&lt;/p&gt;

&lt;p&gt;In cross-cloud elastic inference scenarios, compute tasks can be scheduled to different regions, cloud environments, and clusters, but model files and application data are large and cannot be migrated as quickly as compute resources. Especially in online inference, the model repository is read‑heavy and frequently accessed – storage access performance directly affects service startup, elastic scaling, and request latency.&lt;/p&gt;

&lt;p&gt;To address this, we built an &lt;strong&gt;object storage acceleration&lt;/strong&gt; solution on top of &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, integrating users’ existing object storage into elastic inference clusters. Through a unified namespace, metadata import, FUSE mount, distributed cache, and data warm-up, it improves access efficiency for model repositories across clouds and clusters. In a case study with a leading text‑to‑image model community, the solution supports a tens‑of‑TB model repository, dynamic loading of checkpoints and low-rank adaptations (LoRAs), and elastic scaling of hundreds of GPUs at peak, while keeping additional latency within the customer’s acceptance range.&lt;/p&gt;

&lt;p&gt;In this post, we'll walk through why storage — not compute — is the real bottleneck in cross-cloud elastic inference, how we evaluated and chose JuiceFS, and the step-by-step optimizations that brought latency from +10s down to under 2s in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Elastic demand is widespread, but supply is hard to match
&lt;/h2&gt;

&lt;p&gt;As AI applications grow rapidly, compute demand continues to increase, but resource usage patterns differ across scenarios. &lt;strong&gt;Compared to training, which has stable resource needs, &lt;a href="https://www.ibm.com/think/topics/ai-inference" rel="noopener noreferrer"&gt;AI inference&lt;/a&gt;, data processing, and data synthesis are often more volatile&lt;/strong&gt;: office applications may see higher traffic during the day, entertainment apps during evenings or weekends, and project‑based data processing may consume large amounts of compute in short bursts then idle. For small teams or exploratory applications, elastic compute also helps them better evaluate the relationship between per‑request cost and application value.&lt;/p&gt;

&lt;p&gt;On the supply side, compute infrastructure is capital‑intensive. Resource providers are not incapable of offering elastic services, but they prefer long‑term dedicated leases to recover costs and reduce risk. As a result, low price, stability, and elasticity are difficult to achieve together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dedicated leases are low‑cost and stable but lack elasticity.&lt;/li&gt;
&lt;li&gt;Spot resources are cheap and elastic but uncertain.&lt;/li&gt;
&lt;li&gt;On‑demand resources are elastic and stable but expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In China, this contradiction is further reflected by a market dominated by dedicated leases, with elastic supply accounting for a small share.&lt;/p&gt;

&lt;p&gt;We aim to resolve this mismatch between elastic demand and supply. &lt;strong&gt;By aggregating idle IDC and edge resources, the platform offers containerized services, providing rapidly schedulable compute for AI inference, video rendering, data processing, and data synthesis.&lt;/strong&gt; At lower resource costs, we help users quickly spin up tasks during peaks, schedule them across clusters, and handle elastic demand, while enabling resource providers to improve utilization and monetize idle capacity beyond dedicated leases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compute can be scheduled: How does storage keep up?
&lt;/h2&gt;

&lt;p&gt;As elastic compute platforms evolve, compute resource scheduling is easy. Container images can be synchronized across clusters via registries and distribution networks, tasks can be launched in different resource pools by schedulers, and traffic can be distributed via unified ingress and traffic management.&lt;/p&gt;

&lt;p&gt;But model and data files are typically large, making cross‑cloud, cross‑cluster migration costly and slow, unable to match the sub‑second startup and release of compute. Therefore, &lt;strong&gt;in cross‑cloud elastic inference architectures, the real limitation on system elasticity is often not compute scheduling, but the efficiency of data and model distribution&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Different application scenarios have different storage requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;&lt;strong&gt;Model training&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;, development, and debugging:&lt;/strong&gt; These involve complex read‑write needs, including code repositories, model files, experiment results, and intermediate state. They also require high environment stability; users cannot tolerate state loss from frequent host switching. Thus, the platform typically provides long‑term stable compute resources and runtime environments, and storage needs can be met by existing stable storage systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data processing:&lt;/strong&gt; This can be split further. If a single processing job has high application value and can cover cross‑cloud network transfer costs, you can build a pipeline that continuously pulls data from S3 or other object storage, processes it in the compute cluster, and writes back streaming. The system does not need large local storage. If the data scale is larger or per‑job value is low, local storage acts as a one‑time cache. Data flows through and does not need to be persisted.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What is truly more challenging is the online inference scenario&lt;/strong&gt;. Online inference services cannot tolerate downtime. However, the resources used by an elastic computing platform may come from idle resource pools. These resources could be preempted. Once resources in a certain data center or cluster become unavailable, the platform must be able to migrate tasks to other providers or other clusters in time. This means not only computing tasks must be migrated. Model files and related storage access capabilities must also be migrated at the same time&lt;/p&gt;

&lt;p&gt;Online inference has higher requirements for service continuity and cross-cluster migration capabilities, but its storage access pattern is also more clear. Compared to training, development, and debugging scenarios, inference workloads are typically read heavy. The core needs focus on efficient model loading, reading model weights, and accessing the model repository. For large models and online applications, model loading speed directly affects service startup time, elastic scaling efficiency, and request response stability. Therefore, inference scenarios are not suitable for simply adopting traditional read-write hybrid storage architectures. Instead, they are better suited for specialized optimizations around model distribution, read only access, and cache acceleration.&lt;/p&gt;

&lt;p&gt;In addition, an elastic computing platform usually does not host a user's complete application system. The user's primary cloud account, application database, model management system, and even some fixed computing resources often already exist in other clouds or on premises. For the platform to integrate with the user's application, it must be compatible with the user's existing model repository and model management processes. It cannot require the user to fully migrate the entire system.&lt;/p&gt;

&lt;p&gt;Therefore, &lt;strong&gt;to support cross-cloud elastic inference, we need more than just compute scheduling capabilities. We need a cross-cloud high-performance storage and model distribution solution tailored for model inference scenarios&lt;/strong&gt;. This solution must support hosting a large model repository and high-performance reading, it must adapt to the user's existing model management system. And it must provide stable data access capabilities when resources are migrated across clouds and clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JuiceFS: Unified cross-cloud access, strongly consistent metadata, and high-performance cache
&lt;/h2&gt;

&lt;p&gt;Facing cross-cloud elastic inference scenarios, the storage system needs to meet several conditions at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It must provide a unified access point across different clouds and clusters. It must support shared read-write access and unified metadata management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It must be compatible with the user's existing &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; and model repository to avoid data migration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It needs low operational complexity and good read performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When evaluating storage options, we considered Ceph:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ceph is mature. It’s suitable for building unified storage within a single data center or a stable resource domain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;However, in cross cloud elastic inference scenarios, Ceph requires high network stability and operational skills. The overall integration cost is higher. So we did not choose it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also evaluated Alluxio. However, in a &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt; environment, multiple clusters need to access the same underlying object storage data concurrently. The workload is not purely read only; there are also occasional writes. This scenario requires strong data consistency. Therefore, Alluxio was not chosen for production.&lt;/p&gt;

&lt;p&gt;We finally chose JuiceFS mainly because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It uses object storage as the database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It provides a unified namespace and consistent file system view through an independent metadata service. This allows multiple clusters to access the same model data as a file system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This architecture is suitable for cross-cloud and cross-cluster model distribution and shared reading.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s also compatible with the user's existing object storage and model repository, reducing data migration and application integration costs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The decision to further adopt &lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; was mainly due to its &lt;strong&gt;distributed caching capabilities and managed metadata service&lt;/strong&gt;. In this scenario, the value of JuiceFS is not just providing a file system interface. It combines object storage, unified namespace, metadata management, and cache acceleration into a storage access layer that is better suited for cross-cloud elastic inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4bd67wtjwxd2zdb124p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4bd67wtjwxd2zdb124p.png" alt=" " width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical: Object storage acceleration based on JuiceFS
&lt;/h2&gt;

&lt;p&gt;Based on JuiceFS, the platform encapsulates an object storage acceleration product. This product connects the user's existing object storage to the elastic inference cluster. It provides the storage as a high-performance file system for the application. The overall process is as follows.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a file system.&lt;/strong&gt; The user provides object storage access credentials, for example, AK/SK for S3-compatible storage. The credential permissions can be configured as read only or read-write based on application needs. The platform creates a corresponding JuiceFS file system based on that object storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Import metadata.&lt;/strong&gt; The platform uses the JuiceFS import feature to scan the metadata of files in object storage. Then, it imports that metadata into the JuiceFS metadata service. In this way, the model files originally stored by the user in object storage can be accessed as file system directories in JuiceFS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a cache group.&lt;/strong&gt; Within each cluster that may host workloads, the platform sets up a JuiceFS cache group. This forms a distributed cache group. Before running a task, the platform can warm-up model files. It caches hot data in the target cluster in advance. This reduces the time needed to pull data from remote object storage when the inference service starts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mount to application Pods.&lt;/strong&gt; When the user's application runs, the platform uses the FUSE client to mount the JuiceFS file system into the application Pod. For the application, model files appear as local file system paths. Therefore, the original model reading logic usually does not need modification.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enable node local cache.&lt;/strong&gt; Besides the cluster level cache group, the node where the FUSE client runs can also provide local cache. This improves repeated read and model loading performance. It further reduces direct access to remote object storage.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This object storage acceleration product essentially productizes the JuiceFS metadata import, distributed cache, data warm-up, and FUSE mounting process. It allows the user's existing object storage to serve cross-cloud inference tasks in a way that feels closer to a local file system.&lt;/p&gt;

&lt;p&gt;In addition, the JuiceFS cache group is independent from the file system access point. This characteristic, on one hand, adds management complexity on the platform side, because the platform needs to manage the relationships among the file system, cache groups, mount points, and task scheduling. On the other hand, it provides a foundation for cache isolation, independent scheduling, and fine-grained management based on clusters, users, or application scenarios in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production case study: A leading text-to-image model community
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario, challenges, and acceptance criteria
&lt;/h3&gt;

&lt;p&gt;One of the most representative cases in this object storage acceleration solution involves a leading Chinese text-to-image model community hosting tens of terabytes of model data, including large checkpoint base models and a larger number of smaller LoRA models. In practice, inference jobs typically load a checkpoint first, then load one or more LoRA models to perform combined inference.&lt;/p&gt;

&lt;p&gt;The company already operated compute infrastructure at scale — several thousand GPUs — but its workload, serving creative design and production use cases, exhibited significant variability. &lt;strong&gt;Overall average utilization was below 50%, yet during morning and afternoon peak hours on weekdays, load could reach 140% of normal capacity, degrading the user experience&lt;/strong&gt;. The customer therefore needed a highly elastic compute supply.&lt;/p&gt;

&lt;p&gt;We provided a high-elasticity resource model: compute support at the scale of hundreds of GPUs was available only during weekday peak hours — 10:00–12:00 AM and 2:00–6:00 PM — with resources scaling to zero at all other times.&lt;/p&gt;

&lt;p&gt;This meant the platform needed to provision hundreds of GPUs within a window of minutes, while consuming zero resources outside peak hours. For the customer, this model delivers large-scale compute during peak periods while avoiding payment for idle capacity. For the platform, it enables more efficient utilization and monetization of idle compute resources.&lt;/p&gt;

&lt;p&gt;The technical challenges were significant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A model repository of this scale cannot simply be replicated to every elastic cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Inference services do not load all models once at startup. Model reads and switches happen continuously as user requests arrive, resulting in high access frequency. Therefore, the object storage acceleration solution needed to support not just large-scale model repository access, but stable read performance under continuous dynamic loading.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The customer's performance requirements were also strict. During acceptance testing, a portion of production traffic was routed to the elastic cluster. The requirement was that both the median and mean inference latency of the elastic cluster must stay within 2 seconds of the customer's own cluster. Given that individual inference jobs take on the order of tens of seconds, this requirement left virtually no room for additional latency introduced by the storage layer. In the first few rounds of testing, both median and mean inference latency on the elastic cluster exceeded the customer's own cluster by approximately 10 seconds — failing the acceptance criteria.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance optimization: Reducing additional latency on the elastic cluster
&lt;/h3&gt;

&lt;p&gt;Optimization began with the median. &lt;strong&gt;A high median indicates that a significant proportion of requests are experiencing performance degradation, not just a small number of outliers inflating the tail.&lt;/strong&gt; JuiceFS monitoring revealed that the cluster's cache hit rate was not reaching the expected level. In the current architecture, a cache miss requires a round trip over the public internet to the customer's object storage on Alibaba Cloud. This significantly increases model loading time and then affects inference request latency.&lt;/p&gt;

&lt;p&gt;To solve this, the platform used the isolation capability of the JuiceFS cache group. It assigned dedicated cache nodes to this customer, reserved enough cache space, and warmed up the core model data. After warming up, the access path for core models achieved nearly 100% cache hit rate. This effectively avoided the performance loss from cross public network backfilling.&lt;/p&gt;

&lt;p&gt;The second factor affecting the median was metadata access latency. Because the platform uses a unified cross-cluster architecture, the metadata service is accessed over the public internet, for example, via JuiceFS Cloud Service or a deployment on a remote host, and this latency affects overall model read performance.&lt;/p&gt;

&lt;p&gt;The platform took two measures to address this issue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enabling JuiceFS' open cache to keep metadata in local memory as much as possible.&lt;/strong&gt; Since this workload is predominantly read-only, caching is an effective way to reduce metadata access overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tuning the cluster's network rate-limiting policy&lt;/strong&gt;. While the platform cannot directly control network equipment in edge data centers, it can apply node-level rate limiting to prevent any single node from saturating the available bandwidth, improving overall network stability. After these optimizations, cluster-wide performance improved meaningfully and the median metric gradually reached the customer's requirement.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Once the median met the target, the mean still showed a gap. This indicated that long-tail requests remained, with a small number of requests taking significantly longer than normal and pulling up the overall average.&lt;/strong&gt; Further analysis traced this to node-level local cache — specifically, the FUSE cache quota. With limited cache capacity, the elastic cluster experienced more frequent cache evictions than the customer's own cluster, causing some requests to reload model data from scratch and increasing mean inference latency. The platform addressed this by increasing the FUSE local cache quota in the production environment, reducing eviction frequency, improving tail latency, and ultimately bringing the mean metric within acceptance. The system passed validation and has been running stably since.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-tenant cache management
&lt;/h3&gt;

&lt;p&gt;After the single-tenant case was validated, the solution entered multi-tenant operation. As different tenants began time-sharing the same elastic nodes, a new issue emerged: cache contention between tenants.&lt;/p&gt;

&lt;p&gt;In the elastic resource model, FUSE clients do not actively clear node cache on exit. This is a reasonable design in single-tenant scenarios, where cached data from previous jobs can be reused by subsequent jobs to improve hit rates. &lt;strong&gt;However, in multi-tenant scenarios, one tenant's data can occupy node cache for extended periods. This leaves insufficient cache capacity for the next tenant, who is then forced to fall back to object storage, causing a noticeable performance drop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To address this, we deployed an independent daemon process on each node that performs a global cache garbage collection (GC) pass before the application FUSE client starts. The eviction strategy references the JuiceFS FUSE client implementation, using a 2-random policy to balance collection efficiency and performance overhead. Coordination across nodes is handled via Kubernetes distributed locks: only the client that acquires the lock executes GC, preventing multiple clients from running cache collection simultaneously and creating excessive network and I/O pressure.&lt;/p&gt;

&lt;p&gt;This mechanism effectively mitigates the problem of historical jobs occupying cache resources in multi-tenant scenarios, allowing different tenants sharing elastic resources to maintain consistent cache performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;For elastic compute to reliably serve production traffic, compute scheduling alone is not enough. Model data and hot data must remain stably accessible across clouds and clusters.&lt;/p&gt;

&lt;p&gt;Built on JuiceFS, we’ve combined object storage, unified namespace, metadata management, distributed caching, and FUSE mounting into an object storage acceleration solution purpose-built for elastic inference. This is not simply about mounting object storage as a file system. It’s about building a data access layer around the access patterns of model inference: one that supports warm-up, caching, isolation, and management.&lt;/p&gt;

&lt;p&gt;This represents Gongjiyun's current progress in elastic compute and cross-cloud storage acceleration. As AI inference scenarios continue to evolve, model distribution, cache management, and multi-cluster data access will continue to surface new engineering challenges. We look forward to exchanging ideas with developers, AI application teams, and infrastructure practitioners, and to exploring more stable and efficient data access solutions for elastic compute environments.&lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Reducing Data Storage Costs: A Deep Dive into JuiceFS 1.4 Tiered Storage</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 05 Jun 2026 07:50:42 +0000</pubDate>
      <link>https://dev.to/daswu/reducing-data-storage-costs-a-deep-dive-into-juicefs-14-tiered-storage-2c21</link>
      <guid>https://dev.to/daswu/reducing-data-storage-costs-a-deep-dive-into-juicefs-14-tiered-storage-2c21</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/juicedata/juicefs/releases/tag/v1.4.0-beta1" rel="noopener noreferrer"&gt;JuiceFS Community Edition 1.4&lt;/a&gt; introduces enhanced tiered storage capabilities, allowing users to set object storage classes at the file or directory level. This makes it possible to manage different storage tiers for data under a unified file system interface. In this article, we’ll discuss this feature’s application background, evolution, usage model, implementation, and future plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  Application background
&lt;/h2&gt;

&lt;p&gt;In real‑world scenarios, different files have different access patterns and performance requirements. Some data is read or written frequently and demands low latency and high throughput. Other data is rarely accessed after being written, and the main concern is long‑term storage cost. Tiered storage addresses this by placing data in the appropriate storage layer based on access patterns, balancing performance and cost.&lt;br&gt;&lt;br&gt;
Typically, data can be classified into three categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hot data:&lt;/strong&gt; Frequently accessed, requires low latency and high throughput.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Warm (infrequent access) data:&lt;/strong&gt; Accessed occasionally, but still requires fast retrieval when needed.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold (archival) data:&lt;/strong&gt; Primarily for long‑term retention, very low access frequency, can tolerate some restoration delay in exchange for lower cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;Object storage&lt;/a&gt; already offers tiering capabilities. For example, Amazon S3 provides S3 Standard for frequently accessed data, S3 Standard‑IA for infrequent but still millisecond‑accessible data, and Glacier / Deep Archive for long‑term archiving. These storage classes differ in access latency, minimum storage duration, and pricing.&lt;br&gt;&lt;br&gt;
The table below compares main S3 storage classes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage class&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;First byte latency&lt;/th&gt;
&lt;th&gt;Minimum storage duration fee&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;S3 Standard&lt;/td&gt;
&lt;td&gt;General-purpose storage for frequently accessed data&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Standard-IA&lt;/td&gt;
&lt;td&gt;Infrequently accessed data requiring millisecond access&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;td&gt;30 days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Glacier Deep Archive&lt;/td&gt;
&lt;td&gt;Archiving very rarely accessed data with very low cost&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;td&gt;180 days&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For JuiceFS, which is built on top of object storage, the key is to translate these capabilities into file‑system‑level tiering: users set storage tiers for files, directories, or datasets, and JuiceFS maps them to the underlying object storage while handling writes, migrations, and restore operations.&lt;/p&gt;
&lt;h2&gt;
  
  
  Evolution of JuiceFS tiering capabilities
&lt;/h2&gt;

&lt;p&gt;The evolution of JuiceFS tiering has moved from being “passively unaware of object storage classes” to “actively managing storage tiers at file and directory granularity.”  &lt;/p&gt;

&lt;p&gt;Before v1.1, JuiceFS did not provide a way to configure storage classes. While users could manually change the storage class of objects at the object storage side, these changes were not recognized or managed by JuiceFS at the file system level. For standard and infrequent‑access classes that support direct access, normal read/write operations usually continued to work. However, if objects were moved to archival storage, access would fail because those objects cannot be read directly.  &lt;/p&gt;

&lt;p&gt;Starting with v1.1, &lt;a href="https://juicefs.com/docs/community/reference/how_to_set_up_object_storage/#storage-class" rel="noopener noreferrer"&gt;JuiceFS supports setting the object storage class via &lt;code&gt;--storage-class&lt;/code&gt;&lt;/a&gt;. For example, you can specify the default storage class for the file system at format time or override the storage class used for data written to a specific mount point during mount. This gave JuiceFS a basic ability to leverage object storage tiering. However, the configuration granularity remained coarse – primarily at the file system default or mount‑point level – and did not allow fine‑grained management per directory, per file, or per dataset.  &lt;/p&gt;

&lt;p&gt;Version 1.4 further advances tiering capabilities to the file and directory level. You can assign a storage tier to individual files or directories based on data temperature. When a directory is assigned a tier, newly created files and subdirectories under it automatically inherit that configuration. Compared to the previous default or mount‑point level settings, v1.4 is better suited for tiered management by project, directory, dataset, or data temperature.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to configure tiered storage
&lt;/h2&gt;

&lt;p&gt;The key to tiered storage in JuiceFS 1.4 is translating object storage classes into file‑system‑manageable tiers. The usage model consists of two steps:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Map tier IDs to object storage classes.
&lt;/li&gt;
&lt;li&gt;Assign files or directories to those tier IDs.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This allows users to organise tiering policies by file, directory, or dataset without specifying the underlying storage class on every write.  &lt;/p&gt;

&lt;p&gt;The figure below shows mapping tier IDs to storage classes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xe9y2mcgxr9fpcxxk7a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xe9y2mcgxr9fpcxxk7a.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, map tier IDs 1–3 to different storage classes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;juicefs config redis://localhost &lt;span class="nt"&gt;--tier-id&lt;/span&gt; 1 &lt;span class="nt"&gt;--tier-sc&lt;/span&gt; STANDARD_IA &lt;span class="nt"&gt;-y&lt;/span&gt;  
juicefs config redis://localhost &lt;span class="nt"&gt;--tier-id&lt;/span&gt; 2 &lt;span class="nt"&gt;--tier-sc&lt;/span&gt; INTELLIGENT_TIERING &lt;span class="nt"&gt;-y&lt;/span&gt;  
juicefs config redis://localhost &lt;span class="nt"&gt;--tier-id&lt;/span&gt; 3 &lt;span class="nt"&gt;--tier-sc&lt;/span&gt; GLACIER_IR &lt;span class="nt"&gt;-y&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After mapping, set the storage tier for a file or directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;juicefs tier &lt;span class="nb"&gt;set &lt;/span&gt;redis://localhost &lt;span class="nt"&gt;--id&lt;/span&gt; 1 /path/to/file  
juicefs tier &lt;span class="nb"&gt;set &lt;/span&gt;redis://localhost &lt;span class="nt"&gt;--id&lt;/span&gt; 2 /path/to/dir  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Directory‑level settings have inheritance semantics. Once a directory is assigned a tier ID, newly created files and subdirectories will inherit that tier. To apply the tier to existing data under the directory, use &lt;code&gt;-r&lt;/code&gt; to recursively set the tier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;juicefs tier &lt;span class="nb"&gt;set &lt;/span&gt;redis://localhost &lt;span class="nt"&gt;--id&lt;/span&gt; 2 /path/to/dir &lt;span class="nt"&gt;-r&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcwwap06itrd5ljrp90bs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcwwap06itrd5ljrp90bs.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For archival storage classes such as Glacier, a restore request must be issued before reading:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;juicefs tier restore redis://localhost /path/to/dir &lt;span class="nt"&gt;-r&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;From an implementation perspective, the key to tiered storage in v1.4 is storing tier information in metadata and using the tier ID to decide the object storage behavior during writes, migrations, and reads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvr3rq0u6pp3qjdiexydn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvr3rq0u6pp3qjdiexydn.png" alt=" " width="800" height="718"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Metadata design
&lt;/h3&gt;

&lt;p&gt;JuiceFS uses &lt;code&gt;tier-id&lt;/code&gt; on files and directories to indicate the storage tier. A value of &lt;code&gt;0&lt;/code&gt; means the default storage tier; values &lt;code&gt;1&lt;/code&gt; to &lt;code&gt;3&lt;/code&gt; correspond to user‑configured object storage classes.  &lt;/p&gt;

&lt;p&gt;Thus, the storage tier is no longer just an external state at the object storage side, but becomes part of the file system metadata that JuiceFS can understand and manage. When writing new data, migrating existing data, or checking file status, JuiceFS can determine the intended storage class based on this metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Migrating existing data
&lt;/h3&gt;

&lt;p&gt;For existing data, changing the storage tier involves not only updating the metadata &lt;code&gt;tier-id&lt;/code&gt; but also changing the actual storage class of the underlying objects. When a directory is set recursively, JuiceFS processes all files and subdirectories under it and uses the object storage’s copy capability to migrate existing objects to the new storage class.  &lt;/p&gt;

&lt;p&gt;If only the mapping from a tier ID to a storage class is changed, the actual storage class of existing objects is not automatically updated. In that case, you must use &lt;code&gt;tier set --force&lt;/code&gt; to explicitly trigger the change. This will rewrite the objects with the new storage class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write path
&lt;/h3&gt;

&lt;p&gt;When a new file is written, JuiceFS determines the target storage class based on the file’s own &lt;code&gt;tier-id&lt;/code&gt; or, if not set, the inherited &lt;code&gt;tier-id&lt;/code&gt; from its parent directory. For directories that already have a storage tier assigned, new data can be written directly to the corresponding storage tier. This avoids the overhead of first writing to the default tier and then migrating later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read path
&lt;/h3&gt;

&lt;p&gt;For storage classes that support immediate access (for example, Standard and Standard‑IA), reads are transparent to the application, and JuiceFS simply reads the data from object storage as usual.  &lt;/p&gt;

&lt;p&gt;For archival classes such as Glacier and Deep Archive, objects cannot be read directly. You must first issue a restore request using &lt;code&gt;juicefs tier restore&lt;/code&gt;. This sends a request to the object storage service. Whether and when the objects become readable depends on the cloud provider’s restore mechanism. After restoration completes, applications can retry the read.  &lt;/p&gt;

&lt;p&gt;Therefore, archival storage is suitable for data that is accessed very infrequently and can tolerate restoration delay. It’s not appropriate for workloads that require online access at any time. When using archival tiers, you must consider storage cost, restoration time, and restoration costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future plans
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Reducing operational costs of archival storage
&lt;/h3&gt;

&lt;p&gt;Archival storage classes have low long‑term storage costs, but they often come with complex cost models for writes, restores, early deletion costs, and lifecycle transitions. Writing data directly to archival storage may incur extra costs in scenarios with frequent changes or bulk migrations.  &lt;/p&gt;

&lt;p&gt;In the future, JuiceFS could combine object storage lifecycle management. Data could first be written to standard storage with specific object tags. Users could then use cloud‑vendor lifecycle rules to automatically and cost‑effectively transition data to infrequent‑access or archival tiers based on those tags. This would preserve JuiceFS’ file‑system‑level tiering capabilities while leveraging native batch transition mechanisms to reduce overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extending tiering to multi‑bucket, multi‑cloud
&lt;/h3&gt;

&lt;p&gt;Currently, tiered storage works on different storage classes within the same object storage backend. In the future, JuiceFS could extend “tier” to different buckets, different object storage services, or even different cloud providers. Tiering would no longer be limited to a single backend.  &lt;/p&gt;

&lt;p&gt;For example, hot data could be stored in a local high‑performance MinIO cluster backed by SSDs, while cold or archival data resides in low‑cost cloud archival buckets. Policies could then gradually move data from the hot tier to the cold tier. With such an architecture, JuiceFS could offer cross‑bucket, cross‑cloud, and cross‑media tiered data management under a unified file system namespace.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>JuiceFS at Xiaomi: Unified Storage for AI, Big Data, and Cloud‑Native Workloads</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Wed, 03 Jun 2026 08:29:53 +0000</pubDate>
      <link>https://dev.to/daswu/juicefs-at-xiaomi-unified-storage-for-ai-big-data-and-cloud-native-workloads-237d</link>
      <guid>https://dev.to/daswu/juicefs-at-xiaomi-unified-storage-for-ai-big-data-and-cloud-native-workloads-237d</guid>
      <description>&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Xiaomi" rel="noopener noreferrer"&gt;Xiaomi&lt;/a&gt; is one of the world's leading smartphone companies. Since 2021, its storage team has been building a file storage platform based on &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, initially providing file storage capabilities for cloud‑native and some application scenarios. After Xiaomi announced its comprehensive AI strategy in 2024, issues with the previous heterogeneous storage system became more evident in areas such as technology selection, data flow, and development/operations. Leveraging multi‑protocol access, elastic scalability, multi‑cloud adaptability, and high performance, the team decided to build a unified file storage foundation centered on JuiceFS to support big data, cloud‑native, and AI workloads.&lt;/p&gt;

&lt;p&gt;To achieve this goal, the platform further developed core capabilities, including a capacity layer, a performance layer, and a cache layer. These reduce the complexity of multi‑system access and data movement while balancing large‑scale storage with high‑performance access. &lt;strong&gt;Over the past two years, with the rapid growth of generative AI and autonomous driving, the platform has supported typical scenarios such as large‑model training, autonomous driving training, inference acceleration, and big‑data cloud migration. Today, the platform can handle hundreds of billions of files and EB‑scale storage, covering the entire AI storage chain from raw data and training data to model file distribution.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage architecture challenges under the AI strategy
&lt;/h2&gt;

&lt;p&gt;Before 2023, Xiaomi, like most companies, had built multiple storage systems for different application scenarios. In the &lt;a href="https://en.wikipedia.org/wiki/Big_data" rel="noopener noreferrer"&gt;big data&lt;/a&gt; area, the data platform was mainly based on HDFS; AI workloads, before the rise of large language models, relied primarily on high‑performance file storage services on the cloud, such as Parallel File System (PFS) and Network Attached Storage (NAS).&lt;/p&gt;

&lt;p&gt;During this period, we also began to introduce JuiceFS and built an internal self‑developed File Storage Service (FDS), using components like &lt;a href="https://juicefs.com/docs/csi/introduction/" rel="noopener noreferrer"&gt;JuiceFS CSI Driver&lt;/a&gt; to provide file storage for cloud‑native and some application scenarios. As application needs evolved, these storage systems grew independently. This led to a complex heterogeneous storage landscape.&lt;/p&gt;

&lt;p&gt;In 2024, after Xiaomi announced its comprehensive AI strategy, the shortcomings of the previous storage system became more pronounced in areas such as technology selection, access, data flow, and development/operations.&lt;/p&gt;

&lt;p&gt;These challenges included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High selection and access costs:&lt;/strong&gt; With many storage systems and inconsistent capabilities, application teams had to understand and adapt to each one, raising the barrier to entry.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low data flow efficiency:&lt;/strong&gt; The lack of a unified access method across systems led to frequent cross‑system data copying. This hurt development efficiency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scattered development and operations efforts:&lt;/strong&gt; Multiple systems were maintained and evolved independently, making it difficult to focus resources on the mission-critical infrastructure required for the AI strategy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address these issues, we conducted in‑depth internal discussions and architectural adjustments in 2024, and began redesigning a unified storage architecture for AI, big data, and &lt;a href="https://en.wikipedia.org/wiki/Cloud-native_computing" rel="noopener noreferrer"&gt;cloud‑native&lt;/a&gt; scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a unified file foundation with JuiceFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Selection rationale: Multi‑protocol support, elasticity, multi‑cloud, high Performance
&lt;/h3&gt;

&lt;p&gt;JuiceFS is a distributed file system that natively supports multi‑protocol access, elastic scaling, and high‑performance reads/writes. This makes it a perfect fit for both native AI and big data storage needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzx2nabiv9h1a1bqyey3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzx2nabiv9h1a1bqyey3y.png" alt=" " width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the cloud-native field, we’ve been using JuiceFS since 2021, continuously conducting internal development and iterative optimization. At the same time, we maintain close collaboration with the JuiceFS open-source community to jointly drive technology evolution and real-world adoption.&lt;/p&gt;

&lt;p&gt;In AI scenarios, model training and inference rely heavily on &lt;a href="https://en.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt; semantics, which aligns naturally with JuiceFS capabilities. Meanwhile, in the big data area, we were already promoting HDFS replacement during cloud migration, a practice with many mature industry examples, so adapting the HDFS protocol was also feasible.&lt;/p&gt;

&lt;p&gt;Considering multi-protocol support, elastic scalability, &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt; adaptability, and high-performance read/write, we ultimately chose JuiceFS as the core component of our unified file storage foundation. This solved the problems of complex data flow, high access costs, and scattered operations caused by using different file systems across multiple platforms and application units.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fol0aqgp4o608zo6j6812.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fol0aqgp4o608zo6j6812.png" alt=" " width="799" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage layer capability construction
&lt;/h3&gt;

&lt;p&gt;Our core goal is to build a unified file storage layer on top of JuiceFS, providing large capacity, high performance, and standardized access interfaces to uniformly support the three core application scenarios: big data, cloud-native, and AI.&lt;/p&gt;

&lt;p&gt;On the client side, we fully leverage JuiceFS’ multi-protocol capabilities, offering access methods including POSIX, Hadoop SDK, Python SDK, and &lt;a href="https://juicefs.com/docs/community/guide/gateway/" rel="noopener noreferrer"&gt;S3 Gateway&lt;/a&gt;. They’re all already in use internally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvbf3fv47ocmgq2mp4xg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvbf3fv47ocmgq2mp4xg.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the data plane, the architecture consists of three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capacity layer:&lt;/strong&gt; Built on &lt;a href="https://aws.amazon.com/what-is/public-cloud/" rel="noopener noreferrer"&gt;public cloud&lt;/a&gt; object storage, designed for EB‑scale storage, supporting multi-cloud deployments across different strategic data centers and multiple cloud providers.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance layer:&lt;/strong&gt; Large‑scale tuning based on Ceph and all‑flash nodes, designed for AI training and other scenarios with high throughput and low latency requirements.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache layer:&lt;/strong&gt; Given the “write once, read many, seldom modify” characteristic of AI training datasets, we developed a high‑performance distributed cache system based on NVMe and RDMA to reduce repeated read costs and improve training data access efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the control plane, we made custom enhancements to the &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;Community Edition&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For metadata, we built a distributed metadata service based on the Raft protocol to integrate with internal infrastructure systems and support multi-system access, improving reliability and scalability.
&lt;/li&gt;
&lt;li&gt;For backend management, we built a unified management service responsible for data lifecycle management, tiered storage, garbage collection, and warm-up of hot data from the capacity layer to the performance or cache layers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Through these efforts, JuiceFS has gradually become the unified file storage foundation at Xiaomi, supporting both large‑scale capacity storage and high‑performance access for AI training. The architecture is now running in production and provides the high throughput required for large model training.&lt;/p&gt;

&lt;h2&gt;
  
  
  Our practices
&lt;/h2&gt;

&lt;p&gt;During the construction of the unified file storage foundation, JuiceFS has gradually covered Xiaomi’s mission-critical application scenarios, including big data, cloud-native, and AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;In terms of scale, the solution can support EB‑level storage and hundreds of billions of files.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;In terms of capability, the coordinated design of the capacity, performance, and cache layers balances large‑scale storage with high performance.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below we describe two typical scenarios: big data cloud migration and the &lt;a href="https://www.hpe.com/hk/en/what-is/ai-storage.html" rel="noopener noreferrer"&gt;AI storage&lt;/a&gt; pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Big data cloud migration and unified lakehouse storage
&lt;/h3&gt;

&lt;p&gt;In its early days, our big data system was mainly built on the Hadoop ecosystem, where HDFS used a previous‑generation coupled architecture. Over time, this architecture showed problems such as performance fluctuations, complex operations, and high total cost. In contrast, cloud storage offers significant advantages in elastic scaling, resource utilization, and cost control. Therefore, starting in 2021, we systematically began migrating big data to the cloud.&lt;/p&gt;

&lt;h4&gt;
  
  
  From cold data to the lakehouse layer
&lt;/h4&gt;

&lt;p&gt;Our big data cloud migration went through three stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cold data migration:&lt;/strong&gt; We first migrated cold data from HDFS to cloud storage, a process lasting over two years.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lakehouse layer migration:&lt;/strong&gt; We self‑developed a unified lakehouse file system, promoting the evolution from coupled to decoupled storage and compute.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified storage foundation based on JuiceFS:&lt;/strong&gt; After selecting JuiceFS, we migrated the entire lakehouse layer to JuiceFS.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lakehouse construction can leverage Iceberg’s native support for &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; access (like OSS or S3). However, our application spans multiple regions globally using several cloud vendors. Adapting to each vendor individually would incur high access and maintenance costs.&lt;/p&gt;

&lt;p&gt;Thus, we chose JuiceFS to uniformly access different cloud storage. Upper‑layer services simply switch the backend storage address via the SDK to adapt to access in different cloud environments, greatly reducing multi‑cloud complexity.&lt;/p&gt;

&lt;p&gt;For data migration, our self‑developed data‑factory platform supports transparently switching a table’s underlying storage to the new architecture and gradually migrates existing data to the cloud in the background, with little or no impact on application. Moreover, JuiceFS supports multi-cloud and on‑premises deployment. If future cost or strategic considerations require switching to self‑built storage, data can be smoothly migrated back via JuiceFS. This preserves architectural flexibility.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hot table cache acceleration for compute efficiency
&lt;/h4&gt;

&lt;p&gt;After data was in the cloud, we further analyzed access patterns of the lakehouse layer. For daily reporting and analysis tasks, computation is usually concentrated on day‑level or week‑level hot data, not requiring frequent full scans. Therefore, the performance focus for the lakehouse layer was not simply improving full‑scan throughput but rather increasing hot data access efficiency and task execution stability.&lt;/p&gt;

&lt;p&gt;Based on this, we built a hot table warm-up capability in cooperation with the lakehouse layer. The system identifies hot tables and their hot partitions based on daily access statistics, and preloads related data into the cache layer before task execution via a warm-up interface. For periodic reporting tasks that must be completed by 8 AM, hot data is warmed up before computation. This reduces remote reads and repeated access.&lt;/p&gt;

&lt;p&gt;Through offline and online testing, after hot table caching, compute efficiency improved by about 10-20%, with reductions in both computation time and resource consumption. The cache size has reached PB level, with average throughput around 200 GB/s. The cache layer also reduces cross‑cloud bandwidth pressure and cloud storage API call costs: by improving the hot data hit rate, repeated cross-cloud reads can be reduced, thereby lowering bandwidth consumption and access costs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgt4qwbic24x1w9zzr0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgt4qwbic24x1w9zzr0u.png" alt=" " width="799" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Benefits for big data
&lt;/h4&gt;

&lt;p&gt;Benefits for our big data application include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance:&lt;/strong&gt; After switching to JuiceFS, sequential read/write performance improved significantly, more than doubling in some scenarios. Overall task duration decreased by about 10–30%.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; By Xiaomi's internal cost metrics, the unified storage architecture has greatly lowered storage costs – about 70% in China and 90% in overseas regions. The overseas legacy solution, which used HDFS with three replicas on cloud instances and EBS, had a high replication factor and thus higher costs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stability and operations:&lt;/strong&gt; Under the previous mixed architecture, many compute tasks easily consumed node resources, raising node load and affecting storage performance. With the decoupled storage‑compute architecture, compute tasks run on dedicated nodes, task durations are more stable, and scaling and management are more flexible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI one‑stop storage
&lt;/h3&gt;

&lt;p&gt;AI storage consists of three stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Raw data stage:&lt;/strong&gt; Storing large volumes of raw data, which undergoes processing (for example, ETL) to produce training datasets, then is fed into high‑performance training environments.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training stage:&lt;/strong&gt; &lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;Training&lt;/a&gt; tasks require high throughput and low latency to reduce I/O wait time and increase GPU utilization. After training, model files are generated for subsequent inference.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference stage:&lt;/strong&gt; Model files must be quickly distributed to specific nodes for rapid startup of inference tasks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk61lk72s1to1iip0g5kl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk61lk72s1to1iip0g5kl.png" alt=" " width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Previously, data flowed among multiple systems, causing inconvenience for both application teams and internal operations. By adopting JuiceFS uniformly, we can meet diverse needs based on different storage tiers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Requirements and solutions by stage
&lt;/h4&gt;

&lt;p&gt;AI one-stop storage needs to cover three stages: raw data, training data, and model files. The requirements for capacity, performance, cost, and distribution efficiency differ at each stage. The table below compares the application needs for each stage with previous and current solutions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Application requirements&lt;/th&gt;
&lt;th&gt;Previous solution&lt;/th&gt;
&lt;th&gt;Current solution (JuiceFS)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raw data&lt;/td&gt;
&lt;td&gt;Large capacity, low cost; support high‑concurrency data processing; scale to PB+&lt;/td&gt;
&lt;td&gt;Direct use of object storage; HDFS; other low‑cost storage&lt;/td&gt;
&lt;td&gt;Capacity‑oriented JuiceFS: multi‑cloud object storage underlying, shielding vendor differences; EB capacity, hundreds of billions of files; millions of concurrent tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training data&lt;/td&gt;
&lt;td&gt;High throughput, low latency; reduce I/O wait time; improve GPU utilization&lt;/td&gt;
&lt;td&gt;PFS, NAS (good performance but high cost)&lt;/td&gt;
&lt;td&gt;Performance‑oriented/cache‑oriented JuiceFS: TB/s throughput, low latency; async checkpoint to reduce I/O wait; cache acceleration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model files&lt;/td&gt;
&lt;td&gt;Fast distribution; efficient loading; quick inference startup&lt;/td&gt;
&lt;td&gt;P2P distribution; workflow distribution; PFS&lt;/td&gt;
&lt;td&gt;Cache‑accelerated JuiceFS: cache improves model loading; up to 16 GB/s sequential load per node; several times faster than local disk or FDS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  High‑performance cache acceleration: improving efficiency and cutting costs
&lt;/h4&gt;

&lt;p&gt;In AI training, training datasets typically have the characteristics of "write once, read many times, and modify very little." This is a typical read-heavy, write-light access pattern, making it suitable for improving data access efficiency through caching.&lt;/p&gt;

&lt;p&gt;Take our internal &lt;a href="https://en.wikipedia.org/wiki/Self-driving_car" rel="noopener noreferrer"&gt;autonomous driving&lt;/a&gt; training as an example. Once a dataset version matures, its data volume may continue to grow within the version cycle, but existing data is rarely modified. While the previous high‑performance file storage met training performance requirements, it had some performance redundancy and cost waste for such repetitive reads. Therefore, we began promoting a high‑performance cache acceleration solution based on JuiceFS.&lt;/p&gt;

&lt;p&gt;The cache solution offers several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short I/O path:&lt;/strong&gt; Clients operate on files directly, greatly shortening the I/O path for fast responses.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance optimization:&lt;/strong&gt; Through RDMA and zero‑copy optimization, performance has significantly improved – throughput more than 20% higher than previous high‑performance storage, with ongoing optimization.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost reduction:&lt;/strong&gt; The previous PFS‑based storage used replication (though some used EC, replication was more common for stability). With the cache solution, single‑copy storage reduces costs by more than 60%.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource consolidation:&lt;/strong&gt; For CPU training, GPU nodes typically have NVMe drives (about 10 TB each), which were previously used in scattered ways with low utilization. Now, we consolidate these NVMe resources into a unified cache pool to accelerate nearby GPU training and data processing tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Future plans
&lt;/h2&gt;

&lt;p&gt;Looking ahead, we’ll focus on three directions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continuously improve the stability, performance, and scalability of the unified file storage foundation.&lt;/strong&gt; As AI application grows rapidly, training, inference, and data processing tasks demand higher throughput, lower latency, and greater reliability. We’ll continue optimizing the underlying architecture and critical paths to enhance service capabilities under large‑scale concurrent access.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strengthen lifecycle management for massive data.&lt;/strong&gt; Current data volumes continue to grow, but management across storage tiers, access frequencies, and retention periods can be further optimized. We’ll refine tiered storage, archiving, warm-up, and cleanup strategies based on data temperature, access patterns, and cost models, reducing unit storage cost and improving resource utilization.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhance data management and analysis capabilities.&lt;/strong&gt; On top of the unified file storage foundation, we’ll build data management capabilities for application users, helping them better understand data distribution, access behavior, and resource usage, supporting data management, cost optimization, and application decisions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We look forward to continuous exchanges with industry peers to explore more technical practices. If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Quota Design in Distributed Architectures: Implementation and Use Cases in JuiceFS</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 08 May 2026 06:59:24 +0000</pubDate>
      <link>https://dev.to/daswu/quota-design-in-distributed-architectures-implementation-and-use-cases-in-juicefs-4519</link>
      <guid>https://dev.to/daswu/quota-design-in-distributed-architectures-implementation-and-use-cases-in-juicefs-4519</guid>
      <description>&lt;p&gt;In distributed storage environments, storage resources are typically shared across multiple users, projects, and applications. Without effective constraint mechanisms, abnormal writes or erroneous operations from a single tenant can quickly consume large amounts of space or inodes, impacting system stability and cost control. Quota management provides a way to establish predictable resource boundaries in shared environments.&lt;br&gt;&lt;br&gt;
In distributed systems, quota management is far more than just "setting a limit." The system must balance concurrent writes from multiple clients, asynchronous metadata updates, and overall throughput. At the same time, quota rules must be enforced at different levels of control. To address this, &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; provides multi-level quota capabilities covering the entire file system, directories, and users, supporting scenarios ranging from overall capacity control to individual and team-level constraints.&lt;br&gt;&lt;br&gt;
In this article, we’ll introduce the design and implementation of JuiceFS' quota mechanism, including its core data structures, synchronization model, and the validation and accounting logic in write and delete processes. We’ll also include typical use cases that highlight common issues around quota changes, space reclamation, and over-limit writes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quota types and resource dimensions supported by JuiceFS
&lt;/h2&gt;

&lt;p&gt;JuiceFS quotas support two resource dimensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Space: Used storage capacity. Statistics are based on the file system's usage perspective and are aligned to block granularity. The write path section later will explain how incremental usage is estimated under 4 KiB alignment.
&lt;/li&gt;
&lt;li&gt;Inodes: Number of used inodes. For workloads with a large number of small files, inodes often become the constraint bottleneck earlier than space. Therefore, inode quotas must also be part of the management strategy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on these two resource dimensions, JuiceFS currently supports four types of quotas.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quota type&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Design goal&lt;/th&gt;
&lt;th&gt;Typical use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total file system quota&lt;/td&gt;
&lt;td&gt;Entire file system&lt;/td&gt;
&lt;td&gt;Prevents overall resource runaway&lt;/td&gt;
&lt;td&gt;Cost budget control, capacity limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subdirectory quota&lt;/td&gt;
&lt;td&gt;Directory subtree&lt;/td&gt;
&lt;td&gt;Blocks abnormal write behavior&lt;/td&gt;
&lt;td&gt;Prevents misoperations, small‑file storms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User quota&lt;/td&gt;
&lt;td&gt;Per user&lt;/td&gt;
&lt;td&gt;Isolates impact between different applications&lt;/td&gt;
&lt;td&gt;Multi‑tenant data management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User group quota&lt;/td&gt;
&lt;td&gt;Project or department&lt;/td&gt;
&lt;td&gt;Cost allocation and team limits&lt;/td&gt;
&lt;td&gt;Shared environment for AI projects&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;User quotas and user group quotas are expected to be released in &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS Community Edition&lt;/a&gt; 1.4.&lt;br&gt;&lt;br&gt;
In practice, a common and effective strategy combines the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total file system quota as a safety net.
&lt;/li&gt;
&lt;li&gt;Directory quotas to address individual abuse and small‑file storms.
&lt;/li&gt;
&lt;li&gt;User/group quotas for &lt;a href="https://en.wikipedia.org/wiki/Multitenancy" rel="noopener noreferrer"&gt;multi‑tenant&lt;/a&gt; management.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layered approach controls overall resource limits while preventing abnormal growth of a single entity from affecting other workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quota implementation mechanism
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Synchronization model and data structures
&lt;/h3&gt;

&lt;p&gt;The main challenge of implementing quotas is how to perform checking, accounting, and convergence at an acceptable cost under concurrent writes from multiple clients. JuiceFS clients run on various nodes and continuously issue resource‑changing operations such as creation, writing, truncation, and deletion. If every operation required a strongly consistent server‑side check and update, the write path would incur unacceptable overhead.&lt;br&gt;&lt;br&gt;
Therefore, the quota mechanism must satisfy two goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance: Avoid a strongly consistent server‑side update on every write.
&lt;/li&gt;
&lt;li&gt;Consistency: Ensure that system usage eventually converges under concurrent writes from multiple clients and prevent over‑limit operations before they happen, as much as possible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Based on this trade‑off, JuiceFS adopts a synchronization model that works as "local accumulation, periodic flush, and periodic refresh."&lt;/strong&gt; Clients first accumulate resource deltas in local memory, with background tasks periodically persisting them to the metadata engine in batches. At the same time, each client periodically pulls the latest quota configuration and baseline usage from the server, gradually aligning its own global view. Clients do not communicate directly with each other; instead, the metadata engine serves as the central coordination point.&lt;br&gt;&lt;br&gt;
In other words, JuiceFS quotas do not pursue &lt;a href="https://en.wikipedia.org/wiki/Strong_consistency" rel="noopener noreferrer"&gt;strong consistency&lt;/a&gt; on each operation but achieve eventually consistent resource control through periodic synchronization.&lt;br&gt;&lt;br&gt;
In the current implementation, quota deltas are flushed every &lt;strong&gt;3 seconds&lt;/strong&gt; (&lt;code&gt;flushQuotas\&lt;/code&gt;). Clients reload the latest quota configuration and baseline usage from the backend approximately every 12 seconds (via a refresh call triggered by the mount heartbeat). This means that under extreme conditions, the global views seen by different clients may diverge by up to about 12 seconds, but they will gradually converge in subsequent sync cycles.&lt;br&gt;&lt;br&gt;
Quota information is managed uniformly by the quota structure. It represents a single quota entity and can adapt to different types of managed objects such as directories, users, and user groups. Its core design decouples baseline usage from incremental usage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;UsedSpace\&lt;/code&gt;/&lt;code&gt;UsedInodes\&lt;/code&gt;: Represents the baseline usage already persisted in the backend.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;newSpace\&lt;/code&gt;/&lt;code&gt;newInodes\&lt;/code&gt;: Represents the locally accumulated deltas on this client that have not yet been flushed to the backend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;&lt;br&gt;
type Quota struct {  &lt;br&gt;
    MaxSpace, MaxInodes   int64  // Maximum space and inode limits  &lt;br&gt;
    UsedSpace, UsedInodes int64  // Used space and inodes  &lt;br&gt;
    newSpace, newInodes   int64  // Pending usage deltas to be synced  &lt;br&gt;
}  &lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;For inode accounting, hard links require special attention. Different quota types have different counting semantics for hard links. For directory quotas, counting is based on directory entries: when a hard link is created under a directory, both space and inode usage of that directory increase by 1, and they decrease accordingly when the hard link is removed. For user quotas and user group quotas, counting is deduplicated by the file object (inode). Even if a file has multiple hard links, it’s counted only once per &lt;a href="https://en.wikipedia.org/wiki/User_identifier" rel="noopener noreferrer"&gt;UID&lt;/a&gt;/&lt;a href="https://en.wikipedia.org/wiki/Group_identifier" rel="noopener noreferrer"&gt;GID&lt;/a&gt; dimension. Therefore, creating or deleting hard links does not change the usage for the associated user or user group.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quota storage
&lt;/h3&gt;

&lt;p&gt;Regarding the quota storage mechanism, the total file system quota (the global "red line") has its capacity and inode limits directly persisted in the metadata engine. Clients load this configuration during mount and enforce hard limits, ensuring the underlying resources are not exceeded.&lt;br&gt;&lt;br&gt;
In contrast, checks and delta accumulation for directory, user, and user group quotas rely more on the client side. Clients maintain in‑memory indexing structures keyed by inode, UID, and GID, and periodically synchronize the corresponding quota information from the backend. This keeps lookup overhead low in high‑frequency I/O scenarios. It’s important to emphasise that the client in‑memory state is only a runtime cache and incremental view; the authoritative source for quota configuration and baseline usage remains the metadata backend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quota checks
&lt;/h3&gt;

&lt;p&gt;A synchronization model and data structures alone are not sufficient, and quota logic must also be embedded into the specific resource‑changing paths. A single write operation may not be a simple data append; it can simultaneously involve inode creation, block allocation, directory entry changes, and parent‑directory statistics updates. Under multi‑client concurrency, these changes collectively affect the same set of quota constraints. Therefore, only by placing checks and statistics updates directly into the operation paths (write, create, truncate, and delete) can we avoid out‑of‑limit writes and statistical inaccuracies.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pre‑write: incremental estimation and multi‑dimensional quota check
&lt;/h4&gt;

&lt;p&gt;When a user initiates an operation that may change resource usage (such as write, create, and truncate), the client first estimates the expected resource delta, including both space and inode changes.&lt;br&gt;&lt;br&gt;
Space delta is estimated based on the actual allocation granularity of underlying data blocks (for example, 4 KiB alignment), therefore block‑aligned calculation is required. Inode deltas primarily occur in creation operations (such as creating a new file or directory).&lt;br&gt;&lt;br&gt;
After obtaining the resource delta for the operation, the client performs a quota check before actually writing. The check covers multiple dimensions: user and user group quotas, total file system quota, and directory quotas for the target directory tree. If any dimension would exceed its limit after this operation, the request is rejected with an error such as quota exceeded or out of space.&lt;br&gt;&lt;br&gt;
By placing the check in the write path before the resource change, the system can block risky operations before they happen, avoiding complex cleanup or rollback afterwards.&lt;/p&gt;

&lt;h4&gt;
  
  
  Post‑write: local delta accumulation and background batched sync
&lt;/h4&gt;

&lt;p&gt;After a successful write, the resource delta generated by the operation is incorporated into the corresponding usage statistics and gradually aligns with the global state according to the defined convergence mechanism. Specifically, three categories of statistics are affected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Global level: The overall file system usage increases (or decreases).
&lt;/li&gt;
&lt;li&gt;Directory level: The usage of the relevant directory subtree changes accordingly.
&lt;/li&gt;
&lt;li&gt;User / user group level: The usage of the corresponding subject also accumulates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These updates are first reflected in the client’s local accumulated deltas and are not immediately written back to the backend in a strongly consistent way. Later, background tasks flush them in batches, and periodic refresh operations gradually align them with other clients, achieving global convergence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage statistics (&lt;code&gt;stats\&lt;/code&gt;): foundation for the quota system
&lt;/h2&gt;

&lt;p&gt;For quotas to work effectively, the system must be able to track current resource usage with low overhead. Whether for large directory trees or many users and user groups, if every check requires a real‑time full scan, the performance cost will be unacceptable. Therefore, an efficient and reliable usage statistics mechanism is a prerequisite for implementing quotas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Directory statistics
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/en/blog/engineering/design-juicefs-directory-quotas" rel="noopener noreferrer"&gt;Directory quotas&lt;/a&gt; constrain the total space and inode usage of an entire directory subtree, not the size of individual files. Consequently, they rely on directory‑level usage statistics.&lt;br&gt;&lt;br&gt;
It’s important to note that directory statistics (&lt;code&gt;DirStats\&lt;/code&gt;) and quota statistics have different scopes: &lt;code&gt;DirStats\&lt;/code&gt; only sums up the usage of immediate children (files and subdirectories) under a given directory – a single‑level statistic. In contrast, directory quotas recursively sum up the entire subtree. This design allows &lt;code&gt;DirStats\&lt;/code&gt; to be maintained with lower overhead, while directory quotas provide a full subtree view.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The key to implementing such statistics is maintaining low overhead and high availability for large directory trees.&lt;/strong&gt; JuiceFS follows the same approach as the quota mechanism: high‑frequency local updates and batched background persistence. Clients maintain directory usage deltas in memory; when operations such as writes or deletions occur, the changes are first recorded locally and then periodically synced in batches to the metadata engine by background tasks.&lt;br&gt;&lt;br&gt;
In addition, the system does not load all directory statistics at mount time. For large directory trees, a full load would cause significant latency and memory overhead. Therefore, directory statistics adopt an on‑demand fetch strategy: only when precise usage is required (such as quota checks, usage summarisation, and administrative queries) does the system load the statistics of the corresponding directory from the backend.&lt;br&gt;&lt;br&gt;
When users query usage information via &lt;code&gt;df\&lt;/code&gt; or an application calls &lt;code&gt;statfs\&lt;/code&gt;, JuiceFS makes a trade‑off between performance and accuracy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It first uses locally cached used space and inodes for fast calculation.
&lt;/li&gt;
&lt;li&gt;If the local baseline is incomplete (for example, just after startup) or higher real‑time accuracy is needed, it fetches the latest global counters from the backend for calibration.
&lt;/li&gt;
&lt;li&gt;Finally, it adds locally accumulated (not yet synced) deltas to make the result more accurate for the current node’s write state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After obtaining the used amounts, the client calculates &lt;code&gt;total\&lt;/code&gt; and &lt;code&gt;avail\&lt;/code&gt; based on whether a total capacity limit is configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a limit is configured, total capacity equals that limit, and available capacity is "limit minus used."
&lt;/li&gt;
&lt;li&gt;If no limit is configured, it returns a dynamically estimated total capacity so that tools like &lt;code&gt;df\&lt;/code&gt; can display normally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Moreover, when querying quotas from the root directory, the system displays the maximum space and inode limits, allowing administrators to see the global resource boundaries.&lt;br&gt;&lt;br&gt;
In addition, JuiceFS will support real‑time updates of directory statistics for the trash in version 1.4. When files are deleted (moved to the trash), restored from the trash, or permanently cleaned up, the system updates the trash directory’s statistics immediately. This enables administrators to accurately track space usage of the trash.&lt;/p&gt;

&lt;h3&gt;
  
  
  User and user group statistics
&lt;/h3&gt;

&lt;p&gt;User and user group statistics are collected only after the corresponding quota feature is enabled. Before enabling, the &lt;code&gt;updateUserGroupStat\&lt;/code&gt; call in the kernel path returns directly without generating any statistics. After enabling, clients maintain usage data in an in‑memory map keyed by &lt;code&gt;uid\&lt;/code&gt; and &lt;code&gt;gid\&lt;/code&gt; and update the relevant statistics on all paths that may cause usage changes.&lt;br&gt;&lt;br&gt;
A special note: when setting a quota for a user or user group for the first time via &lt;code&gt;juicefs quota set \--uid\&lt;/code&gt; or &lt;code&gt;juicefs quota set \--gid\&lt;/code&gt;, the system immediately performs a full scan of existing files to initialise the baseline usage. After this initialisation, subsequent writes and deletions become incremental updates, and no further full scan is required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common scenarios
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. A file has been deleted, why hasn’t the total file system quota decreased? Why hasn’t the object storage billing changed?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This is usually not a statistics error, but a result of file system semantics combined with the statistical model.&lt;br&gt;&lt;br&gt;
For example, after enabling the trash in JuiceFS, a deletion operation does not immediately free space. The file is first moved to the trash for possible recovery. Therefore, files in the trash are still counted in the total file system quota and user / user group quotas, but are no longer counted in the original directory quota.&lt;br&gt;&lt;br&gt;
Another common reason is the time lag between file system statistics and &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; billing. JuiceFS quota statistics use a local accumulation + periodic background sync model, so it’s possible that different clients or different statistical interfaces have not yet converged in a short time. At the same time, object storage may not have completed garbage collection or lifecycle cleanup. Therefore, temporarily seeing inconsistency between file system usage, quota statistics, and object storage billing is generally expected. This is not considered a system anomaly as long as they gradually converge over time.&lt;br&gt;&lt;br&gt;
In addition, note that quota and &lt;code&gt;statfs\&lt;/code&gt; show the file system perspective of space usage and availability, while object storage billing is based on the underlying object storage model – affected by factors such as chunking, merging, delayed reclamation, and lifecycle rules. The two are not required to be the same.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;2. Quota is full, but appending to an existing file did not report an error immediately.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This is often related to the asynchronous commit path in some JuiceFS writes. From the application’s perspective, the write system call may return success early, while the actual data commit and corresponding quota check happen later. Thus, appending may appear to "succeed," but the data may not be fully persisted; if the later commit stage determines that the quota would be exceeded, the write may still fail.&lt;br&gt;&lt;br&gt;
In other words, a successful write return does not guarantee that the write has been finally committed. In scenarios involving quota limits, a safer approach is to check the return status on close, the final file size, and handle possible errors accordingly.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;3. Quota is not yet full, but file creation fails.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This phenomenon is usually related to temporary view divergence under the eventual‑consistent statistical model.&lt;br&gt;&lt;br&gt;
Example: a volume has a total quota of 2,000 inodes, and there are currently 1,999 files. One more file should be creatable. However, in extreme concurrency or unusual refresh timing, the client’s local cache may diverge briefly from the backend baseline count. This may cause the in‑memory used inode count to be temporarily too high, thus rejecting a legitimate creation request.&lt;br&gt;&lt;br&gt;
This type of problem inherently stems from the local accumulation + periodic sync convergence model. It avoids the high overhead of strong‑consistent backend updates on every operation, but in extreme cases the system may have short‑term false positives. Typically, such false positives disappear with the next sync cycle, and retries can mitigate the issue.&lt;br&gt;&lt;br&gt;
This also illustrates that, in a distributed environment, quotas are best understood as an efficient, near‑real‑time constraint mechanism, not a fully synchronous, strongly consistent judgement for every concurrent operation.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;4. After a write exceeds the quota, why does the "failed" file remain in the directory?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This is not unique to JuiceFS; it’s not uncommon in file systems that follow &lt;a href="https://ja.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt; semantics.&lt;br&gt;&lt;br&gt;
For example, a user sets a 1 GiB quota on a directory and then tries to write a 2 GiB file using &lt;code&gt;dd\&lt;/code&gt;. The file system first allows the first 1 GiB of valid writes; only when the subsequent write exceeds the quota does it return “Disk quota exceeded.” Consequently, a "partial file" of about 1 GiB is left behind. This does not indicate abnormal behaviour. It simply means the first part of the data was written successfully, while the remainder failed due to the quota.&lt;br&gt;&lt;br&gt;
The file system's responsibility is to report the error, not to decide whether to delete the successfully written data. Whether to clean up such an incomplete file is left to the application. This follows standard POSIX semantics: the file system returns the error, and the application handles subsequent cleanup and recovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In a distributed file system, quotas are not a simple "counter" feature, but a system design that must balance performance, consistency, and management granularity. Through pre‑write checks, local accumulation, and periodic background synchronization, JuiceFS minimizes overhead on the write path while allowing various usage statistics to gradually converge under an eventual consistency model. Based on this mechanism, quota control covers not only total file system capacity, but also multiple levels such as directories, users, and user groups, thereby meeting the needs of typical scenarios including multi‑tenant isolation, individual constraints, and team‑level resource management.&lt;br&gt;&lt;br&gt;
If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;community on Discord&lt;/a&gt;.  &lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>JuiceFS Performance Optimization for AI Scenarios</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Wed, 15 Apr 2026 09:37:48 +0000</pubDate>
      <link>https://dev.to/daswu/juicefs-performance-optimization-for-ai-scenarios-4big</link>
      <guid>https://dev.to/daswu/juicefs-performance-optimization-for-ai-scenarios-4big</guid>
      <description>&lt;p&gt;The scale of computing power for &lt;a href="https://en.wikipedia.org/wiki/Large_language_model" rel="noopener noreferrer"&gt;large language model&lt;/a&gt; (LLM) training continues to expand. While GPU performance keeps improving, data access bottlenecks are becoming increasingly prominent in overall system performance. Local storage offers excellent performance but has limited scalability. Object storage excels in cost and scalability but suffers from insufficient throughput in massive small‑file and high‑concurrency scenarios. Teams often struggle to choose between them.  &lt;/p&gt;

&lt;p&gt;Therefore, &lt;a href="https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems" rel="noopener noreferrer"&gt;distributed file systems&lt;/a&gt; have become a key solution to balance high performance and scalability. &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; has been widely deployed in AI scenarios across multiple industries. Its distributed architecture delivers high performance, strong scalability, and low cost simultaneously for large‑scale data access.  &lt;/p&gt;

&lt;p&gt;In this article, we’ll introduce JuiceFS’ architecture from a performance perspective and analyze core performance bottlenecks and optimization methods under different access patterns. We’ll also offer links of key points for references, helping you understand JuiceFS’ performance mechanisms and master common tuning strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance foundations from the JuiceFS architecture
&lt;/h3&gt;

&lt;p&gt;JuiceFS comes in &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;Community Edition&lt;/a&gt; and &lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;Enterprise Edition&lt;/a&gt;. Both share the same architecture: metadata and data are separated. The client adopts a rich‑client design, handling core logic including some metadata operations, and provides both metadata and data caching. These modules work together for efficient data location and access. The underlying data is stored in object storage, with local caches further improving access performance. For external interfaces, JuiceFS supports multiple access methods – FUSE is the most common, and it also provides various SDKs and an S3 gateway.  &lt;/p&gt;

&lt;p&gt;JuiceFS Community Edition is designed as a general‑purpose file system. Users can choose different metadata engines based on their needs. For small‑scale deployments, Redis delivers lightweight, low‑latency metadata management. For large‑scale file scenarios, &lt;a href="https://tikv.org/" rel="noopener noreferrer"&gt;TiKV&lt;/a&gt; provides good horizontal scalability.  &lt;/p&gt;

&lt;p&gt;JuiceFS Enterprise Edition targets complex, high‑performance scenarios. It differs from Community Edition in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It uses a self‑developed multi‑zone metadata engine built on Raft that runs as an in‑memory cluster, offering low latency and strong horizontal scalability. It supports up to 500 billion files. Operations that require multiple key-value requests in the Community Edition often need only one or two in the Enterprise Edition, and complex logic can be processed inside the metadata cluster.
&lt;/li&gt;
&lt;li&gt;The Enterprise Edition supports distributed cache sharing: clients in the same group can access each other’s local caches via consistent hashing. This improves cache hit rates and access efficiency. In multi‑node, high‑concurrency scenarios, the cache space scales horizontally, and most required data can be warmed up before job execution. This accelerates AI training and inference while boosting performance and stability. See &lt;a href="https://juicefs.com/en/blog/release-notes/juicefs-enterprise-5-3-rdma-support" rel="noopener noreferrer"&gt;JuiceFS Enterprise 5.3: 500B+ Files per File System &amp;amp; RDMA Support&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskcn0ta20btddjuqoha8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskcn0ta20btddjuqoha8.png" alt=" " width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Data chunking
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/internals/io_processing" rel="noopener noreferrer"&gt;JuiceFS splits data into chunks&lt;/a&gt; and stores them in object storage. This design is key to its performance, affecting data read efficiency, cache hit rate, and throughput under high concurrency.  &lt;/p&gt;

&lt;p&gt;JuiceFS breaks a file into multiple chunks. Inside each chunk, the system maintains a management structure called a slice to track writes and updates. When data is written, new data does not overwrite existing slices; instead, a new slice is appended on top of the chunk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptuj6vju98c61nflrsi5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptuj6vju98c61nflrsi5.png" alt=" " width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ideally, each chunk ends up containing only one slice. Each slice consists of several 4 MB blocks, which are the smallest unit stored in object storage. By default, the caching system also manages data at the block level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F951ywbamssmfa9r0xsl5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F951ywbamssmfa9r0xsl5.png" alt=" " width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As shown in the diagram on the upper right, file updates use an append‑only write pattern: existing slices are shown in red, and new data is appended as a new slice. During reads, the system combines the slices to form the current view. When fragmentation becomes excessive, a compaction process merges slices to optimize access performance. For more details on data chunking, refer to &lt;a href="https://juicefs.com/en/blog/engineering/design-metadata-data-storage" rel="noopener noreferrer"&gt;Code-Level Analysis: Design Principles of JuiceFS Metadata and Data Storage&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caching
&lt;/h3&gt;

&lt;p&gt;Compared to direct object storage access, JuiceFS performance improvements largely benefit from its caching mechanism. The JuiceFS client comes with a high‑performance local cache module. Key configuration options include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache-dir&lt;/code&gt;: specifies the cache directory.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache-size&lt;/code&gt;: sets the maximum cache space.
&lt;/li&gt;
&lt;li&gt;Prefetch: a parameter in the cache module that controls prefetching. When a request hits a block, a background thread fetches the entire block.
&lt;/li&gt;
&lt;li&gt;Write‑back related settings: improves write IOPS by writing data blocks that need to be uploaded to object storage into the local cache first, then asynchronously uploading them to object storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;JuiceFS Enterprise Edition also provides advanced configurations. For example, a &lt;a href="https://juicefs.com/docs/cloud/guide/cache/" rel="noopener noreferrer"&gt;cache group&lt;/a&gt; can be used to designate a set of clients whose local caches form a distributed cache group, enabling cache sharing. In addition, the no sharing option (related to cache groups) allows a client to read data only from a specified cache group without serving its own cache to others. This creates a two‑level cache:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first level is the local cache.
&lt;/li&gt;
&lt;li&gt;The second level is the cache on other nodes in the group.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another performance‑boosting mechanism is the memory buffer (read buffer), which provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I/O request merging: multiple consecutive I/O requests can be merged in memory. For example, three I/O requests issued by the system may be reduced to just one after being processed by the memory buffer.
&lt;/li&gt;
&lt;li&gt;Adaptive read‑ahead: in large‑file sequential read scenarios, adaptive read‑ahead increases request concurrency by prefetching data. This fully utilizes cache and object storage resources and improves overall I/O performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Enterprise Edition also offers advanced read‑ahead settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;max read ahead&lt;/code&gt;: sets the maximum read‑ahead range.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;initial read ahead&lt;/code&gt;: sets the initial read‑ahead window size (default unit is 4 MB blocks).
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;read ahead ratio&lt;/code&gt;: a configuration added last year that controls the read‑ahead ratio for large‑file random reads, reducing bandwidth waste caused by read amplification. Overly aggressive read‑ahead can negatively impact random read performance; read ahead ratio helps mitigate this. In AI scenarios, when large‑file sequential or random reads cause bandwidth or IOPS bottlenecks, adjusting these parameters can optimize overall performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  JuiceFS benchmark I/O tests and bottleneck analysis
&lt;/h2&gt;

&lt;p&gt;Before diving into performance tuning for common &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence" rel="noopener noreferrer"&gt;AI&lt;/a&gt; scenarios, let’s first examine JuiceFS’ I/O behavior under ideal conditions through sequential and random read benchmarks. This helps us understand throughput and latency under different access patterns, providing a reference for the read/write patterns of subsequent AI/ML workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sequential read performance
&lt;/h3&gt;

&lt;p&gt;In JuiceFS, sequential read performance is typically bandwidth‑bound. In cold read scenarios, performance is mainly limited by object storage bandwidth; in distributed cache scenarios, network bandwidth can become the bottleneck. For example, a node with a 40 Gbps NIC may achieve less than 5 Gbps usable bandwidth. In addition, the user‑kernel transition overhead in the FUSE layer limits single‑thread throughput. Tests showed single‑thread sequential read bandwidth around 3.5 Gbps. To break this limit, multi‑threaded or higher‑concurrency strategies are needed to fully utilize storage and network resources.  &lt;/p&gt;

&lt;p&gt;The table below shows test results of JuiceFS sequential read performance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threads&lt;/th&gt;
&lt;th&gt;Bandwidth (GB/s)&lt;/th&gt;
&lt;th&gt;Bandwidth per thread (GB/s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3.5&lt;/td&gt;
&lt;td&gt;3.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;6.3&lt;/td&gt;
&lt;td&gt;3.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;9.5&lt;/td&gt;
&lt;td&gt;3.16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;9.7&lt;/td&gt;
&lt;td&gt;2.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;14.0&lt;/td&gt;
&lt;td&gt;2.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;17.0&lt;/td&gt;
&lt;td&gt;2.13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;18.6&lt;/td&gt;
&lt;td&gt;1.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;1.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In the performance test, single‑thread sequential read bandwidth was about 3.5 Gbps. As the number of threads increased, total throughput gradually approached the network bandwidth limit. To help users evaluate the performance ceiling of their own environment, JuiceFS provides the &lt;code&gt;bj bench&lt;/code&gt; subcommand for testing object storage bandwidth.  &lt;/p&gt;

&lt;p&gt;In real workloads, caching is more common than direct object storage access. In such cases, increasing the buffer size raises the number of background prefetch requests, thereby improving concurrency and overall throughput. For example, after increasing the buffer size to 400 MB (corresponding to 100 background prefetch requests of 4 MB each), concurrency improved significantly and overall throughput increased.&lt;/p&gt;

&lt;h3&gt;
  
  
  Random read performance
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Low‑concurrency random reads
&lt;/h4&gt;

&lt;p&gt;In low‑concurrency, non‑asynchronous access scenarios, each request must wait for the previous one to complete before being issued. As a result, latency has a significant impact on overall performance. I/O latency can come from many sources, including metadata query latency, object storage access latency, and local or distributed cache read latency. When analyzing random read performance, we must closely examine these latency factors.  &lt;/p&gt;

&lt;p&gt;In a 4 KB cold random read scenario, if the IOPS is only 8 and object storage latency is about 125 ms, the concurrency level is roughly 1 (8 IOPS × 125 ms ≈ 1,000 ms).  &lt;/p&gt;

&lt;p&gt;This indicates a near‑single‑concurrent, serial‑blocked state. In such cases, the optimization focus should be on shortening the access path and reducing per‑request latency rather than increasing concurrency – for example, by warming up data into the local cache. After data warm-up, the random read path switches from object storage to local cache, and IOPS can increase to about 12,000, approaching the I/O level of a local disk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x0a4xxc1y0hefv01xkx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x0a4xxc1y0hefv01xkx.png" alt="Using the juicefs stats command to view performance" width="800" height="242"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvj2mcvdbdf4u6yp7ouaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvj2mcvdbdf4u6yp7ouaq.png" alt="Performance after data warm-up" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  High‑concurrency random reads
&lt;/h3&gt;

&lt;p&gt;High‑concurrency random reads typically occur in scenarios with high thread counts or asynchronous I/O. The main performance bottleneck is often IOPS limits – including metadata IOPS, object storage IOPS, and cache IOPS. JuiceFS allows you to observe these metrics and pinpoint the bottleneck. Client machine resources (CPU, memory) can also affect performance, but such bottlenecks are easy to monitor.  &lt;/p&gt;

&lt;p&gt;In a cold read scenario using &lt;a href="https://github.com/anlongfei/libaio" rel="noopener noreferrer"&gt;Libaio&lt;/a&gt; for random reads, the object‑side IOPS ceiling is around 7,000/s. When caching is enabled and data is warmed up, the access path shifts from object storage to the cache layer, and IOPS can further increase to over 20,000. This shows that the bottleneck for high‑concurrency random reads shifts as the access path changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwh9cvzud9mbbyqtudloi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwh9cvzud9mbbyqtudloi.png" alt=" " width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrrmq1is9u4ubskc7tkq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrrmq1is9u4ubskc7tkq.png" alt=" " width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a deeper dive into JuiceFS’ complete data access path, refer to &lt;a href="https://juicefs.com/en/blog/engineering/optimize-read-performance" rel="noopener noreferrer"&gt;Optimizing JuiceFS Read Performance: Readahead, Prefetch, and Cache&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  I/O characteristics and performance tuning for common AI scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Large‑file sequential reads
&lt;/h3&gt;

&lt;p&gt;A typical large‑file sequential read scenario is model loading, such as loading PyTorch .pt files saved via pickle serialization. In this process, performance is limited by two factors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.python.org/3/library/pickle.html" rel="noopener noreferrer"&gt;Pickle&lt;/a&gt; deserialization efficiency determines data processing speed.
&lt;/li&gt;
&lt;li&gt;Data reading is usually single‑threaded and limited by FUSE bandwidth and CPU performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To increase throughput, you can raise concurrency through multi‑threaded or sharded loading, fully utilizing I/O capacity. For large‑file sequential reads, the best performance is achieved when the entire dataset can be cached locally. If only on‑demand reading is required, the implementation is simple.&lt;br&gt;&lt;br&gt;
For more details on optimizing large‑file sequential reads, see &lt;a href="https://juicefs.com/en/blog/solutions/idle-resources-elastic-high-throughput-storage-cache-pool" rel="noopener noreferrer"&gt;How JuiceFS Transformed Idle Resources into a 70 GB/s Cache Pool&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Massive small files
&lt;/h3&gt;

&lt;p&gt;In computer vision and multimodal tasks, training datasets often consist of many individual files, for example, single images, video frames, or text annotations. Such massive small‑file scenarios place heavy pressure on metadata services.  &lt;/p&gt;

&lt;p&gt;In massive small-file scenarios, metadata performance is critical. On one hand, each file carries only a small amount of data; on the other hand, directory metadata access efficiency is low when a directory holds a huge number of small files.&lt;br&gt;&lt;br&gt;
For read‑only workloads, enabling client metadata caching and extending the cache lifetime can improve performance. &lt;/p&gt;

&lt;p&gt;Moreover, the data read layer experiences higher IOPS pressure because small files cannot take advantage of read‑ahead. This makes requests more fragmented. Common optimizations include increasing local cache capacity; for the Enterprise Edition, you can also scale out the distributed cache cluster horizontally. Because small files derive little benefit from read‑ahead, their latency tends to be higher.  &lt;/p&gt;

&lt;p&gt;For performance tuning in this scenario, see &lt;a href="https://juicefs.com/en/blog/user-stories/multi-cloud-store-massive-small-files" rel="noopener noreferrer"&gt;How D-Robotics Manages Massive Small Files in a Multi-Cloud Environment with JuiceFS&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large‑file random reads
&lt;/h3&gt;

&lt;p&gt;This scenario is common in AI training, for example, when randomly accessing datasets in TFRecord, HDF5, or LMDB format by sample. Take model loading: if the dataset is accessed randomly and each read size equals the sample size (for example, 1 MB to 4 MB images or short videos), read‑ahead can waste bandwidth. Such scenarios can often break through IOPS bottlenecks by increasing concurrency.  &lt;/p&gt;

&lt;p&gt;Recommended measures include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increase the number of data‑loading &lt;code&gt;reader&lt;/code&gt; threads.
&lt;/li&gt;
&lt;li&gt;Use asynchronous I/O to raise concurrency and saturate IOPS.
&lt;/li&gt;
&lt;li&gt;Improve the caching system, for example, pre‑map data into cache to boost underlying IOPS.
&lt;/li&gt;
&lt;li&gt;Adjust the &lt;code&gt;read ahead ratio&lt;/code&gt; parameter (for example, set it to &lt;code&gt;0.5&lt;/code&gt;) to reduce bandwidth waste from read‑ahead. For instance, a 4 MB sequential read would previously prefetch 4 MB; after adjustment, only 2 MB is prefetched.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we’ve analyzed JuiceFS’ architecture from a performance perspective, covered benchmark I/O tests, and discussed tuning methods for typical AI scenarios. This provides an introductory reference for system performance. JuiceFS has been deployed in many production environments, and its distributed architecture offers a feasible balance between performance and cost.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="http://go.juicefs.com/discord" rel="noopener noreferrer"&gt;community on Discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>How D-Robotics Manages Massive Small Files in a Multi-Cloud Environment with JuiceFS</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 06 Mar 2026 06:44:16 +0000</pubDate>
      <link>https://dev.to/daswu/how-d-robotics-manages-massive-small-files-in-a-multi-cloud-environment-with-juicefs-5a3f</link>
      <guid>https://dev.to/daswu/how-d-robotics-manages-massive-small-files-in-a-multi-cloud-environment-with-juicefs-5a3f</guid>
      <description>&lt;p&gt;&lt;a href="https://en.d-robotics.cc/" rel="noopener noreferrer"&gt;D-Robotics&lt;/a&gt;, founded in 2024 and spun off from &lt;a href="https://en.wikipedia.org/wiki/Horizon_Robotics" rel="noopener noreferrer"&gt;Horizon Robotics&lt;/a&gt;' robotics division, specializes in the research and development of foundational computing platforms for consumer-grade robots. In 2025, we released an &lt;a href="https://www.nvidia.com/en-us/glossary/embodied-ai/" rel="noopener noreferrer"&gt;embodied AI&lt;/a&gt; foundation model.  &lt;/p&gt;

&lt;p&gt;In robot data management, training, and inference, the sheer volume of data is immense. Using object storage presents challenges such as handling small files and managing multi-cloud data. After trying some solutions and replacing private MinIO with SSD storage, we still faced difficulties in addressing these challenges. Ultimately, we selected &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; as our core storage solution.  &lt;/p&gt;

&lt;p&gt;JuiceFS' inherent adaptability for cross-cloud operations efficiently supports data sharing needs in multi-cloud environments. In training scenarios, JuiceFS' cache mechanism, specifically designed for small file data, effectively replaces traditional caching solutions while achieving a cost-effective balance between cost and efficiency, fully meeting storage performance requirements. Currently, we manage tens of millions of files.  &lt;/p&gt;

&lt;p&gt;In this article, we’ll share our application characteristics, storage pain points, solution selection, implementation practices, and production tuning experiences. We hope our experience offers useful insights for those facing similar challenges in the industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage pain points in the robotics industry
&lt;/h2&gt;

&lt;p&gt;The cloud platform serves as our core technical hub, undertaking key application functions such as simulation environment setup, data generation and &lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;model training&lt;/a&gt;, model lightweighting and deployment, and visual verification. The data types involved in the platform are diverse, mainly including sensor image data, LiDAR point cloud data, model weights and configuration data, motor operational data, and map construction data.  &lt;/p&gt;

&lt;p&gt;While &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; meets basic storage needs for massive data, its performance limitations become particularly obvious when handling the massive small files frequently encountered in robotics applications. Our storage system faced four challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata performance bottleneck with massive small files:&lt;/strong&gt; Robot model training involves tens of millions to billions of sensor images, LiDAR data, and model files. Traditional object storage (like standard S3) exhibits significant metadata operation bottlenecks at this scale. The fixed API latency for routine operations like listing files or retrieving attributes is typically 10–30 ms. This directly constrains queries per second (QPS) performance during training and inference and impacts overall R&amp;amp;D efficiency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inefficient &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt; collaboration and data flow:&lt;/strong&gt; As robotics companies increasingly adopt multi-cloud architectures for their R&amp;amp;D and production applications, ensuring efficient data synchronization and sharing across different cloud platforms and geographical regions has become a common challenge for the industry. Traditional storage solutions typically suffer from low cross-cloud data transfer efficiency and are often deeply integrated with a single cloud provider. This leads to technical lock-in and makes it difficult to achieve flexible cross-cloud deployment and data collaboration.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The impossible trinity of performance, cost, and operations:&lt;/strong&gt; High-performance parallel file systems offer high throughput and low latency but typically rely on all-flash arrays or dedicated hardware. This leads to high hardware investment and ongoing operational costs, plus complex deployment. Low-cost object storage offers good elasticity but is difficult to support the high-throughput I/O demands of GPU clusters in AI training scenarios. A common industry workaround is using a high-speed file system as a cache synchronized with S3. However, the extra data synchronization steps significantly reduce usability and fail to achieve efficient storage-compute synergy.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difficulty in dataset version management:&lt;/strong&gt; The rapid iteration cycle of robot models requires efficient and granular management of multiple dataset versions. Using physical copies for version control directly leads to exponentially higher underlying storage consumption, significantly increasing costs. Moreover, the difficulty of retrieving, reusing, and maintaining multi-version data also increases substantially.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage selection: JuiceFS vs. MinIO/S3 vs. PFS
&lt;/h2&gt;

&lt;p&gt;To address these storage challenges, we established a clear evaluation framework for storage selection. A comprehensive comparative test was conducted on mainstream storage solutions across seven core dimensions: storage architecture, protocol compatibility, metadata performance, scalability, multi-cloud adaptability, cost efficiency, and operational complexity.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Comparison basis&lt;/th&gt;
&lt;th&gt;JuiceFS&lt;/th&gt;
&lt;th&gt;MinIO / Public Cloud S3&lt;/th&gt;
&lt;th&gt;CephFS / Public Cloud FS (CPFS)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage architecture&lt;/td&gt;
&lt;td&gt;Separation of metadata and data&lt;/td&gt;
&lt;td&gt;Unified object storage&lt;/td&gt;
&lt;td&gt;Metadata and data typically coupled, often with kernel-space parallel design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol support&lt;/td&gt;
&lt;td&gt;Full compatibility: POSIX, HDFS, S3 API, Kubernetes CSI&lt;/td&gt;
&lt;td&gt;Primarily S3 API, with weak POSIX compatibility&lt;/td&gt;
&lt;td&gt;POSIX-oriented; HDFS or S3 compatibility often requires plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metadata performance&lt;/td&gt;
&lt;td&gt;Very high: sub-millisecond latency, supports hundreds of billions of files per volume&lt;/td&gt;
&lt;td&gt;Lower: high metadata overhead for massive small files; API call overhead about 10–30 ms&lt;/td&gt;
&lt;td&gt;Medium to high: performance bottlenecks and complexity challenges at ultra-large scale (100M+ files)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability&lt;/td&gt;
&lt;td&gt;High: horizontal scaling, supports tens to hundreds of billions of files per volume&lt;/td&gt;
&lt;td&gt;High: near-infinite storage capacity, but small-file management efficiency degrades with scale&lt;/td&gt;
&lt;td&gt;Moderate: scaling limited by metadata nodes; operational complexity grows exponentially with scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-cloud adaptability&lt;/td&gt;
&lt;td&gt;Native support&lt;/td&gt;
&lt;td&gt;Relies on sync tools; cross-cloud data flow inefficient; global unified view difficult&lt;/td&gt;
&lt;td&gt;Limited: often tightly bound to specific hardware or cloud provider; cross-cloud deployment is complex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost efficiency&lt;/td&gt;
&lt;td&gt;High performance-to-cost ratio&lt;/td&gt;
&lt;td&gt;Low (storage only): cheap storage, but low GPU utilization in high-throughput scenarios like AI training&lt;/td&gt;
&lt;td&gt;High: often requires all-flash architecture or dedicated hardware; high operational labor cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Based on the comparison results above, JuiceFS demonstrates significant advantages in core performance, scalability, multi-cloud adaptability, and cost efficiency. This makes it the preferred choice for our unified storage solution.&lt;br&gt;&lt;br&gt;
Furthermore, JuiceFS has been widely adopted in the &lt;a href="https://juicefs.com/en/blog?tag=AI%20storage" rel="noopener noreferrer"&gt;autonomous driving&lt;/a&gt; industry. Leading companies such as Horizon Robotics have leveraged JuiceFS to manage data at the exabyte scale. This demonstrates its maturity and effectiveness in large-scale production environments.&lt;br&gt;&lt;br&gt;
For our specific application scenarios, JuiceFS' core technical advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decoupled architecture:&lt;/strong&gt; JuiceFS adopts a metadata-data separation architecture, persisting data in cost-effective object storage (like S3 or OSS) while storing metadata in databases like Redis or TiKV. This decoupled design enables elastic storage scaling and reduces dependence on any single cloud provider.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking and caching mechanisms:&lt;/strong&gt; JuiceFS &lt;a href="https://juicefs.com/docs/community/architecture#how-juicefs-store-files" rel="noopener noreferrer"&gt;uses chunks, slices, and blocks&lt;/a&gt; to significantly improve small file read efficiency and enhance concurrent read/write performance. In addition, multi-level caching (memory, local SSD, distributed cache) reduces access latency for hot data. This meets the demands of high-throughput training workloads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-native adaptability:&lt;/strong&gt; By providing a &lt;a href="https://juicefs.com/docs/csi/introduction" rel="noopener noreferrer"&gt;CSI Driver&lt;/a&gt;, JuiceFS delivers persistent storage decoupled from compute nodes in Kubernetes environments, supporting stateless container deployment and cross-cloud migration. It enables data sharing, enhances application high availability and flexibility, and adapts to various Kubernetes deployment methods.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-stack support for AI training:&lt;/strong&gt; JuiceFS fully supports POSIX, HDFS, and S3 API, and is compatible with mainstream AI frameworks such as PyTorch and TensorFlow. It can be integrated without code modifications, lowering the technical barrier for adoption.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-cloud support:&lt;/strong&gt; Its cross-cloud capabilities and high-performance metadata engine ensure efficient data flow, perfectly aligning with our strategy of "computing power on demand."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;From a cost perspective, JuiceFS does not offer a significant cost advantage in the early stages of small-scale deployment. However, when data volume reaches the petabyte level—especially at the 10 PB or 100 PB scale—and is compared against all-flash storage solutions, its cost-efficient architecture built on object storage becomes fully evident.&lt;/strong&gt; In addition, JuiceFS requires minimal operational overhead. Currently, we need only one engineer to manage the entire cloud platform and storage system, a fraction of the personnel required by traditional solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Community Edition to Enterprise Edition: addressing larger-scale scenarios
&lt;/h2&gt;

&lt;p&gt;As our application continued to expand, we encountered limitations when using Redis as the &lt;a href="https://juicefs.com/docs/community/databases_for_metadata/" rel="noopener noreferrer"&gt;metadata engine&lt;/a&gt;—specifically, physical memory capacity constrained data scalability. When the number of files approached the hundred-million level, metadata query latency increased significantly. This impacted the concurrency efficiency of training tasks. After using the clone feature, the metadata volume grew substantially. In addition, in cross-cloud scenarios, we faced higher demands for metadata synchronization and mirror file system capabilities. We also required more granular capacity controls and permission management at the directory level.  &lt;/p&gt;

&lt;p&gt;Considering these requirements—along with our desire to leverage local SSDs on GPU nodes to build a distributed cache layer for improved performance—we decided to deploy &lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; in parallel, migrating core scenarios such as ultra-large-scale directory management and multi-node collaborative training to this version. Through this scenario-based approach, we’ve effectively enhanced the adaptability of our overall storage system and established a solid foundation for future application growth. Below are the key features of the Enterprise Edition that we’ve applied in real-world scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-performance metadata engine: solving the bottleneck of large-scale directory retrieval
&lt;/h3&gt;

&lt;p&gt;For high-frequency operations such as traversing directories with hundreds of millions of files and deep pagination queries, we previously encountered the "slower as you query" problem with traditional storage solutions. When the number of files in a single directory exceeded 10 million, and the pagination offset surpassed 100,000 entries, response latency would spike from hundreds of milliseconds to several seconds. This severely impacted data filtering efficiency.  &lt;/p&gt;

&lt;p&gt;After switching to JuiceFS Enterprise Edition, its native tree-structured metadata storage architecture played a key role. Unlike the flat key-value storage used—which stores file metadata in a disordered manner—this tree structure allows direct navigation to directory levels, reducing the scope of metadata scans. In our actual tests, deep pagination queries (with an offset of 500,000 entries) in a directory containing 120 million files saw latency drop from 3.8 seconds to just 210 milliseconds. This fully met the retrieval needs of large-scale datasets. In addition, this engine supports storing hundreds of billions of files per volume, and we’ve already used it to manage three petabyte-scale training datasets stably, aligning with our application growth expectations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise-grade distributed cache: improving data sharing efficiency in multi-node, multi-GPU training
&lt;/h3&gt;

&lt;p&gt;In multi-node, multi-GPU training scenarios, we previously faced challenges such as low cache hit rates and cross-node bandwidth congestion. The open-source version only supports local caching on each node. This means that when multiple nodes pull the same dataset simultaneously, each node must access object storage independently. This resulted in single-node bandwidth utilization exceeding 90%, with average training job startup delays of up to 20 minutes.  &lt;/p&gt;

&lt;p&gt;With JuiceFS Enterprise Edition's &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache/" rel="noopener noreferrer"&gt;distributed caching&lt;/a&gt; feature, we set up a distributed cache across a 12-node training cluster using just three commands. The dataset only needs to be pulled from object storage once and is cached in a pool built from local SSDs across the nodes. As a result, &lt;strong&gt;the cache hit rate for multi-node collaborative training increased from 45% to 92%, cross-node bandwidth utilization dropped to below 15%, and training job startup time was reduced to under three minutes&lt;/strong&gt;. This significantly improved compute utilization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced cross-cloud collaboration: building a low-operational-cost cross-cloud data foundation
&lt;/h3&gt;

&lt;p&gt;Since our R&amp;amp;D environments are distributed across two cloud environments, we previously encountered challenges with &lt;strong&gt;slow cross-cloud data synchronization and high operational costs&lt;/strong&gt;. Using traditional synchronization tools to maintain data consistency between the two clouds required configuring eight scheduled tasks, with an average synchronization delay of four hours, and dedicated personnel needed to investigate sync failures weekly.  &lt;/p&gt;

&lt;p&gt;By using the JuiceFS sync tool combined with our internal AI operations tools, we achieved automated configuration of synchronization policies. The system automatically adjusts sync priorities based on data heat levels, keeping cross-cloud data latency within 10 minutes. In addition, tasks such as failure retries and log alerts for synchronization are fully automated, eliminating the need for dedicated monitoring. &lt;strong&gt;This has reduced operational overhead by 70%&lt;/strong&gt;, and we now stably support multiple training projects across two cloud platforms sharing the same dataset. Going forward, we plan to use the Enterprise Edition's mirror file system feature to further enhance cross-cloud data collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Client cache and write performance tuning
&lt;/h3&gt;

&lt;p&gt;We need to pay attention to compatibility issues between caching strategies and Kubernetes resource limits. For example, using memory as a local cache path with improper configuration may lead to abnormal memory growth in the Mount Pod, or insufficient resource quota reservations may cause checkpoint loss or file handle write exceptions during long-running training tasks.  &lt;/p&gt;

&lt;p&gt;Regarding write performance tuning, enabling writeback mode can improve small file write throughput to some extent. However, considering production environment requirements for data consistency, we still adopt write-through synchronous mode to reduce data risks in extreme crash scenarios. It’s recommended to cautiously enable writeback mode only in scenarios with lower data reliability requirements, such as temporary computing or offline data cleaning, based on actual needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment and network topology optimization
&lt;/h3&gt;

&lt;p&gt;For more stable performance, it’s strongly recommended to deploy the metadata engine and compute nodes within the same region during deployment. In actual operations, we observed that cross-region deployment could increase metadata operation latency by several to ten times. This significantly impacted I/O-intensive operations such as data decompression. Deploying metadata services and GPU computing resources within the same region helps maintain performance while controlling network transmission costs, improving overall resource utilization efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data warm-up and cache optimization
&lt;/h3&gt;

&lt;p&gt;In a 10-gigabit network environment, fully utilizing JuiceFS' data &lt;a href="https://juicefs.com/docs/cloud/reference/command_reference/#warmup" rel="noopener noreferrer"&gt;warm-up&lt;/a&gt; and reasonably adjusting data block sizes based on application scenarios can better leverage network bandwidth capabilities and improve read throughput. Combined with the distributed cache architecture, this can effectively enhance data sharing efficiency in multi-node concurrent scenarios and improve cache hit rates during high-concurrency reads. This thereby optimizes the overall performance of large-scale AI training tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource quotas and high availability guarantee
&lt;/h3&gt;

&lt;p&gt;In enterprise-level multi-role operations and storage responsibility separation scenarios, to avoid operational risks caused by inconsistent configurations, it’s recommended to finely control resource quotas for &lt;a href="https://juicefs.com/docs/csi/introduction/" rel="noopener noreferrer"&gt;JuiceFS CSI Driver&lt;/a&gt; in Kubernetes environments. By appropriately setting CPU and memory request/limit for Mount Pods, Pod restarts or node anomalies caused by resource preemption can be reduced. In practice, resource reservation ratios can be dynamically adjusted based on cluster load.  &lt;/p&gt;

&lt;p&gt;In addition, for scenarios with high application continuity requirements, the automatic mount point recovery feature for Mount Pods can be enabled to achieve automated fault recovery for storage services, further ensuring underlying storage stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-tenancy
&lt;/h3&gt;

&lt;p&gt;We provide independent &lt;a href="https://en.wikipedia.org/wiki/File_system" rel="noopener noreferrer"&gt;file systems&lt;/a&gt; and storage buckets for large enterprise customers, while achieving isolation for small and medium-sized enterprises and end users through subdirectory-level directory isolation and permission control.  &lt;/p&gt;

&lt;p&gt;Large enterprises can flexibly scale throughput and capacity, avoiding performance bottlenecks associated with shared storage buckets. For small and medium-sized enterprises and end users, we ensure data security and independence through subdirectory isolation and permission control, while enabling accurate metering and billing.  &lt;/p&gt;

&lt;p&gt;This architecture ensures tenant isolation while flexibly allocating resources, improving system management efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Version management
&lt;/h3&gt;

&lt;p&gt;Using the &lt;code&gt;juicefs clone&lt;/code&gt; command, copies of original datasets can be quickly created and modified independently without affecting the source data. The clone operation only copies file metadata, while data only stores additional changes, saving underlying storage space. This feature supports managing multiple versions, facilitating rollback and recovery and ensuring data security and version control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;JuiceFS' characteristics in metadata performance, scalability, cross-cloud adaptability, and comprehensive cost efficiency have made it our choice for building a unified storage layer. Currently, we adopt both JuiceFS Community Edition and Enterprise Edition to accommodate different storage requirements across various application scenarios. &lt;/p&gt;

&lt;p&gt;In the future, we plan to further implement JuiceFS in the embodied intelligence field, addressing specific storage needs in this scenario. These include high-throughput processing of time-series data, precise multi-modal data alignment, edge-cloud collaborative storage, and integrated management of simulation and real-world data.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>JuiceFS Enterprise 5.3: 500B+ Files per File System &amp; RDMA Support</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Wed, 04 Feb 2026 09:18:32 +0000</pubDate>
      <link>https://dev.to/daswu/juicefs-enterprise-53-500b-files-per-file-system-rdma-support-51g9</link>
      <guid>https://dev.to/daswu/juicefs-enterprise-53-500b-files-per-file-system-rdma-support-51g9</guid>
      <description>&lt;p&gt;&lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; 5.3 has recently been released, achieving a milestone breakthrough by &lt;strong&gt;supporting over 500 billion files in a single file system&lt;/strong&gt;. This upgrade includes several key optimizations to the metadata multi-zone architecture and introduces remote direct memory access (RDMA) technology for the first time to enhance distributed caching efficiency. In addition, version 5.3 enhances write support for mirrors and provides data caching for objects imported across buckets. It aims to support high-performance requirements and multi-cloud application scenarios.  &lt;/p&gt;

&lt;p&gt;JuiceFS Enterprise Edition is designed for high-performance scenarios. Since 2019, it has been applied in machine learning and has become one of the core infrastructures in the AI industry. Its customers include large language model (LLM) companies such as &lt;a href="https://juicefs.com/en/blog/user-stories/minimax-foundation-model-ai-storage" rel="noopener noreferrer"&gt;MiniMax&lt;/a&gt; and &lt;a href="https://juicefs.com/en/blog/user-stories/artificial-intelligence-storage-large-language-model-multimodal" rel="noopener noreferrer"&gt;StepFun&lt;/a&gt;; AI infrastructure and applications like &lt;a href="https://fal.ai/" rel="noopener noreferrer"&gt;fal&lt;/a&gt; and &lt;a href="https://www.google.com/aclk?sa=L&amp;amp;pf=1&amp;amp;ai=DChsSEwiRiciyqLKSAxUDGnsHHWLjLNUYACICCAEQABoCdG0&amp;amp;co=1&amp;amp;ase=2&amp;amp;gclid=Cj0KCQiAp-zLBhDkARIsABcYc6uhhwJ9tC5nV4bZaVnUn0rp3n5supJemQ56IlmqotNNBXwOu7nj45YaAoHLEALw_wcB&amp;amp;cid=CAASWeRoj18s_qlU1Snul0wlY3LceuDdnBzqF0JeaQvpy0BRPWlYRSWMNuJeTIC5wRaiAQ_Y5fSwz0TyEorzHk_5RryIFwySfWj-3W4JosAnIYhsmmi-OBv4bDQr&amp;amp;cce=2&amp;amp;category=acrcp_v1_32&amp;amp;sig=AOD64_2ED5eS2SjlfQ6Or__-vtFR2mN9Aw&amp;amp;q&amp;amp;nis=4&amp;amp;adurl=https://www.heygen.com/?sid%3Drewardful%26via%3D8866%26gad_source%3D1%26gad_campaignid%3D23447523769%26gbraid%3D0AAAAA-C7PAeKAlXawSoqSIYo1bVn-PoAD%26gclid%3DCj0KCQiAp-zLBhDkARIsABcYc6uhhwJ9tC5nV4bZaVnUn0rp3n5supJemQ56IlmqotNNBXwOu7nj45YaAoHLEALw_wcB&amp;amp;ved=2ahUKEwitwMCyqLKSAxVKcfUHHRrvBfUQ0Qx6BAgMEAE" rel="noopener noreferrer"&gt;HeyGen&lt;/a&gt;; autonomous driving companies like &lt;a href="https://www.momenta.cn/" rel="noopener noreferrer"&gt;Momenta&lt;/a&gt; and &lt;a href="https://en.horizon.auto/" rel="noopener noreferrer"&gt;Horizon Robotics&lt;/a&gt;; and numerous leading technology enterprises across various industries leveraging AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Single file system supports 500 billion+ files
&lt;/h2&gt;

&lt;p&gt;The multi-zone architecture is one of JuiceFS' key technologies for handling hundreds of billions of files, ensuring high scalability and concurrent processing capabilities. &lt;strong&gt;To meet the growing demands of scenarios like &lt;a href="https://en.wikipedia.org/wiki/Self-driving_car" rel="noopener noreferrer"&gt;autonomous driving&lt;/a&gt;, version 5.3 introduces in-depth optimizations to the multi-zone architecture, increasing the zone limit to 1,024 and enabling a single file system to store and access at least 500 billion files&lt;/strong&gt; (each zone can store 500 million files, with a maximum of 2 billion).  &lt;/p&gt;

&lt;p&gt;The figure below shows JuiceFS Enterprise Edition architecture, with a single zone in the lower left corner:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdoz2btx939gfoc0rr5x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdoz2btx939gfoc0rr5x.png" alt=" " width="784" height="702"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This breakthrough presents exponentially increasing challenges in system performance, data consistency, and stability, backed by a series of complex underlying optimizations and R&amp;amp;D efforts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-zone hotspot balancing: automated monitoring and hotspot migration, with manual ops tools
&lt;/h3&gt;

&lt;p&gt;In distributed systems, hotspots are a common challenge. Especially when data is distributed across multiple zones, some zones may experience higher loads than others. This leads to imbalance that impacts system performance.  &lt;/p&gt;

&lt;p&gt;When the number of zones reaches hundreds, hotspot issues become more prevalent. Particularly with smaller datasets and larger numbers of files, read/write hotspots exacerbate latency fluctuations.  &lt;/p&gt;

&lt;p&gt;We introduced an automated hotspot migration mechanism to move frequently accessed files to other zones, distributing the load and reducing pressure on specific zones. However, in practice, relying solely on automated migration cannot fully resolve all issues. In certain special or extreme scenarios, automated tools may not respond promptly. &lt;strong&gt;Therefore, alongside automated monitoring and migration, we added manual operational tools, allowing administrators to intervene in complex scenarios, perform manual analysis, and implement optimization solutions.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Large-scale migration: improved migration speed, small-batch concurrent migration
&lt;/h3&gt;

&lt;p&gt;Facing zones with excessive hotspots, early migration operations were simple. However, as the system scale expanded, migration efficiency gradually decreased. &lt;strong&gt;To address this, we introduced a small-batch concurrent migration strategy&lt;/strong&gt;, breaking down high-access directories into smaller chunks and migrating them in parallel to multiple lower-load zones. This quickly scatters hotspots and restores normal application access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced reliability self-checks: automatic repair and cleanup of intermediate migration states
&lt;/h3&gt;

&lt;p&gt;In large-scale clusters, the probability of distributed transaction failures increases significantly, especially during extensive migration processes. To address this, &lt;strong&gt;we enhanced reliability detection mechanisms, adding periodic background checks to scan cross-zone file states, particularly focusing on intermediate state issues, and automatically performing repairs and cleanup&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Previously, the system encountered issues with residual intermediate state data. While these did not affect operations in the short term, over time they could lead to errors. Through enhanced self-check mechanisms, we ensure the background periodically scans and promptly handles intermediate state issues, improving system stability and reliability.  &lt;/p&gt;

&lt;p&gt;Beyond the three key optimizations above, we also made multiple improvements to the console to better adapt to managing more zones. We optimized concurrent processing, operational tasks, and query displays, enhancing overall performance and user experience. Specifically, we refined UI design to better showcase system states in large-scale zone environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance stress test for hundreds of billions of files
&lt;/h3&gt;

&lt;p&gt;We conducted large-scale tests using a custom &lt;a href="https://github.com/llnl/mdtest" rel="noopener noreferrer"&gt;mdtest&lt;/a&gt; tool on Google Cloud, deploying 60 nodes, each with over 1 TB of memory. In terms of software configuration, we increased the number of zones to 1,024. The deployment method was similar to previous setups, but to reduce memory consumption, we deployed only one service process, with two others as cold backups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4q88quw1c8m67prty9o3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4q88quw1c8m67prty9o3.png" alt=" " width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;JuiceFS Enterprise Edition 5.3 test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test duration: Approximately 20 hours
&lt;/li&gt;
&lt;li&gt;Total files written: About 400 billion files
&lt;/li&gt;
&lt;li&gt;Write speed: 5 million files per second
&lt;/li&gt;
&lt;li&gt;Memory usage: About 35% to 40%
&lt;/li&gt;
&lt;li&gt;Disk usage: 40% to 50%, primarily for metadata persistence, with good utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on our experience, if using a configuration with one service process, one hot backup, and one cold backup, memory usage increases by 20% to 30%.  &lt;/p&gt;

&lt;p&gt;Due to limited cloud resources, this test only wrote up to 400 billion files. During stress testing, the system performed stably, with hardware resources still remaining. We’ll continue to attempt larger-scale tests in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Support for RDMA: increased bandwidth cap, reduced CPU usage
&lt;/h2&gt;

&lt;p&gt;This new version introduces support for &lt;a href="https://en.wikipedia.org/wiki/Remote_direct_memory_access" rel="noopener noreferrer"&gt;RDMA&lt;/a&gt; technology for the first time. Its basic architecture is shown in the diagram below. RDMA allows direct access to remote node memory, bypassing the operating system's network protocol stack. This significantly improves data transfer efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhkkiori1vvxfvaekmoi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhkkiori1vvxfvaekmoi.png" alt=" " width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main advantages of RDMA include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low latency:&lt;/strong&gt; By enabling direct memory-to-memory transfers and bypassing the OS network protocol layers, it reduces CPU interrupts and context switches. This lowers latency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High throughput:&lt;/strong&gt; RDMA uses hardware for direct data transfer, better utilizing the bandwidth of network interface cards (NICs).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced CPU usage:&lt;/strong&gt; In RDMA, data copying is almost entirely handled by the NIC, with the CPU only processing control messages. This allows the NIC to handle hardware transfers, freeing up CPU resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In JuiceFS, network request messages between clients and metadata services are small, and existing TCP configurations already meet the needs. However, in &lt;a href="https://en.wikipedia.org/wiki/Distributed_cache" rel="noopener noreferrer"&gt;distributed caching&lt;/a&gt;, file data is transferred between clients and cache nodes. Using RDMA can effectively improve transfer efficiency and reduce CPU consumption.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkr7vblmcuoa5eo2f5hv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkr7vblmcuoa5eo2f5hv.png" alt=" " width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We conducted 1 MB random read tests using 160 Gbps NICs, comparing versions 5.1, 5.2 (using TCP networking) with version 5.3 (RDMA), and observed CPU usage.  &lt;/p&gt;

&lt;p&gt;Tests showed that RDMA effectively reduces CPU usage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In version 5.2, CPU usage was nearly 50%.
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;In version 5.3, with RDMA optimization, CPU usage dropped to about one-third. Client and cache node CPU usage decreased to 8 cores and 5 cores respectively, with bandwidth reaching 20 GiB/s.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In previous tests, we found that while TCP ran stably on 200G NICs, fully saturating bandwidth was challenging, typically achieving only 85%-90% utilization. &lt;strong&gt;For customers requiring higher bandwidth (such as 400G NICs), TCP could not meet demands. However, RDMA can more easily reach hardware bandwidth limits, providing better transfer efficiency.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;If customers have RDMA-capable hardware and high bandwidth requirements (for example, NICs greater than 100G) and wish to reduce CPU usage, RDMA is a technology worth trying. Currently, our RDMA feature is in public testing and has not yet been widely deployed in production environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enhanced write support for mirrors
&lt;/h2&gt;

&lt;p&gt;Initially, &lt;a href="https://juicefs.com/docs/cloud/guide/mirror/" rel="noopener noreferrer"&gt;mirror&lt;/a&gt; clusters were primarily used for read-only mirroring in enterprise products. As users requested capabilities like writing temporary files (such as training data) in mirrors, we provided write support for mirrors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd567qgcebpr8pma2xgg5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd567qgcebpr8pma2xgg5.png" alt=" " width="590" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mirror client implements a read-write separation mechanism. When reading data, the client prioritizes fetching from the mirror cluster to reduce latency. When writing data, it still writes to the source cluster to ensure data consistency. By recording and comparing metadata version numbers, we ensure strong consistency between the mirror client and source cluster client views of the data.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To improve availability, version 5.3 introduces a fallback mechanism. When the mirror becomes unavailable, client read requests automatically fall back to the source cluster.&lt;/strong&gt; This ensures application continuity and avoids interruptions caused by mirror cluster failures. We also optimized deployments in multi-mirror environments. Previously, the mirror end required two hot backup nodes to ensure high availability. Now, with the improved fallback feature, deploying a single mirror node can achieve similar effects. This ensures application continuity and reduces costs, especially beneficial for users requiring multiple mirrors.  &lt;/p&gt;

&lt;p&gt;Through this improvement, we not only reduced hardware costs but also found a balance between high availability and low cost. For users deploying mirrors in multiple locations, reducing metadata replicas further lowers overall costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simplified operations &amp;amp; increased flexibility: providing cross-bucket data cache for imported objects
&lt;/h2&gt;

&lt;p&gt;In JuiceFS, users can use the &lt;code&gt;import&lt;/code&gt; command to bring existing files from &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; under unified management. This is convenient for users already storing large amounts of data (for example, tens of petabytes). However, in previous versions, this feature only supported caching for objects within the same data bucket. This meant imported objects had to reside in the same bucket as the existing file system data. This limitation had certain practical constraints.  &lt;/p&gt;

&lt;p&gt;In version 5.3, we improved this feature. &lt;strong&gt;Users can now provide caching capability for any imported objects, regardless of whether they come from the same data bucket.&lt;/strong&gt; This allows users more flexibility in managing objects across different data buckets, avoiding strict bucket restrictions and enhancing data management freedom.  &lt;/p&gt;

&lt;p&gt;In addition, previously, if users had data distributed across multiple buckets and wanted to provide caching for that data, they needed to create a new file system for each bucket. In version 5.3, users only need to create one file system (volume) to uniformly manage data from multiple buckets and provide caching for all buckets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other important optimizations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Trace feature
&lt;/h3&gt;

&lt;p&gt;We added the trace feature, a feature provided by the Go language itself. Through this, advanced users can perform tracing and performance analysis, gaining more information to help quickly locate issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trash recovery
&lt;/h3&gt;

&lt;p&gt;In previous versions, especially with multiple zones, sometimes the paths recorded in the trash were incomplete. This led to anomalies during recovery, where files were not restored to the expected locations. To address this, in version 5.3, when deleting files, we record the original file path, ensuring more reliable recovery capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python SDK improvements
&lt;/h3&gt;

&lt;p&gt;In earlier versions, we released the &lt;a href="https://juicefs.com/docs/cloud/deployment/python-sdk/" rel="noopener noreferrer"&gt;Python SDK&lt;/a&gt;, providing basic read/write functionalities for Python users to interface with our system. In version 5.3, we not only strengthened basic read/write functions but also added support for operational subcommands. For example, users can directly call commands like &lt;code&gt;juicefs info&lt;/code&gt; or &lt;code&gt;warmup&lt;/code&gt; via the SDK without relying on external system commands. This simplifies coding efforts and avoids potential performance bottlenecks from frequently calling external commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Windows client
&lt;/h3&gt;

&lt;p&gt;We previously launched a beta version of the Windows client and have received some user feedback. After improvements, the current version shows significant enhancements in mount reliability, performance, and compatibility with Linux systems. In the future, we plan to further refine the Windows client, providing an experience closer to Linux for users reliant on Windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Compared to expensive dedicated hardware, JuiceFS helps users balance performance and cost when addressing data growth by flexibly utilizing cloud or existing customer storage resources. In version 5.3, by optimizing the metadata zone architecture, a single file system can support over 500 billion files. The first-time introduction of RDMA technology significantly improves distributed caching bandwidth and data access efficiency, reduces CPU usage, and further optimizes system performance. In addition, we enhanced features like write support for mirrors and caching, improving the performance and operational efficiency of large-scale clusters and optimizing user experience.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;Cloud service&lt;/a&gt; users can now directly experience JuiceFS Enterprise Edition 5.3 online, while on-premises deployment users can obtain upgrade support through official channels. We’ll continue to focus on high-performance storage solutions, partnering with enterprises to tackle challenges brought by continuous data growth.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>architecture</category>
    </item>
    <item>
      <title>From GlusterFS to JuiceFS: Lightillusions Achieved 2.5x Faster 3D AIGC Data Processing</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 09 Jan 2026 07:43:46 +0000</pubDate>
      <link>https://dev.to/daswu/from-glusterfs-to-juicefs-lightillusions-achieved-25x-faster-3d-aigc-data-processing-53l0</link>
      <guid>https://dev.to/daswu/from-glusterfs-to-juicefs-lightillusions-achieved-25x-faster-3d-aigc-data-processing-53l0</guid>
      <description>&lt;p&gt;&lt;a href="https://www.lightillusions.com/" rel="noopener noreferrer"&gt;Lightillusions&lt;/a&gt; is a company specializing in spatial intelligence technology, integrating 3D vision, computer graphics, and generative models to build innovative 3D foundation models. Our company is led by Ping Tan, a professor at the Hong Kong University of Science and Technology (HKUST) and Director of the HKUST-BYD Joint Laboratory.  &lt;/p&gt;

&lt;p&gt;Unlike 2D models, a single 3D model can be several gigabytes in size, especially complex models like point clouds. When our data volume reached petabyte scales, management and storage became significant challenges. &lt;strong&gt;After trying solutions like NFS and GlusterFS, we chose &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, an open-source high-performance distributed file system, to build a unified storage platform.&lt;/strong&gt; This platform now serves multiple scenarios, supports cross-platform access including Windows and Linux, &lt;strong&gt;manages hundreds of millions of files, improves data processing speed by 200%–250%&lt;/strong&gt;, enables efficient storage scaling, and greatly simplifies operations and maintenance. This allows us to focus more on core research.  &lt;/p&gt;

&lt;p&gt;In this article, we’ll break down the unique storage demands of 3D AIGC, share why we selected JuiceFS over CephFS, and walk through the architecture of our JuiceFS-based storage platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage requirements for 3D AIGC
&lt;/h2&gt;

&lt;p&gt;Our research focuses on perception and generation. In the 3D domain, task complexity is different from image and text processing. This placed higher demands on our AI models, algorithms, and infrastructure.  &lt;/p&gt;

&lt;p&gt;We illustrate the complexity of 3D data processing through a typical pipeline. On the left side of the diagram below is a 3D model containing texture (top-left) and geometry (bottom-right) information. First, we generate &lt;a href="https://en.wikipedia.org/wiki/Rendering_(computer_graphics)" rel="noopener noreferrer"&gt;rendered&lt;/a&gt; images. Each model has text labels describing its content, geometric features, and texture features, which are tightly coupled with the model. In addition, we process geometry data, such as sampled points and necessary numerical values obtained from data preprocessing, like signed distance fields (SDFs). It's important to note that 3D model file formats are highly diverse, and image formats are also different.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72kjddg1jm5kft7qca6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72kjddg1jm5kft7qca6u.png" alt=" " width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our work spans language models, image/video models, and 3D models. As data volume grows, so does the storage burden. The main characteristics of data usage in these scenarios are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Language models: Data typically consists of a vast number of small files. Although individual text files are small, the total file count can reach millions or even tens of millions as data volume increases. This makes the management of such a large number of files a primary storage challenge.
&lt;/li&gt;
&lt;li&gt;Image and video data: High-resolution images and long videos are usually large. A single image can range from hundreds of kilobytes to several megabytes, while video files can reach gigabytes. During preprocessing—such as data augmentation, resolution adjustment, and frame extraction—data volume increases significantly. Especially in video processing, where each video is typically decomposed into a large number of image frames, managing these massive file collections adds considerable complexity.
&lt;/li&gt;
&lt;li&gt;3D models: Individual models, especially complex ones like point clouds, can be several gigabytes in size. &lt;strong&gt;3D data preprocessing is more complex than other data types, involving steps like texture mapping and geometry reconstruction, which consume great computational resources and can increase data volume.&lt;/strong&gt; Furthermore, 3D models often consist of multiple files, leading to a large total file count. As data grows, managing these files becomes increasingly difficult.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on the storage characteristics discussed above, when we chose a storage platform solution, we expected it to meet the following requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Diverse data formats and cross-node sharing:&lt;/strong&gt; Different models use different data formats, especially the complexity and cross-platform compatibility issues of 3D models. The storage system must support multiple formats and effectively manage data sharing across nodes and platforms.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handling data models of different sizes:&lt;/strong&gt; Whether it's small files for language models, large-scale image/video data, or large files for 3D models, the storage system must be highly scalable to meet rapidly growing storage demands and handle the storage and access of large-size data efficiently.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Challenges of &lt;a href="https://www.virtana.com/glossary/what-is-cross-cloud/" rel="noopener noreferrer"&gt;cross-cloud&lt;/a&gt; and cluster storage:&lt;/strong&gt; As data volume increases, especially with petabyte-level storage needs for 3D models, cross-cloud and cluster storage issues become more prominent. The storage system must support seamless cross-region, cross-cloud data access and efficient cluster management.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy scaling:&lt;/strong&gt; The need for scaling is constant, whether for language, image/video, or 3D models, and is particularly high for 3D model storage and processing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple operations and maintenance:&lt;/strong&gt; The storage system should provide easy-to-use management interfaces and tools. Especially for 3D model management, operational requirements are higher, making automated management and fault tolerance essential.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage solutions: from NFS, GlusterFS, CephFS to JuiceFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial solution: NFS mount
&lt;/h3&gt;

&lt;p&gt;Initially, we tried the simplest solution—using &lt;a href="https://en.wikipedia.org/wiki/Network_File_System" rel="noopener noreferrer"&gt;NFS&lt;/a&gt; for mounting. However, in practice, we found that the training cluster and rendering cluster required independent clusters for mount operations. Maintaining this setup was very cumbersome. Especially when adding new data, as we needed to write mount points separately for each new dataset. &lt;strong&gt;When the data volume reached about 1 million objects, we could no longer sustain this approach and abandoned it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhbhaa1lo6wmjr6ar2vv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhbhaa1lo6wmjr6ar2vv.png" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Mid-term solution: GlusterFS
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_glusterfs" rel="noopener noreferrer"&gt;GlusterFS&lt;/a&gt; was an easy-to-start-with choice, offering simple installation and configuration, acceptable performance, and no need for multiple mount points—just add new nodes.  &lt;/p&gt;

&lt;p&gt;While GlusterFS greatly reduced our workload in the early stages, we also discovered issues with its ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Many GlusterFS execution scripts and features required writing custom scheduled tasks. Particularly when adding new storage, it had additional requirements, such as needing to increase nodes by specific multiples.
&lt;/li&gt;
&lt;li&gt;Support for operations like cloning and data synchronization was weak. This led us to frequently consult documentation.
&lt;/li&gt;
&lt;li&gt;Many operations were unstable. For example, when using tools like fio for speed testing, results were not always reliable.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A more serious problem was that GlusterFS performance would drastically decline when the number of small files reached a certain scale.&lt;/strong&gt; For example, one model might generate 100 images. With 10 million models, that would produce 1 billion images. GlusterFS struggled severely with addressing in later stages, especially with an excessive number of small files. This led to significant performance drops and even system crashes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5xtcifcfmk1v710gau4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5xtcifcfmk1v710gau4.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Final selection: CephFS vs. JuiceFS
&lt;/h3&gt;

&lt;p&gt;As storage demands grew, we decided to use a more sustainable solution. After evaluating various options, we compared &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/" rel="noopener noreferrer"&gt;CephFS and JuiceFS&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;Although Ceph is widely used, through our own practice and reviewing documentation, we found Ceph's operational and management costs to be very high. Especially for a small team like ours, handling such complex operational tasks proved particularly difficult.  &lt;/p&gt;

&lt;p&gt;JuiceFS had two native features that strongly aligned with our needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The client data cache.&lt;/strong&gt; For our model training clusters, which are typically equipped with high-performance NVMe storage, fully utilizing client caching could significantly accelerate model training and reduce pressure on the JuiceFS storage backend.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JuiceFS' S3 compatibility was crucial for us.&lt;/strong&gt; As we had developed some visualization platforms based on storage for data annotation, organization, and statistics, S3 compatibility allowed us to rapidly develop web interfaces supporting visualization, data statistics, and other features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The table below &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/" rel="noopener noreferrer"&gt;compares basic features of CephFS and JuiceFS&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p0scbwio14i307kjru0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p0scbwio14i307kjru0.png" alt=" " width="800" height="1276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage platform practice based on JuiceFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Metadata engine selection and topology
&lt;/h3&gt;

&lt;p&gt;JuiceFS employs a metadata-data separation architecture with several metadata engine options. We first quickly validated the &lt;a href="https://juicefs.com/docs/community/redis_best_practices/" rel="noopener noreferrer"&gt;Redis storage solution&lt;/a&gt;, which is well-documented by the JuiceFS team. Redis' advantage lies in its lightweight nature; configuration typically takes only a day or half a day, and data migration is smooth. &lt;strong&gt;However, when the number of small files exceeded 100 million, Redis' speed and performance significantly declined&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv98vc7tp14qrp4eqhh3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv98vc7tp14qrp4eqhh3u.png" alt=" " width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As mentioned earlier, each model might render 100 images. With other miscellaneous files, the number of small files increased dramatically. While we could mitigate the issue by packing small files, performing modifications or visualization on packed data greatly increased complexity. Therefore, we preferred to retain the original small image files for subsequent processing&lt;/p&gt;

&lt;p&gt;As the file count grew and soon exceeded Redis' capacity, we decided to migrate the storage system to a combination of &lt;a href="https://tikv.org/" rel="noopener noreferrer"&gt;TiKV&lt;/a&gt; and Kubernetes (K8s). &lt;strong&gt;The TiKV-K8s setup provided us with a more highly available metadata storage solution&lt;/strong&gt;. Furthermore, through benchmarking, we found that although TiKV's performance was slightly lower, the gap was not significant, and its support for small files was better than Redis'. We also consulted JuiceFS engineers and learned that Redis has poor scalability in cluster mode. Therefore, we switched to TiKV.&lt;/p&gt;

&lt;p&gt;The table below shows read/write performance test results for different metadata engines:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b4gz1knnp88iyszdisr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b4gz1knnp88iyszdisr.png" alt=" " width="800" height="632"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Latest architecture: JuiceFS+TiKV+SeaweedFS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;We use JuiceFS to manage the object storage layer. For the metadata storage system, we built it with TiKV and K8s. For object storage, we used SeaweedFS.&lt;/strong&gt; This allows us to quickly scale storage capacity and provides fast access for both small and large files. In addition, our object storage is distributed across multiple platforms, including local storage and platforms like R2 and Amazon S3. Through JuiceFS, we were able to integrate these different storage systems and provide a unified interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5jci4fq8dn2gc5l6fqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5jci4fq8dn2gc5l6fqk.png" alt=" " width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To better manage system resources, we built a resource monitoring platform on K8s. The current system consists of about 60 Linux nodes and several Windows nodes handling rendering and data processing tasks. We monitored read stability, and the results show that even with multiple heterogeneous servers performing simultaneous read operations, the overall system I/O performance remains stable, able to fully utilize the bandwidth resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problems we encountered
&lt;/h3&gt;

&lt;p&gt;During the optimization of the storage solution, we initially tried an &lt;a href="https://en.wikipedia.org/wiki/Erasure_code" rel="noopener noreferrer"&gt;erasure code&lt;/a&gt; (EC) storage scheme aimed at reducing storage requirements and improving efficiency. However, in large-scale data migration, EC storage computation was slow, and its performance was unsatisfactory in high-throughput and frequent data change scenarios. Especially when combined with SeaweedFS, bottlenecks existed. Based on these issues, we decided to abandon EC storage and switch to a replication-based storage scheme.  &lt;/p&gt;

&lt;p&gt;We set up independent servers and configured scheduled tasks for large-volume metadata backups. In TiKV, we implemented a redundant replica mechanism, adopting a multi-replica scheme to ensure data integrity. For object storage, we used dual-replica encoding to further enhance data reliability. Although replica storage effectively ensures data redundancy and high availability, storage costs remain high due to processing petabyte-scale data and massive incremental data. In the future, we may consider further optimizing the storage scheme to reduce costs.  &lt;/p&gt;

&lt;p&gt;In addition, we found that using all-flash servers with JuiceFS did not bring significant performance improvements. The bottleneck mainly appeared in network bandwidth and latency. Therefore, we plan to consider using InfiniBand to connect storage servers and training servers to maximize resource utilization efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;When using GlusterFS, we could process at most 200,000 models per day. &lt;strong&gt;After switching to JuiceFS, the processing capacity increased significantly. Our daily data processing capacity has grown by 2.5 times. Small file throughput also improved notably. The system remained stable even when storage utilization reached 70%.&lt;/strong&gt; Furthermore, scaling became very convenient, whereas the previous architecture involved troublesome scaling processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87p9qc0cqdskbjhm0hfi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87p9qc0cqdskbjhm0hfi.png" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, let's summarize the advantages JuiceFS has demonstrated in 3D generation tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small file performance:&lt;/strong&gt; Small file handling is a critical point, and JuiceFS provides an excellent solution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform features:&lt;/strong&gt; Cross-platform support is very important. We found that some data can only be opened in Windows software, so we need to process the same data on both Windows and Linux systems and perform read/write operations on the same mount point. This requirement makes cross-platform features particularly crucial, and JuiceFS' design addresses this well.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low operational cost:&lt;/strong&gt; JuiceFS' operational cost is extremely low. After configuration, only simple testing and node management (for example, discarding certain nodes and monitoring robustness) are needed. We spent about half a year migrating data and have not encountered major issues so far.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local cache mechanism:&lt;/strong&gt; Previously, to use local cache, we needed to manually implement local caching logic in our code. JuiceFS provides a very convenient local caching mechanism, optimizing performance for training scenarios by setting mount parameters.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low migration cost:&lt;/strong&gt; Especially when migrating small files, we found using JuiceFS for metadata and object storage migration to be convenient, saving us a lot of time and effort. In contrast, migrating with other storage systems was very painful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary, JuiceFS performs excellently in large-scale data processing, providing an efficient and stable storage solution. It not only simplifies storage management and scaling but also significantly improves system performance. This allows us to focus more on advancing core tasks. In addition, the JuiceFS tools are very convenient. For example, we used the &lt;code&gt;sync&lt;/code&gt; tool for small file migration with extremely high efficiency. Without additional performance optimization, we successfully migrated 500 TB of data, including a massive number of small data and image files. It was done in less than 5 days, exceeding our expectations.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>opensource</category>
      <category>performance</category>
    </item>
    <item>
      <title>AI Data Storage: Challenges, Capabilities, and Comparative Analysis</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 19 Dec 2025 07:42:58 +0000</pubDate>
      <link>https://dev.to/daswu/ai-data-storage-challenges-capabilities-and-comparative-analysis-46n3</link>
      <guid>https://dev.to/daswu/ai-data-storage-challenges-capabilities-and-comparative-analysis-46n3</guid>
      <description>&lt;p&gt;&lt;em&gt;Note: This article was first published on &lt;a href="https://dzone.com/articles/ai-data-storage-challenges-capabilities-comparison" rel="noopener noreferrer"&gt;DZone&lt;/a&gt; and featured on its homepage.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The explosion in the popularity of ChatGPT has once again ignited a surge of excitement in the &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence" rel="noopener noreferrer"&gt;AI&lt;/a&gt; world. Over the past five years, AI has advanced rapidly and has found applications in a wide range of industries. As a storage company, we’ve had a front-row seat to this expansion, watching more and more AI startups and established players emerge across fields like autonomous driving, protein structure prediction, and quantitative investment.&lt;br&gt;&lt;br&gt;
AI scenarios have introduced new challenges to the field of data storage. Existing storage solutions are often inadequate to fully meet these demands.&lt;/p&gt;

&lt;p&gt;In this article, we’ll deep dive into the storage challenges in AI scenarios, critical storage capabilities, and comparative analysis of Amazon S3, Alluxio, Amazon EFS, Azure, GCP Filestore, Lustre, Amazon FSx for Lustre, GPFS, BeeGFS, and JuiceFS Cloud Service. I hope this post will help you make informed choices in AI and data storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage challenges for AI
&lt;/h2&gt;

&lt;p&gt;AI scenarios have brought new data patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/High-throughput" rel="noopener noreferrer"&gt;&lt;strong&gt;High throughput&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;data access challenges:&lt;/strong&gt; In AI scenarios, the growing use of GPUs by enterprises has outpaced the I/O capabilities of underlying storage systems. Enterprises require storage solutions that can provide high-throughput data access to fully leverage the computing power of GPUs. For instance, in smart manufacturing, where high-precision cameras capture images for defect detection models, the training dataset may consist of only 10,000 to 20,000 high-resolution images. Each image has several gigabytes in size, resulting in a total dataset size of 10 TB. If the storage system lacks the required throughput, it becomes a bottleneck during GPU training.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Managing storage for billions of files:&lt;/strong&gt; AI scenarios need storage solutions that can handle and provide quick access to datasets with billions of files. For example, in autonomous driving, the training dataset consists of small images, each about several hundred kilobytes in size. A single training set comprises tens of millions of such images, each sized several hundred kilobytes. Each image is treated as an individual file. The total training data amounts to billions or even 10 billion files. This creates a major challenge in effectively managing large numbers of small files.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalable throughput for hot data:&lt;/strong&gt; In areas like &lt;a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)" rel="noopener noreferrer"&gt;quantitative investing&lt;/a&gt;, financial market data is smaller compared to computer vision datasets. However, this data must be shared among many research teams, leading to hotspots where disk throughput is fully used but still cannot satisfy the application's needs. This shows that we need storage solutions that can handle a lot of hot data quickly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The basic computing environment has also changed a lot.&lt;br&gt;&lt;br&gt;
These days, with cloud computing and Kubernetes getting so popular, more and more AI companies are setting up their data pipelines on &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;-based platforms. Algorithm engineers request resources on the platform, write code in Notebook to debug algorithms, use workflow engines like Argo and Airflow to plan data processing workflows, use Fluid to manage datasets, and use BentoML to deploy models into apps. &lt;a href="https://en.wikipedia.org/wiki/Cloud-native_computing" rel="noopener noreferrer"&gt;&lt;strong&gt;Cloud-native&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;technologies have become a standard consideration when building storage platforms.&lt;/strong&gt; As cloud computing matures, AI businesses are increasingly relying on large-scale distributed clusters. With a significant increase in the number of nodes in these clusters, &lt;strong&gt;storage systems face new challenges related to handling concurrent access from tens of thousands of Pods within Kubernetes clusters.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;IT professionals managing the underlying infrastructure face significant changes brought about by the evolving business scenarios and computing environments. Existing hardware-software coupled storage solutions often suffer from several pain points, such as no elasticity, no distributed high availability, and constraints on cluster scalability. &lt;a href="https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems" rel="noopener noreferrer"&gt;Distributed file systems&lt;/a&gt; like GlusterFS, CephFS, and those designed for HPC such as Lustre, BeeGFS, and GPFS, are typically designed for physical machines and bare-metal disks. While they can deploy large capacity clusters, they cannot provide elastic capacity and flexible throughput, especially when dealing with storage demands in the order of tens of billions of files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key capabilities for AI data storage
&lt;/h2&gt;

&lt;p&gt;Considering these challenges, we’ll outline essential storage capabilities critical for AI scenarios, helping enterprises make informed decisions when selecting storage products.&lt;/p&gt;

&lt;h3&gt;
  
  
  POSIX compatibility and data consistency
&lt;/h3&gt;

&lt;p&gt;In the AI/ML domain, &lt;a href="https://en.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt; is the most common API for data access. Previous-generation distributed file systems, except HDFS, are also POSIX-compatible, but products on the cloud in recent years have not been consistent in their POSIX support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compatibility:&lt;/strong&gt; Users should not solely rely on the description "POSIX-compatible product" to assess compatibility. You can use pjdfstest and the Linux Test Project (LTP) framework for testing. We’ve done a &lt;a href="https://juicefs.com/en/blog/engineering/posix-compatibility-comparison-among-four-file-system-on-the-cloud" rel="noopener noreferrer"&gt;POSIX compatibility test of the cloud file system&lt;/a&gt; for your reference.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Strong data consistency guarantee:&lt;/strong&gt; This is fundamental to ensuring computational correctness. Storage systems have various consistency implementations, with object storage systems often adopting eventual consistency, while file systems typically adhere to strong consistency. Careful consideration is needed when selecting a storage system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User mode or kernel mode:&lt;/strong&gt; Early developers favored kernel mode due to its potential for optimized I/O operations. However, in recent years, we’ve witnessed a growing number of developers "escaping" from kernel mode for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kernel mode usage ties the file system client to specific kernel versions. GPU and high-performance network card drivers often require compatibility with specific kernel versions. This combination of factors places a significant burden on kernel version selection and maintenance.&lt;/li&gt;
&lt;li&gt;Exceptions of kernel mode clients can potentially freeze the host operating system. This is highly unfavorable for Kubernetes platforms.&lt;/li&gt;
&lt;li&gt;The user-mode FUSE library has undergone continuous iterations, resulting in significant performance improvements. It has been well-supported among &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; customers for various business needs, such as autonomous driving perception model training and quantitative investment strategy training. This demonstrates that in AI scenarios, the user-mode FUSE library is no longer a performance bottleneck.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo95b0veaaifez30jbrw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo95b0veaaifez30jbrw.png" alt=" " width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Linear scalability of throughput
&lt;/h3&gt;

&lt;p&gt;Different file systems employ different principles for scaling throughput. Previous-generation distributed storage systems like GlusterFS, CephFS, the HPC-oriented Lustre, BeeGFS, and GPFS primarily use all-flash solutions to build their clusters. &lt;strong&gt;In these systems, peak throughput equals the total performance of the disks in the cluster. To increase cluster throughput, users must scale the cluster by adding more disks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;However, when users have imbalanced needs for capacity and throughput, &lt;strong&gt;traditional file systems require scaling the entire cluster, leading to capacity wastage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, for a 500 TB capacity cluster using 8 TB hard drives with 2 replicas, 126 drives with a throughput of 150 MB/s each are needed. The theoretical maximum throughput of the cluster is 18 GB/s (126 ×150 = 18 GB/s). If the application demands 60 GB/s throughput, there are two options: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Switching to 2 TB HDDs (with 150 MB/s throughput) and requiring 504 drives&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switching to 8 TB SATA SSDs (with 500 MB/s throughput) while maintaining 126 drives&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first solution increases the number of drives by four times, necessitating a corresponding increase in the number of cluster nodes. The second solution, upgrading to SSDs from HDDs, also results in a significant cost increase.&lt;/p&gt;

&lt;p&gt;As you can see, it’s difficult to balance capacity, performance, and cost. Capacity planning based on these three perspectives becomes a challenge, because we cannot predict the development, changes, and details of the real business.&lt;/p&gt;

&lt;p&gt;Therefore, &lt;strong&gt;decoupling storage capacity from performance scaling would be a more effective approach for businesses to address these challenges. When we designed JuiceFS, we considered this requirement&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In addition, handling hot data is a common problem in AI scenarios. JuiceFS employs a cache grouping mechanism to automatically distribute hot data to different cache groups. This means that JuiceFS automatically creates multiple copies of hot data during computation to achieve higher disk throughput, and these cache spaces are automatically reclaimed after computation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing massive amounts of files
&lt;/h3&gt;

&lt;p&gt;Efficiently managing a large number of files, such as 10 billion files has three demands on the storage system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Elastic scalability:&lt;/strong&gt; The real scenario of JuiceFS users is to expand from tens of millions of files to hundreds of millions of files and then to billions of files. This process is not possible by adding a few machines. Storage clusters need to add nodes to achieve &lt;a href="https://www.virtana.com/glossary/what-is-horizontal-scaling/" rel="noopener noreferrer"&gt;horizontal scaling&lt;/a&gt;, enabling them to support business growth effectively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data distribution during horizontal scaling:&lt;/strong&gt; During system scaling, data distribution rules based on directory name prefixes may lead to uneven data distribution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scaling complexity:&lt;/strong&gt; As the number of files increases, the ease of system scaling, stability, and the availability of tools for managing storage clusters become vital considerations. Some systems become more fragile as file numbers reach billions. Ease of management and high stability are crucial for business growth.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Concurrent load capacity and feature support in Kubernetes environments
&lt;/h3&gt;

&lt;p&gt;When we look at the specifications of the storage system, some storage system specifications specify the maximum limit for concurrent access. Users need to conduct stress testing based on their business. When there are more clients, &lt;a href="https://en.wikipedia.org/wiki/Quality_of_service" rel="noopener noreferrer"&gt;quality of service&lt;/a&gt; (QoS) management is required, including traffic control for each client and temporary read/write blocking policies.&lt;/p&gt;

&lt;p&gt;We must also note the design and supported features of CSI in Kubernetes. For example, the deployment method of the mounting process, whether it supports &lt;code&gt;ReadWriteMany&lt;/code&gt;, &lt;code&gt;subPath&lt;/code&gt; mounting, quotas, and hot updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost analysis
&lt;/h3&gt;

&lt;p&gt;Cost analysis is a multifaceted concept, encompassing hardware and software procurement, often overshadowed by operational and maintenance expenses. As AI businesses scale, data volume grows significantly. Storage systems must exhibit both capacity and throughput scalability, offering ease of adjustment.&lt;/p&gt;

&lt;p&gt;In the past, the procurement and scaling of systems like Ceph, Lustre, and BeeGFS in data centers involved lengthy planning cycles. It took months for hardware to arrive, be configured, and become operational. Time costs, notably ignored, were often the most significant expenditures. &lt;strong&gt;Storage systems that enable elastic capacity and performance adjustments equate to faster time-to-market.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another frequently underestimated cost is efficiency. In AI workflows, the data pipeline is extensive, involving multiple interactions with the storage system. Each step, from data collection, clear conversion, labeling, feature extraction, training, backtesting, to production deployment, is affected by the storage system's efficiency. &lt;/p&gt;

&lt;p&gt;However, businesses typically utilize only a fraction (often less than 20%) of the entire dataset actively. This subset of hot data demands high performance, while warm or cold data may be infrequently accessed or not accessed at all. &lt;strong&gt;It’s difficult to satisfy both requirements in systems like Ceph, Lustre, and BeeGFS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consequently, many teams adopt multiple storage systems to cater to diverse needs. A common strategy is to employ an &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; system for archival purposes to achieve large capacity and low costs. However, object storage is not typically known for high performance, and it may handle data ingestion, preprocessing, and cleansing in the data pipeline. While this may not be the most efficient method for data preprocessing, it's often the pragmatic choice due to the sheer volume of data. Engineers then have to wait for a substantial period to transfer the data to the file storage system used for model training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Therefore, in addition to hardware and software costs of storage systems, total cost considerations should account for time costs invested in cluster operations (including procurement and supply chain management) and time spent managing data across multiple storage systems.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Storage system comparison&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here's a comparative analysis of the storage products mentioned earlier for your reference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;POSIX compatibility&lt;/th&gt;
&lt;th&gt;Elastic capacity&lt;/th&gt;
&lt;th&gt;Maximum supported file count&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;th&gt;Cost (USD)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;Partially compatible through S3FS&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Hundreds of billions&lt;/td&gt;
&lt;td&gt;Medium to Low&lt;/td&gt;
&lt;td&gt;About $0.02/GB/ month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Alluxio&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.alluxio.io/os/user/stable/en/api/POSIX-API.html#assumptions-and-limitations" rel="noopener noreferrer"&gt;Partial compatibility&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.alluxio.io/blog/store-1-billion-files-in-alluxio-20/" rel="noopener noreferrer"&gt;1 billion&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Depends on cache capacity&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud file storage service&lt;/td&gt;
&lt;td&gt;Amazon EFS&lt;/td&gt;
&lt;td&gt;NFSv4.1 compatible&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.aws.amazon.com/efs/latest/ug/performance.html" rel="noopener noreferrer"&gt;Depends on the data size. Throughput up to 3 GB/s, maximum 500 MB/s per client&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/efs/pricing/?nc1=h_ls" rel="noopener noreferrer"&gt;$0.043~0.30/GB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;SMB &amp;amp; NFS for Premium&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-scale-targets" rel="noopener noreferrer"&gt;100 million&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Performance scales with data capacity. See &lt;a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-scale-targets" rel="noopener noreferrer"&gt;details&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;&lt;a href="https://azure.microsoft.com/en-us/pricing/details/storage/files/" rel="noopener noreferrer"&gt;$0.16/GiB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;GCP Filestore&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/architecture/filers-on-compute-engine#summary_of_file_server_options" rel="noopener noreferrer"&gt;NFSv3 compatible&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/filestore?hl=en#section-12" rel="noopener noreferrer"&gt;Maxmium 63.9 TB&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/filestore/docs/limits" rel="noopener noreferrer"&gt;Up to 67,108,864 files per 1 TiB capacity&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Performance scales with data capacity. See &lt;a href="https://cloud.google.com/filestore/docs/performance" rel="noopener noreferrer"&gt;details&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/filestore/pricing?hl=zh-cn" rel="noopener noreferrer"&gt;$0.36/GiB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lustre&lt;/td&gt;
&lt;td&gt;Lustre&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Amazon FSx for Lustre&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;Manual scaling, 1,200 GiB increments&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.amazonaws.cn/en/?nc1=h_ls" rel="noopener noreferrer"&gt;Multiple performance types of 50 MB~200 MB/s per 1 TB capacity&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/fsx/lustre/pricing/" rel="noopener noreferrer"&gt;$0.073~0.6/GB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPFS&lt;/td&gt;
&lt;td&gt;GPFS&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;10 billion&lt;/td&gt;
&lt;td&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;BeeGFS&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Billions&lt;/td&gt;
&lt;td&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Cloud Service&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;Elastic capacity, no maximum limit&lt;/td&gt;
&lt;td&gt;10 billion&lt;/td&gt;
&lt;td&gt;Depends on cache capacity&lt;/td&gt;
&lt;td&gt;JuiceFS $0.02/GiB/month + AWS S3 $0.023/GiB/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Over the last decade, &lt;a href="https://en.wikipedia.org/wiki/Cloud_computing" rel="noopener noreferrer"&gt;cloud computing&lt;/a&gt; has rapidly evolved. Previous-generation storage systems designed for data centers couldn't harness the advantages brought by the cloud, notably elasticity. Object storage, a newcomer, offers unparalleled scalability, availability, and cost-efficiency. Still, it exhibits limitations in AI scenarios.&lt;/p&gt;

&lt;p&gt;File storage, on the other hand, presents invaluable benefits for AI and other computational use cases. Leveraging the cloud and its infrastructure efficiently to design the next-generation file storage system is a new challenge, and this is precisely what JuiceFS has been doing over the past five years.&lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>JuiceFS+MinIO: Ariste AI Achieved 3x Faster I/O and Cut Storage Costs by 40%+</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Mon, 15 Dec 2025 03:18:17 +0000</pubDate>
      <link>https://dev.to/daswu/juicefsminio-ariste-ai-achieved-3x-faster-io-and-cut-storage-costs-by-40-1016</link>
      <guid>https://dev.to/daswu/juicefsminio-ariste-ai-achieved-3x-faster-io-and-cut-storage-costs-by-40-1016</guid>
      <description>&lt;p&gt;&lt;a href="https://ariste.ai/" rel="noopener noreferrer"&gt;Ariste AI&lt;/a&gt; is a company specializing in AI-driven trading, with businesses covering proprietary trading, asset management, high-frequency market making, and other fields. In &lt;a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)" rel="noopener noreferrer"&gt;quantitative trading&lt;/a&gt; research, data read speed and storage efficiency often determine the speed of research iteration.  &lt;/p&gt;

&lt;p&gt;In the process of building quantitative research infrastructure, facing market and factor data with a total scale exceeding 500 TB, we went through four stages—from local disks to eventually choosing the &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; file system on top of &lt;a href="https://en.wikipedia.org/wiki/MinIO" rel="noopener noreferrer"&gt;MinIO&lt;/a&gt; object storage. Through caching mechanisms and a layered architecture, we achieved fast access to high-frequency data and centralized management. &lt;strong&gt;This practice validates the feasibility of the integrated solution of cache acceleration + elastic object storage + POSIX compatibility in quantitative scenarios.&lt;/strong&gt; We hope our experience can provide some reference for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage challenges in quantitative investment: balancing scale, speed, and collaboration
&lt;/h2&gt;

&lt;p&gt;The quantitative investment process sequentially includes the data layer, factor and signal layer, strategy and position layer, and execution and trading layer. They form a closed loop from data acquisition to trade execution.&lt;br&gt;&lt;br&gt;
Throughout this process, the storage system faced multiple challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data scale and growth rate:&lt;/strong&gt; Quantitative research requires processing a large total volume of data, covering historical market data, news data, and self-calculated factor data. Currently, the total volume of this data is close to 500 TB. Furthermore, our company adds hundreds of gigabytes of new market data daily. Using traditional disks for storage would clearly be unable to meet such massive data storage demands.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-frequency access and low-latency requirements:&lt;/strong&gt; High-frequency data access relies on low-latency data reads. The data read rate directly determines research efficiency. Faster data reads allow the research process to advance rapidly; conversely, slower reads lead to inefficient research.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-team parallelism and data management:&lt;/strong&gt; During quantitative research, multiple teams often conduct different experiments simultaneously. To ensure the independence and data security of each team's research work, secure isolation is necessary to avoid data confusion and leakage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address the data storage needs of the entire quantitative process and build a future-proof storage system, we wanted to achieve high performance, easy scalability, and management capability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High performance:&lt;/strong&gt; Single-node read/write bandwidth should exceed 500 MB/s, and access latency should be below the local disk perception threshold.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy scalability:&lt;/strong&gt; The solution should support on-demand horizontal scaling of storage and computing resources, enabling smooth elastic scaling without application modification.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management capability:&lt;/strong&gt; The solution should provide one-stop management capabilities for fine-grained permission control, operation auditing, and data lifecycle policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Evolution of the storage architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stage 1: Local disk
&lt;/h3&gt;

&lt;p&gt;In the initial phase of the project, we adopted the QuantraByte research framework. It had a built-in &lt;a href="https://en.wikipedia.org/wiki/Exchange-traded_fund" rel="noopener noreferrer"&gt;exchange-traded fund&lt;/a&gt; (ETF) module allowing data to be stored directly on local disks. This resulted in fast data read speeds. Researchers could directly run the data they needed, and the iteration process was quick.&lt;br&gt;&lt;br&gt;
However, this stage had some issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resource waste from repeated downloads:&lt;/strong&gt; Multiple researchers downloading the same data led to redundant efforts.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insufficient storage capacity:&lt;/strong&gt; Research servers had limited storage capacity, only about 15 TB. This could not meet growing data storage needs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration difficulties:&lt;/strong&gt; The process was not convenient when needing to reuse others' research results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage 2: MinIO centralized management
&lt;/h3&gt;

&lt;p&gt;To solve the problems of the first stage, we introduced MinIO for centralized management. All stored data was centralized on MinIO, with a split-out module handling all data ingestion. Specific factor data was also stored in MinIO. This enabled unified downloads of public data. Permission isolation facilitated multi-team data sharing and improved storage space utilization.  &lt;/p&gt;

&lt;p&gt;However, new bottlenecks emerged in this stage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High latency for high-frequency random reads:&lt;/strong&gt; High-latency I/O operations during high-frequency data access impacted data read speeds.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow reads/writes due to lack of cache:&lt;/strong&gt; Since the MinIO community edition lacked caching, reading and writing high-frequency public data was slow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage 3: Introducing JuiceFS for cache acceleration
&lt;/h3&gt;

&lt;p&gt;To address the above bottlenecks, after thorough research, we finally introduced JuiceFS’ &lt;a href="https://juicefs.com/docs/community/guide/cache" rel="noopener noreferrer"&gt;cache&lt;/a&gt; acceleration solution. This solution involved mounting via client-side local RAID5 storage. &lt;strong&gt;With an efficient caching mechanism, it improved read/write performance by about three times. This significantly enhanced the access experience for high-frequency shared data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5qmagek7lw7rkacjfry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5qmagek7lw7rkacjfry.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As application data volume surpassed 300 TB, the scaling limitations of local storage became apparent. Since data was stored locally, scaling required reconfiguring storage devices. Scaling under a RAID5 architecture was slow and risky, making it difficult to meet the needs of continuous application growth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4: JuiceFS + MinIO cluster
&lt;/h3&gt;

&lt;p&gt;To solve the scaling challenge, we ultimately adopted the JuiceFS + MinIO cluster architecture. This solution offers the following advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sustained high performance:&lt;/strong&gt; JuiceFS provides good caching capability, fully meeting the performance demands of high-frequency data access scenarios.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy cluster scaling:&lt;/strong&gt; Based on the clustered solution, we quickly achieved horizontal scaling. Simply by adding disks of the same type, we can flexibly increase storage capacity. This greatly enhances system scalability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq08kj4ctgkxho7l6jz98.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq08kj4ctgkxho7l6jz98.png" alt=" " width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Through this four-stage evolution, we validated the feasibility of the integrated solution combining cache acceleration, elastic &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt;, and &lt;a href="https://en.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt; compatibility in quantitative scenarios.&lt;/strong&gt; This solution can provide the industry with a replicable, implementable best practice template, achieving an excellent balance between performance, cost, and management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and cost benefits
&lt;/h2&gt;

&lt;p&gt;By adopting the combined storage architecture of JuiceFS and MinIO, our system bandwidth and resource utilization efficiency greatly improved. Now they fully meet the storage performance requirements of the research application. After introducing the JuiceFS cache layer, backtesting task execution efficiency increased dramatically. &lt;strong&gt;The time required for backtesting 100 million entries of tick data was reduced from hours to tens of minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foigxqmdzzmq1oxltgeex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foigxqmdzzmq1oxltgeex.png" alt=" " width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We implemented a tiered storage strategy for managing the data lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hot data (0-90 days): This tier handles data that is accessed frequently. To ensure maximum performance, it’s automatically cached on local SSDs.
&lt;/li&gt;
&lt;li&gt;Warm data (90-365 days): Data with medium access frequency resides here. It’s stored on MinIO's standard object storage drives, striking an optimal balance between cost and performance.
&lt;/li&gt;
&lt;li&gt;Cold data (&amp;gt;365 days): This tier is for rarely accessed, archival data. It’s automatically migrated to a low-frequency access storage layer, which is compatible with the S3 Glacier strategy. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on this tiered storage strategy, we achieved a smooth transition from higher to lower storage unit costs. &lt;strong&gt;Our overall storage costs were reduced by over 40%.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multi-tenant management
&lt;/h3&gt;

&lt;p&gt;Regarding data isolation and permission management, we’ve established a comprehensive management system:&lt;br&gt;&lt;br&gt;
Logical isolation is achieved through namespaces, using path planning like &lt;code&gt;/factor/A&lt;/code&gt; and &lt;code&gt;/factor/B&lt;/code&gt; to ensure clear data boundaries for each application. For permission control, we implemented fine-grained management across three dimensions: user, team, and project. They seamlessly integrate with the POSIX &lt;a href="https://en.wikipedia.org/wiki/Access-control_list" rel="noopener noreferrer"&gt;ACL&lt;/a&gt; permission system.  &lt;/p&gt;

&lt;p&gt;We’ve also established a complete audit log system. It enables real-time tracking of access behaviors and historical change backtracking. This fully meets compliance requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability and automated operations
&lt;/h3&gt;

&lt;p&gt;We built a complete monitoring system around four critical metrics: cache hit rate, I/O throughput, I/O latency, and write retry rate. The system automatically triggers alerts when metrics are abnormal.  &lt;/p&gt;

&lt;p&gt;We implemented closed-loop operations management based on Grafana to continuously monitor node health and storage capacity. Before each scaling operation, we did simulated stress tests to verify system capacity and ensure no application impact. The overall operations system achieves high-standard goals of automation, predictability, and rollback capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data update design in the backtesting system
&lt;/h3&gt;

&lt;p&gt;In our &lt;a href="https://en.wikipedia.org/wiki/Backtesting" rel="noopener noreferrer"&gt;backtesting&lt;/a&gt; system design, we adopted an architecture based on directed acyclic graphs (DAGs) to improve computational efficiency and maintainability. This framework centers on computational nodes and dependency relationships, abstracting data processing, feature calculation, signal generation, and other steps into nodes, all managed uniformly through a dependency graph. The system has a built-in version control mechanism. When data versions are updated, the dependency graph automatically identifies affected nodes, precisely locating parts that need recalculation, thereby enabling efficient incremental updates and result traceability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjqfdkkymz5sg19svxpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjqfdkkymz5sg19svxpf.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Plans for the future
&lt;/h2&gt;

&lt;p&gt;In future planning, we’ll continue to optimize the storage architecture in the following three aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata high availability upgrade:&lt;/strong&gt; We plan to migrate metadata storage from Redis to TiKV or PostgreSQL to build a cross-data-center high-availability architecture. This can significantly improve our system disaster recovery and rapid recovery capabilities.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/learn/what-is-hybrid-cloud" rel="noopener noreferrer"&gt;&lt;strong&gt;Hybrid cloud&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;tiered storage:&lt;/strong&gt; By integrating with public cloud S3 and Glacier storage services, we aim to build an intelligent hot/cold tiering system to achieve unlimited storage elasticity while optimizing costs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified management for the research data lake:&lt;/strong&gt; We’ll build a unified research data lake platform, integrating core services such as schema registration, automatic data cleansing, and unified catalog management. We hope to comprehensively improve the discoverability and management efficiency of data assets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
