<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DASWU</title>
    <description>The latest articles on DEV Community by DASWU (@daswu).</description>
    <link>https://dev.to/daswu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F973459%2F8329f19d-864e-41c4-bb30-a5a0d23e63ce.jpeg</url>
      <title>DEV Community: DASWU</title>
      <link>https://dev.to/daswu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/daswu"/>
    <language>en</language>
    <item>
      <title>How D-Robotics Manages Massive Small Files in a Multi-Cloud Environment with JuiceFS</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 06 Mar 2026 06:44:16 +0000</pubDate>
      <link>https://dev.to/daswu/how-d-robotics-manages-massive-small-files-in-a-multi-cloud-environment-with-juicefs-5a3f</link>
      <guid>https://dev.to/daswu/how-d-robotics-manages-massive-small-files-in-a-multi-cloud-environment-with-juicefs-5a3f</guid>
      <description>&lt;p&gt;&lt;a href="https://en.d-robotics.cc/" rel="noopener noreferrer"&gt;D-Robotics&lt;/a&gt;, founded in 2024 and spun off from &lt;a href="https://en.wikipedia.org/wiki/Horizon_Robotics" rel="noopener noreferrer"&gt;Horizon Robotics&lt;/a&gt;' robotics division, specializes in the research and development of foundational computing platforms for consumer-grade robots. In 2025, we released an &lt;a href="https://www.nvidia.com/en-us/glossary/embodied-ai/" rel="noopener noreferrer"&gt;embodied AI&lt;/a&gt; foundation model.  &lt;/p&gt;

&lt;p&gt;In robot data management, training, and inference, the sheer volume of data is immense. Using object storage presents challenges such as handling small files and managing multi-cloud data. After trying some solutions and replacing private MinIO with SSD storage, we still faced difficulties in addressing these challenges. Ultimately, we selected &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; as our core storage solution.  &lt;/p&gt;

&lt;p&gt;JuiceFS' inherent adaptability for cross-cloud operations efficiently supports data sharing needs in multi-cloud environments. In training scenarios, JuiceFS' cache mechanism, specifically designed for small file data, effectively replaces traditional caching solutions while achieving a cost-effective balance between cost and efficiency, fully meeting storage performance requirements. Currently, we manage tens of millions of files.  &lt;/p&gt;

&lt;p&gt;In this article, we’ll share our application characteristics, storage pain points, solution selection, implementation practices, and production tuning experiences. We hope our experience offers useful insights for those facing similar challenges in the industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage pain points in the robotics industry
&lt;/h2&gt;

&lt;p&gt;The cloud platform serves as our core technical hub, undertaking key application functions such as simulation environment setup, data generation and &lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;model training&lt;/a&gt;, model lightweighting and deployment, and visual verification. The data types involved in the platform are diverse, mainly including sensor image data, LiDAR point cloud data, model weights and configuration data, motor operational data, and map construction data.  &lt;/p&gt;

&lt;p&gt;While &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; meets basic storage needs for massive data, its performance limitations become particularly obvious when handling the massive small files frequently encountered in robotics applications. Our storage system faced four challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata performance bottleneck with massive small files:&lt;/strong&gt; Robot model training involves tens of millions to billions of sensor images, LiDAR data, and model files. Traditional object storage (like standard S3) exhibits significant metadata operation bottlenecks at this scale. The fixed API latency for routine operations like listing files or retrieving attributes is typically 10–30 ms. This directly constrains queries per second (QPS) performance during training and inference and impacts overall R&amp;amp;D efficiency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inefficient &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt; collaboration and data flow:&lt;/strong&gt; As robotics companies increasingly adopt multi-cloud architectures for their R&amp;amp;D and production applications, ensuring efficient data synchronization and sharing across different cloud platforms and geographical regions has become a common challenge for the industry. Traditional storage solutions typically suffer from low cross-cloud data transfer efficiency and are often deeply integrated with a single cloud provider. This leads to technical lock-in and makes it difficult to achieve flexible cross-cloud deployment and data collaboration.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The impossible trinity of performance, cost, and operations:&lt;/strong&gt; High-performance parallel file systems offer high throughput and low latency but typically rely on all-flash arrays or dedicated hardware. This leads to high hardware investment and ongoing operational costs, plus complex deployment. Low-cost object storage offers good elasticity but is difficult to support the high-throughput I/O demands of GPU clusters in AI training scenarios. A common industry workaround is using a high-speed file system as a cache synchronized with S3. However, the extra data synchronization steps significantly reduce usability and fail to achieve efficient storage-compute synergy.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difficulty in dataset version management:&lt;/strong&gt; The rapid iteration cycle of robot models requires efficient and granular management of multiple dataset versions. Using physical copies for version control directly leads to exponentially higher underlying storage consumption, significantly increasing costs. Moreover, the difficulty of retrieving, reusing, and maintaining multi-version data also increases substantially.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage selection: JuiceFS vs. MinIO/S3 vs. PFS
&lt;/h2&gt;

&lt;p&gt;To address these storage challenges, we established a clear evaluation framework for storage selection. A comprehensive comparative test was conducted on mainstream storage solutions across seven core dimensions: storage architecture, protocol compatibility, metadata performance, scalability, multi-cloud adaptability, cost efficiency, and operational complexity.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Comparison basis&lt;/th&gt;
&lt;th&gt;JuiceFS&lt;/th&gt;
&lt;th&gt;MinIO / Public Cloud S3&lt;/th&gt;
&lt;th&gt;CephFS / Public Cloud FS (CPFS)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage architecture&lt;/td&gt;
&lt;td&gt;Separation of metadata and data&lt;/td&gt;
&lt;td&gt;Unified object storage&lt;/td&gt;
&lt;td&gt;Metadata and data typically coupled, often with kernel-space parallel design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol support&lt;/td&gt;
&lt;td&gt;Full compatibility: POSIX, HDFS, S3 API, Kubernetes CSI&lt;/td&gt;
&lt;td&gt;Primarily S3 API, with weak POSIX compatibility&lt;/td&gt;
&lt;td&gt;POSIX-oriented; HDFS or S3 compatibility often requires plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metadata performance&lt;/td&gt;
&lt;td&gt;Very high: sub-millisecond latency, supports hundreds of billions of files per volume&lt;/td&gt;
&lt;td&gt;Lower: high metadata overhead for massive small files; API call overhead about 10–30 ms&lt;/td&gt;
&lt;td&gt;Medium to high: performance bottlenecks and complexity challenges at ultra-large scale (100M+ files)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability&lt;/td&gt;
&lt;td&gt;High: horizontal scaling, supports tens to hundreds of billions of files per volume&lt;/td&gt;
&lt;td&gt;High: near-infinite storage capacity, but small-file management efficiency degrades with scale&lt;/td&gt;
&lt;td&gt;Moderate: scaling limited by metadata nodes; operational complexity grows exponentially with scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-cloud adaptability&lt;/td&gt;
&lt;td&gt;Native support&lt;/td&gt;
&lt;td&gt;Relies on sync tools; cross-cloud data flow inefficient; global unified view difficult&lt;/td&gt;
&lt;td&gt;Limited: often tightly bound to specific hardware or cloud provider; cross-cloud deployment is complex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost efficiency&lt;/td&gt;
&lt;td&gt;High performance-to-cost ratio&lt;/td&gt;
&lt;td&gt;Low (storage only): cheap storage, but low GPU utilization in high-throughput scenarios like AI training&lt;/td&gt;
&lt;td&gt;High: often requires all-flash architecture or dedicated hardware; high operational labor cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Based on the comparison results above, JuiceFS demonstrates significant advantages in core performance, scalability, multi-cloud adaptability, and cost efficiency. This makes it the preferred choice for our unified storage solution.&lt;br&gt;&lt;br&gt;
Furthermore, JuiceFS has been widely adopted in the &lt;a href="https://juicefs.com/en/blog?tag=AI%20storage" rel="noopener noreferrer"&gt;autonomous driving&lt;/a&gt; industry. Leading companies such as Horizon Robotics have leveraged JuiceFS to manage data at the exabyte scale. This demonstrates its maturity and effectiveness in large-scale production environments.&lt;br&gt;&lt;br&gt;
For our specific application scenarios, JuiceFS' core technical advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decoupled architecture:&lt;/strong&gt; JuiceFS adopts a metadata-data separation architecture, persisting data in cost-effective object storage (like S3 or OSS) while storing metadata in databases like Redis or TiKV. This decoupled design enables elastic storage scaling and reduces dependence on any single cloud provider.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking and caching mechanisms:&lt;/strong&gt; JuiceFS &lt;a href="https://juicefs.com/docs/community/architecture#how-juicefs-store-files" rel="noopener noreferrer"&gt;uses chunks, slices, and blocks&lt;/a&gt; to significantly improve small file read efficiency and enhance concurrent read/write performance. In addition, multi-level caching (memory, local SSD, distributed cache) reduces access latency for hot data. This meets the demands of high-throughput training workloads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-native adaptability:&lt;/strong&gt; By providing a &lt;a href="https://juicefs.com/docs/csi/introduction" rel="noopener noreferrer"&gt;CSI Driver&lt;/a&gt;, JuiceFS delivers persistent storage decoupled from compute nodes in Kubernetes environments, supporting stateless container deployment and cross-cloud migration. It enables data sharing, enhances application high availability and flexibility, and adapts to various Kubernetes deployment methods.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-stack support for AI training:&lt;/strong&gt; JuiceFS fully supports POSIX, HDFS, and S3 API, and is compatible with mainstream AI frameworks such as PyTorch and TensorFlow. It can be integrated without code modifications, lowering the technical barrier for adoption.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-cloud support:&lt;/strong&gt; Its cross-cloud capabilities and high-performance metadata engine ensure efficient data flow, perfectly aligning with our strategy of "computing power on demand."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;From a cost perspective, JuiceFS does not offer a significant cost advantage in the early stages of small-scale deployment. However, when data volume reaches the petabyte level—especially at the 10 PB or 100 PB scale—and is compared against all-flash storage solutions, its cost-efficient architecture built on object storage becomes fully evident.&lt;/strong&gt; In addition, JuiceFS requires minimal operational overhead. Currently, we need only one engineer to manage the entire cloud platform and storage system, a fraction of the personnel required by traditional solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Community Edition to Enterprise Edition: addressing larger-scale scenarios
&lt;/h2&gt;

&lt;p&gt;As our application continued to expand, we encountered limitations when using Redis as the &lt;a href="https://juicefs.com/docs/community/databases_for_metadata/" rel="noopener noreferrer"&gt;metadata engine&lt;/a&gt;—specifically, physical memory capacity constrained data scalability. When the number of files approached the hundred-million level, metadata query latency increased significantly. This impacted the concurrency efficiency of training tasks. After using the clone feature, the metadata volume grew substantially. In addition, in cross-cloud scenarios, we faced higher demands for metadata synchronization and mirror file system capabilities. We also required more granular capacity controls and permission management at the directory level.  &lt;/p&gt;

&lt;p&gt;Considering these requirements—along with our desire to leverage local SSDs on GPU nodes to build a distributed cache layer for improved performance—we decided to deploy &lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; in parallel, migrating core scenarios such as ultra-large-scale directory management and multi-node collaborative training to this version. Through this scenario-based approach, we’ve effectively enhanced the adaptability of our overall storage system and established a solid foundation for future application growth. Below are the key features of the Enterprise Edition that we’ve applied in real-world scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-performance metadata engine: solving the bottleneck of large-scale directory retrieval
&lt;/h3&gt;

&lt;p&gt;For high-frequency operations such as traversing directories with hundreds of millions of files and deep pagination queries, we previously encountered the "slower as you query" problem with traditional storage solutions. When the number of files in a single directory exceeded 10 million, and the pagination offset surpassed 100,000 entries, response latency would spike from hundreds of milliseconds to several seconds. This severely impacted data filtering efficiency.  &lt;/p&gt;

&lt;p&gt;After switching to JuiceFS Enterprise Edition, its native tree-structured metadata storage architecture played a key role. Unlike the flat key-value storage used—which stores file metadata in a disordered manner—this tree structure allows direct navigation to directory levels, reducing the scope of metadata scans. In our actual tests, deep pagination queries (with an offset of 500,000 entries) in a directory containing 120 million files saw latency drop from 3.8 seconds to just 210 milliseconds. This fully met the retrieval needs of large-scale datasets. In addition, this engine supports storing hundreds of billions of files per volume, and we’ve already used it to manage three petabyte-scale training datasets stably, aligning with our application growth expectations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise-grade distributed cache: improving data sharing efficiency in multi-node, multi-GPU training
&lt;/h3&gt;

&lt;p&gt;In multi-node, multi-GPU training scenarios, we previously faced challenges such as low cache hit rates and cross-node bandwidth congestion. The open-source version only supports local caching on each node. This means that when multiple nodes pull the same dataset simultaneously, each node must access object storage independently. This resulted in single-node bandwidth utilization exceeding 90%, with average training job startup delays of up to 20 minutes.  &lt;/p&gt;

&lt;p&gt;With JuiceFS Enterprise Edition's &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache/" rel="noopener noreferrer"&gt;distributed caching&lt;/a&gt; feature, we set up a distributed cache across a 12-node training cluster using just three commands. The dataset only needs to be pulled from object storage once and is cached in a pool built from local SSDs across the nodes. As a result, &lt;strong&gt;the cache hit rate for multi-node collaborative training increased from 45% to 92%, cross-node bandwidth utilization dropped to below 15%, and training job startup time was reduced to under three minutes&lt;/strong&gt;. This significantly improved compute utilization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced cross-cloud collaboration: building a low-operational-cost cross-cloud data foundation
&lt;/h3&gt;

&lt;p&gt;Since our R&amp;amp;D environments are distributed across two cloud environments, we previously encountered challenges with &lt;strong&gt;slow cross-cloud data synchronization and high operational costs&lt;/strong&gt;. Using traditional synchronization tools to maintain data consistency between the two clouds required configuring eight scheduled tasks, with an average synchronization delay of four hours, and dedicated personnel needed to investigate sync failures weekly.  &lt;/p&gt;

&lt;p&gt;By using the JuiceFS sync tool combined with our internal AI operations tools, we achieved automated configuration of synchronization policies. The system automatically adjusts sync priorities based on data heat levels, keeping cross-cloud data latency within 10 minutes. In addition, tasks such as failure retries and log alerts for synchronization are fully automated, eliminating the need for dedicated monitoring. &lt;strong&gt;This has reduced operational overhead by 70%&lt;/strong&gt;, and we now stably support multiple training projects across two cloud platforms sharing the same dataset. Going forward, we plan to use the Enterprise Edition's mirror file system feature to further enhance cross-cloud data collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Client cache and write performance tuning
&lt;/h3&gt;

&lt;p&gt;We need to pay attention to compatibility issues between caching strategies and Kubernetes resource limits. For example, using memory as a local cache path with improper configuration may lead to abnormal memory growth in the Mount Pod, or insufficient resource quota reservations may cause checkpoint loss or file handle write exceptions during long-running training tasks.  &lt;/p&gt;

&lt;p&gt;Regarding write performance tuning, enabling writeback mode can improve small file write throughput to some extent. However, considering production environment requirements for data consistency, we still adopt write-through synchronous mode to reduce data risks in extreme crash scenarios. It’s recommended to cautiously enable writeback mode only in scenarios with lower data reliability requirements, such as temporary computing or offline data cleaning, based on actual needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment and network topology optimization
&lt;/h3&gt;

&lt;p&gt;For more stable performance, it’s strongly recommended to deploy the metadata engine and compute nodes within the same region during deployment. In actual operations, we observed that cross-region deployment could increase metadata operation latency by several to ten times. This significantly impacted I/O-intensive operations such as data decompression. Deploying metadata services and GPU computing resources within the same region helps maintain performance while controlling network transmission costs, improving overall resource utilization efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data warm-up and cache optimization
&lt;/h3&gt;

&lt;p&gt;In a 10-gigabit network environment, fully utilizing JuiceFS' data &lt;a href="https://juicefs.com/docs/cloud/reference/command_reference/#warmup" rel="noopener noreferrer"&gt;warm-up&lt;/a&gt; and reasonably adjusting data block sizes based on application scenarios can better leverage network bandwidth capabilities and improve read throughput. Combined with the distributed cache architecture, this can effectively enhance data sharing efficiency in multi-node concurrent scenarios and improve cache hit rates during high-concurrency reads. This thereby optimizes the overall performance of large-scale AI training tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource quotas and high availability guarantee
&lt;/h3&gt;

&lt;p&gt;In enterprise-level multi-role operations and storage responsibility separation scenarios, to avoid operational risks caused by inconsistent configurations, it’s recommended to finely control resource quotas for &lt;a href="https://juicefs.com/docs/csi/introduction/" rel="noopener noreferrer"&gt;JuiceFS CSI Driver&lt;/a&gt; in Kubernetes environments. By appropriately setting CPU and memory request/limit for Mount Pods, Pod restarts or node anomalies caused by resource preemption can be reduced. In practice, resource reservation ratios can be dynamically adjusted based on cluster load.  &lt;/p&gt;

&lt;p&gt;In addition, for scenarios with high application continuity requirements, the automatic mount point recovery feature for Mount Pods can be enabled to achieve automated fault recovery for storage services, further ensuring underlying storage stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-tenancy
&lt;/h3&gt;

&lt;p&gt;We provide independent &lt;a href="https://en.wikipedia.org/wiki/File_system" rel="noopener noreferrer"&gt;file systems&lt;/a&gt; and storage buckets for large enterprise customers, while achieving isolation for small and medium-sized enterprises and end users through subdirectory-level directory isolation and permission control.  &lt;/p&gt;

&lt;p&gt;Large enterprises can flexibly scale throughput and capacity, avoiding performance bottlenecks associated with shared storage buckets. For small and medium-sized enterprises and end users, we ensure data security and independence through subdirectory isolation and permission control, while enabling accurate metering and billing.  &lt;/p&gt;

&lt;p&gt;This architecture ensures tenant isolation while flexibly allocating resources, improving system management efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Version management
&lt;/h3&gt;

&lt;p&gt;Using the &lt;code&gt;juicefs clone&lt;/code&gt; command, copies of original datasets can be quickly created and modified independently without affecting the source data. The clone operation only copies file metadata, while data only stores additional changes, saving underlying storage space. This feature supports managing multiple versions, facilitating rollback and recovery and ensuring data security and version control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;JuiceFS' characteristics in metadata performance, scalability, cross-cloud adaptability, and comprehensive cost efficiency have made it our choice for building a unified storage layer. Currently, we adopt both JuiceFS Community Edition and Enterprise Edition to accommodate different storage requirements across various application scenarios. &lt;/p&gt;

&lt;p&gt;In the future, we plan to further implement JuiceFS in the embodied intelligence field, addressing specific storage needs in this scenario. These include high-throughput processing of time-series data, precise multi-modal data alignment, edge-cloud collaborative storage, and integrated management of simulation and real-world data.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>JuiceFS Enterprise 5.3: 500B+ Files per File System &amp; RDMA Support</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Wed, 04 Feb 2026 09:18:32 +0000</pubDate>
      <link>https://dev.to/daswu/juicefs-enterprise-53-500b-files-per-file-system-rdma-support-51g9</link>
      <guid>https://dev.to/daswu/juicefs-enterprise-53-500b-files-per-file-system-rdma-support-51g9</guid>
      <description>&lt;p&gt;&lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; 5.3 has recently been released, achieving a milestone breakthrough by &lt;strong&gt;supporting over 500 billion files in a single file system&lt;/strong&gt;. This upgrade includes several key optimizations to the metadata multi-zone architecture and introduces remote direct memory access (RDMA) technology for the first time to enhance distributed caching efficiency. In addition, version 5.3 enhances write support for mirrors and provides data caching for objects imported across buckets. It aims to support high-performance requirements and multi-cloud application scenarios.  &lt;/p&gt;

&lt;p&gt;JuiceFS Enterprise Edition is designed for high-performance scenarios. Since 2019, it has been applied in machine learning and has become one of the core infrastructures in the AI industry. Its customers include large language model (LLM) companies such as &lt;a href="https://juicefs.com/en/blog/user-stories/minimax-foundation-model-ai-storage" rel="noopener noreferrer"&gt;MiniMax&lt;/a&gt; and &lt;a href="https://juicefs.com/en/blog/user-stories/artificial-intelligence-storage-large-language-model-multimodal" rel="noopener noreferrer"&gt;StepFun&lt;/a&gt;; AI infrastructure and applications like &lt;a href="https://fal.ai/" rel="noopener noreferrer"&gt;fal&lt;/a&gt; and &lt;a href="https://www.google.com/aclk?sa=L&amp;amp;pf=1&amp;amp;ai=DChsSEwiRiciyqLKSAxUDGnsHHWLjLNUYACICCAEQABoCdG0&amp;amp;co=1&amp;amp;ase=2&amp;amp;gclid=Cj0KCQiAp-zLBhDkARIsABcYc6uhhwJ9tC5nV4bZaVnUn0rp3n5supJemQ56IlmqotNNBXwOu7nj45YaAoHLEALw_wcB&amp;amp;cid=CAASWeRoj18s_qlU1Snul0wlY3LceuDdnBzqF0JeaQvpy0BRPWlYRSWMNuJeTIC5wRaiAQ_Y5fSwz0TyEorzHk_5RryIFwySfWj-3W4JosAnIYhsmmi-OBv4bDQr&amp;amp;cce=2&amp;amp;category=acrcp_v1_32&amp;amp;sig=AOD64_2ED5eS2SjlfQ6Or__-vtFR2mN9Aw&amp;amp;q&amp;amp;nis=4&amp;amp;adurl=https://www.heygen.com/?sid%3Drewardful%26via%3D8866%26gad_source%3D1%26gad_campaignid%3D23447523769%26gbraid%3D0AAAAA-C7PAeKAlXawSoqSIYo1bVn-PoAD%26gclid%3DCj0KCQiAp-zLBhDkARIsABcYc6uhhwJ9tC5nV4bZaVnUn0rp3n5supJemQ56IlmqotNNBXwOu7nj45YaAoHLEALw_wcB&amp;amp;ved=2ahUKEwitwMCyqLKSAxVKcfUHHRrvBfUQ0Qx6BAgMEAE" rel="noopener noreferrer"&gt;HeyGen&lt;/a&gt;; autonomous driving companies like &lt;a href="https://www.momenta.cn/" rel="noopener noreferrer"&gt;Momenta&lt;/a&gt; and &lt;a href="https://en.horizon.auto/" rel="noopener noreferrer"&gt;Horizon Robotics&lt;/a&gt;; and numerous leading technology enterprises across various industries leveraging AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Single file system supports 500 billion+ files
&lt;/h2&gt;

&lt;p&gt;The multi-zone architecture is one of JuiceFS' key technologies for handling hundreds of billions of files, ensuring high scalability and concurrent processing capabilities. &lt;strong&gt;To meet the growing demands of scenarios like &lt;a href="https://en.wikipedia.org/wiki/Self-driving_car" rel="noopener noreferrer"&gt;autonomous driving&lt;/a&gt;, version 5.3 introduces in-depth optimizations to the multi-zone architecture, increasing the zone limit to 1,024 and enabling a single file system to store and access at least 500 billion files&lt;/strong&gt; (each zone can store 500 million files, with a maximum of 2 billion).  &lt;/p&gt;

&lt;p&gt;The figure below shows JuiceFS Enterprise Edition architecture, with a single zone in the lower left corner:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdoz2btx939gfoc0rr5x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdoz2btx939gfoc0rr5x.png" alt=" " width="784" height="702"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This breakthrough presents exponentially increasing challenges in system performance, data consistency, and stability, backed by a series of complex underlying optimizations and R&amp;amp;D efforts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-zone hotspot balancing: automated monitoring and hotspot migration, with manual ops tools
&lt;/h3&gt;

&lt;p&gt;In distributed systems, hotspots are a common challenge. Especially when data is distributed across multiple zones, some zones may experience higher loads than others. This leads to imbalance that impacts system performance.  &lt;/p&gt;

&lt;p&gt;When the number of zones reaches hundreds, hotspot issues become more prevalent. Particularly with smaller datasets and larger numbers of files, read/write hotspots exacerbate latency fluctuations.  &lt;/p&gt;

&lt;p&gt;We introduced an automated hotspot migration mechanism to move frequently accessed files to other zones, distributing the load and reducing pressure on specific zones. However, in practice, relying solely on automated migration cannot fully resolve all issues. In certain special or extreme scenarios, automated tools may not respond promptly. &lt;strong&gt;Therefore, alongside automated monitoring and migration, we added manual operational tools, allowing administrators to intervene in complex scenarios, perform manual analysis, and implement optimization solutions.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Large-scale migration: improved migration speed, small-batch concurrent migration
&lt;/h3&gt;

&lt;p&gt;Facing zones with excessive hotspots, early migration operations were simple. However, as the system scale expanded, migration efficiency gradually decreased. &lt;strong&gt;To address this, we introduced a small-batch concurrent migration strategy&lt;/strong&gt;, breaking down high-access directories into smaller chunks and migrating them in parallel to multiple lower-load zones. This quickly scatters hotspots and restores normal application access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced reliability self-checks: automatic repair and cleanup of intermediate migration states
&lt;/h3&gt;

&lt;p&gt;In large-scale clusters, the probability of distributed transaction failures increases significantly, especially during extensive migration processes. To address this, &lt;strong&gt;we enhanced reliability detection mechanisms, adding periodic background checks to scan cross-zone file states, particularly focusing on intermediate state issues, and automatically performing repairs and cleanup&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Previously, the system encountered issues with residual intermediate state data. While these did not affect operations in the short term, over time they could lead to errors. Through enhanced self-check mechanisms, we ensure the background periodically scans and promptly handles intermediate state issues, improving system stability and reliability.  &lt;/p&gt;

&lt;p&gt;Beyond the three key optimizations above, we also made multiple improvements to the console to better adapt to managing more zones. We optimized concurrent processing, operational tasks, and query displays, enhancing overall performance and user experience. Specifically, we refined UI design to better showcase system states in large-scale zone environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance stress test for hundreds of billions of files
&lt;/h3&gt;

&lt;p&gt;We conducted large-scale tests using a custom &lt;a href="https://github.com/llnl/mdtest" rel="noopener noreferrer"&gt;mdtest&lt;/a&gt; tool on Google Cloud, deploying 60 nodes, each with over 1 TB of memory. In terms of software configuration, we increased the number of zones to 1,024. The deployment method was similar to previous setups, but to reduce memory consumption, we deployed only one service process, with two others as cold backups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4q88quw1c8m67prty9o3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4q88quw1c8m67prty9o3.png" alt=" " width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;JuiceFS Enterprise Edition 5.3 test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test duration: Approximately 20 hours
&lt;/li&gt;
&lt;li&gt;Total files written: About 400 billion files
&lt;/li&gt;
&lt;li&gt;Write speed: 5 million files per second
&lt;/li&gt;
&lt;li&gt;Memory usage: About 35% to 40%
&lt;/li&gt;
&lt;li&gt;Disk usage: 40% to 50%, primarily for metadata persistence, with good utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on our experience, if using a configuration with one service process, one hot backup, and one cold backup, memory usage increases by 20% to 30%.  &lt;/p&gt;

&lt;p&gt;Due to limited cloud resources, this test only wrote up to 400 billion files. During stress testing, the system performed stably, with hardware resources still remaining. We’ll continue to attempt larger-scale tests in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Support for RDMA: increased bandwidth cap, reduced CPU usage
&lt;/h2&gt;

&lt;p&gt;This new version introduces support for &lt;a href="https://en.wikipedia.org/wiki/Remote_direct_memory_access" rel="noopener noreferrer"&gt;RDMA&lt;/a&gt; technology for the first time. Its basic architecture is shown in the diagram below. RDMA allows direct access to remote node memory, bypassing the operating system's network protocol stack. This significantly improves data transfer efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhkkiori1vvxfvaekmoi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhkkiori1vvxfvaekmoi.png" alt=" " width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main advantages of RDMA include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low latency:&lt;/strong&gt; By enabling direct memory-to-memory transfers and bypassing the OS network protocol layers, it reduces CPU interrupts and context switches. This lowers latency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High throughput:&lt;/strong&gt; RDMA uses hardware for direct data transfer, better utilizing the bandwidth of network interface cards (NICs).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced CPU usage:&lt;/strong&gt; In RDMA, data copying is almost entirely handled by the NIC, with the CPU only processing control messages. This allows the NIC to handle hardware transfers, freeing up CPU resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In JuiceFS, network request messages between clients and metadata services are small, and existing TCP configurations already meet the needs. However, in &lt;a href="https://en.wikipedia.org/wiki/Distributed_cache" rel="noopener noreferrer"&gt;distributed caching&lt;/a&gt;, file data is transferred between clients and cache nodes. Using RDMA can effectively improve transfer efficiency and reduce CPU consumption.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkr7vblmcuoa5eo2f5hv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkr7vblmcuoa5eo2f5hv.png" alt=" " width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We conducted 1 MB random read tests using 160 Gbps NICs, comparing versions 5.1, 5.2 (using TCP networking) with version 5.3 (RDMA), and observed CPU usage.  &lt;/p&gt;

&lt;p&gt;Tests showed that RDMA effectively reduces CPU usage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In version 5.2, CPU usage was nearly 50%.
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;In version 5.3, with RDMA optimization, CPU usage dropped to about one-third. Client and cache node CPU usage decreased to 8 cores and 5 cores respectively, with bandwidth reaching 20 GiB/s.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In previous tests, we found that while TCP ran stably on 200G NICs, fully saturating bandwidth was challenging, typically achieving only 85%-90% utilization. &lt;strong&gt;For customers requiring higher bandwidth (such as 400G NICs), TCP could not meet demands. However, RDMA can more easily reach hardware bandwidth limits, providing better transfer efficiency.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;If customers have RDMA-capable hardware and high bandwidth requirements (for example, NICs greater than 100G) and wish to reduce CPU usage, RDMA is a technology worth trying. Currently, our RDMA feature is in public testing and has not yet been widely deployed in production environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enhanced write support for mirrors
&lt;/h2&gt;

&lt;p&gt;Initially, &lt;a href="https://juicefs.com/docs/cloud/guide/mirror/" rel="noopener noreferrer"&gt;mirror&lt;/a&gt; clusters were primarily used for read-only mirroring in enterprise products. As users requested capabilities like writing temporary files (such as training data) in mirrors, we provided write support for mirrors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd567qgcebpr8pma2xgg5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd567qgcebpr8pma2xgg5.png" alt=" " width="590" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mirror client implements a read-write separation mechanism. When reading data, the client prioritizes fetching from the mirror cluster to reduce latency. When writing data, it still writes to the source cluster to ensure data consistency. By recording and comparing metadata version numbers, we ensure strong consistency between the mirror client and source cluster client views of the data.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To improve availability, version 5.3 introduces a fallback mechanism. When the mirror becomes unavailable, client read requests automatically fall back to the source cluster.&lt;/strong&gt; This ensures application continuity and avoids interruptions caused by mirror cluster failures. We also optimized deployments in multi-mirror environments. Previously, the mirror end required two hot backup nodes to ensure high availability. Now, with the improved fallback feature, deploying a single mirror node can achieve similar effects. This ensures application continuity and reduces costs, especially beneficial for users requiring multiple mirrors.  &lt;/p&gt;

&lt;p&gt;Through this improvement, we not only reduced hardware costs but also found a balance between high availability and low cost. For users deploying mirrors in multiple locations, reducing metadata replicas further lowers overall costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simplified operations &amp;amp; increased flexibility: providing cross-bucket data cache for imported objects
&lt;/h2&gt;

&lt;p&gt;In JuiceFS, users can use the &lt;code&gt;import&lt;/code&gt; command to bring existing files from &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; under unified management. This is convenient for users already storing large amounts of data (for example, tens of petabytes). However, in previous versions, this feature only supported caching for objects within the same data bucket. This meant imported objects had to reside in the same bucket as the existing file system data. This limitation had certain practical constraints.  &lt;/p&gt;

&lt;p&gt;In version 5.3, we improved this feature. &lt;strong&gt;Users can now provide caching capability for any imported objects, regardless of whether they come from the same data bucket.&lt;/strong&gt; This allows users more flexibility in managing objects across different data buckets, avoiding strict bucket restrictions and enhancing data management freedom.  &lt;/p&gt;

&lt;p&gt;In addition, previously, if users had data distributed across multiple buckets and wanted to provide caching for that data, they needed to create a new file system for each bucket. In version 5.3, users only need to create one file system (volume) to uniformly manage data from multiple buckets and provide caching for all buckets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other important optimizations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Trace feature
&lt;/h3&gt;

&lt;p&gt;We added the trace feature, a feature provided by the Go language itself. Through this, advanced users can perform tracing and performance analysis, gaining more information to help quickly locate issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trash recovery
&lt;/h3&gt;

&lt;p&gt;In previous versions, especially with multiple zones, sometimes the paths recorded in the trash were incomplete. This led to anomalies during recovery, where files were not restored to the expected locations. To address this, in version 5.3, when deleting files, we record the original file path, ensuring more reliable recovery capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python SDK improvements
&lt;/h3&gt;

&lt;p&gt;In earlier versions, we released the &lt;a href="https://juicefs.com/docs/cloud/deployment/python-sdk/" rel="noopener noreferrer"&gt;Python SDK&lt;/a&gt;, providing basic read/write functionalities for Python users to interface with our system. In version 5.3, we not only strengthened basic read/write functions but also added support for operational subcommands. For example, users can directly call commands like &lt;code&gt;juicefs info&lt;/code&gt; or &lt;code&gt;warmup&lt;/code&gt; via the SDK without relying on external system commands. This simplifies coding efforts and avoids potential performance bottlenecks from frequently calling external commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Windows client
&lt;/h3&gt;

&lt;p&gt;We previously launched a beta version of the Windows client and have received some user feedback. After improvements, the current version shows significant enhancements in mount reliability, performance, and compatibility with Linux systems. In the future, we plan to further refine the Windows client, providing an experience closer to Linux for users reliant on Windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Compared to expensive dedicated hardware, JuiceFS helps users balance performance and cost when addressing data growth by flexibly utilizing cloud or existing customer storage resources. In version 5.3, by optimizing the metadata zone architecture, a single file system can support over 500 billion files. The first-time introduction of RDMA technology significantly improves distributed caching bandwidth and data access efficiency, reduces CPU usage, and further optimizes system performance. In addition, we enhanced features like write support for mirrors and caching, improving the performance and operational efficiency of large-scale clusters and optimizing user experience.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;Cloud service&lt;/a&gt; users can now directly experience JuiceFS Enterprise Edition 5.3 online, while on-premises deployment users can obtain upgrade support through official channels. We’ll continue to focus on high-performance storage solutions, partnering with enterprises to tackle challenges brought by continuous data growth.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>architecture</category>
    </item>
    <item>
      <title>From GlusterFS to JuiceFS: Lightillusions Achieved 2.5x Faster 3D AIGC Data Processing</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 09 Jan 2026 07:43:46 +0000</pubDate>
      <link>https://dev.to/daswu/from-glusterfs-to-juicefs-lightillusions-achieved-25x-faster-3d-aigc-data-processing-53l0</link>
      <guid>https://dev.to/daswu/from-glusterfs-to-juicefs-lightillusions-achieved-25x-faster-3d-aigc-data-processing-53l0</guid>
      <description>&lt;p&gt;&lt;a href="https://www.lightillusions.com/" rel="noopener noreferrer"&gt;Lightillusions&lt;/a&gt; is a company specializing in spatial intelligence technology, integrating 3D vision, computer graphics, and generative models to build innovative 3D foundation models. Our company is led by Ping Tan, a professor at the Hong Kong University of Science and Technology (HKUST) and Director of the HKUST-BYD Joint Laboratory.  &lt;/p&gt;

&lt;p&gt;Unlike 2D models, a single 3D model can be several gigabytes in size, especially complex models like point clouds. When our data volume reached petabyte scales, management and storage became significant challenges. &lt;strong&gt;After trying solutions like NFS and GlusterFS, we chose &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, an open-source high-performance distributed file system, to build a unified storage platform.&lt;/strong&gt; This platform now serves multiple scenarios, supports cross-platform access including Windows and Linux, &lt;strong&gt;manages hundreds of millions of files, improves data processing speed by 200%–250%&lt;/strong&gt;, enables efficient storage scaling, and greatly simplifies operations and maintenance. This allows us to focus more on core research.  &lt;/p&gt;

&lt;p&gt;In this article, we’ll break down the unique storage demands of 3D AIGC, share why we selected JuiceFS over CephFS, and walk through the architecture of our JuiceFS-based storage platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage requirements for 3D AIGC
&lt;/h2&gt;

&lt;p&gt;Our research focuses on perception and generation. In the 3D domain, task complexity is different from image and text processing. This placed higher demands on our AI models, algorithms, and infrastructure.  &lt;/p&gt;

&lt;p&gt;We illustrate the complexity of 3D data processing through a typical pipeline. On the left side of the diagram below is a 3D model containing texture (top-left) and geometry (bottom-right) information. First, we generate &lt;a href="https://en.wikipedia.org/wiki/Rendering_(computer_graphics)" rel="noopener noreferrer"&gt;rendered&lt;/a&gt; images. Each model has text labels describing its content, geometric features, and texture features, which are tightly coupled with the model. In addition, we process geometry data, such as sampled points and necessary numerical values obtained from data preprocessing, like signed distance fields (SDFs). It's important to note that 3D model file formats are highly diverse, and image formats are also different.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72kjddg1jm5kft7qca6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72kjddg1jm5kft7qca6u.png" alt=" " width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our work spans language models, image/video models, and 3D models. As data volume grows, so does the storage burden. The main characteristics of data usage in these scenarios are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Language models: Data typically consists of a vast number of small files. Although individual text files are small, the total file count can reach millions or even tens of millions as data volume increases. This makes the management of such a large number of files a primary storage challenge.
&lt;/li&gt;
&lt;li&gt;Image and video data: High-resolution images and long videos are usually large. A single image can range from hundreds of kilobytes to several megabytes, while video files can reach gigabytes. During preprocessing—such as data augmentation, resolution adjustment, and frame extraction—data volume increases significantly. Especially in video processing, where each video is typically decomposed into a large number of image frames, managing these massive file collections adds considerable complexity.
&lt;/li&gt;
&lt;li&gt;3D models: Individual models, especially complex ones like point clouds, can be several gigabytes in size. &lt;strong&gt;3D data preprocessing is more complex than other data types, involving steps like texture mapping and geometry reconstruction, which consume great computational resources and can increase data volume.&lt;/strong&gt; Furthermore, 3D models often consist of multiple files, leading to a large total file count. As data grows, managing these files becomes increasingly difficult.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on the storage characteristics discussed above, when we chose a storage platform solution, we expected it to meet the following requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Diverse data formats and cross-node sharing:&lt;/strong&gt; Different models use different data formats, especially the complexity and cross-platform compatibility issues of 3D models. The storage system must support multiple formats and effectively manage data sharing across nodes and platforms.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handling data models of different sizes:&lt;/strong&gt; Whether it's small files for language models, large-scale image/video data, or large files for 3D models, the storage system must be highly scalable to meet rapidly growing storage demands and handle the storage and access of large-size data efficiently.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Challenges of &lt;a href="https://www.virtana.com/glossary/what-is-cross-cloud/" rel="noopener noreferrer"&gt;cross-cloud&lt;/a&gt; and cluster storage:&lt;/strong&gt; As data volume increases, especially with petabyte-level storage needs for 3D models, cross-cloud and cluster storage issues become more prominent. The storage system must support seamless cross-region, cross-cloud data access and efficient cluster management.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy scaling:&lt;/strong&gt; The need for scaling is constant, whether for language, image/video, or 3D models, and is particularly high for 3D model storage and processing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple operations and maintenance:&lt;/strong&gt; The storage system should provide easy-to-use management interfaces and tools. Especially for 3D model management, operational requirements are higher, making automated management and fault tolerance essential.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage solutions: from NFS, GlusterFS, CephFS to JuiceFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial solution: NFS mount
&lt;/h3&gt;

&lt;p&gt;Initially, we tried the simplest solution—using &lt;a href="https://en.wikipedia.org/wiki/Network_File_System" rel="noopener noreferrer"&gt;NFS&lt;/a&gt; for mounting. However, in practice, we found that the training cluster and rendering cluster required independent clusters for mount operations. Maintaining this setup was very cumbersome. Especially when adding new data, as we needed to write mount points separately for each new dataset. &lt;strong&gt;When the data volume reached about 1 million objects, we could no longer sustain this approach and abandoned it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhbhaa1lo6wmjr6ar2vv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhbhaa1lo6wmjr6ar2vv.png" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Mid-term solution: GlusterFS
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_glusterfs" rel="noopener noreferrer"&gt;GlusterFS&lt;/a&gt; was an easy-to-start-with choice, offering simple installation and configuration, acceptable performance, and no need for multiple mount points—just add new nodes.  &lt;/p&gt;

&lt;p&gt;While GlusterFS greatly reduced our workload in the early stages, we also discovered issues with its ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Many GlusterFS execution scripts and features required writing custom scheduled tasks. Particularly when adding new storage, it had additional requirements, such as needing to increase nodes by specific multiples.
&lt;/li&gt;
&lt;li&gt;Support for operations like cloning and data synchronization was weak. This led us to frequently consult documentation.
&lt;/li&gt;
&lt;li&gt;Many operations were unstable. For example, when using tools like fio for speed testing, results were not always reliable.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A more serious problem was that GlusterFS performance would drastically decline when the number of small files reached a certain scale.&lt;/strong&gt; For example, one model might generate 100 images. With 10 million models, that would produce 1 billion images. GlusterFS struggled severely with addressing in later stages, especially with an excessive number of small files. This led to significant performance drops and even system crashes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5xtcifcfmk1v710gau4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5xtcifcfmk1v710gau4.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Final selection: CephFS vs. JuiceFS
&lt;/h3&gt;

&lt;p&gt;As storage demands grew, we decided to use a more sustainable solution. After evaluating various options, we compared &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/" rel="noopener noreferrer"&gt;CephFS and JuiceFS&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;Although Ceph is widely used, through our own practice and reviewing documentation, we found Ceph's operational and management costs to be very high. Especially for a small team like ours, handling such complex operational tasks proved particularly difficult.  &lt;/p&gt;

&lt;p&gt;JuiceFS had two native features that strongly aligned with our needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The client data cache.&lt;/strong&gt; For our model training clusters, which are typically equipped with high-performance NVMe storage, fully utilizing client caching could significantly accelerate model training and reduce pressure on the JuiceFS storage backend.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JuiceFS' S3 compatibility was crucial for us.&lt;/strong&gt; As we had developed some visualization platforms based on storage for data annotation, organization, and statistics, S3 compatibility allowed us to rapidly develop web interfaces supporting visualization, data statistics, and other features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The table below &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/" rel="noopener noreferrer"&gt;compares basic features of CephFS and JuiceFS&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p0scbwio14i307kjru0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p0scbwio14i307kjru0.png" alt=" " width="800" height="1276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage platform practice based on JuiceFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Metadata engine selection and topology
&lt;/h3&gt;

&lt;p&gt;JuiceFS employs a metadata-data separation architecture with several metadata engine options. We first quickly validated the &lt;a href="https://juicefs.com/docs/community/redis_best_practices/" rel="noopener noreferrer"&gt;Redis storage solution&lt;/a&gt;, which is well-documented by the JuiceFS team. Redis' advantage lies in its lightweight nature; configuration typically takes only a day or half a day, and data migration is smooth. &lt;strong&gt;However, when the number of small files exceeded 100 million, Redis' speed and performance significantly declined&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv98vc7tp14qrp4eqhh3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv98vc7tp14qrp4eqhh3u.png" alt=" " width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As mentioned earlier, each model might render 100 images. With other miscellaneous files, the number of small files increased dramatically. While we could mitigate the issue by packing small files, performing modifications or visualization on packed data greatly increased complexity. Therefore, we preferred to retain the original small image files for subsequent processing&lt;/p&gt;

&lt;p&gt;As the file count grew and soon exceeded Redis' capacity, we decided to migrate the storage system to a combination of &lt;a href="https://tikv.org/" rel="noopener noreferrer"&gt;TiKV&lt;/a&gt; and Kubernetes (K8s). &lt;strong&gt;The TiKV-K8s setup provided us with a more highly available metadata storage solution&lt;/strong&gt;. Furthermore, through benchmarking, we found that although TiKV's performance was slightly lower, the gap was not significant, and its support for small files was better than Redis'. We also consulted JuiceFS engineers and learned that Redis has poor scalability in cluster mode. Therefore, we switched to TiKV.&lt;/p&gt;

&lt;p&gt;The table below shows read/write performance test results for different metadata engines:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b4gz1knnp88iyszdisr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2b4gz1knnp88iyszdisr.png" alt=" " width="800" height="632"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Latest architecture: JuiceFS+TiKV+SeaweedFS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;We use JuiceFS to manage the object storage layer. For the metadata storage system, we built it with TiKV and K8s. For object storage, we used SeaweedFS.&lt;/strong&gt; This allows us to quickly scale storage capacity and provides fast access for both small and large files. In addition, our object storage is distributed across multiple platforms, including local storage and platforms like R2 and Amazon S3. Through JuiceFS, we were able to integrate these different storage systems and provide a unified interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5jci4fq8dn2gc5l6fqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5jci4fq8dn2gc5l6fqk.png" alt=" " width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To better manage system resources, we built a resource monitoring platform on K8s. The current system consists of about 60 Linux nodes and several Windows nodes handling rendering and data processing tasks. We monitored read stability, and the results show that even with multiple heterogeneous servers performing simultaneous read operations, the overall system I/O performance remains stable, able to fully utilize the bandwidth resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problems we encountered
&lt;/h3&gt;

&lt;p&gt;During the optimization of the storage solution, we initially tried an &lt;a href="https://en.wikipedia.org/wiki/Erasure_code" rel="noopener noreferrer"&gt;erasure code&lt;/a&gt; (EC) storage scheme aimed at reducing storage requirements and improving efficiency. However, in large-scale data migration, EC storage computation was slow, and its performance was unsatisfactory in high-throughput and frequent data change scenarios. Especially when combined with SeaweedFS, bottlenecks existed. Based on these issues, we decided to abandon EC storage and switch to a replication-based storage scheme.  &lt;/p&gt;

&lt;p&gt;We set up independent servers and configured scheduled tasks for large-volume metadata backups. In TiKV, we implemented a redundant replica mechanism, adopting a multi-replica scheme to ensure data integrity. For object storage, we used dual-replica encoding to further enhance data reliability. Although replica storage effectively ensures data redundancy and high availability, storage costs remain high due to processing petabyte-scale data and massive incremental data. In the future, we may consider further optimizing the storage scheme to reduce costs.  &lt;/p&gt;

&lt;p&gt;In addition, we found that using all-flash servers with JuiceFS did not bring significant performance improvements. The bottleneck mainly appeared in network bandwidth and latency. Therefore, we plan to consider using InfiniBand to connect storage servers and training servers to maximize resource utilization efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;When using GlusterFS, we could process at most 200,000 models per day. &lt;strong&gt;After switching to JuiceFS, the processing capacity increased significantly. Our daily data processing capacity has grown by 2.5 times. Small file throughput also improved notably. The system remained stable even when storage utilization reached 70%.&lt;/strong&gt; Furthermore, scaling became very convenient, whereas the previous architecture involved troublesome scaling processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87p9qc0cqdskbjhm0hfi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87p9qc0cqdskbjhm0hfi.png" alt=" " width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, let's summarize the advantages JuiceFS has demonstrated in 3D generation tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small file performance:&lt;/strong&gt; Small file handling is a critical point, and JuiceFS provides an excellent solution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform features:&lt;/strong&gt; Cross-platform support is very important. We found that some data can only be opened in Windows software, so we need to process the same data on both Windows and Linux systems and perform read/write operations on the same mount point. This requirement makes cross-platform features particularly crucial, and JuiceFS' design addresses this well.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low operational cost:&lt;/strong&gt; JuiceFS' operational cost is extremely low. After configuration, only simple testing and node management (for example, discarding certain nodes and monitoring robustness) are needed. We spent about half a year migrating data and have not encountered major issues so far.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local cache mechanism:&lt;/strong&gt; Previously, to use local cache, we needed to manually implement local caching logic in our code. JuiceFS provides a very convenient local caching mechanism, optimizing performance for training scenarios by setting mount parameters.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low migration cost:&lt;/strong&gt; Especially when migrating small files, we found using JuiceFS for metadata and object storage migration to be convenient, saving us a lot of time and effort. In contrast, migrating with other storage systems was very painful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary, JuiceFS performs excellently in large-scale data processing, providing an efficient and stable storage solution. It not only simplifies storage management and scaling but also significantly improves system performance. This allows us to focus more on advancing core tasks. In addition, the JuiceFS tools are very convenient. For example, we used the &lt;code&gt;sync&lt;/code&gt; tool for small file migration with extremely high efficiency. Without additional performance optimization, we successfully migrated 500 TB of data, including a massive number of small data and image files. It was done in less than 5 days, exceeding our expectations.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>opensource</category>
      <category>performance</category>
    </item>
    <item>
      <title>AI Data Storage: Challenges, Capabilities, and Comparative Analysis</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 19 Dec 2025 07:42:58 +0000</pubDate>
      <link>https://dev.to/daswu/ai-data-storage-challenges-capabilities-and-comparative-analysis-46n3</link>
      <guid>https://dev.to/daswu/ai-data-storage-challenges-capabilities-and-comparative-analysis-46n3</guid>
      <description>&lt;p&gt;&lt;em&gt;Note: This article was first published on &lt;a href="https://dzone.com/articles/ai-data-storage-challenges-capabilities-comparison" rel="noopener noreferrer"&gt;DZone&lt;/a&gt; and featured on its homepage.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The explosion in the popularity of ChatGPT has once again ignited a surge of excitement in the &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence" rel="noopener noreferrer"&gt;AI&lt;/a&gt; world. Over the past five years, AI has advanced rapidly and has found applications in a wide range of industries. As a storage company, we’ve had a front-row seat to this expansion, watching more and more AI startups and established players emerge across fields like autonomous driving, protein structure prediction, and quantitative investment.&lt;br&gt;&lt;br&gt;
AI scenarios have introduced new challenges to the field of data storage. Existing storage solutions are often inadequate to fully meet these demands.&lt;/p&gt;

&lt;p&gt;In this article, we’ll deep dive into the storage challenges in AI scenarios, critical storage capabilities, and comparative analysis of Amazon S3, Alluxio, Amazon EFS, Azure, GCP Filestore, Lustre, Amazon FSx for Lustre, GPFS, BeeGFS, and JuiceFS Cloud Service. I hope this post will help you make informed choices in AI and data storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage challenges for AI
&lt;/h2&gt;

&lt;p&gt;AI scenarios have brought new data patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/High-throughput" rel="noopener noreferrer"&gt;&lt;strong&gt;High throughput&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;data access challenges:&lt;/strong&gt; In AI scenarios, the growing use of GPUs by enterprises has outpaced the I/O capabilities of underlying storage systems. Enterprises require storage solutions that can provide high-throughput data access to fully leverage the computing power of GPUs. For instance, in smart manufacturing, where high-precision cameras capture images for defect detection models, the training dataset may consist of only 10,000 to 20,000 high-resolution images. Each image has several gigabytes in size, resulting in a total dataset size of 10 TB. If the storage system lacks the required throughput, it becomes a bottleneck during GPU training.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Managing storage for billions of files:&lt;/strong&gt; AI scenarios need storage solutions that can handle and provide quick access to datasets with billions of files. For example, in autonomous driving, the training dataset consists of small images, each about several hundred kilobytes in size. A single training set comprises tens of millions of such images, each sized several hundred kilobytes. Each image is treated as an individual file. The total training data amounts to billions or even 10 billion files. This creates a major challenge in effectively managing large numbers of small files.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalable throughput for hot data:&lt;/strong&gt; In areas like &lt;a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)" rel="noopener noreferrer"&gt;quantitative investing&lt;/a&gt;, financial market data is smaller compared to computer vision datasets. However, this data must be shared among many research teams, leading to hotspots where disk throughput is fully used but still cannot satisfy the application's needs. This shows that we need storage solutions that can handle a lot of hot data quickly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The basic computing environment has also changed a lot.&lt;br&gt;&lt;br&gt;
These days, with cloud computing and Kubernetes getting so popular, more and more AI companies are setting up their data pipelines on &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;-based platforms. Algorithm engineers request resources on the platform, write code in Notebook to debug algorithms, use workflow engines like Argo and Airflow to plan data processing workflows, use Fluid to manage datasets, and use BentoML to deploy models into apps. &lt;a href="https://en.wikipedia.org/wiki/Cloud-native_computing" rel="noopener noreferrer"&gt;&lt;strong&gt;Cloud-native&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;technologies have become a standard consideration when building storage platforms.&lt;/strong&gt; As cloud computing matures, AI businesses are increasingly relying on large-scale distributed clusters. With a significant increase in the number of nodes in these clusters, &lt;strong&gt;storage systems face new challenges related to handling concurrent access from tens of thousands of Pods within Kubernetes clusters.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;IT professionals managing the underlying infrastructure face significant changes brought about by the evolving business scenarios and computing environments. Existing hardware-software coupled storage solutions often suffer from several pain points, such as no elasticity, no distributed high availability, and constraints on cluster scalability. &lt;a href="https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems" rel="noopener noreferrer"&gt;Distributed file systems&lt;/a&gt; like GlusterFS, CephFS, and those designed for HPC such as Lustre, BeeGFS, and GPFS, are typically designed for physical machines and bare-metal disks. While they can deploy large capacity clusters, they cannot provide elastic capacity and flexible throughput, especially when dealing with storage demands in the order of tens of billions of files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key capabilities for AI data storage
&lt;/h2&gt;

&lt;p&gt;Considering these challenges, we’ll outline essential storage capabilities critical for AI scenarios, helping enterprises make informed decisions when selecting storage products.&lt;/p&gt;

&lt;h3&gt;
  
  
  POSIX compatibility and data consistency
&lt;/h3&gt;

&lt;p&gt;In the AI/ML domain, &lt;a href="https://en.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt; is the most common API for data access. Previous-generation distributed file systems, except HDFS, are also POSIX-compatible, but products on the cloud in recent years have not been consistent in their POSIX support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compatibility:&lt;/strong&gt; Users should not solely rely on the description "POSIX-compatible product" to assess compatibility. You can use pjdfstest and the Linux Test Project (LTP) framework for testing. We’ve done a &lt;a href="https://juicefs.com/en/blog/engineering/posix-compatibility-comparison-among-four-file-system-on-the-cloud" rel="noopener noreferrer"&gt;POSIX compatibility test of the cloud file system&lt;/a&gt; for your reference.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Strong data consistency guarantee:&lt;/strong&gt; This is fundamental to ensuring computational correctness. Storage systems have various consistency implementations, with object storage systems often adopting eventual consistency, while file systems typically adhere to strong consistency. Careful consideration is needed when selecting a storage system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User mode or kernel mode:&lt;/strong&gt; Early developers favored kernel mode due to its potential for optimized I/O operations. However, in recent years, we’ve witnessed a growing number of developers "escaping" from kernel mode for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kernel mode usage ties the file system client to specific kernel versions. GPU and high-performance network card drivers often require compatibility with specific kernel versions. This combination of factors places a significant burden on kernel version selection and maintenance.&lt;/li&gt;
&lt;li&gt;Exceptions of kernel mode clients can potentially freeze the host operating system. This is highly unfavorable for Kubernetes platforms.&lt;/li&gt;
&lt;li&gt;The user-mode FUSE library has undergone continuous iterations, resulting in significant performance improvements. It has been well-supported among &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; customers for various business needs, such as autonomous driving perception model training and quantitative investment strategy training. This demonstrates that in AI scenarios, the user-mode FUSE library is no longer a performance bottleneck.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo95b0veaaifez30jbrw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo95b0veaaifez30jbrw.png" alt=" " width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Linear scalability of throughput
&lt;/h3&gt;

&lt;p&gt;Different file systems employ different principles for scaling throughput. Previous-generation distributed storage systems like GlusterFS, CephFS, the HPC-oriented Lustre, BeeGFS, and GPFS primarily use all-flash solutions to build their clusters. &lt;strong&gt;In these systems, peak throughput equals the total performance of the disks in the cluster. To increase cluster throughput, users must scale the cluster by adding more disks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;However, when users have imbalanced needs for capacity and throughput, &lt;strong&gt;traditional file systems require scaling the entire cluster, leading to capacity wastage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example, for a 500 TB capacity cluster using 8 TB hard drives with 2 replicas, 126 drives with a throughput of 150 MB/s each are needed. The theoretical maximum throughput of the cluster is 18 GB/s (126 ×150 = 18 GB/s). If the application demands 60 GB/s throughput, there are two options: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Switching to 2 TB HDDs (with 150 MB/s throughput) and requiring 504 drives&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switching to 8 TB SATA SSDs (with 500 MB/s throughput) while maintaining 126 drives&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first solution increases the number of drives by four times, necessitating a corresponding increase in the number of cluster nodes. The second solution, upgrading to SSDs from HDDs, also results in a significant cost increase.&lt;/p&gt;

&lt;p&gt;As you can see, it’s difficult to balance capacity, performance, and cost. Capacity planning based on these three perspectives becomes a challenge, because we cannot predict the development, changes, and details of the real business.&lt;/p&gt;

&lt;p&gt;Therefore, &lt;strong&gt;decoupling storage capacity from performance scaling would be a more effective approach for businesses to address these challenges. When we designed JuiceFS, we considered this requirement&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In addition, handling hot data is a common problem in AI scenarios. JuiceFS employs a cache grouping mechanism to automatically distribute hot data to different cache groups. This means that JuiceFS automatically creates multiple copies of hot data during computation to achieve higher disk throughput, and these cache spaces are automatically reclaimed after computation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing massive amounts of files
&lt;/h3&gt;

&lt;p&gt;Efficiently managing a large number of files, such as 10 billion files has three demands on the storage system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Elastic scalability:&lt;/strong&gt; The real scenario of JuiceFS users is to expand from tens of millions of files to hundreds of millions of files and then to billions of files. This process is not possible by adding a few machines. Storage clusters need to add nodes to achieve &lt;a href="https://www.virtana.com/glossary/what-is-horizontal-scaling/" rel="noopener noreferrer"&gt;horizontal scaling&lt;/a&gt;, enabling them to support business growth effectively.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data distribution during horizontal scaling:&lt;/strong&gt; During system scaling, data distribution rules based on directory name prefixes may lead to uneven data distribution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scaling complexity:&lt;/strong&gt; As the number of files increases, the ease of system scaling, stability, and the availability of tools for managing storage clusters become vital considerations. Some systems become more fragile as file numbers reach billions. Ease of management and high stability are crucial for business growth.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Concurrent load capacity and feature support in Kubernetes environments
&lt;/h3&gt;

&lt;p&gt;When we look at the specifications of the storage system, some storage system specifications specify the maximum limit for concurrent access. Users need to conduct stress testing based on their business. When there are more clients, &lt;a href="https://en.wikipedia.org/wiki/Quality_of_service" rel="noopener noreferrer"&gt;quality of service&lt;/a&gt; (QoS) management is required, including traffic control for each client and temporary read/write blocking policies.&lt;/p&gt;

&lt;p&gt;We must also note the design and supported features of CSI in Kubernetes. For example, the deployment method of the mounting process, whether it supports &lt;code&gt;ReadWriteMany&lt;/code&gt;, &lt;code&gt;subPath&lt;/code&gt; mounting, quotas, and hot updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost analysis
&lt;/h3&gt;

&lt;p&gt;Cost analysis is a multifaceted concept, encompassing hardware and software procurement, often overshadowed by operational and maintenance expenses. As AI businesses scale, data volume grows significantly. Storage systems must exhibit both capacity and throughput scalability, offering ease of adjustment.&lt;/p&gt;

&lt;p&gt;In the past, the procurement and scaling of systems like Ceph, Lustre, and BeeGFS in data centers involved lengthy planning cycles. It took months for hardware to arrive, be configured, and become operational. Time costs, notably ignored, were often the most significant expenditures. &lt;strong&gt;Storage systems that enable elastic capacity and performance adjustments equate to faster time-to-market.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another frequently underestimated cost is efficiency. In AI workflows, the data pipeline is extensive, involving multiple interactions with the storage system. Each step, from data collection, clear conversion, labeling, feature extraction, training, backtesting, to production deployment, is affected by the storage system's efficiency. &lt;/p&gt;

&lt;p&gt;However, businesses typically utilize only a fraction (often less than 20%) of the entire dataset actively. This subset of hot data demands high performance, while warm or cold data may be infrequently accessed or not accessed at all. &lt;strong&gt;It’s difficult to satisfy both requirements in systems like Ceph, Lustre, and BeeGFS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consequently, many teams adopt multiple storage systems to cater to diverse needs. A common strategy is to employ an &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; system for archival purposes to achieve large capacity and low costs. However, object storage is not typically known for high performance, and it may handle data ingestion, preprocessing, and cleansing in the data pipeline. While this may not be the most efficient method for data preprocessing, it's often the pragmatic choice due to the sheer volume of data. Engineers then have to wait for a substantial period to transfer the data to the file storage system used for model training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Therefore, in addition to hardware and software costs of storage systems, total cost considerations should account for time costs invested in cluster operations (including procurement and supply chain management) and time spent managing data across multiple storage systems.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Storage system comparison&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here's a comparative analysis of the storage products mentioned earlier for your reference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;POSIX compatibility&lt;/th&gt;
&lt;th&gt;Elastic capacity&lt;/th&gt;
&lt;th&gt;Maximum supported file count&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;th&gt;Cost (USD)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;Partially compatible through S3FS&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Hundreds of billions&lt;/td&gt;
&lt;td&gt;Medium to Low&lt;/td&gt;
&lt;td&gt;About $0.02/GB/ month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Alluxio&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.alluxio.io/os/user/stable/en/api/POSIX-API.html#assumptions-and-limitations" rel="noopener noreferrer"&gt;Partial compatibility&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.alluxio.io/blog/store-1-billion-files-in-alluxio-20/" rel="noopener noreferrer"&gt;1 billion&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Depends on cache capacity&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud file storage service&lt;/td&gt;
&lt;td&gt;Amazon EFS&lt;/td&gt;
&lt;td&gt;NFSv4.1 compatible&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.aws.amazon.com/efs/latest/ug/performance.html" rel="noopener noreferrer"&gt;Depends on the data size. Throughput up to 3 GB/s, maximum 500 MB/s per client&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/efs/pricing/?nc1=h_ls" rel="noopener noreferrer"&gt;$0.043~0.30/GB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;SMB &amp;amp; NFS for Premium&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-scale-targets" rel="noopener noreferrer"&gt;100 million&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Performance scales with data capacity. See &lt;a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-scale-targets" rel="noopener noreferrer"&gt;details&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;&lt;a href="https://azure.microsoft.com/en-us/pricing/details/storage/files/" rel="noopener noreferrer"&gt;$0.16/GiB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;GCP Filestore&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/architecture/filers-on-compute-engine#summary_of_file_server_options" rel="noopener noreferrer"&gt;NFSv3 compatible&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/filestore?hl=en#section-12" rel="noopener noreferrer"&gt;Maxmium 63.9 TB&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/filestore/docs/limits" rel="noopener noreferrer"&gt;Up to 67,108,864 files per 1 TiB capacity&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Performance scales with data capacity. See &lt;a href="https://cloud.google.com/filestore/docs/performance" rel="noopener noreferrer"&gt;details&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/filestore/pricing?hl=zh-cn" rel="noopener noreferrer"&gt;$0.36/GiB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lustre&lt;/td&gt;
&lt;td&gt;Lustre&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Amazon FSx for Lustre&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;Manual scaling, 1,200 GiB increments&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.amazonaws.cn/en/?nc1=h_ls" rel="noopener noreferrer"&gt;Multiple performance types of 50 MB~200 MB/s per 1 TB capacity&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/fsx/lustre/pricing/" rel="noopener noreferrer"&gt;$0.073~0.6/GB/month&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPFS&lt;/td&gt;
&lt;td&gt;GPFS&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;10 billion&lt;/td&gt;
&lt;td&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;BeeGFS&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Billions&lt;/td&gt;
&lt;td&gt;Depends on cluster disk count and performance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Cloud Service&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Compatible&lt;/td&gt;
&lt;td&gt;Elastic capacity, no maximum limit&lt;/td&gt;
&lt;td&gt;10 billion&lt;/td&gt;
&lt;td&gt;Depends on cache capacity&lt;/td&gt;
&lt;td&gt;JuiceFS $0.02/GiB/month + AWS S3 $0.023/GiB/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Over the last decade, &lt;a href="https://en.wikipedia.org/wiki/Cloud_computing" rel="noopener noreferrer"&gt;cloud computing&lt;/a&gt; has rapidly evolved. Previous-generation storage systems designed for data centers couldn't harness the advantages brought by the cloud, notably elasticity. Object storage, a newcomer, offers unparalleled scalability, availability, and cost-efficiency. Still, it exhibits limitations in AI scenarios.&lt;/p&gt;

&lt;p&gt;File storage, on the other hand, presents invaluable benefits for AI and other computational use cases. Leveraging the cloud and its infrastructure efficiently to design the next-generation file storage system is a new challenge, and this is precisely what JuiceFS has been doing over the past five years.&lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>JuiceFS+MinIO: Ariste AI Achieved 3x Faster I/O and Cut Storage Costs by 40%+</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Mon, 15 Dec 2025 03:18:17 +0000</pubDate>
      <link>https://dev.to/daswu/juicefsminio-ariste-ai-achieved-3x-faster-io-and-cut-storage-costs-by-40-1016</link>
      <guid>https://dev.to/daswu/juicefsminio-ariste-ai-achieved-3x-faster-io-and-cut-storage-costs-by-40-1016</guid>
      <description>&lt;p&gt;&lt;a href="https://ariste.ai/" rel="noopener noreferrer"&gt;Ariste AI&lt;/a&gt; is a company specializing in AI-driven trading, with businesses covering proprietary trading, asset management, high-frequency market making, and other fields. In &lt;a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)" rel="noopener noreferrer"&gt;quantitative trading&lt;/a&gt; research, data read speed and storage efficiency often determine the speed of research iteration.  &lt;/p&gt;

&lt;p&gt;In the process of building quantitative research infrastructure, facing market and factor data with a total scale exceeding 500 TB, we went through four stages—from local disks to eventually choosing the &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; file system on top of &lt;a href="https://en.wikipedia.org/wiki/MinIO" rel="noopener noreferrer"&gt;MinIO&lt;/a&gt; object storage. Through caching mechanisms and a layered architecture, we achieved fast access to high-frequency data and centralized management. &lt;strong&gt;This practice validates the feasibility of the integrated solution of cache acceleration + elastic object storage + POSIX compatibility in quantitative scenarios.&lt;/strong&gt; We hope our experience can provide some reference for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage challenges in quantitative investment: balancing scale, speed, and collaboration
&lt;/h2&gt;

&lt;p&gt;The quantitative investment process sequentially includes the data layer, factor and signal layer, strategy and position layer, and execution and trading layer. They form a closed loop from data acquisition to trade execution.&lt;br&gt;&lt;br&gt;
Throughout this process, the storage system faced multiple challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data scale and growth rate:&lt;/strong&gt; Quantitative research requires processing a large total volume of data, covering historical market data, news data, and self-calculated factor data. Currently, the total volume of this data is close to 500 TB. Furthermore, our company adds hundreds of gigabytes of new market data daily. Using traditional disks for storage would clearly be unable to meet such massive data storage demands.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-frequency access and low-latency requirements:&lt;/strong&gt; High-frequency data access relies on low-latency data reads. The data read rate directly determines research efficiency. Faster data reads allow the research process to advance rapidly; conversely, slower reads lead to inefficient research.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-team parallelism and data management:&lt;/strong&gt; During quantitative research, multiple teams often conduct different experiments simultaneously. To ensure the independence and data security of each team's research work, secure isolation is necessary to avoid data confusion and leakage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address the data storage needs of the entire quantitative process and build a future-proof storage system, we wanted to achieve high performance, easy scalability, and management capability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High performance:&lt;/strong&gt; Single-node read/write bandwidth should exceed 500 MB/s, and access latency should be below the local disk perception threshold.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy scalability:&lt;/strong&gt; The solution should support on-demand horizontal scaling of storage and computing resources, enabling smooth elastic scaling without application modification.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management capability:&lt;/strong&gt; The solution should provide one-stop management capabilities for fine-grained permission control, operation auditing, and data lifecycle policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Evolution of the storage architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stage 1: Local disk
&lt;/h3&gt;

&lt;p&gt;In the initial phase of the project, we adopted the QuantraByte research framework. It had a built-in &lt;a href="https://en.wikipedia.org/wiki/Exchange-traded_fund" rel="noopener noreferrer"&gt;exchange-traded fund&lt;/a&gt; (ETF) module allowing data to be stored directly on local disks. This resulted in fast data read speeds. Researchers could directly run the data they needed, and the iteration process was quick.&lt;br&gt;&lt;br&gt;
However, this stage had some issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resource waste from repeated downloads:&lt;/strong&gt; Multiple researchers downloading the same data led to redundant efforts.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insufficient storage capacity:&lt;/strong&gt; Research servers had limited storage capacity, only about 15 TB. This could not meet growing data storage needs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration difficulties:&lt;/strong&gt; The process was not convenient when needing to reuse others' research results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage 2: MinIO centralized management
&lt;/h3&gt;

&lt;p&gt;To solve the problems of the first stage, we introduced MinIO for centralized management. All stored data was centralized on MinIO, with a split-out module handling all data ingestion. Specific factor data was also stored in MinIO. This enabled unified downloads of public data. Permission isolation facilitated multi-team data sharing and improved storage space utilization.  &lt;/p&gt;

&lt;p&gt;However, new bottlenecks emerged in this stage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High latency for high-frequency random reads:&lt;/strong&gt; High-latency I/O operations during high-frequency data access impacted data read speeds.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow reads/writes due to lack of cache:&lt;/strong&gt; Since the MinIO community edition lacked caching, reading and writing high-frequency public data was slow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage 3: Introducing JuiceFS for cache acceleration
&lt;/h3&gt;

&lt;p&gt;To address the above bottlenecks, after thorough research, we finally introduced JuiceFS’ &lt;a href="https://juicefs.com/docs/community/guide/cache" rel="noopener noreferrer"&gt;cache&lt;/a&gt; acceleration solution. This solution involved mounting via client-side local RAID5 storage. &lt;strong&gt;With an efficient caching mechanism, it improved read/write performance by about three times. This significantly enhanced the access experience for high-frequency shared data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5qmagek7lw7rkacjfry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5qmagek7lw7rkacjfry.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As application data volume surpassed 300 TB, the scaling limitations of local storage became apparent. Since data was stored locally, scaling required reconfiguring storage devices. Scaling under a RAID5 architecture was slow and risky, making it difficult to meet the needs of continuous application growth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4: JuiceFS + MinIO cluster
&lt;/h3&gt;

&lt;p&gt;To solve the scaling challenge, we ultimately adopted the JuiceFS + MinIO cluster architecture. This solution offers the following advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sustained high performance:&lt;/strong&gt; JuiceFS provides good caching capability, fully meeting the performance demands of high-frequency data access scenarios.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy cluster scaling:&lt;/strong&gt; Based on the clustered solution, we quickly achieved horizontal scaling. Simply by adding disks of the same type, we can flexibly increase storage capacity. This greatly enhances system scalability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq08kj4ctgkxho7l6jz98.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq08kj4ctgkxho7l6jz98.png" alt=" " width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Through this four-stage evolution, we validated the feasibility of the integrated solution combining cache acceleration, elastic &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt;, and &lt;a href="https://en.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt; compatibility in quantitative scenarios.&lt;/strong&gt; This solution can provide the industry with a replicable, implementable best practice template, achieving an excellent balance between performance, cost, and management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and cost benefits
&lt;/h2&gt;

&lt;p&gt;By adopting the combined storage architecture of JuiceFS and MinIO, our system bandwidth and resource utilization efficiency greatly improved. Now they fully meet the storage performance requirements of the research application. After introducing the JuiceFS cache layer, backtesting task execution efficiency increased dramatically. &lt;strong&gt;The time required for backtesting 100 million entries of tick data was reduced from hours to tens of minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foigxqmdzzmq1oxltgeex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foigxqmdzzmq1oxltgeex.png" alt=" " width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We implemented a tiered storage strategy for managing the data lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hot data (0-90 days): This tier handles data that is accessed frequently. To ensure maximum performance, it’s automatically cached on local SSDs.
&lt;/li&gt;
&lt;li&gt;Warm data (90-365 days): Data with medium access frequency resides here. It’s stored on MinIO's standard object storage drives, striking an optimal balance between cost and performance.
&lt;/li&gt;
&lt;li&gt;Cold data (&amp;gt;365 days): This tier is for rarely accessed, archival data. It’s automatically migrated to a low-frequency access storage layer, which is compatible with the S3 Glacier strategy. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on this tiered storage strategy, we achieved a smooth transition from higher to lower storage unit costs. &lt;strong&gt;Our overall storage costs were reduced by over 40%.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multi-tenant management
&lt;/h3&gt;

&lt;p&gt;Regarding data isolation and permission management, we’ve established a comprehensive management system:&lt;br&gt;&lt;br&gt;
Logical isolation is achieved through namespaces, using path planning like &lt;code&gt;/factor/A&lt;/code&gt; and &lt;code&gt;/factor/B&lt;/code&gt; to ensure clear data boundaries for each application. For permission control, we implemented fine-grained management across three dimensions: user, team, and project. They seamlessly integrate with the POSIX &lt;a href="https://en.wikipedia.org/wiki/Access-control_list" rel="noopener noreferrer"&gt;ACL&lt;/a&gt; permission system.  &lt;/p&gt;

&lt;p&gt;We’ve also established a complete audit log system. It enables real-time tracking of access behaviors and historical change backtracking. This fully meets compliance requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability and automated operations
&lt;/h3&gt;

&lt;p&gt;We built a complete monitoring system around four critical metrics: cache hit rate, I/O throughput, I/O latency, and write retry rate. The system automatically triggers alerts when metrics are abnormal.  &lt;/p&gt;

&lt;p&gt;We implemented closed-loop operations management based on Grafana to continuously monitor node health and storage capacity. Before each scaling operation, we did simulated stress tests to verify system capacity and ensure no application impact. The overall operations system achieves high-standard goals of automation, predictability, and rollback capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data update design in the backtesting system
&lt;/h3&gt;

&lt;p&gt;In our &lt;a href="https://en.wikipedia.org/wiki/Backtesting" rel="noopener noreferrer"&gt;backtesting&lt;/a&gt; system design, we adopted an architecture based on directed acyclic graphs (DAGs) to improve computational efficiency and maintainability. This framework centers on computational nodes and dependency relationships, abstracting data processing, feature calculation, signal generation, and other steps into nodes, all managed uniformly through a dependency graph. The system has a built-in version control mechanism. When data versions are updated, the dependency graph automatically identifies affected nodes, precisely locating parts that need recalculation, thereby enabling efficient incremental updates and result traceability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjqfdkkymz5sg19svxpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjqfdkkymz5sg19svxpf.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Plans for the future
&lt;/h2&gt;

&lt;p&gt;In future planning, we’ll continue to optimize the storage architecture in the following three aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata high availability upgrade:&lt;/strong&gt; We plan to migrate metadata storage from Redis to TiKV or PostgreSQL to build a cross-data-center high-availability architecture. This can significantly improve our system disaster recovery and rapid recovery capabilities.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/learn/what-is-hybrid-cloud" rel="noopener noreferrer"&gt;&lt;strong&gt;Hybrid cloud&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;tiered storage:&lt;/strong&gt; By integrating with public cloud S3 and Glacier storage services, we aim to build an intelligent hot/cold tiering system to achieve unlimited storage elasticity while optimizing costs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified management for the research data lake:&lt;/strong&gt; We’ll build a unified research data lake platform, integrating core services such as schema registration, automatic data cleansing, and unified catalog management. We hope to comprehensively improve the discoverability and management efficiency of data assets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Design and Performance Optimization of juice sync for Enterprise Data Synchronization</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Wed, 10 Dec 2025 03:01:54 +0000</pubDate>
      <link>https://dev.to/daswu/design-and-performance-optimization-of-juice-sync-for-enterprise-data-synchronization-176g</link>
      <guid>https://dev.to/daswu/design-and-performance-optimization-of-juice-sync-for-enterprise-data-synchronization-176g</guid>
      <description>&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/command_reference/#sync" rel="noopener noreferrer"&gt;&lt;code&gt;juicefs sync&lt;/code&gt;&lt;/a&gt; is a powerful data synchronization tool that supports data transfer between object storage, &lt;a href="https://juicefs.com/docs/community/introduction" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, and local file systems. It can maximize the efficiency of large-scale full migrations, achieving transfer speeds close to the upper limit of the network bandwidth. This makes petabyte-level data migration feasible within days or weeks (the actual duration depends on bandwidth and cloud throttling policies). Beyond this, it also supports accessing remote directories via SSH, HDFS, and WebDAV, while offering advanced features such as incremental synchronization, pattern matching (similar to rsync), and distributed synchronization.  &lt;/p&gt;

&lt;p&gt;In this article, we’ll focus on the core execution principles of &lt;code&gt;sync&lt;/code&gt;, introducing its architecture and task scheduling mechanisms in both standalone and cluster modes. We’ll also provide a detailed explanation of how performance is improved through concurrency optimizations. In addition, the article will cover &lt;code&gt;sync&lt;/code&gt;’s multi-dimensional filtering capabilities, flexible path-based pattern matching, and data consistency verification mechanisms. We aim to help users better understand and utilize this tool for efficient and reliable data synchronization.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;The core execution principle of &lt;code&gt;sync&lt;/code&gt; is consistent between standalone mode and cluster mode. The main differences lie in the deployment approach and the task distribution mechanism. Therefore, this section uses standalone mode as an example to explain the principles. We’ll explain the differences in cluster mode later.  &lt;/p&gt;

&lt;p&gt;As shown in the figure below, we can abstract the architecture of &lt;code&gt;sync&lt;/code&gt; as a typical producer–consumer model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The producer traverses all keys in the source and target object storage, merges the ordered results from both ends, determines whether objects need to be synchronized, deleted, or validated based on this, and writes the generated tasks into a task queue.
&lt;/li&gt;
&lt;li&gt;The consumer continuously retrieves tasks from the queue and performs the actual operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In standalone mode, the producer and consumer run within the same process and collaborate concurrently through &lt;a href="https://go.dev/tour/concurrency" rel="noopener noreferrer"&gt;goroutines&lt;/a&gt; (lightweight threads).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8n21zdzcb08w6rsjesgm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8n21zdzcb08w6rsjesgm.png" alt=" " width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the producer stage, the reason for simultaneously traversing the target end is that &lt;code&gt;sync&lt;/code&gt; synchronizes data from the source and also supports the deletion of redundant objects on the target according to configuration. By combining the existence and different status of objects on the target, it determines what type of task to generate for each key. Currently, it supports five types of tasks: sync, delete from source, delete from target, copy permissions, and verify.&lt;br&gt;&lt;br&gt;
Next, we’ll further elaborate on the specific mechanism of task generation at the producer stage.  &lt;/p&gt;

&lt;p&gt;To improve task generation efficiency, &lt;code&gt;sync&lt;/code&gt; requires that the traversal (&lt;code&gt;list&lt;/code&gt;) interface of the object storage must return results in an ordered manner, that is, sorted by key in ascending order (such as lexicographically). Relying on this ordering, the producer can perform linear scans on both the source and target object storage simultaneously and compare the current key values one by one to determine whether a synchronization task needs to be generated.  &lt;/p&gt;

&lt;p&gt;Taking the merge process shown in the figure as an example, &lt;code&gt;sync&lt;/code&gt; traverses the source and target synchronously. It reads the key at the current position from each side, for example, &lt;code&gt;file1&lt;/code&gt; from the source and &lt;code&gt;file2&lt;/code&gt; from the target:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkn4aoligfj0xhxt74pb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkn4aoligfj0xhxt74pb.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;juicefs sync&lt;/code&gt; file filtering workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;src key&lt;/code&gt; &amp;lt; &lt;code&gt;dst key&lt;/code&gt; (for example, &lt;code&gt;file1&lt;/code&gt; &amp;lt; &lt;code&gt;file2&lt;/code&gt;), a synchronization task is generated. Then, the source pointer moves forward to continue comparing with the current target key.
&lt;/li&gt;
&lt;li&gt;If the keys at both ends are the same (for example, both are &lt;code&gt;file2&lt;/code&gt;), it means the object exists on both sides. Whether to skip or perform further verification can be judged based on policy. Both pointers then move forward simultaneously.
&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;src key&lt;/code&gt; &amp;gt; &lt;code&gt;dst key&lt;/code&gt; (for example, &lt;code&gt;file4&lt;/code&gt; &amp;gt; &lt;code&gt;file3&lt;/code&gt;), it indicates an extra object exists on the target. This possibly requires a deletion task. Then, the target pointer moves forward.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The comparison logic continues until all keys from both source and target have been traversed. Thereby this constructs a complete task queue. The primary advantage of this strategy is that it requires only a linear scan to assess the state of all objects and quickly generate tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance optimization
&lt;/h2&gt;

&lt;p&gt;After understanding what the producer and consumer do, let’s examine where performance bottlenecks mainly occur and how to optimize them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Produce optimization
&lt;/h3&gt;

&lt;p&gt;During producer execution, list operations on &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; can become a performance bottleneck. The &lt;code&gt;list&lt;/code&gt; interfaces of most object storage systems do not support specifying a range to return results for a particular interval, unlike segmented downloads where list operations for the same directory can be split into multiple intervals for parallel execution. Therefore, &lt;code&gt;sync&lt;/code&gt;'s optimization strategy is: execute list operations sequentially within a single directory, but execute list operations concurrently across multiple directories, and then aggregate the results from each directory into a complete key set.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf86vuecosfn4qny8fvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf86vuecosfn4qny8fvj.png" alt=" " width="800" height="692"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In terms of the traversal method:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first step is a breadth-first traversal with limited depth on the directory tree to find all subdirectories. This achieves directory-level splitting.
&lt;/li&gt;
&lt;li&gt;The second step involves a full traversal of each subdirectory, listing all objects under that prefix. This step requires that the list of objects be ordered.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7254tawy0myfazd4crx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7254tawy0myfazd4crx.png" alt=" " width="800" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Consumer optimization
&lt;/h3&gt;

&lt;p&gt;The entire transfer process is an end-to-end pipeline: the consumer reads data from the source and writes it directly to the target, without passing through the local disk. Therefore, any bottleneck in either reading from the source or writing to the target can become a bottleneck in the synchronization flow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr5q44j9isrkvm13i6o8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr5q44j9isrkvm13i6o8.png" alt=" " width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During the upload process, &lt;code&gt;sync&lt;/code&gt; supports concurrent execution of multiple tasks, handled by multiple goroutines. From a task perspective, the system possesses a degree of parallel transmission capability. &lt;strong&gt;The optimization focus is on enhancing read/write concurrency for large files.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Therefore, &lt;code&gt;sync&lt;/code&gt; utilizes the object storage's multipart upload capability. It splits a large file into multiple parts, uploads them in parallel via multiple goroutines, and finally completes the merge via a &lt;code&gt;combine&lt;/code&gt; interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqzqh87nxxfqosfikzl4j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqzqh87nxxfqosfikzl4j.png" alt=" " width="800" height="695"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For extremely large files, we can further split the parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload several intermediate objects.
&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;UploadPartCopy&lt;/code&gt; interface to perform splitting and merging of these intermediate objects directly within the object storage service.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This two-level splitting + two-level merging for the same file significantly improves the concurrency and efficiency of large file transfers.  &lt;/p&gt;

&lt;p&gt;Compared to the write path, the read-side concurrency logic is simpler: split the object into chunks and read each chunk concurrently. These concurrently read chunks form a &lt;a href="https://juicefs.com/docs/community/guide/cache/#readahead-prefetch" rel="noopener noreferrer"&gt;readahead&lt;/a&gt; window. This means data blocks ahead of the current read position (offset) are pre-downloaded and cached as much as possible. The readahead mechanism helps reduce wait times when source read performance is insufficient or network jitter occurs, preventing the read side from becoming the performance bottleneck in the transfer pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0r4o27mij8l8y3jb8mxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0r4o27mij8l8y3jb8mxp.png" alt=" " width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster mode
&lt;/h2&gt;

&lt;p&gt;For larger data scales where single-node throughput is insufficient, &lt;code&gt;sync&lt;/code&gt; also supports cluster mode. By horizontally scaling workers across multiple nodes to execute tasks in parallel, overall synchronization efficiency can be further enhanced.&lt;br&gt;&lt;br&gt;
Cluster mode does not introduce a new synchronization algorithm. The changes relative to standalone mode are mainly reflected in the physical deployment method and task distribution channels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Physical deployment differences
&lt;/h3&gt;

&lt;p&gt;In standalone mode, the producer and consumer run within the same process, cooperating via different goroutines. In cluster mode, consumers can be independently deployed to multiple nodes as worker nodes, while the producer serves as the manager node. This enables horizontal scaling.  &lt;/p&gt;

&lt;p&gt;Before enabling cluster mode, SSH password-free login needs to be configured between the manager and each worker node. After starting cluster mode, the manager distributes the &lt;code&gt;juicefs&lt;/code&gt; binary to the &lt;code&gt;/tmp&lt;/code&gt; directory of each worker node and starts the worker process remotely via SSH to join the synchronization task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xgzxa6lj7ucmyx34yky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xgzxa6lj7ucmyx34yky.png" alt=" " width="531" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Task distribution mechanism differences
&lt;/h3&gt;

&lt;p&gt;In standalone mode, the producer and consumer pass tasks via Go's &lt;code&gt;chan&lt;/code&gt; within the same process. In cluster mode, task distribution switches to an HTTP-based approach—the manager provides a task allocation API, and each worker periodically polls the manager to pull tasks for execution. Except for the communication method, the rest of the flow is consistent with standalone mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Task filtering
&lt;/h2&gt;

&lt;p&gt;In real-world synchronization scenarios, users typically only want to perform synchronization or cleanup operations on objects meeting specific criteria. &lt;code&gt;sync&lt;/code&gt; supports filtering objects from multiple dimensions during the task generation stage and decides whether to generate a task and its type based on the filtering results. Overall, filtering and adjustment mainly include the following categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Based on time:&lt;/strong&gt; Filter keys according to the file's &lt;code&gt;mtime&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Based on size:&lt;/strong&gt; Filter keys according to the file's size.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Based on key:&lt;/strong&gt; Filter keys through pattern matching rules for paths/names.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Based on result:&lt;/strong&gt; Filter or adjust task types by combining the key's existence status or differences on the target.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r43aiqkhfvq9ssu0hqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r43aiqkhfvq9ssu0hqg.png" alt=" " width="800" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Among these, filtering based on key is the most commonly used and flexible. &lt;code&gt;sync&lt;/code&gt; uses the &lt;code&gt;--exclude&lt;/code&gt; and &lt;code&gt;--include&lt;/code&gt; parameters for path pattern matching to precisely control the synchronization scope.  &lt;/p&gt;

&lt;p&gt;Currently, two matching modes are supported: the layer-by-layer filtering mode and the full-path filtering mode. The layer-by-layer mode is compatible with rsync's rules but complex to use. For details, see this &lt;a href="https://juicefs.com/docs/community/guide/sync/#layer-by-layer-filtering-mode" rel="noopener noreferrer"&gt;document&lt;/a&gt;. The following will focus on the more intuitive and easier-to-use full-path filtering mode.  &lt;/p&gt;

&lt;p&gt;Since v1.2.0, the &lt;code&gt;sync&lt;/code&gt; command added the &lt;code&gt;--match-full-path&lt;/code&gt; option to enable the full-path filtering mode. In this mode, &lt;code&gt;sync&lt;/code&gt; will match the full path of the object to be processed sequentially against multiple configured rules. Once a rule is matched, the processing result for that object (synchronize or exclude) is immediately determined, and subsequent rules are no longer evaluated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43c8l5h7nv3fpyd36eqj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43c8l5h7nv3fpyd36eqj.png" alt=" " width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, consider a file with the &lt;code&gt;a1/b1/c1.txt&lt;/code&gt; path and three matching patterns: &lt;code&gt;--include 'a*.txt'&lt;/code&gt;, &lt;code&gt;--include 'c1.txt'&lt;/code&gt;, and &lt;code&gt;--exclude 'c*.txt'&lt;/code&gt;. In full-path filtering mode, the string &lt;code&gt;a1/b1/c1.txt&lt;/code&gt; is directly evaluated against the patterns sequentially:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Evaluate &lt;code&gt;a1/b1/c1.txt&lt;/code&gt; against &lt;code&gt;--include 'a*.txt'&lt;/code&gt;. The result is no match because &lt;code&gt;*&lt;/code&gt; cannot match the &lt;code&gt;/ character&lt;/code&gt;. See &lt;a href="https://juicefs.com/docs/community/guide/sync/#matching-rules" rel="noopener noreferrer"&gt;Matching rules&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Evaluate &lt;code&gt;a1/b1/c1.txt&lt;/code&gt; against &lt;code&gt;--include 'c1.txt'&lt;/code&gt;. According to the matching rules, this will match successfully. Although the subsequent &lt;code&gt;--exclude 'c*.txt'&lt;/code&gt; would also match according to the rules, based on the logic of the full-path filtering mode, once a pattern is matched, subsequent patterns are no longer attempted. Therefore, the final match result is "synchronize".&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;More examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--exclude '/foo**'&lt;/code&gt; excludes all files or directories whose root directory name is &lt;code&gt;foo&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--exclude '**foo/**'&lt;/code&gt; excludes all directories ending with &lt;code&gt;foo&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--include '*/'&lt;/code&gt;, &lt;code&gt;--include '*.c'&lt;/code&gt;, and &lt;code&gt;--exclude '*'&lt;/code&gt; includes only all directories and files with the &lt;code&gt;.c&lt;/code&gt; suffix; all other files and directories are excluded.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--include 'foo/bar.c'&lt;/code&gt; and &lt;code&gt;--exclude '*'&lt;/code&gt; includes only the &lt;code&gt;foo&lt;/code&gt; directory and the &lt;code&gt;foo/bar.c&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data integrity verification
&lt;/h2&gt;

&lt;p&gt;Beyond high-concurrency transfers, correctness of the synchronization result is crucial. To mitigate risks from network jitter, concurrent writes during synchronization, and differences in consistency semantics across storage backends, &lt;code&gt;juicefs sync&lt;/code&gt; provides a multi-level data integrity verification mechanism. This allows users to balance reliability and resource overhead based on application requirements.  &lt;/p&gt;

&lt;p&gt;Specifically, &lt;code&gt;sync&lt;/code&gt; supports different verification levels when executing sync tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stricter verification provides stronger guarantees for data consistency between source and target but introduces higher read amplification and CPU consumption.
&lt;/li&gt;
&lt;li&gt;More relaxed verification has less impact on performance but correspondingly increases tolerance for potential inconsistencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users can choose suitable verification parameters based on data importance and performance costs.&lt;br&gt;&lt;br&gt;
Taking the highest-level verification &lt;code&gt;check-all&lt;/code&gt; as an example, its process has three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-transfer comparison:&lt;/strong&gt; When determining whether an object needs synchronization, &lt;code&gt;sync&lt;/code&gt; reads and compares the data content from the source and target. If the content differs, a synchronization task is generated and executed.
In extreme scenarios, it might be necessary to read to the last byte to confirm inconsistency. At this point, both source and target undergo one full read before the transfer. This results in approximately 1x additional read traffic.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transfer process verification:&lt;/strong&gt; During synchronization, source data is read entirely again and written to the target. This results in approximately 1x read traffic on the source and 1x write traffic on the target. Simultaneously, &lt;code&gt;sync&lt;/code&gt; calculates a checksum while reading the source data, obtaining the object's verification value after the read is complete.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post-transfer verification:&lt;/strong&gt; To verify the correctness of network transmission and target-side writing, &lt;code&gt;sync&lt;/code&gt; reads the target object once more, calculates its checksum, and compares it with the checksum obtained during the transfer. If they are inconsistent, a data error has occurred, and an error is reported.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The table below shows comparison of &lt;code&gt;sync&lt;/code&gt; verification modes and traffic consumption.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4t5elo2qhbsqdkgrh95p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4t5elo2qhbsqdkgrh95p.png" alt=" " width="800" height="651"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;check-new&lt;/code&gt; can be seen as a subset of &lt;code&gt;check-all&lt;/code&gt;: it only performs the above content verification on new or objects requiring transmission to achieve a balance between higher reliability and lower overhead.  &lt;/p&gt;

&lt;p&gt;&lt;code&gt;check-change&lt;/code&gt; adopts a weaker verification approach: it does not directly compare content but infers whether an object has changed based on metadata like &lt;code&gt;mtime&lt;/code&gt; and size. Since this strategy does not verify the actual content, there is a possibility that metadata remains unchanged while content is different. Therefore, this offers the lowest verification level with the smallest performance overhead&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article systematically introduced the architectural design, performance optimization, cluster mode, and task filtering mechanisms of &lt;code&gt;juicefs sync&lt;/code&gt;. We aim to help users gain a deeper understanding of its implementation principles and usage methods. We’ll continue to optimize and evolve &lt;code&gt;sync&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>NAS vs. Object Storage vs. JuiceFS: Storage Selection of Billion-Dollar Quantitative Firms</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 28 Nov 2025 06:09:28 +0000</pubDate>
      <link>https://dev.to/daswu/nas-vs-object-storage-vs-juicefs-storage-selection-of-billion-dollar-quantitative-firms-1k37</link>
      <guid>https://dev.to/daswu/nas-vs-object-storage-vs-juicefs-storage-selection-of-billion-dollar-quantitative-firms-1k37</guid>
      <description>&lt;p&gt;In the &lt;a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)" rel="noopener noreferrer"&gt;quantitative investment&lt;/a&gt; field, the performance and scalability of the storage system support efficient research and computational tasks. &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, an open-source high-performance distributed file system, has become the storage backbone for multiple top-tier, billion-dollar quantitative investment firms. It delivers high-performance, cost-effective, and elastically scalable storage infrastructure for their core operations—including backtesting and model training.  &lt;/p&gt;

&lt;p&gt;This article shares the critical storage challenges faced by quantitative firms and how JuiceFS addresses them. We’ll explore typical case studies focusing on cost optimization, metadata performance improvement, and seamless cloud migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data types and storage challenges in the quantitative industry
&lt;/h2&gt;

&lt;p&gt;First, let's understand the data usage patterns in quantitative research. The programming languages and data processing tools used are diverse, including Python, R, Java, MATLAB, and Conda. The data types involved are very rich, such as CSV, Parquet, TXT, Excel, and HDF. This requires the storage platform to be flexible enough to support various data formats efficiently.  &lt;/p&gt;

&lt;p&gt;The table below shows common data types in quantitative investing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data type&lt;/th&gt;
&lt;th&gt;Data characteristics&lt;/th&gt;
&lt;th&gt;Data format&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Order book&lt;/td&gt;
&lt;td&gt;GB~TB per day&lt;/td&gt;
&lt;td&gt;Parquet, NPY&lt;/td&gt;
&lt;td&gt;High-frequency backtesting, market making&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tick data&lt;/td&gt;
&lt;td&gt;GB~TB per day&lt;/td&gt;
&lt;td&gt;Parquet, NPY&lt;/td&gt;
&lt;td&gt;High-frequency backtesting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K-line / bar data&lt;/td&gt;
&lt;td&gt;MB~GB per year&lt;/td&gt;
&lt;td&gt;Parquet&lt;/td&gt;
&lt;td&gt;Low-frequency backtesting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Factor library&lt;/td&gt;
&lt;td&gt;Tens of GB~TB per year&lt;/td&gt;
&lt;td&gt;Parquet, Feather&lt;/td&gt;
&lt;td&gt;Trading reference&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In quantitative investing, data usage exhibits a distinct read-intensive, write-light pattern. Researchers frequently read data from unified data sources, while data writing and updating occur infrequently. Various types of market data and factor data are typically stored centrally and shared for read access by all researchers. Both factor research and trading strategy development involve repeated access and utilization of foundational data. Consequently, storage systems must deliver high-performance read capabilities to support rapid data access and analytical requirements.  &lt;/p&gt;

&lt;p&gt;This data usage pattern is also fully reflected in the critical task of quantitative research: &lt;a href="https://en.wikipedia.org/wiki/Backtesting" rel="noopener noreferrer"&gt;backtesting&lt;/a&gt;. In a typical backtesting workflow, researchers repeatedly read historical market data to construct factors, generate trading signals, and validate strategy performance. Stable factors with strong predictive power are incorporated into factor libraries, serving as critical references for investment decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyux3rqudcl4orzij776s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyux3rqudcl4orzij776s.png" alt=" " width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Consequently, storage systems face a series of challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The ability to handle diverse programming languages and data processing tools.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The requirement for high performance despite low capacity needs.&lt;/strong&gt; Although storage capacity requirements typically range from tens to hundreds of terabytes, the performance demands on the storage system are extremely high, particularly regarding throughput and IOPS. Since quantitative tasks involve extensive &lt;a href="https://en.wikipedia.org/wiki/Machine_learning" rel="noopener noreferrer"&gt;machine learning&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Deep_learning" rel="noopener noreferrer"&gt;deep learning&lt;/a&gt; computations, the storage system must provide efficient sequential read throughput and fast random read IOPS to facilitate rapid data processing and analysis.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptability to cloud environments.&lt;/strong&gt; As more quantitative teams adopt public or &lt;a href="https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-hybrid-cloud-computing" rel="noopener noreferrer"&gt;hybrid clouds&lt;/a&gt; to build their technology platforms, storage systems need to possess high availability, elastic scalability, and flexible resource management capabilities. This meets dynamic storage demands and allows for quick adjustment of storage resources based on computational needs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration with Kubernetes, supporting CSI drivers.&lt;/strong&gt; This is because Kubernetes has become the mainstream solution for resource orchestration in quantitative research. This integration enhances platform flexibility and scalability and enables more efficient resource management for storage systems in cloud environments.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data security and &lt;a href="https://en.wikipedia.org/wiki/Access_control" rel="noopener noreferrer"&gt;access control&lt;/a&gt; requirements.&lt;/strong&gt; Protecting sensitive information like factor data and trading strategies is a top priority. The storage system must provide strong access control mechanisms to ensure data privacy and security. As more data moves to the cloud, features like encryption, auditing, and compliance management become essential. These measures guarantee that sensitive data remains secure in cloud environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Analysis of main storage solutions: NAS vs. object storage vs. JuiceFS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  NAS
&lt;/h3&gt;

&lt;p&gt;Many quantitative teams initially choose &lt;a href="https://en.wikipedia.org/wiki/Network-attached_storage" rel="noopener noreferrer"&gt;network-attached storage&lt;/a&gt; (NAS) for its out-of-the-box simplicity, typically using NFS or Samba protocols. This works well for data scales from tens to hundreds of terabytes in on-premises data centers. It’s easy to deploy and manage with controllable costs. However, several issues often emerge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; High-performance storage in on-premise data centers often requires all-flash NAS systems, which are expensive. Hybrid flash systems (mixing SSD and HDD) often lack the performance for high-concurrency access. This becomes a bottleneck when multiple researchers read data simultaneously.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud NAS performance tied to capacity:&lt;/strong&gt; In public cloud environments, the throughput of many NAS or parallel file system products is directly tied to purchased capacity. If a team needs 100 TB of storage but requires high throughput for backtesting (for example, 50+ GB/s reads or millions of IOPS), they might be forced to purchase PB-level capacity. This leads to waste.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol-level bottlenecks:&lt;/strong&gt; NAS protocols rely on kernel clients. Under high load, data and metadata competing for the same channel (prior to NFS v4.2) can cause congestion, performance drops, and latency spikes, especially during peak research or backtesting periods.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3j6uipssuqfifansjny.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3j6uipssuqfifansjny.png" alt=" " width="331" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;While NAS offers simplicity and good compatibility, its cost, scalability, and high-concurrency performance are significant limitations.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Object storage
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;Object storage&lt;/a&gt; is a low-cost alternative. While it has higher single-request latency, it handles large sequential reads and writes effectively. Many teams adapt their code for sequential access, but this approach has disadvantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Object storage can only be accessed via the S3 protocol.&lt;/strong&gt; For applications already using POSIX mount points, migrating to object storage requires code modifications, which introduces intrusion into the existing system. Furthermore, object storage natively lacks efficient metadata management. Due to its flat structure, the response speed for I/O operations is slow, especially for operations like file renaming. Renaming a file essentially requires deleting the original file and creating a copy with the new name, which performs poorly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Files on object storage are immutable.&lt;/strong&gt; This means that if data needs to be modified (for example, adding a column to a Parquet file), it can only be achieved by creating a new file, rather than modifying the original file in place. This immutability can cause significant inconvenience in certain scenarios, such as when dealing with database files.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent read operations on object storage incur extra data transfer fees.&lt;/strong&gt; For the high-frequency read/write demands typical in quantitative finance, this can lead to substantial costs. Moreover, although object storage performs well for large sequential throughput, its performance is entirely dependent on the storage system itself. Under high-concurrency access patterns, it may still fail to meet the low-latency and real-time requirements crucial for quantitative research.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g57db3zif74twp3y6it.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g57db3zif74twp3y6it.png" alt=" " width="238" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  JuiceFS
&lt;/h3&gt;

&lt;p&gt;JuiceFS is a POSIX-compatible distributed file system built on object storage. It manages both structured and unstructured data, providing a unified, high-performance storage platform for quantitative research. With &lt;a href="https://juicefs.com/docs/csi/introduction/" rel="noopener noreferrer"&gt;CSI Driver&lt;/a&gt; support, it integrates deeply with Kubernetes and offers cloud-native flexibility for deployment across public, private, or hybrid cloud environments.&lt;/p&gt;

&lt;h4&gt;
  
  
  Architecture design
&lt;/h4&gt;

&lt;p&gt;JuiceFS employs a separated &lt;a href="https://juicefs.com/docs/community/architecture" rel="noopener noreferrer"&gt;architecture&lt;/a&gt; for data and metadata management:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is split into chunks and stored in object storage.
&lt;/li&gt;
&lt;li&gt;Metadata is handled by a dedicated engine and cached in memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design delivers significant performance improvements. File metadata operations—including &lt;code&gt;rename&lt;/code&gt;, &lt;code&gt;lookup&lt;/code&gt;, and &lt;code&gt;getattr&lt;/code&gt;—avoid frequent access to underlying storage. The system achieves high throughput, low latency, and high IOPS performance.&lt;/p&gt;

&lt;h4&gt;
  
  
  Security and access control
&lt;/h4&gt;

&lt;p&gt;JuiceFS provides comprehensive security features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token-based access control
&lt;/li&gt;
&lt;li&gt;POSIX ACL support
&lt;/li&gt;
&lt;li&gt;Data encryption during transmission and at rest
&lt;/li&gt;
&lt;li&gt;Data chunking in object storage and cache for enhanced security and compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Product editions
&lt;/h4&gt;

&lt;p&gt;JuiceFS offers two editions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Community Edition: Users select their preferred metadata engine for scenario flexibility.
&lt;/li&gt;
&lt;li&gt;Enterprise Edition: Includes high-performance metadata service and &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache" rel="noopener noreferrer"&gt;distributed caching&lt;/a&gt; for demanding workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw8xpvhveuu08opupjfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw8xpvhveuu08opupjfb.png" alt=" " width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Deployment process
&lt;/h4&gt;

&lt;p&gt;Users deploy JuiceFS through a simple process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download and install the JuiceFS binary.
&lt;/li&gt;
&lt;li&gt;Mount using the &lt;code&gt;juicefs mount&lt;/code&gt; command.
&lt;/li&gt;
&lt;li&gt;Access via POSIX mount points.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach maintains compatibility with existing applications, working like a local file system.&lt;/p&gt;

&lt;h4&gt;
  
  
  Advantages for quantitative research
&lt;/h4&gt;

&lt;p&gt;This architecture delivers significant advantages for the quantitative industry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High throughput and IOPS support:&lt;/strong&gt; Data usage patterns in quantitative research are typically read-heavy write-light. Especially for tasks like backtesting and factor research, frequent reading of large datasets is required. JuiceFS' chunk-based storage and efficient multi-level &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache/" rel="noopener noreferrer"&gt;data caching&lt;/a&gt; provide high throughput and IOPS support. This meets the demanding data access requirements of quantitative research.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic scaling and cost reduction:&lt;/strong&gt; Quantitative research often faces dynamic storage demands. JuiceFS inherently provides elastic scaling, allowing it to dynamically adjust storage and cache resources based on requirements. By using object storage as the underlying architecture combined with distributed caching, JuiceFS significantly reduces storage costs and avoids the high expense of all-flash storage.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data security:&lt;/strong&gt; In quantitative investing and research, factor data and trading strategies are core assets, and they are very important. JuiceFS effectively ensures data security through data chunking and encryption mechanisms. Particularly for institutions hesitant about moving to the cloud, this chunking and encryption strategy significantly reduces the risk of data leakage. Even if partial data is intercepted, attackers cannot reconstruct the original content without the metadata. This ensures the security and compliance of sensitive information.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud environment compatibility:&lt;/strong&gt; With the increasing adoption of cloud computing and &lt;a href="https://aws.amazon.com/what-is/public-cloud/" rel="noopener noreferrer"&gt;public clouds&lt;/a&gt;, quantitative research is gradually migrating to cloud environments. JuiceFS supports cross-cloud synchronization and can operate efficiently in multi-cloud or hybrid cloud environments. This provides greater flexibility and scalability for quantitative research.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For on-premises quantitative firms, JuiceFS is also a viable and efficient choice. Traditional on-premises storage often uses hybrid flash or hard disk drives. Their performance often cannot meet the high-concurrency demands of quantitative research. By fully utilizing the local cache disks of application nodes and an efficient caching mechanism, JuiceFS can achieve performance close to all-flash at a low cost, significantly reducing hardware investment. Its operational simplicity and proven stability ensure reliable performance with light maintenance, as confirmed by customer feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case studies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Enhancing object storage performance and reducing costs
&lt;/h3&gt;

&lt;p&gt;To further reduce storage costs, one of our customers migrated data to object storage. In their backtesting operations, they initially used a self-developed interface to directly access Amazon S3 object storage via the S3 API. While this solution could handle their primary workload of writing large data blocks and performing sequential reads—even without caching—it had several notable drawbacks.  &lt;/p&gt;

&lt;p&gt;Using the S3 API was intrusive to the application code. This means that if API issues arose, debugging and maintaining the application code would become very difficult. In addition, this API didn't implement caching and only handled protocol conversion. This imposed certain performance limitations. For these reasons, the application team didn't want to rely on this model for a long time.  &lt;/p&gt;

&lt;p&gt;To improve this situation, the company decided to migrate to &lt;a href="https://juicefs.com/en/product/enterprise-edition/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt;. After migration, the application code no longer depended on the S3 API. JuiceFS, as a rich client, was directly mounted on the application nodes. This approach made the application code almost non-intrusive, allowing developers to continue compiling and running code in the original way. Meanwhile, the NVMe drives in the application nodes were used as distributed cache. This effectively optimized data access efficiency.&lt;/p&gt;

&lt;p&gt;Using JuiceFS brought significant performance improvements. This enabled the client to achieve high-performance read/write operations on low-cost object storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local cache avoided repeated access to remote storage. This significantly reduced read latency.
&lt;/li&gt;
&lt;li&gt;JuiceFS’ chunking and &lt;a href="https://juicefs.com/en/blog/engineering/optimize-read-performance" rel="noopener noreferrer"&gt;readahead&lt;/a&gt; mechanisms fully utilized object storage bandwidth. They enabled even single-file reads to reach system performance limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, the migrated system showed noticeable improvements in both performance and maintenance convenience.&lt;br&gt;&lt;br&gt;
This company’s neural network training tasks also benefited from JuiceFS optimizations. During training, the system read approximately 100,000 files (each about 24 MB) and repeatedly read them after random shuffling. Without caching, each training epoch required reloading all files, generating massive network traffic. After implementing JuiceFS, data from the first read was automatically cached locally. This allowed subsequent training to directly fetch data from the cache pool. Bandwidth usage dramatically reduced and the overall training efficiency significantly improved. This fully met the company’s storage requirements for training.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extremely high metadata demands
&lt;/h3&gt;

&lt;p&gt;Another customer primarily uses JuiceFS for backtesting operations. They purchase market data from external sources, typically in CSV format, and store it in directories. Unlike other customers who aggregate data, this company stores raw data directly in folders. This creates a heavier data access burden. Historical market data for each stock is stored separately, with each file representing one year of data for a single stock. These files are distributed across different directories by month and year. &lt;strong&gt;Each time market data for a specific date needs to be read, the system must randomly read and process about 6,000 files and then arrange these files in sequence. To support backtesting, this data is used to launch hundreds of virtual machines, running 4,000 to 6,000 backtesting tasks.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;This process of randomly reading large numbers of files placed enormous pressure on the storage system, particularly in high-concurrency read/write scenarios where metadata access efficiency severely impacted overall performance. The company initially used JuiceFS Community Edition combined with cloud-based Redis service to cache some data. While Redis provided good performance, its QPS limit of 500,000 couldn't meet the demands of the company’s high-concurrency backtesting tasks. When backtesting tasks reached 12,000, Redis performance hit a bottleneck. This caused the entire system to slow down significantly and prevented completion of backtesting tasks.  &lt;/p&gt;

&lt;p&gt;To solve this problem, the company tried JuiceFS Enterprise Edition. The metadata service in the Enterprise Edition showed significant performance improvements, particularly in handling large volumes of metadata requests with higher concurrency support. The Enterprise Edition's metadata service adopts a directory-tree based management approach and provides more efficient request processing. Customers no longer need to request multiple operations through key-value storage; instead, they simply inform the metadata system to read files through simple requests. &lt;strong&gt;The per-request processing efficiency of the Enterprise Edition metadata service is approximately 6 times that of the Community Edition. This means that for the same application operations, the required QPS is only one-sixth of the Community Edition's requirement. This results in higher overall execution efficiency.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;In addition, JuiceFS Enterprise Edition provides a more powerful metadata client caching mechanism: numerous read-only requests (such as &lt;code&gt;getattr&lt;/code&gt; and &lt;code&gt;lookup&lt;/code&gt;) can directly hit the cache, significantly reducing the access pressure on the metadata service.  &lt;/p&gt;

&lt;p&gt;As shown in the figure below, the system handled about 500,000 QPS of read-only metadata requests, with only about 40,000 actually reaching the metadata server. Combined with the efficient processing capability of the Enterprise Edition metadata service, the overall system maintained stable operation in high-concurrency scenarios, with CPU utilization at only about 40%, enabling the company to stably support up to 12,000 backtesting tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bkoiow4uz2s8ajwjncp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bkoiow4uz2s8ajwjncp.png" alt=" " width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From this case, we can see that JuiceFS Enterprise Edition effectively addresses the challenges of high-concurrency read/write operations, high IOPS, and a lot of metadata requests in the quantitative industry. Even without optimizing for specific application scenarios, JuiceFS efficiently handles these intensive operational demands. It meets the quantitative industry's stringent requirements for data access speed and responsiveness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overcoming cloud NAS IOPS bottlenecks
&lt;/h3&gt;

&lt;p&gt;In this case, our customer faced cloud storage performance issues when migrating their application to the cloud. They initially used cloud NAS to support their storage needs but quickly discovered that as the application expanded, the cloud NAS performance couldn't meet IOPS requirements. While using cloud NAS, the company’s 120 nodes already saturated the IOPS limits of standard cloud NAS. Even using high-speed cloud NAS (providing about 200,000 IOPS) still couldn't fully meet performance demands.  &lt;/p&gt;

&lt;p&gt;To address this bottleneck, the company decided to switch to JuiceFS. Through JuiceFS' distributed caching, the company could aggregate more IOPS performance. Theoretically, 5 cache nodes could provide 200,000 IOPS each. With local hard drive support, they could ultimately deliver over 1 million IOPS. This approach allowed the company to flexibly combine cache resources and scale storage performance on demand.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In extreme scenarios, the company achieved 7.5 GB/s throughput and nearly 3 million IOPS. This was sufficient to handle large-scale random read/write scenarios.&lt;/strong&gt; Specifically, the company could increase throughput by adding network bandwidth and improve IOPS performance by increasing the number of cache nodes. Each cache node can provide about 100,000 to 200,000 IOPS. The company can also expand capacity by mounting additional hard drives or utilizing memory as needed. This allows the company to freely combine and adjust cache pool capabilities according to actual application requirements, ensuring system performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7xbubr3cd3dea5ghtkj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7xbubr3cd3dea5ghtkj.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Based on the multiple cases discussed above, we can summarize several distinctive characteristics of quantitative research regarding storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quantitative applications typically have small data volumes but extremely high performance requirements. JuiceFS significantly improves access efficiency through data and metadata separation and caching mechanisms. Even without caching, it can fully utilize bandwidth through &lt;a href="https://juicefs.com/en/blog/engineering/optimize-read-performance" rel="noopener noreferrer"&gt;readahead&lt;/a&gt; and data chunking upload mechanisms. This ensures high-performance data processing capabilities.
&lt;/li&gt;
&lt;li&gt;JuiceFS' data chunking and encryption mechanisms enhance cloud storage security. Data is split into chunks and encrypted before storage. Even if partial data is stolen, attackers cannot reconstruct the files without metadata. This design is particularly suitable for protecting sensitive data like quantitative factors and trading strategies.
&lt;/li&gt;
&lt;li&gt;As quantitative applications gradually migrate to cloud environments, JuiceFS provides a solution that balances performance and cost. It achieves high-performance access and unified management on top of object storage. This meets diverse computing needs in public and hybrid cloud environments and builds scalable, reliable data infrastructure for quantitative institutions. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>Deep Dive into the JuiceFS Garbage Collection Mechanism</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 07 Nov 2025 07:12:45 +0000</pubDate>
      <link>https://dev.to/daswu/deep-dive-into-the-juicefs-garbage-collection-mechanism-n2n</link>
      <guid>https://dev.to/daswu/deep-dive-into-the-juicefs-garbage-collection-mechanism-n2n</guid>
      <description>&lt;p&gt;When using &lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt;, you might have these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why isn't the object storage space freed up immediately after deleting files?
&lt;/li&gt;
&lt;li&gt;How can I efficiently clean up a large number of files accumulating in the trash?
&lt;/li&gt;
&lt;li&gt;Why is the deletion operation so slow or performance degraded when deleting files in batches within a short period of time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow behind JuiceFS' &lt;a href="https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)" rel="noopener noreferrer"&gt;garbage collection&lt;/a&gt; (GC) is quite intricate. It’s hard to intuitively grasp when file states change and resources are truly released.&lt;br&gt;&lt;br&gt;
To pull back the curtain, this article will systematically break down the key processes and underlying logic of GC, explain the roots of common issues, and offer practical advice for efficient cleanup and system maintenance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background knowledge: JuiceFS file storage structure
&lt;/h2&gt;

&lt;p&gt;JuiceFS handles file deletion through an asynchronous mechanism. It responds to user delete requests instantly but postpones the actual cleanup for later, when the system is not busy. This approach smooths out workload spikes, reduces system load, and enhances system stability. Furthermore, this phased cleanup strategy offers greater flexibility for potential data recovery.  &lt;/p&gt;

&lt;p&gt;This efficient design is rooted in &lt;a href="https://juicefs.com/docs/community/architecture/" rel="noopener noreferrer"&gt;JuiceFS' underlying architecture&lt;/a&gt;. It employs a separation of metadata and data storage, and manages file data in chunks. When you delete a file, the system doesn't immediately remove the actual data. Instead, it only adds a deletion marker in the metadata layer. The real data reclamation happens later in an asynchronous process.  &lt;/p&gt;

&lt;p&gt;To fully grasp this asynchronous deletion logic, we need to understand how JuiceFS organizes data internally. The system manages file data through chunks, slices, and blocks. This structure not only dictates file performance during storage and access but also directly influences how data is located, referenced, and cleaned up during deletion.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chunks: Each file consists of one or more chunks, each with a maximum size of 64 MB. Regardless of file size, all read and write operations locate the corresponding chunk based on offset. This boosts retrieve and access efficiency.
&lt;/li&gt;
&lt;li&gt;Slices: This is the unit of writing within a chunk. Each continuous write operation generates one slice. A slice must reside entirely within a single chunk, so its size cannot exceed 64 MB. The process of writing a file typically produces multiple slices.
&lt;/li&gt;
&lt;li&gt;Blocks: To improve write efficiency to object storage, each slice is further split into multiple blocks (default maximum size is 4 MB) before being written. Multi-threaded concurrent writing is used to improve throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flrfhlvlwwok4r1eaw7we.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flrfhlvlwwok4r1eaw7we.png" alt=" " width="719" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS GC process
&lt;/h2&gt;

&lt;p&gt;Now that we've covered the background, let's walk through the main file deletion workflow. JuiceFS' deletion mechanism can be divided into three key processes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trash management
&lt;/li&gt;
&lt;li&gt;File deletion processing
&lt;/li&gt;
&lt;li&gt;Underlying slice cleanup&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Trash management
&lt;/h3&gt;

&lt;p&gt;The file deletion journey begins with a user's delete request. When an upper-layer interface initiates a file deletion, the system doesn't remove the file immediately. Instead, it moves the file to the &lt;a href="https://juicefs.com/docs/community/security/trash/" rel="noopener noreferrer"&gt;trash&lt;/a&gt; (enabled by default). In this step, only the pointer from the file's parent directory changes; the file's other metadata and its actual content remain unchanged.&lt;br&gt;&lt;br&gt;
Once a file stays in the trash beyond its configured retention period, a background cleanup task or a manual operation converts it into a "pending deletion" state, pushing it into the subsequent cleanup pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi0t3jpeua0rbk160lxu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi0t3jpeua0rbk160lxu.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To better understand and manage files in the trash, let's look at its directory structure and operation.&lt;br&gt;&lt;br&gt;
The trash resides in the &lt;code&gt;.trash&lt;/code&gt; directory under the mount point. Deleted files are saved into subdirectories named based on the deletion time (accurate to the hour), for example, &lt;code&gt;.trash/2024-01-15-14/&lt;/code&gt;. To avoid file name conflicts, files moved to the trash are automatically renamed following this pattern: &lt;code&gt;&amp;lt;ParentDirID&amp;gt;_&amp;lt;FileID&amp;gt;_&amp;lt;OriginalFileName&amp;gt;&lt;/code&gt;.&lt;br&gt;&lt;br&gt;
The system also provides a flexible recovery mechanism for files in the trash. See &lt;a href="https://juicefs.com/docs/community/security/trash/" rel="noopener noreferrer"&gt;Trash&lt;/a&gt; for details. Files in the trash aren't kept forever. JuiceFS cleans them up in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic cleanup:&lt;/strong&gt; A background task (&lt;code&gt;cleanupTrash&lt;/code&gt;) runs hourly. It gradually cleans files based on the set retention period, marking them for deletion.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual cleanup:&lt;/strong&gt; You can use the OS &lt;code&gt;rm&lt;/code&gt; command or the &lt;code&gt;juicefs rmr&lt;/code&gt; tool to directly delete files or directories from the trash. Manual cleanup typically reclaims space faster than waiting for the background task, but the process is asynchronous. The actual deletion of objects from storage is handled by subsequent tasks, which will be covered later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Furthermore, trash cleanup performance is constrained by the processing capacity of a single client. When dealing with massive file deletions, consider mounting JuiceFS on multiple clients and performing manual cleanup simultaneously to significantly boost the overall efficiency of large-scale deletion operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  File deletion
&lt;/h3&gt;

&lt;p&gt;In this stage, the system first checks if the file is still in use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the file hasn't been closed (for example, it's still held open by a process), it's placed in a &lt;code&gt;sustained&lt;/code&gt; state, waiting for all references to be released before re-entering the cleanup workflow.
&lt;/li&gt;
&lt;li&gt;If the file is closed, the system marks it as a deletable file (internal state &lt;code&gt;delfile&lt;/code&gt;), indicating it's ready for the deletion queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, the system attempts to add these deletable files (&lt;code&gt;delfile&lt;/code&gt;) to the file deletion queue. This queue acts as a buffer for files awaiting cleanup. To prevent impacting system performance, the queue has a limited capacity. If the queue is full, deletion requests for regular files are temporarily skipped, letting the background cleanup task handle them later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cxd145aqcep1yr5g22j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cxd145aqcep1yr5g22j.png" alt=" " width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For files in the trash, if a user performs a manual delete operation while the deletion queue is full, that operation will block synchronously until space becomes available in the queue. This ensures manual deletions are queued within the current process and won't be skipped.&lt;br&gt;&lt;br&gt;
To guarantee all deletable files are eventually processed, JuiceFS provides a background task named &lt;code&gt;cleanupDeletedFiles&lt;/code&gt; that runs hourly. This task scans for files still marked as &lt;code&gt;delfile&lt;/code&gt; and adds them into the deletion queue in batches, ensuring the cleanup pipeline keeps moving.&lt;br&gt;&lt;br&gt;
Once a file enters the deletion queue, the system scans all its associated chunks. For each chunk, it decrements the reference count for every contained slice and then cleans up the chunk metadata. Slices whose reference count drops to zero are marked as pending cleanup, entering the slice cleanup stage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice cleanup and space reclamation
&lt;/h3&gt;

&lt;p&gt;In the previous stage, the reference counts for all slices associated with the deleted file were updated. Now, the system checks the current reference count for these slices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If it's still greater than 0, the slice is still being used by another file and won’t be cleaned up.
&lt;/li&gt;
&lt;li&gt;If the reference count has reached 0, it means the slice is completely obsolete, and the system adds it to the slice deletion queue for the final cleanup step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A scheduled background task, &lt;code&gt;cleanupSlices&lt;/code&gt;, periodically scans this queue and performs batch deletion. Users can also trigger a manual GC to actively remove these invalid slices. Slices entering this workflow are precisely located down to their physical blocks in the &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt;. The actual delete operations are performed, finally freeing up the storage space.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr74uxg21v6iufi915loz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr74uxg21v6iufi915loz.png" alt=" " width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's important to note that the slice deletion queue itself is also subject to concurrency and capacity limits. You can use the &lt;code&gt;--max-deletes&lt;/code&gt; parameter to adjust the number of concurrent threads. This controls how many delete operations happen simultaneously and thus balances resource usage and cleanup speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compaction
&lt;/h2&gt;

&lt;p&gt;The previous sections covered the complete file deletion and reclamation process. However, JuiceFS deals with another type of "hidden garbage" – redundant or useless file fragments. Because files are stored in fixed-size chunks, and each chunk contains multiple slices, frequently overwriting or performing random writes to the same file can leave the slices within certain chunks scattered and non-contiguous. This is known as fragmentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4abbxk27aidaakdw2x6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4abbxk27aidaakdw2x6.png" alt=" " width="536" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While fragmentation doesn't prevent files from being read correctly, it has bad effects on several performance and resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read amplification:&lt;/strong&gt; Reading a single chunk might require putting many dispersed slices together. This increases I/O operation complexity.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increased resource usage:&lt;/strong&gt; More fragments mean more metadata and potentially more object storage space is consumed. This strains the system and reduces overall performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To combat fragmentation, JuiceFS provides a &lt;a href="https://juicefs.com/docs/cloud/guide/background-job/#compaction" rel="noopener noreferrer"&gt;compaction&lt;/a&gt; feature. It identifies and merges redundant or scattered slices within chunks, reconstructing them into larger, more contiguous data blocks. This significantly reduces the number of fragments and improves storage efficiency. JuiceFS supports two triggering mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Automatic triggering:&lt;/strong&gt; During file read/write operations, the system monitors the number of slices in each chunk. When the slice count for a chunk reaches a predefined threshold, the system automatically starts an asynchronous compaction task. This process involves: &lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Applying for a new slice ID at the metadata layer to build the new, contiguous data.  
2. Reading the data from the old, scattered slices from object storage and writing it into the new, contiguous block. 
3. Decrementing the reference count for the old slices. If the trash feature is enabled, these old slices are marked for delayed processing, handed over to background tasks for reference count updates and eventual cleanup.  
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Manual triggering:&lt;/strong&gt; Users can use the &lt;code&gt;juicefs compact&lt;/code&gt; command to proactively scan specified paths and perform concurrent defragmentation. The &lt;code&gt;--threads&lt;/code&gt; parameter controls the number of concurrent threads, allowing you to adjust based on system load to balance performance and resource usage. Check the &lt;a href="https://juicefs.com/docs/community/command_reference/" rel="noopener noreferrer"&gt;command document&lt;/a&gt; for detailed usage.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Compact one or multiple paths; concurrency can be specified. &lt;/span&gt;
juicefs compact /mnt/jfs/pathA /mnt/jfs/pathB &lt;span class="nt"&gt;--threads&lt;/span&gt; 16 
&lt;span class="c"&gt;# Shorthand &lt;/span&gt;
juicefs compact /mnt/jfs/path &lt;span class="nt"&gt;--p&lt;/span&gt; 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To help users monitor the status of compaction handling, JuiceFS provides metrics viewable via the &lt;code&gt;juicefs status&lt;/code&gt; command. For details, see the next section.&lt;/p&gt;

&lt;h2&gt;
  
  
  GC tools principles and usage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The GC tool
&lt;/h3&gt;

&lt;p&gt;The GC tool in JuiceFS is an auxiliary utility for manual cleanup, supporting modes like check-only (dry-run) and scan-and-clean. Its core principle involves scanning and comparing metadata with the actual contents of the object storage. It identifies and processes objects that are no longer referenced or managed by the metadata, ensuring data consistency and reclaiming underlying space occupied by invalid data.&lt;br&gt;&lt;br&gt;
In certain corner cases or due to operational errors, data segments might exist in the object storage that are detached from JuiceFS' metadata management and create what's known as "object storage leaks." While rare, the GC tool can scan for and remove these. Beyond that, GC can also clean up invalid records within the metadata, such as files or slice information that wasn't fully removed during the standard cleanup process, and can accelerate various stages of the GC process, like manual compaction or manual cleanup of &lt;code&gt;delfile&lt;/code&gt; files.&lt;br&gt;&lt;br&gt;
When running, the GC command displays a progress bar. It allows you to monitor the task's status in real time. The status items shown in the progress bar correspond directly to the concepts illustrated in the earlier figures, helping you better understand the GC workflow.&lt;br&gt;&lt;br&gt;
Some users have expressed confusion about the meaning of these status items, especially in contexts like resource billing. Correlating the figures with these status messages can greatly enhance comprehension of the entire GC mechanism's operation.&lt;br&gt;&lt;br&gt;
GC metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pending deleted files/data: Number of files / amount of data awaiting deletion.
&lt;/li&gt;
&lt;li&gt;Cleaned pending files/data: Number of files / amount of data deleted by this GC run.
&lt;/li&gt;
&lt;li&gt;Listed slices: Total number of slices in the file system.
&lt;/li&gt;
&lt;li&gt;Trash slices: Slices within the trash.
&lt;/li&gt;
&lt;li&gt;Cleaned trash slices/data: Number of trash slices / amount of data cleaned by this GC run.
&lt;/li&gt;
&lt;li&gt;Scanned objects: Total objects in object storage.
&lt;/li&gt;
&lt;li&gt;Valid objects/data: Valid objects / amount of data in object storage.
&lt;/li&gt;
&lt;li&gt;Compacted objects/data: Objects / amount of data after being processed by compaction.
&lt;/li&gt;
&lt;li&gt;Leaked objects/data: Leaked objects / amount of data.
&lt;/li&gt;
&lt;li&gt;Skipped objects/data: Objects / amount of data skipped during this GC run.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Status metrics
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;juicefs status&lt;/code&gt; displays various &lt;a href="https://juicefs.com/docs/community/administration/status_check_and_maintenance/#status" rel="noopener noreferrer"&gt;status information&lt;/a&gt;. Metrics relevant to GC are listed below. These metrics correspond to different stages of the cleanup process and serve as crucial evidence for judging the system's cleanup progress and task status. If you notice certain fragments continuously increasing, or the background processing can't keep up, consider using manual cleanup methods to accelerate the process to prevent system performance degradation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trash files: Files in the trash.
&lt;/li&gt;
&lt;li&gt;Pending deleted files: Files marked for deletion.
&lt;/li&gt;
&lt;li&gt;To be sliced: Fragmentation tasks pending compaction.
&lt;/li&gt;
&lt;li&gt;Trash slices: Hidden slices in the trash (resulting from compaction), potentially usable for recovery.
&lt;/li&gt;
&lt;li&gt;Pending deleted slices: Slices with zero reference count, waiting for cleanup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Operational suggestions
&lt;/h3&gt;

&lt;p&gt;Finally, here are some fundamental yet practical operational suggestions to help manage JuiceFS system resources more efficiently and reduce operational risks in daily use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set an appropriate trash retention period based on your specific application needs, balancing data recoverability against storage cost. For systems with frequent file turnover or high storage cost sensitivity, consider shortening the retention period to minimize space waste.
&lt;/li&gt;
&lt;li&gt;It's recommended that operators periodically use tools like &lt;code&gt;juicefs status&lt;/code&gt; to monitor key cleanup metrics, such as pending deleted files, delayed slice count, and trash space usage. This helps identify potential cleanup lag or resource buildup early.
&lt;/li&gt;
&lt;li&gt;If automatic background tasks are disabled for your system, schedule regular execution of GC or related cleanup commands to remove invalid residues from metadata and object storage. This prevents long-term performance issues and resource waste.
&lt;/li&gt;
&lt;li&gt;Schedule all cleanup tasks to run during off-peak application hours whenever possible to minimize interference with normal read/write operations and ensure overall system stability. By adopting this approach, users can establish a more predictable rhythm for managing the deletion and reclamation lifecycle, achieving ongoing optimization of resource utilization efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions/" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and community on &lt;a href="https://go.juicefs.com/slack/" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
    </item>
    <item>
      <title>Building AI Inference with JuiceFS: Supporting Multi-Modal Complex I/O, Cross-Cloud, and Multi-Tenancy</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 24 Oct 2025 07:40:50 +0000</pubDate>
      <link>https://dev.to/daswu/building-ai-inference-with-juicefs-supporting-multi-modal-complex-io-cross-cloud-and-44mk</link>
      <guid>https://dev.to/daswu/building-ai-inference-with-juicefs-supporting-multi-modal-complex-io-cross-cloud-and-44mk</guid>
      <description>&lt;p&gt;The demand for inference services is rapidly growing in the market and has become a key application of widespread interest across various industries. With the continuous enhancement of model inference capabilities through techniques like &lt;a href="https://en.wikipedia.org/wiki/Chain_of_thought" rel="noopener noreferrer"&gt;chain-of-thought&lt;/a&gt; and slow thinking, their application scope keeps expanding, holding significant importance for enterprises deploying AI infrastructure.&lt;/p&gt;

&lt;p&gt;Inference services impose higher requirements on storage systems. &lt;strong&gt;There is a need not only for high-performance, low-latency data access capabilities but also for features like elastic scaling, resource isolation, and multi-cloud deployment&lt;/strong&gt;. Choosing the right storage solution has become a crucial issue enterprises must address when building AI infrastructure. &lt;a href="https://s.juicefs.com/docs/community/introduction" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; has been widely adopted in various inference service scenarios, serving different types of customers including foundational model platforms, multi-modal applications, and enterprise-grade AI services. Based on in-depth analysis of these practical cases, we’ve summarized a series of representative storage requirements and challenges.&lt;/p&gt;

&lt;p&gt;This article is divided into two sections:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first section focuses on the storage access patterns in inference services, combining JuiceFS application practices to outline key requirements and common problems in inference scenarios.&lt;/li&gt;
&lt;li&gt;The second section compares the advantages and limitations of several mainstream storage solutions, helping users make more informed selection decisions when building inference systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage requirements of inference services
&lt;/h2&gt;

&lt;p&gt;JuiceFS is already widely used in inference scenarios. Through collaboration with customers, we’ve identified several typical storage requirements for &lt;a href="https://www.ibm.com/think/topics/ai-inference" rel="noopener noreferrer"&gt;inference&lt;/a&gt; workloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More complex multi-modal I/O patterns&lt;/strong&gt;: Inference is rapidly evolving towards multi-modal directions, involving extensive small file management, mmap-style random reads, and other access patterns. These scenarios are extremely sensitive to - I/O throughput and latency, directly impacting model response speed and user experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-native deployment needs&lt;/strong&gt;: Inference traffic typically has distinct peaks and troughs, requiring the underlying storage to quickly respond to sudden requests and coordinate with cloud-native scheduling systems to achieve scaling within minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear trend towards &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt;/multi-region architectures&lt;/strong&gt;: To enhance service availability and disaster recovery capabilities, more customers deploy inference services across multiple clouds or regions. The storage system needs to possess strong cross-region consistency and cost-control capabilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Emphasis on both resource sharing and isolation&lt;/strong&gt;: Inference tasks prioritize resource reuse, especially in clusters shared by multiple tenants or application lines. Balancing effective isolation and data security with high resource utilization becomes fundamental for stable system operation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Model loading performance challenges
&lt;/h3&gt;

&lt;p&gt;We can analyze file access requirements through the various stages of an inference service. The main stages of an inference service include:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17kyvkd54o74j2bgra2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17kyvkd54o74j2bgra2p.png" alt=" " width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As mentioned, file access requirements are different at each stage. I/O bottlenecks are primarily concentrated during the &lt;a href="https://en.wikipedia.org/wiki/Cold_start_(computing)" rel="noopener noreferrer"&gt;cold start&lt;/a&gt; phase (model loading) and runtime model switching, especially concerning how to efficiently read model files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution 1: Packaging model files into container images&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In small-scale inference scenarios, many users package model files, code dependencies, and extension packs into container images and start them via Kubernetes. While this method was once simple, it exposes several issues as inference scale increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Oversized image files&lt;/strong&gt;: As model files grow larger, container images also become massive, leading to timeouts during image pulls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow extraction&lt;/strong&gt;: Container images use layered storage; unpacking each layer is a serial operation. With large models, this unpacking process becomes slow and inefficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource waste&lt;/strong&gt;: If multiple inference services need the same model, sharing the model efficiently is difficult. This leads to frequent download and unpack operations, wasting significant resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent model switching&lt;/strong&gt;: This approach is even less efficient in scenarios requiring frequent model changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Solution 2: Storing model files in object storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each inference instance fetches model files from &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; upon startup. While this avoids image pull timeouts, it still faces challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth competition&lt;/strong&gt;: In large-scale clusters, when multiple inference instances pull models simultaneously, bandwidth contention can occur. This prolongs startup times and affects user experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inefficiency&lt;/strong&gt;: Many model file formats (like Safetensors) use mmap for lazy loading. However, object storage doesn't support mmap. This forces the system to download the file locally before reading. This wastes local storage space and reduces read efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Solution 3: Using a high-performance file system like JuiceFS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Compared to object storage and container images, storing model files in a high-performance file system is a more effective solution. JuiceFS, as an implementation of this solution, offers the following advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On-demand configuration: Unlike traditional high-performance file systems, JuiceFS allows flexible configuration of performance and capacity independently, avoiding the tight coupling of performance and capacity.&lt;/li&gt;
&lt;li&gt;Multi-cloud data distribution capability: Traditional cloud file systems often lack support for efficient data distribution across cloud environments. JuiceFS possesses cross-cloud distribution capabilities, enabling efficient data distribution across multiple cloud environments, enhancing flexibility. We’ll explain later how JuiceFS is used in multi-cloud architectures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The figure below shows the improvement in the model file loading process after introducing JuiceFS. When building the container image, we separate the model files from the image, storing them in JuiceFS, while the container image only contains the application code and the basic runtime environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtwq55wy7eljcvikdcaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtwq55wy7eljcvikdcaj.png" alt=" " width="800" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach brings significant benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model files are decoupled from the image cold start process. This avoids the need to unpack large model layers locally.&lt;/li&gt;
&lt;li&gt;Compared to the object storage solution, during cold start with JuiceFS, there's no need to download the complete model file to the local node, nor load the entire model file (such as Safetensors files). Inference instances can complete cold starts quickly. This on-demand loading mode is particularly critical in elastic scaling scenarios, ensuring new instances start rapidly and begin serving requests promptly. Although the first inference might experience slight latency due to on-demand model loading, the speed for subsequent inference requests is significantly improved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond these optimizations, JuiceFS further enhances inference service performance through its powerful caching capabilities. In addition to the local cache functionality provided in the Community Edition, &lt;a href="https://s.juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition &lt;/a&gt;further offers distributed cache functionality. This allows the construction of large-scale cache pools, centrally storing frequently used model files in the cache, thereby significantly boosting read performance.&lt;/p&gt;

&lt;p&gt;Especially in scenarios involving large-scale instance startups or scaling, such as applications like Stable Diffusion that require frequent model switching, the distributed cache cluster can substantially reduce model loading times. This markedly improves the front-end user experience. The figure below shows the &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache/" rel="noopener noreferrer"&gt;JuiceFS distributed cache&lt;/a&gt; architecture. After deploying the inference service, we mount JuiceFS to the inference nodes. When a node reads a model file, it first attempts to retrieve the data from the cache cluster.&lt;/p&gt;

&lt;p&gt;Since the cache cluster consists of high-speed media and communication occurs over high-speed internal networks, data retrieval performance is excellent. On a cache miss, the respective cache node fetches the data from the object storage, returns it to the node, and fills the cache. We can also use the &lt;code&gt;warmup&lt;/code&gt; command to pre-fill the cache, loading the required model files into the cache layer before nodes start. When inference instances start, all instances hit the model files directly from the cache cluster. This eliminates the need for each instance to pull files individually from object storage and thus avoids bandwidth competition issues on the object storage during high-concurrency file pulls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fno31b2llrq67qtccoggt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fno31b2llrq67qtccoggt.png" alt=" " width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-region model file synchronization and consistency management
&lt;/h3&gt;

&lt;p&gt;An increasing number of enterprises are adopting multi-cloud architectures to deploy inference services, flexibly scheduling resources across different cloud providers to optimize cost structures. As shown in the figure below, in a typical inference service deployment case, a customer built infrastructure spanning multiple cloud service providers and data centers to support high-concurrency, low-latency inference requests. A multi-cloud architecture not only effectively mitigates the risk of application interruption from &lt;a href="https://en.wikipedia.org/wiki/Single_point_of_failure" rel="noopener noreferrer"&gt;single points of failure&lt;/a&gt; but also supports dynamically selecting the most suitable compute nodes based on factors like regional network conditions, resource pricing, and computing power, thus achieving optimal cost control while ensuring performance.&lt;/p&gt;

&lt;p&gt;To ensure consistency of inference models and configuration files across different regions, customers typically use object storage in a primary site as a unified model repository, centrally managing model versions and configurations. During the inference task scheduling process, the system can select suitable computing resources based on availability, while model files need to be deployed as close as possible to the computing resources or within high-throughput network environments to improve cold start efficiency and service response speed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0o3148juxxltzs99by4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0o3148juxxltzs99by4.png" alt=" " width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In traditional practice, model files often required manual migration to the scheduled computing cluster. As models iterate frequently, this leads to cumbersome operations and potential data inconsistency issues. Furthermore, inference clusters are typically large-scale. During cold start, concurrent model pulling by nodes can cause bandwidth contention. This creates performance bottlenecks, especially in multi-cloud architectures where model distribution and loading become a significant challenge.&lt;/p&gt;

&lt;p&gt;JuiceFS addresses this by using a &lt;a href="https://juicefs.com/docs/cloud/guide/mirror/" rel="noopener noreferrer"&gt;mirror file system&lt;/a&gt; to achieve real-time metadata synchronization from the primary site to various mirror sites. Clients local to each site see a unified file system view. For data files, it offers two schemes regarding whether to perform full synchronization or to cache and load on demand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous replication&lt;/strong&gt;: it copies the object storage data completely to multiple sites. Each site prioritizes reading from its local object storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploying only a distributed cache layer at the mirror sites without persistent data storage.&lt;/strong&gt; Required data is warmed up into the cache on demand, thereby improving local read performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjk9macv8ten85ejnd5p0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjk9macv8ten85ejnd5p0.png" alt=" " width="800" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In large-scale inference services, considering cost-effectiveness, users often choose the second scheme, deploying distributed cache locally within the site. A pipeline task is added before node startup and scaling to warm up the required model files in one go into the local site's cache cluster where the computing resources are allocated. This stage essentially performs a single high-concurrency pull of the model files from the object storage.&lt;/p&gt;

&lt;p&gt;The benefit is that when a large number of inference services perform cold start simultaneously, they only need to concurrently read the model files from the local domain's cache cluster, without repeatedly pulling from the object storage over the dedicated line. This solves the problem of dedicated line bandwidth bottlenecks caused by high-concurrency repeated pulls from object storage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6kmdaf3c02tnnc63u3f4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6kmdaf3c02tnnc63u3f4.png" alt=" " width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model version iterations can also be handled through warm-up scripts, pre-loading new versions into the cache while removing expired versions. The entire data distribution and cache warming process is fully handled by the JuiceFS mirror file system feature. This is transparent to the user, avoiding a significant amount of manual data copy work.&lt;br&gt;
The benefits brought by the &lt;a href="https://juicefs.com/docs/cloud/guide/mirror/" rel="noopener noreferrer"&gt;JuiceFS Mirror Filesystem&lt;/a&gt; include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost reduction&lt;/strong&gt;: Each mirror site only needs to deploy an appropriately sized cache cluster, avoiding the need for excessive capacity configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency improvement&lt;/strong&gt;: The distributed cache uses local high-speed media, providing excellent throughput performance. The network is the local region's internal network, offering good performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ease of use&lt;/strong&gt;: No manual intervention in data copying. The unified data view automatically handles data distribution for the user. Even on a cache miss, data is automatically fetched from the object storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability and stability&lt;/strong&gt;: Resolves bandwidth competition issues during concurrent model file pulls by multiple nodes, enhancing system performance and user experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud-native support and cluster management
&lt;/h3&gt;

&lt;p&gt;Inference services typically have bursty and intermittent load characteristics, so they demand high resource elasticity. As the scale of inference services expands, the number and node scale of compute clusters also increase. It often requires scheduling across clusters and regions. This demands that the storage system can easily support deployment in multi-cluster environments.&lt;/p&gt;

&lt;p&gt;Furthermore, in elastic environments, the file system needs to ensure transparency to the application during resource expansion, upgrades, or failure recovery, without affecting the normal operation of applications. This means the file system must possess capabilities like automatic failure recovery, resource isolation, and seamless upgrades to ensure application continuity and stability.&lt;/p&gt;

&lt;p&gt;JuiceFS provides a range of capabilities to support inference services in cloud-native environments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JuiceFS offers a standard Kubernetes &lt;a href="https://juicefs.com/docs/csi/introduction/" rel="noopener noreferrer"&gt;CSI Driver&lt;/a&gt;. Users can conveniently use PVCs to access the JuiceFS file system in Kubernetes environments, supporting both static and dynamic provisioning.
One of its advantages is multi-environment compatibility: to match the resource elasticity needs of inference services, JuiceFS offers multiple operational modes suitable for both standard and serverless Kubernetes environments. Regardless of the elastic load type used by the inference service, JuiceFS deployment is simple and easy.&lt;/li&gt;
&lt;li&gt;In terms of resource optimization, JuiceFS supports multiple applications using the same PVC to share the mounted volume and allows fine-grained definition of resources. This enables optimized resource utilization and isolation between tenants to ensure different tenants do not interfere with each other when using different PVs.&lt;/li&gt;
&lt;li&gt;For stability optimization, JuiceFS supports automatic recovery of mount points and smooth upgrades of &lt;a href="https://juicefs.com/en/blog/usage-tips/mount-pod-smooth-upgrade" rel="noopener noreferrer"&gt;Mount Pods&lt;/a&gt;. When a Mount Pod fails and needs restarting, or when an upgrade of the CSI Driver or Mount Pod image is required, the application Pod does not need to restart and can continue running normally.&lt;/li&gt;
&lt;li&gt;Regarding automated management, JuiceFS provides various Operators, including features for cache group management, warm-up, and data synchronization. Users can conveniently manage these resources via CRDs, enabling automated operations.&lt;/li&gt;
&lt;li&gt;For visualized monitoring, JuiceFS offers a complete visual management platform, supporting monitoring for both Mount Pods and application Pods. Users can also perform operations like smooth upgrades on the platform, all manageable through a unified interface.&lt;/li&gt;
&lt;li&gt;Many users integrate the Karmada multi-cluster management framework with JuiceFS CSI Driver to achieve unified management of resource scheduling and storage configuration in multi-cluster environments. Thanks to the simplicity of the JuiceFS architecture, the only background service that needs independent deployment is the metadata service. All other features are implemented within the client. Only the client needs deployment on the computing nodes. In complex multi-K8s cluster scenarios, users can use a multi-cluster management control plane like Karmada to distribute JuiceFS-related CRDs to different clusters according to policies. This enables simple storage volume deployment and efficient operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys8i9qmz6g8w51q8lvew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys8i9qmz6g8w51q8lvew.png" alt=" " width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2For5suddcotzqjz1gm32w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2For5suddcotzqjz1gm32w.png" alt=" " width="800" height="505"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqja6h3k3scozwchyjqp7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqja6h3k3scozwchyjqp7.png" alt=" " width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-tenancy and data isolation
&lt;/h3&gt;

&lt;p&gt;In AI inference service scenarios, the demand for multi-tenant data isolation is particularly pressing. On one hand, for AI foundational platforms and AI applications serving external users, requirements for data security and isolation are high. On the other hand, large enterprises also need data isolation and &lt;a href="https://en.wikipedia.org/wiki/Access_control" rel="noopener noreferrer"&gt;access control&lt;/a&gt; between different internal departments and project teams. Overall, requirements in multi-tenant environments mainly include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data isolation &amp;amp; access control&lt;/strong&gt;: Data needs to be isolated between different tenants, with user-specific access permission management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://juicefs.com/docs/community/guide/quota/" rel="noopener noreferrer"&gt;Storage quota management&lt;/a&gt;&lt;/strong&gt;: Storage capacity needs to be allocated and managed for different tenants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource isolation and performance management&lt;/strong&gt;: Different tenants may require independent resource isolation, with managed and optimized performance (QoS).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address these needs, JuiceFS provides three data isolation schemes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheme 1: Isolation via independent file systems&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This method allocates different JuiceFS file systems to different tenants for strong data isolation. It means different tenants use their own separate JuiceFS file systems. If further resource and performance isolation is required, the metadata services for the file systems and the associated object storage buckets can be completely isolated. This scheme is generally applicable to tenant environments in AI foundational platforms serving business customers (toB). These tenants typically have large-scale application requirements and high demands for service quality. This scheme fully ensures the tenant's data access performance is unaffected by load from other tenants sharing resources. Furthermore, data is physically completely isolated, providing the highest level of security assurance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs35nvopla7ogitasc9k8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs35nvopla7ogitasc9k8.png" alt=" " width="800" height="722"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheme 2: Subdirectory isolation within the same file system&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This scheme allocates different subdirectories on the same file system to different tenants. Subdirectories are invisible to each other. Access control for mounting subdirectories can be enforced using JuiceFS' tokens, strictly ensuring data isolation between subdirectories. This scheme is more commonly used in tenant environments like AI foundational platforms serving consumers (toC), or for data isolation needs between different project teams within an enterprise.&lt;/p&gt;

&lt;p&gt;In K8s environments, JuiceFS CSI Driver, by default, allocates different subdirectories on the file system for dynamically provisioned PVCs, and different PVCs do not reuse the JuiceFS client by default. This achieves data isolation for different applications while allowing separate resource control and parameter optimization for each application client. Of course, in non-K8s environments, data security can also be ensured by using tokens to control client mounting permissions for subdirectories. In this scheme, the chunk data in object storage is not isolated, but these JuiceFS chunks cannot be read directly as original file content from the object storage, so there is no risk of data content leakage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69j2p2xuzf8u4xfbh06s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69j2p2xuzf8u4xfbh06s.png" alt=" " width="800" height="1265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheme 3: Fine-grained access control based on POSIX ACL&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Based on &lt;a href="https://www.redhat.com/en/blog/linux-access-control-lists" rel="noopener noreferrer"&gt;Linux ACL&lt;/a&gt;, JuiceFS Enterprise Edition extends group permission control capabilities. Each file can have permissions set separately for different groups. Combined with the UID/GID mapping feature, consistent ACL permission management can be achieved even across different nodes, as long as the usernames and group names are consistent. This scheme is suitable for small-scale users requiring fine-grained access control over individual directories or files. It has high configuration complexity and limited use cases.&lt;/p&gt;

&lt;p&gt;Furthermore, JuiceFS Enterprise Edition provides two key features supporting multi-tenant management:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client access tokens: Used to authorize client access to the file system. It can control access IP ranges, read/write permissions, subdirectory mount permissions, and permissions for clients to perform background tasks. It can also limit object storage access traffic, implementing quota or rate limiting management.&lt;/li&gt;
&lt;li&gt;Quota management: Allows allocation of storage capacity and file count quotas for each subdirectory. Combined with the subdirectory data isolation scheme, this enables precise control over storage resource usage for each tenant.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison of different storage solutions for inference services
&lt;/h2&gt;

&lt;p&gt;In practical applications for inference services, different types of storage solutions have significant differences in performance, stability, scalability, and cost. The following section analyzes and compares several typical categories to help enterprises evaluate their suitability more comprehensively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Object storage&lt;/strong&gt; is a low-cost solution suitable for scenarios involving sequential reads of large files. However, its performance is poor when handling small files, model files (like Safetensors format), and &lt;a href="https://www.tiledb.com/multimodal-data" rel="noopener noreferrer"&gt;multi-modal data&lt;/a&gt; frequently used in inference tasks. When multiple inference nodes read data in parallel, the bandwidth of object storage can easily be exhausted. This leads to performance bottlenecks. While it supports eventual consistency and nearly unlimited storage capacity, it lacks optimization for high concurrency and small file access and does not support automated data distribution across heterogeneous multi-cloud environments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-performance file systems&lt;/strong&gt; offer higher performance, especially suitable for scenarios with high demands on computing resources and performance. They fully support &lt;a href="https://en.wikipedia.org/wiki/POSIX" rel="noopener noreferrer"&gt;POSIX&lt;/a&gt;-compliant frameworks and provide strong consistency. However, they often tightly couple performance and capacity, requiring large storage capacities to achieve optimal performance. While generally stable, because the client deploys in kernel space, a single client failure can potentially crash the entire system. Operational costs are also high.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pure cache systems + object storage is a compromise&lt;/strong&gt;, configuring &lt;a href="https://en.wikipedia.org/wiki/Distributed_cache" rel="noopener noreferrer"&gt;distributed cache&lt;/a&gt; to provide high throughput and low-latency read/write performance. However, their performance in scenarios involving massive small files and high-frequency metadata requests needs further validation. This type of solution is hard to guarantee strong data consistency in multi-cloud/multi-region architectures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JuiceFS + object storage&lt;/strong&gt; provides a multi-tier caching architecture, allowing flexible configuration of local and distributed cache. It can deliver high throughput and low-latency data read/write performance, particularly suitable for the high-concurrency, low-latency demands of inference services. It not only supports efficient metadata operations but also fully utilizes Ethernet bandwidth without relying on expensive proprietary hardware, ensuring excellent performance among similar systems. JuiceFS also possesses strong consistency and supports automated data distribution in multi-cloud environments, guaranteeing consistency across regions and clusters.&lt;/li&gt;
&lt;li&gt;From a scalability perspective, JuiceFS supports up to 100 billion files and TB/s level throughput, configures cache on-demand, and scales massively. Object storage is suitable for scenarios with virtually unlimited file count and capacity, but its bandwidth is limited by the cloud provider's offerings. The scalability of high-performance file systems is often constrained by the tight coupling of performance and capacity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The table below is a comparison summary of different storage solutions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tlkgcle9ldqlqcenuys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tlkgcle9ldqlqcenuys.png" alt=" " width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Compared to training tasks, inference services impose different and more stringent requirements on infrastructure. They especially emphasize low-latency response, high-concurrency access, resource isolation, and flexible scalability. Therefore, selecting an appropriate storage solution has become a key decision point for enterprises deploying AI inference systems.&lt;/p&gt;

&lt;p&gt;This article has outlined the core storage requirements and typical practical challenges in inference scenarios, covering extensive random access during model cold starts, high-concurrency reads of small files, multi-modal data processing, resource sharing among multiple tenants, and complex scenarios like cross-region multi-cloud deployment. Based on JuiceFS' practical implementation experience with numerous inference users, we’ve proposed a series of optimization strategies to address these challenges.&lt;/p&gt;

&lt;p&gt;The article also compared current mainstream storage solutions—including object storage, high-performance file systems, pure cache systems, and the architecture combining JuiceFS with object storage—helping users make more targeted selection judgments from multiple dimensions such as performance, stability, scalability, and cost.&lt;/p&gt;

&lt;p&gt;With its elastically scalable architecture, excellent I/O performance, and strong cloud-native compatibility, JuiceFS has become a stable choice for inference services in many enterprises. We hope this article can provide practical selection reference and architectural insights for more enterprises building or expanding their inference capabilities, assisting them in moving forward robustly in the increasingly complex landscape of AI applications.&lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Zelos Tech Manages Hundreds of Millions of Files for Autonomous Driving with JuiceFS</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 10 Oct 2025 07:00:48 +0000</pubDate>
      <link>https://dev.to/daswu/zelos-tech-manages-hundreds-of-millions-of-files-for-autonomous-driving-with-juicefs-2al5</link>
      <guid>https://dev.to/daswu/zelos-tech-manages-hundreds-of-millions-of-files-for-autonomous-driving-with-juicefs-2al5</guid>
      <description>&lt;p&gt;&lt;a href="https://www.crunchbase.com/organization/zelos-c72b" rel="noopener noreferrer"&gt;Zelos Tech&lt;/a&gt; is a high-tech company focused on &lt;a href="https://en.wikipedia.org/wiki/Self-driving_car" rel="noopener noreferrer"&gt;autonomous driving&lt;/a&gt; and unmanned delivery technologies, possessing independent R&amp;amp;D and large-scale deployment capabilities for autonomous driving systems and &lt;a href="https://en.wikipedia.org/wiki/Artificial_intelligence" rel="noopener noreferrer"&gt;AI&lt;/a&gt; chips. Our core products have been widely deployed in 200+ cities across China, with a vehicle sales market share exceeding 90%. We maintain a leading position in the Chinese L4 urban autonomous delivery sector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;As our application expanded rapidly, our data volume grew from terabytes to petabytes. The original Ceph-based storage solution faced high costs and maintenance pressures, while performance bottlenecks gradually emerged in handling small files, metadata, high concurrency, and latency. This impacted simulation and training efficiency&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Furthermore, accelerated algorithm iteration and cross-regional application deployment increased the frequency of data transfer and resource scheduling needs across &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt; environments. Challenges included dispersed data, high migration costs, and complex scheduling. Limited community support and slow response times for some storage tools increased the operational complexity.&lt;/p&gt;

&lt;p&gt;Facing these challenges, we needed to build a &lt;a href="https://en.wikipedia.org/wiki/Cloud-native_computing" rel="noopener noreferrer"&gt;cloud-native&lt;/a&gt; storage architecture with high cost-effectiveness, strong scalability, and easy maintainability. After evaluating solutions like Alluxio and CephFS, we chose JuiceFS as our unified storage infrastructure. &lt;strong&gt;Currently, we’ve fully migrated critical data—including production, simulation, training, and inference—to JuiceFS, establishing an efficient, flexible, and unified storage foundation that comprehensively supports massive data processing requirements for various autonomous driving scenarios&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this article, I’ll share the storage challenges we faced at Zelos Tech in autonomous driving R&amp;amp;D, why we chose JuiceFS over Alluxio and CephFS, and how we use it effectively in multi-cloud environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Autonomous driving training process and storage challenges
&lt;/h2&gt;

&lt;p&gt;We’re currently dedicated to the R&amp;amp;D of L4 autonomous driving technology, primarily focusing on urban intelligent delivery logistics. The training process for autonomous driving models generates vast amounts of data and involves complex processing steps. The basic steps are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data collection and upload: Data is collected from vehicles after calibration work and subsequently uploaded.&lt;/li&gt;
&lt;li&gt;Algorithm processing: The algorithm team extracts relevant data for model training or algorithm improvement and then passes the results to the simulation stage for scoring.&lt;/li&gt;
&lt;li&gt;Simulation verification and modification: If simulation fails, the process returns to the algorithm team for modification. If successful, it proceeds to the simulated environment verification stage.&lt;/li&gt;
&lt;li&gt;Test vehicle validation: After passing simulated environment verification, tests are conducted on test vehicles. If the test fails, the process returns to the modification stage. If successful, the model is released via &lt;a href="https://en.wikipedia.org/wiki/Over-the-air_update" rel="noopener noreferrer"&gt;over-the-air&lt;/a&gt; (OTA) technology.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Throughout this process, data volume increases dramatically. Each vehicle uploads 10+ gigabytes of data daily. As the vehicles grew, the total data volume reached petabytes. Efficiently extracting and utilizing this massive data, especially during &lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;model training&lt;/a&gt;, places extremely high demands on the storage system's performance, scalability, and stability.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To meet the data requirements of the entire autonomous driving R&amp;amp;D pipeline, we needed to establish a storage platform with the following characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-performance I/O&lt;/strong&gt;: Capable of supporting high-concurrency reads and low-latency access for massive data during training and simulation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic scalability&lt;/strong&gt;: Able to flexibly scale to petabyte-level or higher storage needs as the vehicle scale grows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-cloud compatibility&lt;/strong&gt;: Supporting unified access across multiple clouds and self-built environments, ensuring data transfer and consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ease of operation&lt;/strong&gt;: Providing simplified management and monitoring capabilities to reduce operational complexity and ensure long-term stability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-effectiveness&lt;/strong&gt;: Controlling overall storage costs while guaranteeing performance and stability, maximizing resource utilization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage selection: JuiceFS vs. Alluxio vs. CephFS
&lt;/h2&gt;

&lt;p&gt;We tried various storage solutions, including &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_alluxio/" rel="noopener noreferrer"&gt;Alluxio&lt;/a&gt;, JuiceFS, and &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/" rel="noopener noreferrer"&gt;CephFS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Alluxio uses master nodes for metadata management, which has a steep learning curve. It requires separate deployment of master and worker clusters, increasing operational complexity. We encountered issues like hangs and I/O exceptions when using the community edition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkf1nf7c26x2r8n4xkmid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkf1nf7c26x2r8n4xkmid.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Regarding CephFS, its metadata is stored in its own Ceph Metadata Server (MDS), while data is stored in RADOS. In contrast, JuiceFS supports various backend storage options (like S3 and OSS) and can use external databases (like Redis and TiKV) for metadata management. This results in a more flexible architecture.&lt;/p&gt;

&lt;p&gt;Furthermore, deploying and tuning CephFS is extremely complex, requiring deep involvement from specialized teams. When building our own Ceph cluster, we found that expanding OSDs and data rebalancing were time-consuming, small file write performance was poor, and tuning was difficult due to the complex architecture. Therefore, we abandoned this solution.&lt;/p&gt;

&lt;p&gt;The table below compares key features of JuiceFS and CephFS. For a detailed comparison, see &lt;a href="https://juicefs.com/docs/community/comparison/juicefs_vs_cephfs/" rel="noopener noreferrer"&gt;this document&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwitjocls2c7rzkq9kiya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwitjocls2c7rzkq9kiya.png" alt=" " width="800" height="847"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;JuiceFS enables seamless integration of various &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt; systems into local environments and supports multi-host concurrent reads/writes across platforms and regions. By separating data and metadata storage, it stores file data in object storage after splitting, while metadata is stored in databases like Redis, MySQL, PostgreSQL, TiKV, or SQLite. Users can flexibly choose based on actual scenarios and performance needs, greatly simplifying operational work.&lt;/p&gt;

&lt;p&gt;In comparison, JuiceFS excels in multiple aspects—particularly in high-concurrency reads of small files, where its performance meets our requirements. Thus, we fully migrated to JuiceFS about a year ago and have widely adopted it in our &lt;a href="https://en.wikipedia.org/wiki/Multicloud" rel="noopener noreferrer"&gt;multi-cloud&lt;/a&gt; architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS in multi-cloud environments
&lt;/h2&gt;

&lt;p&gt;JuiceFS adopts a storage &lt;a href="https://juicefs.com/docs/community/architecture/" rel="noopener noreferrer"&gt;architecture&lt;/a&gt; that decouples metadata and data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The metadata layer supports multiple database engines, such as Redis, MySQL, PostgreSQL, TiKV, and SQLite, allowing users to choose based on application scale and reliability needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The data layer uses mainstream object storage for seamless integration with diverse storage systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4cc9c0uf5w0rqwqf6wk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4cc9c0uf5w0rqwqf6wk.png" alt=" " width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Currently, our system is deployed across multiple cloud platforms like AWS, with JuiceFS serving as the core storage component. In different environments, we flexibly pair backend storage with metadata engines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In self-built data centers, we use &lt;a href="https://www.min.io/" rel="noopener noreferrer"&gt;MinIO&lt;/a&gt; as object storage with Redis for metadata management.&lt;/li&gt;
&lt;li&gt;In public clouds, we combine OSS with Redis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture enhances flexibility and has remained stable over a year of operation, fully meeting application needs with excellent availability and user experience.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; clusters, we deploy JuiceFS using its CSI driver, ensuring strong compatibility with Kubernetes. We use the official Helm charts to create and manage JuiceFS storage volumes, configuring charts tailored to different applications to connect with backend Redis and OSS storage.&lt;/p&gt;

&lt;p&gt;On the local node, we allocate NVMe SSDs for JuiceFS caching, mounted under the &lt;code&gt;/data&lt;/code&gt; directory. As a cache layer, this significantly improves read performance. Once data is cached, subsequent reads for the same file are served directly from the local NVMe disk, achieving high read/write efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS in the training platform
&lt;/h2&gt;

&lt;p&gt;Our training platform architecture consists of multiple layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The infrastructure layer includes storage, network, compute resources, and several data service instances and hardware devices.&lt;/li&gt;
&lt;li&gt;The upper layer is the containerization layer, built on a Kubernetes cluster, supporting various services. The platform provides deep GPU computing support, multiple development language environments, and mainstream deep learning frameworks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the &lt;a href="https://en.wikipedia.org/wiki/Deep_learning" rel="noopener noreferrer"&gt;deep learning&lt;/a&gt; platform, users can submit training jobs via Notebook or a VR interface. After submission, the system schedules and allocates resources through the Training Operator. For storage, we pre-provisioned storage resources based on Persistent Volume Claims (PVCs) and used JuiceFS to automatically associate and supply the underlying storage.&lt;/p&gt;

&lt;p&gt;We integrated JuiceFS into the Kubeflow &lt;a href="https://en.wikipedia.org/wiki/Machine_learning" rel="noopener noreferrer"&gt;machine learning&lt;/a&gt; platform for model training tasks. When creating a training job in the Notebook environment, the system automatically associates with the &lt;code&gt;StorageClass&lt;/code&gt; provided by the backend JuiceFS, enabling dynamic storage allocation and management. Simultaneously, a monitoring system deployed in the cluster observes storage performance in real time. Current monitoring shows read throughput about 200 MB/s, with low write requests—consistent with the read-heavy I/O pattern typical of our training and inference scenarios.&lt;/p&gt;

&lt;p&gt;During performance tuning, we referenced JuiceFS community users’ sharing and compared Redis and &lt;a href="https://tikv.org/" rel="noopener noreferrer"&gt;TiKV&lt;/a&gt; as metadata engines. Test results showed that TiKV significantly outperformed Redis in read-intensive scenarios. Therefore, we plan to gradually migrate the metadata engine for some training clusters to TiKV to further improve read efficiency.&lt;/p&gt;

&lt;p&gt;Currently, one of our storage buckets holds approximately 700 TB of data, comprising 600 million files. This includes a vast number of small files, typical for data organization in AI training tasks. In practice, JuiceFS has demonstrated excellent and stable performance under high-concurrency small file read/write operations, with no anomalies, fully meeting production demands.&lt;/p&gt;

&lt;p&gt;In simulation scenarios, the data scale has reached petabyte levels, with storage buckets primarily containing large files. The underlying storage resources rely on a cloud object storage solution, mainly used for centralized storage of simulation data. This storage solution has also proven stable in practice, supporting the continuous operation of large-scale simulation tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data synchronization across multi-cloud environments with JuiceFS
&lt;/h3&gt;

&lt;p&gt;To achieve data synchronization across multi-cloud environments, we deployed multiple dedicated lines between different cloud service providers and established cross-cloud network connectivity beforehand. For training data requiring consistency across clouds, we developed a synchronization tool that integrates the underlying &lt;code&gt;juicefs sync&lt;/code&gt; &lt;a href="https://juicefs.com/docs/community/guide/sync/" rel="noopener noreferrer"&gt;command&lt;/a&gt;. It efficiently syncs the same dataset to multiple cloud environments. Although cross-cloud mounting is supported, we do not recommend this method due to its high dependence on network stability. The core challenge lies in network reliability; synchronization processes are susceptible to network fluctuations and must be used cautiously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In key segments such as production, simulation, training, and inference, JuiceFS uses flexible metadata engine choices, diverse object storage backend options, and good compatibility with Kubernetes and Kubeflow to effectively support requirements for high-concurrency small file access, cross-cloud data flow, and performance scaling. Under large-scale data scenarios, JuiceFS operates stably. This significantly reduces operational complexity and overall costs, while meeting current application scale needs in terms of system scalability.&lt;/p&gt;

&lt;p&gt;Looking forward, with the gradual adoption of metadata engines like TiKV and continuous optimization of cross-cloud synchronization mechanisms, JuiceFS' overall performance and adaptability still have room for improvement. It will continue to provide sustained support for Zelos' massive data processing needs in autonomous driving research and development.&lt;/p&gt;

&lt;p&gt;If you have any questions for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>MLPerf Storage v2.0: JuiceFS Leads in Bandwidth Utilization and Scalability for AI Training</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Fri, 26 Sep 2025 06:48:18 +0000</pubDate>
      <link>https://dev.to/daswu/mlperf-storage-v20-juicefs-leads-in-bandwidth-utilization-and-scalability-for-ai-training-1021</link>
      <guid>https://dev.to/daswu/mlperf-storage-v20-juicefs-leads-in-bandwidth-utilization-and-scalability-for-ai-training-1021</guid>
      <description>&lt;p&gt;On August 5, the global authoritative AI engineering consortium &lt;a href="https://mlcommons.org/" rel="noopener noreferrer"&gt;MLCommons&lt;/a&gt; released the latest MLPerf® Storage v2.0 benchmark results. This evaluation attracted numerous vendors, including those from categories such as cloud, shared file, fabric-attached block, and direct-attached block storage.&lt;/p&gt;

&lt;p&gt;Due to differences in hardware configurations, node scales, and application scenarios among vendors, direct cross-vendor comparisons have limitations. Therefore, this article will focus on the shared file system category, analyzing their performance under the same testing standards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://juicefs.com/docs/community/introduction/" rel="noopener noreferrer"&gt;JuiceFS&lt;/a&gt; is a high-performance distributed file system supporting both cloud and on-premises deployments. Across multiple AI training workloads, JuiceFS achieved excellent results, particularly leading in bandwidth utilization and scalability. In this article, we’ll analyze the specific test results and further introduce the key features underpinning this performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  MLPerf Storage v2.0 and its workloads
&lt;/h2&gt;

&lt;p&gt;MLPerf is a universal AI benchmark suite launched by MLCommons. MLPerf Storage simulates real AI workload access to storage systems through multiple clients. It replicates storage loads in large-scale distributed training clusters to comprehensively evaluate the practical performance of storage systems in &lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;AI training&lt;/a&gt; tasks.&lt;/p&gt;

&lt;p&gt;The latest 2.0 version includes three types of training workloads, covering the most representative I/O patterns in &lt;a href="https://en.wikipedia.org/wiki/Deep_learning" rel="noopener noreferrer"&gt;deep learning&lt;/a&gt; training.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3D U-Net (medical image segmentation): This workload involves sequential and concurrent reads of large-volume 3D medical images. Each sample averages about 146 MB and is stored as an independent file. &lt;strong&gt;This task primarily tests the storage system's throughput in large-file sequential read scenarios and its ability to maintain stable responsiveness under concurrent multi-node access&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;ResNet-50 (image classification): This workload is characterized by highly concurrent random reads of small samples. Each sample averages only 150 KB, with data packaged and stored in large files using TFRecord format. &lt;strong&gt;This data organization leads to significant random I/O and frequent metadata access during training. It places extremely high demands on the storage system's IOPS. It’s a crucial test for measuring concurrent performance in small-file scenarios&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;CosmoFlow (cosmology prediction): This workload emphasizes concurrent small-file access and bandwidth scalability across nodes. Each sample averages 2 MB, typically stored as individual files in TFRecord format. &lt;strong&gt;As it involves distributed reading of massive numbers of small files, the system requires sufficient aggregate throughput and must demonstrate stability in metadata handling and tail latency control. Otherwise, as nodes increase, latency fluctuations will significantly amplify and slow down overall training speed&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bu8k0dg8fe88spt8opv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bu8k0dg8fe88spt8opv.png" alt=" " width="800" height="645"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition, the 2.0 version introduces a new checkpointing workload to simulate checkpoint saving and recovery in large model training. This workload primarily involves concurrent sequential writes of large files. In the JuiceFS architecture, checkpoint data is written through JuiceFS to &lt;a href="https://en.wikipedia.org/wiki/Object_storage" rel="noopener noreferrer"&gt;object storage&lt;/a&gt;. Its performance bottleneck depends on the bandwidth limit of the object storage serving as the persistent data layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance comparison: product categories, elastic scalability, and resource utilization
&lt;/h2&gt;

&lt;p&gt;Numerous vendors participated in this MLPerf Storage v2.0 test, involving various types like block storage and shared file systems. &lt;strong&gt;However, due to significant architectural and application scenario differences, coupled with different hardware configurations and node scales used by vendors, a direct horizontal comparison has limited value&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we’ll focus on results within the shared file system category. This category can be further subdivided:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ethernet-based systems&lt;/strong&gt;: This group includes Alluxio, JuiceFS, and Oracle. These cloud systems rely on Ethernet environments to provide distributed storage capabilities. Some vendors, like Nutanix, used &lt;a href="https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet" rel="noopener noreferrer"&gt;RoCE&lt;/a&gt;-based Ethernet solutions, typically with higher-bandwidth NICs per instance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/InfiniBand" rel="noopener noreferrer"&gt;InfiniBand&lt;/a&gt;-based storage solutions&lt;/strong&gt;: Vendors like DDN, &lt;a href="https://en.wikipedia.org/wiki/Hewlett-Packard" rel="noopener noreferrer"&gt;Hewlett Packard&lt;/a&gt;, and Ubix offer complete storage appliances based on InfiniBand networks. These systems feature very high hardware specifications and overall cost, delivering extremely high bandwidth and performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ws658272ik9lzrawg1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ws658272ik9lzrawg1j.png" alt=" " width="770" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before we explain the results, let's clarify the evaluation criteria.&lt;/p&gt;

&lt;p&gt;MLPerf Storage documentation requires submitted results to meet a GPU utilization threshold while maximizing the number of GPUs (scale). The thresholds are 90% for 3D U-Net and ResNet-50, and 70% for CosmoFlow. &lt;strong&gt;Given these thresholds are met, the key metric that truly differentiates performance is the maximum number of GPUs a storage system can support. This scale is essentially determined by the system's maximum aggregate bandwidth&lt;/strong&gt;. A system supporting more GPUs indicates stronger scalability and stability in large-scale training scenarios, especially crucial in workloads like CosmoFlow which involve vast numbers of small files and are highly latency-sensitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison should consider resource utilization&lt;/strong&gt;. For software vendors, the key is whether the storage software can fully leverage the potential of the underlying hardware. As the network bandwidth is often the bottleneck for storage systems, &lt;strong&gt;we use NIC bandwidth utilization as a reference metric: higher utilization indicates greater software efficiency and implies higher performance and cost-effectiveness for a given hardware configuration&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS test results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://miccai-sb.github.io/materials/Hoang2019b.pdf" rel="noopener noreferrer"&gt;3D U-Net&lt;/a&gt; workload: JuiceFS achieved a data read bandwidth of up to 108 GiB/s, supporting a training scale of 40 H100 GPUs across 10 nodes. The network bandwidth utilization reached 86.6%, and the GPU utilization was 92.7%.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/pdf/1808.04728" rel="noopener noreferrer"&gt;CosmoFlow&lt;/a&gt; workload: JuiceFS supported a training scale of 100 H100 GPUs across 10 nodes, with a GPU utilization of 75%. This workload places extreme demands on storage latency stability and has lower network bandwidth requirements. The performance bottleneck is not bandwidth. Handling massive concurrent small-file access means that I/O latency and stability determine overall scalability and limit GPU utilization.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.roboflow.com/what-is-resnet-50/" rel="noopener noreferrer"&gt;ResNet-50&lt;/a&gt; workload: JuiceFS achieved a data read bandwidth of 90 GiB/s, with a network bandwidth utilization of 72% and an overall GPU utilization of 95%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: The maximum number of stable elastic instances available in a single zone on the chosen GCP instance type for this test was limited. Therefore, our submitted maximum scale was 10 nodes. This does not represent the maximum capacity of the JuiceFS system, which can provide higher aggregate bandwidth and support larger training scales with more nodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background: GPU scale limited by storage network bandwidth capacity
&lt;/h3&gt;

&lt;p&gt;JuiceFS has been participating in MLPerf tests for three years. In the first year, based on the 0.5 version, we conducted full tests using V100 GPUs for simulation.&lt;/p&gt;

&lt;p&gt;Because the per-GPU bandwidth requirement of V100 is low, the GPU utilization remained near 99% for a long interval. It slowly decreased to about 98% as the training GPU scale increased. Only when the total bandwidth approached the NIC bandwidth limit, the inflection point occurred. After that, utilization dropped rapidly with additional GPUs. For detailed analysis, see &lt;a href="https://juicefs.com/en/blog/engineering/ai-gpu-utilization-mlperf-benchmark" rel="noopener noreferrer"&gt;98% GPU Utilization Achieved in 1k GPU-Scale AI Training Using Distributed Cache&lt;/a&gt;. According to the data in the figure below, at 90% GPU utilization, about 110 V100 GPUs could be supported. This scale limit is primarily constrained by the storage network's bandwidth capacity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1np48z5cje9xi839asf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1np48z5cje9xi839asf.png" alt=" " width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3D U-Net: JuiceFS leads peer systems with highest bandwidth and utilization
&lt;/h3&gt;

&lt;p&gt;The 3D U-Net training workload involves large-file sequential reads, demanding high read bandwidth from the storage system.&lt;/p&gt;

&lt;p&gt;The figure below clearly shows that among all storage systems based on conventional Ethernet, &lt;strong&gt;JuiceFS performed best: achieving 108 GiB/s data read bandwidth at a scale of 10 nodes (40 H100 GPUs), with a network bandwidth utilization of 86.6%, both the highest among comparable systems&lt;/strong&gt;. This indicates JuiceFS not only provides higher total bandwidth but also utilizes network and hardware resources more efficiently, offering significant cost-performance advantages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mykyq4ujqo00wfhj3cf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mykyq4ujqo00wfhj3cf.png" alt=" " width="800" height="557"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following section shows results from participants using InfiniBand networks and RDMA over Converged Ethernet (RoCE). These vendors' hardware specifications are generally very high, with minimum total network bandwidth about 400 GiB/s, and the highest exceeding 1,500 GiB/s. Consequently, they can provide extremely high aggregate bandwidth for training workloads. It’s worth noting that JuiceFS can achieve similar bandwidth levels after elastically scaling the number of distributed cache nodes. In recent tests, an aggregate read bandwidth of 1.2 TB/s was achieved based on a distributed cache cluster of 100 GCP 100 Gbps nodes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8jlkhons16q31b5z2d3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8jlkhons16q31b5z2d3.png" alt=" " width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From a utilization perspective, as per-NIC bandwidth increases, achieving high bandwidth utilization becomes increasingly challenging. Achieving utilization close to 80% is very difficult with 400 Gb/s or higher NIC configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  CosmoFlow: JuiceFS supports hundred-GPU scale, leading in scalability
&lt;/h3&gt;

&lt;p&gt;The CosmoFlow training workload requires reading vast numbers of small files. This places extreme demands on the storage system's metadata performance and read latency. The test requires GPU utilization to be above 70%. Due to the high latency sensitivity, as the number of H100s increases, the read latency variance in multi-node distributed training increases significantly, causing GPU utilization to drop rapidly and making horizontal scaling difficult. Compared to 3D U-Net, fewer total results were submitted for CosmoFlow. This reflects the greater optimization challenge this workload presents.&lt;/p&gt;

&lt;p&gt;The chart's x-axis represents the total number of GPUs supported, that is, the number of H100 GPUs. &lt;strong&gt;JuiceFS continues to lead among comparable systems. Running simultaneously through 10 clients, it successfully supported the CosmoFlow training task with 100 H100 GPUs, maintaining GPU utilization above the required threshold&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6h05w7d5lqf7yqh00ct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6h05w7d5lqf7yqh00ct.png" alt=" " width="800" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Systems based on InfiniBand networks performed particularly well in this workload. This benefits from the end-to-end extremely low and highly stable latency systematically provided by InfiniBand networks. Despite their higher cost, the performance advantage of InfiniBand networks is significant for latency-sensitive tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgv9ngc1xc2y53nm62le9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgv9ngc1xc2y53nm62le9.png" alt=" " width="800" height="688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ResNet-50: JuiceFS supports the most H100s, achieving highest bandwidth utilization
&lt;/h3&gt;

&lt;p&gt;The ResNet-50 training workload involves highly concurrent random reads within large files, demanding extremely high IOPS from the storage system and requiring GPU utilization to stay above 90%.&lt;/p&gt;

&lt;p&gt;In this test, &lt;strong&gt;JuiceFS supported the largest number of 500 H100 GPUs among comparable systems and achieved the highest network bandwidth utilization (72%) among all Ethernet-based solutions, far exceeding the approximately 40% level of other vendors&lt;/strong&gt;. Simultaneously, GPU utilization remained at 95%. This result demonstrates that under high-concurrency random I/O scenarios, JuiceFS can fully leverage its software optimizations to efficiently utilize hardware resources, exhibiting leading performance and efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkbbey2wmwojyb22213c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkbbey2wmwojyb22213c.png" alt=" " width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  JuiceFS system configuration in the MLPerf test
&lt;/h2&gt;

&lt;p&gt;The JuiceFS system topology in this test consisted of three main layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The client layer: The top layer consisted of 10 client nodes, all running on the same GCP instance type, with the same bandwidth configurations per node. This ensures client-side balance.&lt;/li&gt;
&lt;li&gt;The cache cluster layer: The middle layer was the cache cluster, comprising 10 cache nodes. These nodes used cloud disks combined with local memory as cache to accelerate data access.&lt;/li&gt;
&lt;li&gt;The cold data storage layer: Data was ultimately stored in &lt;a href="https://cloud.google.com/storage" rel="noopener noreferrer"&gt;Google Cloud Storage&lt;/a&gt; (GCS).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before training started, cold data was warmed up from GCS to the cache cluster. During training, clients read data from the cache cluster. This approach avoids accessing high-latency object storage during training, providing stable high bandwidth and low-latency access to meet the sustained data demands of large-scale AI training.&lt;/p&gt;

&lt;p&gt;The MLPerf Storage v2.0 rules allowed for an additional "warm-up test" round. Thus, even systems without an active warm-up feature could load data into cache before the official test, ensuring fairness. In addition, all participants could benefit from performance gains offered by page cache. This is critical for latency-sensitive workloads like CosmoFlow.&lt;/p&gt;

&lt;p&gt;In the current configuration, 10 clients and the cache cluster could support a training scale of about 40 H100 GPUs. If the training scale increases further, for example, doubling the number of GPUs, simply scaling the cache cluster nodes proportionally can match the new bandwidth requirements. This architecture demonstrates high flexibility and elasticity, adaptable to larger-scale training tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjualdpg9ixqk0osacmh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjualdpg9ixqk0osacmh.png" alt=" " width="501" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The strong performance of JuiceFS in this test relies firstly on its high-performance metadata engine&lt;/strong&gt;. This engine delivers very high IOPS and extremely low latency, with integrated metadata caching on the client, significantly reducing access latency. This capability is particularly critical for the CosmoFlow workload, which involves massive concurrent small-file access and is highly latency-sensitive. Stable low latency ensures high GPU utilization and overall scalability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Another key advantage is the distributed cache&lt;/strong&gt;. A test shows that the J&lt;a href="https://juicefs.com/en/blog/engineering/terabyte-aggregate-bandwidth-distributed-cache-network" rel="noopener noreferrer"&gt;uiceFS cache cluster can provide up to 1.2 TB/s of aggregate bandwidth&lt;/a&gt; (based on 100 GCP 100 Gbps nodes, scale limit constrained by instance availability in the selected GCP zone) and reduce access latency to sub-millisecond levels. The cache cluster supports elastic scaling, providing extremely high performance in bandwidth and allowing expansion of IOPS and bandwidth according to task needs. In the ResNet-50 workload, JuiceFS' high IOPS performance was decisive. JuiceFS stood out in the Ethernet-based comparison relying on its IOPS advantage, achieving 72% bandwidth utilization and maintaining 95% GPU utilization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this article, we analyzed the results of the MLPerf Storage v2.0 test. Before evaluating a storage product's GPU utilization, users must understand the product type, including its architecture, hardware resources, and cost. &lt;strong&gt;Assuming GPU utilization thresholds are met, the key difference among storage systems is the maximum number of GPUs they can support. A system supporting more GPUs indicates better scalability and stability in large-scale training scenarios. Furthermore, resource utilization must be considered—whether the storage software can fully leverage the potential of the underlying hardware&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;JuiceFS demonstrated stable low latency and high resource utilization across different AI training workloads in these tests. As a fully &lt;a href="https://en.wikipedia.org/wiki/User_space_and_kernel_space" rel="noopener noreferrer"&gt;user-space&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Cloud-native_computing" rel="noopener noreferrer"&gt;cloud-native&lt;/a&gt; distributed file system, it utilizes a distributed cache to achieve high throughput and low latency, while leveraging object storage for cost advantages. It presents a viable option for large-scale AI training that does not rely on expensive proprietary hardware.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Achieving TB-Level Aggregate Bandwidth: How JuiceFS Optimized Distributed Cache Network</title>
      <dc:creator>DASWU</dc:creator>
      <pubDate>Mon, 22 Sep 2025 09:54:50 +0000</pubDate>
      <link>https://dev.to/daswu/achieving-tb-level-aggregate-bandwidth-how-juicefs-optimized-distributed-cache-network-207j</link>
      <guid>https://dev.to/daswu/achieving-tb-level-aggregate-bandwidth-how-juicefs-optimized-distributed-cache-network-207j</guid>
      <description>&lt;p&gt;With the explosive growth of data volume and model scale, scenarios where multiple clients frequently access the same data have become increasingly common. Distributed caching aggregates the local caches of multiple nodes to form a large-capacity cache pool. This  improves cache hit rates, enhances read bandwidth and IOPS, reduces read latency, and meets high-performance demands.&lt;/p&gt;

&lt;p&gt;However, data exchange between nodes heavily relies on network performance. Insufficient bandwidth can limit data transfer speeds and increase latency; high network latency can affect cache responsiveness and reduce system efficiency; meanwhile, the CPU resources consumed by network data processing can also become a bottleneck, constraining overall performance.&lt;/p&gt;

&lt;p&gt;To address these issues, the recently released &lt;a href="https://juicefs.com/docs/cloud/" rel="noopener noreferrer"&gt;JuiceFS Enterprise Edition&lt;/a&gt; 5.2 introduces multiple optimizations for network transmission between cache nodes. This effectively resolves some performance-intensive aspects and improves network interface card (NIC) bandwidth utilization. &lt;strong&gt;After optimization, client CPU overhead was reduced by 50%+, and cache node CPU overhead dropped to one-third of pre-optimization levels. The aggregate read bandwidth reached 1.2 TB/s, nearly saturating the TCP/IP network bandwidth (based on a distributed cache cluster of 100 GCP 100 Gbps nodes)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0tpf5goz8qdfzpxsqsd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0tpf5goz8qdfzpxsqsd.png" alt=" " width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the most widely supported network standard, the TCP/IP protocol can be used almost everywhere out-of-the-box. With built-in congestion control and retransmission mechanisms, it achieves sufficiently high performance and stability in complex networks. This makes it the preferred choice for &lt;a href="https://en.wikipedia.org/wiki/Distributed_cache" rel="noopener noreferrer"&gt;distributed caching&lt;/a&gt;. Currently, JuiceFS can fully saturate 100 Gb NIC bandwidth under TCP/IP. In the future, we’ll also introduce RDMA to further unleash the potential of 200 Gb and 400 Gb NICs.&lt;/p&gt;

&lt;p&gt;In this article, we’ll share the specific implementations of these TCP/IP network optimizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Golang network performance optimization
&lt;/h2&gt;

&lt;p&gt;In large-scale environments, such as when thousands of clients access data from over 100 distributed cache nodes, the number of network connections to cache nodes increases significantly. This leads to substantial Golang scheduling overhead and reduced network bandwidth utilization.&lt;/p&gt;

&lt;p&gt;We addressed two key points—connection reuse and &lt;code&gt;epoll&lt;/code&gt; event triggering—by introducing &lt;a href="https://en.wikipedia.org/wiki/Multiplexing" rel="noopener noreferrer"&gt;multiplexing&lt;/a&gt; mechanisms and event trigger threshold optimizations. This effectively alleviated system pressure and improved data processing performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiplexing
&lt;/h3&gt;

&lt;p&gt;If a connection can only handle one request at a time, the system’s overall concurrency capability will be limited by the number of connections itself. To improve concurrency, a large number of connections must be established. Therefore, we introduced connection multiplexing. It allows multiple request packets to be sent simultaneously over the same connection to the peer end. Through multiplexing, the throughput capacity of a single connection can be effectively improved. This reduces the need for numerous connections, thereby enhancing concurrent performance while lowering resource consumption and network burden.&lt;/p&gt;

&lt;p&gt;Since single TCP connections have performance bottlenecks, we further introduced the ability to dynamically adjust the number of connections based on real-time traffic within the multiplexing architecture to achieve the best balance between performance and resources. This improved overall throughput and network efficiency. The system automatically increases or decreases the number of connections based on current network traffic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When user request volume is high, total traffic continues to rise, and existing connections can no longer meet bandwidth requirements, the system automatically increases the number of connections to improve bandwidth utilization efficiency.&lt;/li&gt;
&lt;li&gt;When requests become inactive and total bandwidth decreases, the system automatically reduces the number of connections to avoid resource waste and packet fragmentation issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another advantage of multiplexing is support for small packet merging. Through efficient connection management, multiple small packets’ data streams can be transmitted together over the same physical connection. This reduces network transmissions along with the overhead from system calls and &lt;a href="https://en.wikipedia.org/wiki/User_space_and_kernel_space" rel="noopener noreferrer"&gt;kernel-user&lt;/a&gt; space switching.&lt;/p&gt;

&lt;p&gt;Specifically, on the sender side, the sending thread continuously retrieves multiple requests from the sending channel until the channel is empty or the cumulative data to be sent reaches 4 KiB before sending them together. This merges multiple small packets into a larger data block, reducing system calls and network transmissions. On the receiver side, data from the NIC is read in bulk into a ring buffer, from which upper-layer services extract segments one by one and parse out complete application-layer data packets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Receive watermark setting (&lt;code&gt;SO_RCVLOWAT&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;In Golang’s network framework, &lt;code&gt;epoll&lt;/code&gt;’s edge-triggered mode is used by default to handle sockets. An &lt;code&gt;epoll&lt;/code&gt; event is generated when the socket state changes (for example, from no data to having data).&lt;/p&gt;

&lt;p&gt;To reduce the additional overhead caused by frequent events triggered by small amounts of data, we set &lt;code&gt;SO_RCVLOWAT&lt;/code&gt; (socket receive low watermark) to control the kernel to trigger &lt;code&gt;epoll&lt;/code&gt; events only when the data volume in the socket receive buffer reaches a specified number of bytes. This reduces the number of system calls triggered by frequent events and lowers network I/O overhead.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conn, _ :&lt;span class="o"&gt;=&lt;/span&gt; net.Dial&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tcp"&lt;/span&gt;, &lt;span class="s2"&gt;"example.com:80"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
syscall.SetsockoptInt&lt;span class="o"&gt;(&lt;/span&gt;conn.&lt;span class="o"&gt;(&lt;/span&gt;net.TCPConn&lt;span class="o"&gt;)&lt;/span&gt;.File&lt;span class="o"&gt;()&lt;/span&gt;.Fd&lt;span class="o"&gt;()&lt;/span&gt;, syscall.SOL_SOCKET, syscall.SO_RCVLOWAT, 512&lt;span class="k"&gt;*&lt;/span&gt;1024&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After testing, in high-concurrency connection scenarios, &lt;code&gt;epoll&lt;/code&gt; events dropped by 90% to about 40,000 per second. Cache performance became more stable, with CPU overhead maintained at about 1 core per GB bandwidth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-copy optimization: reducing CPU and memory consumption
&lt;/h3&gt;

&lt;p&gt;In Linux network communication, &lt;a href="https://en.wikipedia.org/wiki/Zero-copy" rel="noopener noreferrer"&gt;zero-copy&lt;/a&gt; is a technique that reduces or eliminates unnecessary data copying between kernel space and user space to lower CPU and memory consumption, thereby improving data transfer efficiency. It’s particularly suitable for large-scale data transfer scenarios. Classic products such as nginx, Kafka, and haproxy use zero-copy technology to optimize performance.&lt;/p&gt;

&lt;p&gt;Common zero-copy technologies include the mmap system call, as well as mechanisms such as &lt;code&gt;sendfile&lt;/code&gt;, &lt;code&gt;splice&lt;/code&gt;, &lt;code&gt;tee&lt;/code&gt;, &lt;code&gt;vmsplice&lt;/code&gt;, and &lt;a href="](https://lwn.net/Articles/752188/)"&gt;&lt;code&gt;MSG_ZEROCOPY&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sendfile&lt;/code&gt; is a system call provided by Linux to read file data directly from disk into the kernel buffer and transmit it to the socket buffer via DMA without passing through user space.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;splice&lt;/code&gt; allows direct data movement in kernel space, supporting data transfer between any two file descriptors (with at least one end being a pipe). Using a pipe as an intermediary, data is transferred from the input descriptor (e.g., a file) to the output descriptor (e.g., a socket), while the kernel only manipulates page pointers, avoiding actual data copying.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to &lt;code&gt;sendfile&lt;/code&gt;, &lt;code&gt;splice&lt;/code&gt; is more flexible, supporting non-file transmission scenarios (such as forwarding between sockets), and is more suitable for high-concurrency network environments, such as proxy servers. &lt;code&gt;sendfile&lt;/code&gt; can reduce the number of system calls but blocks the current thread, affecting concurrent performance. We used &lt;code&gt;splice&lt;/code&gt; and &lt;code&gt;sendfile&lt;/code&gt; in different scenarios to optimize data transmission flows for the best results.&lt;/p&gt;

&lt;p&gt;Taking &lt;code&gt;splice&lt;/code&gt; as an example, when a client requests file data, it sends a request to the cache node via distributed caching. We used &lt;code&gt;splice&lt;/code&gt; zero-copy technology to optimize the data flow on the sender side. The specific data transmission path is as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sjssgtjburuqt5d3ihk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sjssgtjburuqt5d3ihk.png" alt=" " width="800" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When cache data needs to be sent to the client, the process uses the &lt;code&gt;splice&lt;/code&gt; interface to read data directly from the file into the kernel buffer (page cache). The data is then transmitted via &lt;code&gt;splice&lt;/code&gt; to a pre-created pipe, and the data in the pipe is further transmitted via &lt;code&gt;splice&lt;/code&gt; to the socket buffer. Since the data remains in kernel space throughout the process, only two DMA copies occur without any CPU copying, achieving zero-copy cache service.&lt;/p&gt;

&lt;p&gt;Although there is also optimization potential on the receiver side, due to the high hardware and runtime environment requirements for zero-copy implementation and its lack of universality, we haven’t introduced related solutions in the current architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing the CRC check process
&lt;/h2&gt;

&lt;p&gt;In previous versions, reading a data block from distributed caching required two &lt;a href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check" rel="noopener noreferrer"&gt;cyclic redundancy checks&lt;/a&gt; (CRCs):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disk loading phase: When loading data from disk into memory, to check if data on the disk has experienced bit rot.&lt;/li&gt;
&lt;li&gt;Network transmission phase: The sender recalculates the CRC and writes it into the packet header. The receiver recalculates the CRC after receiving the data and compares it with the value in the packet header to ensure the data has not been tampered with or corrupted during transmission.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To reduce unnecessary CPU overhead, the new version optimizes the CRC check process during network transmission. The sender no longer recalculates the CRC but instead uses the CRC value saved on the disk, merging it into a total CRC written into the packet header. The receiver performs only one CRC after receiving the data.&lt;/p&gt;

&lt;p&gt;To support random reads, cached data blocks need to have CRC values calculated segmentally—every 32 KB. However, the network transmission process requires the CRC value for all transmitted data. Therefore, we efficiently merge segmented CRC values into an overall CRC value using a lookup table method, the overhead of which is almost negligible.&lt;br&gt;
This approach reduces one CRC calculation while ensuring data consistency and integrity, effectively lowering the sender’s CPU consumption and improving overall performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In scenarios such as large-scale &lt;a href="https://www.ibm.com/think/topics/model-training" rel="noopener noreferrer"&gt;model training&lt;/a&gt; and &lt;a href="https://www.ibm.com/think/topics/ai-inference" rel="noopener noreferrer"&gt;inference&lt;/a&gt;, data scale is growing at an unprecedented rate. As a critical component connecting computing and storage, distributed caching distributes hot data across multiple nodes, significantly improving system access performance and scalability. This effectively reduces the pressure on backend storage, and enhances the stability and efficiency of the overall system.&lt;/p&gt;

&lt;p&gt;However, when facing continuously growing data and client access pressure, building high-performance large-scale distributed caching still faces many challenges. Therefore, &lt;a href="https://juicefs.com/docs/cloud/guide/distributed-cache/" rel="noopener noreferrer"&gt;JuiceFS’ distributed caching system&lt;/a&gt;, implemented based on Golang, has undergone in-depth optimizations around Golang’s I/O mechanisms in practice and introduced key technologies such as zero-copy. This significantly reduces CPU overhead, improves network bandwidth utilization, and makes the system more stable and efficient under high-load scenarios.&lt;/p&gt;

&lt;p&gt;We hope that some of the practical experiences in this article can provide references for developers facing similar issues.&lt;br&gt;
If you have any questions or feedback for this article, feel free to join &lt;a href="https://github.com/juicedata/juicefs/discussions" rel="noopener noreferrer"&gt;JuiceFS discussions on GitHub&lt;/a&gt; and &lt;a href="https://go.juicefs.com/slack" rel="noopener noreferrer"&gt;community on Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
