Why JuiceFS?
Tens of billions of files
As the parameter sizes of large language models and other foundation models grow, their training datasets expand significantly. JuiceFS can manage up to tens of billions of files in a single volume. This capability has been proven in multiple enterprises’ production environments, making it ideal for large-scale AI datasets.
High aggregate throughput
With flexible cache configurations, JuiceFS provides virtually unlimited aggregate throughput. By leveraging multi-level cache strategies, priority-based eviction policies, and capacity weighting, JuiceFS maximizes existing hardware resources and eliminates the need for additional dedicated hardware investments.
Efficient large file writes
Checkpoint saving in large language model training involves extensive large file writing. JuiceFS uses a block storage design, combined with enhanced concurrency for object storage access and write caching, to optimize sequential write throughput for large files. This effectively reduces GPU idle time.
Cloud-native design
Designed specifically for cloud environments, JuiceFS can be deployed on global public clouds and seamlessly integrates into existing cloud infrastructures. This meets diverse platform and regional requirements.
Multi-cloud file systems
When GPU resources are distributed across regions, ensuring on-demand remote data access and addressing bandwidth limitations is critical. JuiceFS' mirror file system ensures consistent and localized data access worldwide. Data replication cost is lower than bandwidth cost, reducing cross-region access expenses and optimizing data distribution.
Cost-effective architecture
JuiceFS' architecture separates performance and capacity: it leverages cloud-based, highly available, elastic, reliable, and cost-effective object storage for capacity; it uses NVMe SSDs near compute nodes as cache to ensure high-performance access. This transparent cache mechanism offers you a seamless, efficient experience.
Feature Overview
MiniMax Built a Cost-Effective, High-Performance AI Platform with JuiceFS
MiniMax, a leading general AI technology company, adopts a hybrid cloud strategy to balance flexibility and cost efficiency. With GPU resources deployed across both IDC and cloud environments, JuiceFS provides a unified data access experience. MiniMax selected JuiceFS Enterprise Edition as the storage solution for its AI platform to ensure high-performance data access for various scenarios, including data cleaning, model training, and inference. [Learn more]
Zhihu Improved Checkpoint Storage Stability for LLM Training with JuiceFS in the Multi-Cloud Architecture
Zhihu is China's top Q&A platform with 100 million+ monthly active users. It distributed GPU resources in a multi-cloud environment for LLM training. This setup required a cross-cloud file system to reduce redundant data copies. In addition, Zhihu’s cluster ran various tasks, which generated 100+ GB of checkpoint data. Writing these checkpoints often caused significant system latency. To address these challenges, Zhihu adopted JuiceFS Enterprise Edition, ensuring stable storage for LLM training in its multi-cloud architecture. [Learn more].
Trusted by Innovators in Gen-AI







