JuiceFS for Generative AI Storage

Why JuiceFS?

Tens of billions of files

As the parameter sizes of large language models and other foundation models grow, their training datasets expand significantly. JuiceFS can manage up to tens of billions of files in a single volume. This capability has been proven in multiple enterprises’ production environments, making it ideal for large-scale AI datasets.

High aggregate throughput

With flexible cache configurations, JuiceFS provides virtually unlimited aggregate throughput. By leveraging multi-level cache strategies, priority-based eviction policies, and capacity weighting, JuiceFS maximizes existing hardware resources and eliminates the need for additional dedicated hardware investments.

Efficient large file writes

Checkpoint saving in large language model training involves extensive large file writing. JuiceFS uses a block storage design, combined with enhanced concurrency for object storage access and write caching, to optimize sequential write throughput for large files. This effectively reduces GPU idle time.

Cloud-native design

Designed specifically for cloud environments, JuiceFS can be deployed on global public clouds and seamlessly integrates into existing cloud infrastructures. This meets diverse platform and regional requirements.

Multi-cloud file systems

When GPU resources are distributed across regions, ensuring on-demand remote data access and addressing bandwidth limitations is critical. JuiceFS' mirror file system ensures consistent and localized data access worldwide. Data replication cost is lower than bandwidth cost, reducing cross-region access expenses and optimizing data distribution.

Cost-effective architecture

JuiceFS' architecture separates performance and capacity: it leverages cloud-based, highly available, elastic, reliable, and cost-effective object storage for capacity; it uses NVMe SSDs near compute nodes as cache to ensure high-performance access. This transparent cache mechanism offers you a seamless, efficient experience.

Feature Overview

Distributed cache

Multiple clients share the same cache data to enhance performance.

In-house metadata

JuiceFS' metadata engine is horizontally scalable. It efficiently manages storage for hundreds of billions of files within a single namespace.

Mirror file systems

Creating one or more complete mirrors of the file system with consistent content.

Superior performance

Check fio performance test results, including sequential and concurrent reads/writes for both large and small files.

POSIX compatibility

You can use it like a local file system, seamlessly integrating with existing applications without disrupting application operations.

JuiceFS CSI Driver

Implementing the interface between container orchestration systems and JuiceFS. In K8s, JuiceFS can provide persistent volumes for Pods.

MiniMax Built a Cost-Effective, High-Performance AI Platform with JuiceFS

MiniMax, a leading general AI technology company, adopts a hybrid cloud strategy to balance flexibility and cost efficiency. With GPU resources deployed across both IDC and cloud environments, JuiceFS provides a unified data access experience. MiniMax selected JuiceFS Enterprise Edition as the storage solution for its AI platform to ensure high-performance data access for various scenarios, including data cleaning, model training, and inference. [Learn more]

Zhihu Improved Checkpoint Storage Stability for LLM Training with JuiceFS in the Multi-Cloud Architecture

Zhihu is China's top Q&A platform with 100 million+ monthly active users. It distributed GPU resources in a multi-cloud environment for LLM training. This setup required a cross-cloud file system to reduce redundant data copies. In addition, Zhihu’s cluster ran various tasks, which generated 100+ GB of checkpoint data. Writing these checkpoints often caused significant system latency. To address these challenges, Zhihu adopted JuiceFS Enterprise Edition, ensuring stable storage for LLM training in its multi-cloud architecture. [Learn more].