Data replication
JuiceFS supports cross-cloud and cross-region data replication, which can replicate data asynchronously to object storage service in another region or another cloud service provider.
For the convenience of description, we will refer to the Object Storage Region specified when creating the file system as the "primary region". To enable data replication, you need to specify a "target region" for the file system. It can be the same or different cloud service provider as the primary region.
Featured scenarios
This feature applies to the following scenarios:
- Cross-region data sharing: Read and write the same file system in two regions, target region performance will depend on the actual network condition. For read-only scenarios, consider using a mirror file system for the best performance.
- Object storage service disaster recovery: If the object storage service in the primary region fails, you can manually switch to the object storage in the target region (via
--bucket
option), restore service in a short period of time. - Seamless migration object storage service: If you need to change the underlying object storage service of a certain file system, you can use this feature to write to two buckets at the same time, achieving a seamless migration.
For now, data replication is one-to-one, one-to-many is not supported. If you do need to replicate data into multiple regions, use juicefs sync
instead, or create multiple mirror regions to achieve one-to-many replication.
Prerequisites
The target region needs to provide object storage service, and the primary region and the target region can access each other's object storage service.
How it works
Taking the primary region write and the target region read as an example, the data copy works as shown below:
Access to the same metadata service from both regions is readable and writable:
- For writing, data is preferentially written to the object storage of the current region. After successful, the data is then asynchronously copied to the remote object storage.
- For reading, data is preferentially read from the object storage of the current region. If it does not exist (not yet synchronized), it will be read from the object storage in the remote region. Performance will be affected in poor network conditions, tune cache config according to your use case and see if it helps.
Enable data replication
Open JuiceFS Console, navigate to the volume settings page and click "Enable replication", select a target cloud service and region, save the settings, then re-mount the file system through to take effect.
With replication enabled, the mirror metadata service will watch for all data modifications in the Raft changelog, and dispatch data synchronization tasks to the clients (in the form of background tasks), client will then pull data from the source object storage, and upload to the target object storage. And if synchronization speed isn't ideal, simply mount more clients to increase concurrency. Apart from real-time, incremental synchronization, client will periodically (default to weekly) carry out full, bidirectional synchronization.
When mounting the file system, use --access-key2
and --secret-key2
for the target object storage. And when mounting in the target region, add the --flip
parameter to "flip" the relationship between two regions, i.e. use the target region as the primary region. In this mode, client will first write to the target region (--bucket2
), then asynchronously copy data to the primary region (--bucket
), in the background:
Data consistency
Both zones use the same metadata service, they will see exactly the same status. As for object storage, all data blocks are immutable objects, so no need to worry about data consistency.
Billing Notes
Data replication is available to all users for free. When enabled, no additional metadata is generated, so replication has no impact on JuiceFS billing. You only need to pay attention to your object storage service provider's billing.