Skip to main content

Data replication

JuiceFS supports cross-cloud and cross-region data replication, which can replicate data asynchronously to object storage service in another region or another cloud service provider.

For the convenience of description, we will refer to the Object Storage Region specified when creating the file system as the "primary region". To enable data replication, you need to specify a "target region" for the file system. It can be the same or different cloud service provider as the primary region.

This feature applies to the following scenarios:

  • Cross-region data sharing: Read and write the same file system in two regions, target region performance will depend on the actual network condition. For read-only scenarios, consider using a mirror file system for the best performance.
  • Object storage service disaster recovery: If the object storage service in the primary region fails, you can manually switch to the object storage in the target region (via --bucket option), restore service in a short period of time.
  • Seamless migration object storage service: If you need to change the underlying object storage service of a certain file system, you can use this feature to write to two buckets at the same time, achieving a seamless migration.

For now, data replication is one-to-one, one-to-many is not supported. If you do need to replicate data into multiple regions, use juicefs sync instead, or create multiple mirror regions to achieve one-to-many replication.

Prerequisites

The target region needs to provide object storage service, and the primary region and the target region can access each other's object storage service.

How it works

Taking the primary region write and the target region read as an example, the data copy works as shown below:

replication

Access to the same metadata service from both regions is readable and writable:

  • For writing, data is preferentially written to the object storage of the current region. After successful, the data is then asynchronously copied to the remote object storage.
  • For reading, data is preferentially read from the object storage of the current region. If it does not exist (not yet synchronized), it will be read from the object storage in the remote region. Performance will be affected in poor network conditions, tune cache config according to your use case and see if it helps.

Enable data replication

Open JuiceFS Console, navigate to the volume settings page and click "Enable replication", select a target cloud service and region, save the settings, then re-mount the file system through to take effect.

With replication enabled, the mirror metadata service will watch for all data modifications in the Raft changelog, and dispatch data synchronization tasks to the clients (in the form of background tasks), client will then pull data from the source object storage, and upload to the target object storage. And if synchronization speed isn't ideal, simply mount more clients to increase concurrency. Apart from real-time, incremental synchronization, client will periodically (default to weekly) carry out full, bidirectional synchronization.

When mounting the file system, use --access-key2 and --secret-key2 for the target object storage. And when mounting in the target region, add the --flip parameter to "flip" the relationship between two regions, i.e. use the target region as the primary region. In this mode, client will first write to the target region (--bucket2), then asynchronously copy data to the primary region (--bucket), in the background:

replication-flip

Data consistency

Both zones use the same metadata service, they will see exactly the same status. As for object storage, all data blocks are immutable objects, so no need to worry about data consistency.

Billing Notes

Data replication is available to all users for free. When enabled, no additional metadata is generated, so replication has no impact on JuiceFS billing. You only need to pay attention to your object storage service provider's billing.