Configuration
Hadoop SDK is one of many ways to use JuiceFS, thus most configuration items have the same meaning as JuiceFS Client, you can learn more at Command Reference.
Core configurations
Item | Default Value | Description |
---|---|---|
fs.jfs.impl | com.juicefs.JuiceFileSystem | Specify the implementation of the storage type jfs:// . |
fs.AbstractFileSystem.jfs.impl | com.juicefs.JuiceFS | Specify the implementation of AbstractFileSystem for MapReduce. |
juicefs.token | Credential for JuiceFS volume, checkout from the setting page of JuiceFS web console. | |
juicefs.bucket | Optionally provide the name or endpoint of the bucket, to overwrite the configured value in JuiceFS web console. | |
juicefs.accesskey | Access Key for object store (omit if client node can access object storage without credentials). | |
juicefs.secretkey | Secret Key for object store (omit if client node can access object storage without credentials). | |
juicefs.console-url | JuiceFS Web Console URL (e.g. http://x.x.x.x:8080 ), for on-prem environment only. |
Data replication configurations
Read Data replication to learn more.
Item | Default Value | Description |
---|---|---|
juicefs.bucket2 | Optionally provide the name or endpoint for the secondary bucket for Data replication, to overwrite the configured value in JuiceFS web console. | |
juicefs.accesskey2 | Access Key for replicate object store (omit if client node can access object storage without credentials). | |
juicefs.secretkey2 | Secret Key for replicate object store (omit if client node can access object storage without credentials). |
Cache configurations
Read Cache to learn more.
Item | Default Value | Description |
---|---|---|
juicefs.cache-dir | memory | Local cache directory, default to process memory, can specify multiple directories separate by : , or use wildcards * . When using local directories, you should create them in advance and give 0777 permission so components could share cache data. This option is the same meaning as --cache-dir . |
juicefs.cache-size | 100 | Cache capacity in MiB. Default size is small because Hadoop SDK uses memory as default cache location. This option is the same meaning as --cache-size . |
juicefs.cache-replica | 1 | Number of nodes that a Block can be scheduled on. Hadoop applications support data locality scheduling by checking data blocks' BlockLocation attribute, so setting a higher replica will allow blocks to be put on more nodes, hence increasing compute task concurrency. Block size is controlled by juicefs.block.size configuration. |
juicefs.cache-group | Cache group name for distributed cache. Nodes within the same group share cache, disabled by default. Recommended for applications like Spark where perfect data locality isn't available. | |
juicefs.no-sharing | false | When inside a cache group, only fetch cache data from others, but never share its own cache. Use this option on ephemeral mount points (like Kubernetes Pod). |
juicefs.cache-full-block | true | Cache full sized data block, default to true. Disable this when you need to frequently access a same set of small files, or when disk throughput is smaller th an object storage throughput. This option is the opposite meaning as --cache-partial-only . |
juicefs.memory-size | 300 | Maximum memory for read write buffer in MiB, same meaning as --buffer-size . |
juicefs.auto-create-cache-dir | true | Whether to create cache directories automatically. When set to false, non-existent cache directories will be ignored, effectively disabling cache. |
juicefs.free-space | 0.2 | Minimum free space ratio. When free space is under this ratio, it will clear the cache to free disk space, default to 20%. This option is the same meaning as --free-space-ratio . |
juicefs.metacache | true | Enable metadata cache. |
juicefs.discover-nodes-url | Specify the node discovery API, the node list will be refreshed every 10 minutes. Node list is also used as a whitelist for the cache group, only nodes in this list can join the cache group. Use this method to prevent clients outside the computing cluster from joining the cache group, hindering the distributed cache group performance (read cache group troubleshooting for more).
| |
juicefs.hflush-delay | 0 | Delay hflush (in ms) operations so that data writes is consolidated, this results in fewer object storage PUT requests while increasing overall throughput. Typically used to increase HBase WAL. |
juicefs.write-group-cache | false | Build distributed cache for newly written blocks. Same meaning as --fill-group-cache . |
juicefs.cache-priority Added in v5.0.14 | 0 | The priority of the cache block. The available values are: 0, 1, 2, and 3. The larger the number, the higher the priority. When cache is evicted, data with lower priority will be evicted first. |
juicefs.entry-cache | 0.0 | File entry cache timeout in seconds. |
juicefs.dir-entry-cache | 0.0 | Directory entry cache timeout in seconds. |
juicefs.attr-cache | 0.0 | File attribute cache timeout in seconds. |
juicefs.block.size | dfs.blocksize or 128MB | Logical block size for Hadoop SDK, controls task data sizes for applications like Spark. |
juicefs.cache-group-size | 4 * juicefs.block.size | JuiceFS Client performs readahead and prefetch, so for files smaller than this size, client will try to schedule all its data blocks into a single node, to maximize cache utilization. |
Object storage configurations
Item | Default Value | Description |
---|---|---|
juicefs.bucket | Specify the bucket name of object store. | |
juicefs.prefetch | 1 | Prefetch N blocks in parallel, same as --prefetch |
juicefs.max-uploads | 50 | Maximum number of concurrency for uploading object |
juicefs.upload-limit | 0 | Speed limit for uploading object by a single process, units byte/s |
juicefs.max-downloads | 50 | Maximum number of concurrency for downloading object |
juicefs.download-limit | 0 | Speed limit for downloading object by a single process, units byte/s |
juicefs.get-timeout | 5 | The max number of seconds to download an object |
juicefs.put-timeout | 60 | The max number of seconds to upload an object |
juicefs.max-readahead | 0 | Maximum memory size in MiB for readahead (read relevant sections in cache to learn about readahead), default to 0 , which means the actual max readahead is 20% of juicefs.memory-size . Set this value to a lower int (like 1 ) to reduce read amplification. |
juicefs.external | false | Using external domain to access object store |
Security configurations
Item | Default Value | Description |
---|---|---|
juicefs.server-principal | After enabling Kerberos, you need to specify the principal of the JuiceFS metadata service, refer to "Using Kerberos" |
Other configurations
Item | Default Value | Description |
---|---|---|
juicefs.access-log | The filepath for file system access log (e.g /tmp/juicefs.access.log ), read and write permission is required for all Hadoop components that uses JuiceFS. Log file will rotate at 300MiB, and retain the last 7 files. | |
juicefs.debug | false | Enable DEBUG level log. |
juicefs.superuser | hdfs | Specify the superuser name, to tell JuiceFS Hadoop SDK which user is superuser. |
juicefs.supergroup | hdfs | Specify the supergroup name, all users within this group is considered superuser. |
juicefs.rsaPrivKeyPath | The file path of RSA Private Key for data encryption. | |
juicefs.rsaPassphrase | The passphrase of RSA Key for data encryption. | |
juicefs.file.checksum | false | Enable checksum for copying data via Hadoop DistCp |
juicefs.grouping | Specify the location of the group file to configure user groups and user mapping information, e.g. jfs://myjfs/etc/group . The file format is: <groupname>:<username1>,<username2> | |
juicefs.conf-dir | Specify the dir for file system config, you can find it on the mount machine under /root/.juicefs . Name format {VOLUME}.conf . | |
juicefs.umask | The umask value of the node where the client is located is usually 022 in Linux. | Specify the umask of the file system, which is used to adjust the default permissions when creating new directories and files. For example, set juicefs.umask to 000 , the default permission for the new directory is 777 , and the default permission for the new file is 666 . |
Configure multiple JuiceFS file systems
When using multiple JuiceFS volumes, all of above items can be specified for a single filesystem, the file system name VOL_NAME
should to be placed in the middle of the configuration item, such as:
core-site.xml
<property>
<name>juicefs.{VOL_NAME}.debug</name>
<value>false</value>
</property>