ZIL (ZFS intent log) zil.c
背景
ZIL或称SLOG, 被用于提升ZFS系统的离散fsync性能.
类似数据库的redo log或wal.
注意
1. 每个dataset对应一个zil, 也就是说一个zpool有多个zfs的话, 如果有log设备, 那么在log设备中实际上包含了多个ZIL entry.
数据写入ZIL后(fsync), 即使服务器异常, 也可以用于恢复文件系统.
2. 并不是每一笔FSYNC都会用到ZIL, 只有小于2*zil_slog_limit 的commit操作才会用到. 如果你的zil 块设备够强的话, 可以调大伙调到UINT64_MAX, 那么就不检测了, 所有的commit都用上zil设备. 一般不需要调整
/*
* The zfs intent log (ZIL) saves transaction records of system calls
* that change the file system in memory with enough information
* to be able to replay them. These are stored in memory until
* either the DMU transaction group (txg) commits them to the stable pool
* and they can be discarded, or they are flushed to the stable log
* (also in the pool) due to a fsync, O_DSYNC or other synchronous
* requirement. In the event of a panic or power fail then those log
* records (transactions) are replayed.
*
* There is one ZIL per file system. Its on-disk (pool) format consists
* of 3 parts:
*
* - ZIL header
* - ZIL blocks
* - ZIL records
*
* A log record holds a system call transaction. Log blocks can
* hold many log records and the blocks are chained together.
* Each ZIL block contains a block pointer (blkptr_t) to the next
* ZIL block in the chain. The ZIL header points to the first
* block in the chain. Note there is not a fixed place in the pool
* to hold blocks. They are dynamically allocated and freed as
* needed from the blocks available. Figure X shows the ZIL structure:
*/
可调参数,
/*
* This global ZIL switch affects all pools
*/
int zil_replay_disable = 0; /* disable intent logging replay */
/*
* Tunable parameter for debugging or performance analysis. Setting
* zfs_nocacheflush will cause corruption on power loss if a volatile
* out-of-order write cache is enabled.
*/
int zfs_nocacheflush = 0;
/*
* Define a limited set of intent log block sizes.
* These must be a multiple of 4KB. Note only the amount used (again
* aligned to 4KB) actually gets written. However, we can't always just
* allocate SPA_MAXBLOCKSIZE as the slog space could be exhausted.
*/
uint64_t zil_block_buckets[] = {
4096, /* non TX_WRITE */
8192+4096, /* data base */
32*1024 + 4096, /* NFS writes */
UINT64_MAX
};
/*
* Use the slog as long as the current commit size is less than the
* limit or the total list size is less than 2X the limit. Limit
* checking is disabled by setting zil_slog_limit to UINT64_MAX.
*/
unsigned long zil_slog_limit = 1024 * 1024;
#define USE_SLOG(zilog) (((zilog)->zl_cur_used < zil_slog_limit) || \
((zilog)->zl_itx_list_sz < (zil_slog_limit << 1)))
#if defined(_KERNEL) && defined(HAVE_SPL)
module_param(zil_replay_disable, int, 0644);
MODULE_PARM_DESC(zil_replay_disable, "Disable intent logging replay");
module_param(zfs_nocacheflush, int, 0644);
MODULE_PARM_DESC(zfs_nocacheflush, "Disable cache flushes");
module_param(zil_slog_limit, ulong, 0644);
MODULE_PARM_DESC(zil_slog_limit, "Max commit bytes to separate log device");
#endif