ZIL (ZFS intent log) zil.c

2 minute read

背景

ZIL或称SLOG, 被用于提升ZFS系统的离散fsync性能.

类似数据库的redo log或wal.

注意

1. 每个dataset对应一个zil, 也就是说一个zpool有多个zfs的话, 如果有log设备, 那么在log设备中实际上包含了多个ZIL entry.

数据写入ZIL后(fsync), 即使服务器异常, 也可以用于恢复文件系统.

2. 并不是每一笔FSYNC都会用到ZIL, 只有小于2*zil_slog_limit 的commit操作才会用到. 如果你的zil 块设备够强的话, 可以调大伙调到UINT64_MAX, 那么就不检测了, 所有的commit都用上zil设备. 一般不需要调整

/*  
 * The zfs intent log (ZIL) saves transaction records of system calls  
 * that change the file system in memory with enough information  
 * to be able to replay them. These are stored in memory until  
 * either the DMU transaction group (txg) commits them to the stable pool  
 * and they can be discarded, or they are flushed to the stable log  
 * (also in the pool) due to a fsync, O_DSYNC or other synchronous  
 * requirement. In the event of a panic or power fail then those log  
 * records (transactions) are replayed.  
 *  
 * There is one ZIL per file system. Its on-disk (pool) format consists  
 * of 3 parts:  
 *  
 *      - ZIL header  
 *      - ZIL blocks  
 *      - ZIL records  
 *  
 * A log record holds a system call transaction. Log blocks can  
 * hold many log records and the blocks are chained together.  
 * Each ZIL block contains a block pointer (blkptr_t) to the next  
 * ZIL block in the chain. The ZIL header points to the first  
 * block in the chain. Note there is not a fixed place in the pool  
 * to hold blocks. They are dynamically allocated and freed as  
 * needed from the blocks available. Figure X shows the ZIL structure:  
 */  

可调参数,

/*  
 * This global ZIL switch affects all pools  
 */  
int zil_replay_disable = 0;    /* disable intent logging replay */  
  
/*  
 * Tunable parameter for debugging or performance analysis.  Setting  
 * zfs_nocacheflush will cause corruption on power loss if a volatile  
 * out-of-order write cache is enabled.  
 */  
int zfs_nocacheflush = 0;  
  
  
/*  
 * Define a limited set of intent log block sizes.  
 * These must be a multiple of 4KB. Note only the amount used (again  
 * aligned to 4KB) actually gets written. However, we can't always just  
 * allocate SPA_MAXBLOCKSIZE as the slog space could be exhausted.  
 */  
uint64_t zil_block_buckets[] = {  
    4096,               /* non TX_WRITE */  
    8192+4096,          /* data base */  
    32*1024 + 4096,     /* NFS writes */  
    UINT64_MAX  
};  
  
/*  
 * Use the slog as long as the current commit size is less than the  
 * limit or the total list size is less than 2X the limit.  Limit  
 * checking is disabled by setting zil_slog_limit to UINT64_MAX.  
 */  
unsigned long zil_slog_limit = 1024 * 1024;  
#define USE_SLOG(zilog) (((zilog)->zl_cur_used < zil_slog_limit) || \  
        ((zilog)->zl_itx_list_sz < (zil_slog_limit << 1)))  
  
  
#if defined(_KERNEL) && defined(HAVE_SPL)  
module_param(zil_replay_disable, int, 0644);  
MODULE_PARM_DESC(zil_replay_disable, "Disable intent logging replay");  
  
module_param(zfs_nocacheflush, int, 0644);  
MODULE_PARM_DESC(zfs_nocacheflush, "Disable cache flushes");  
  
module_param(zil_slog_limit, ulong, 0644);  
MODULE_PARM_DESC(zil_slog_limit, "Max commit bytes to separate log device");  
#endif  

Flag Counter

digoal’s 大量PostgreSQL文章入口