zfs 快照增量大小 vs PostgreSQL产生的XLOG大小
背景
zfs快照增量 和 oracle的rman incremental backup极其类似。是其他不具备oracle rman 数据库级增量备份的数据库产品的福音,例如PostgreSQL(注意,使用pg_rman , pg_probackup都可以支持块级增量了)。
下面我们来测试一下zfs快照增量的空间占用情况?
理论上,快照增量的极限是当前zfs文件系统的大小,也就是在打完快照后,快照对应的ZFS上的每个块都被改变了。
所以文件系统越大,同时更新面越广,快照就可能越大。
这种情况什么时候会发生呢?
比如插入数据库的每条记录,将来都会被变更一次,这样的应用场景,快照是会很大的。
测试CASE, TPC-B。
环境
PostgreSQL 9.5rc1
$PGDATA目录
zp1/data01 4.1T 1.1T 3.1T 25% /data01
pg_xlog目录
挂载在zfs以外的某个目录。
参数
关闭full page write(因为$PGDATA所在的zfs是cow的,不需要开启FPW。)
数据量, 70亿,包括索引超过1TB。
pgbench -i -s 70000
这样的测试数据量,覆盖的范围是1TB,而且测试包含了4条更新,1条插入,更新范围是1TB的范围,所以单个快照最大的新增的空间是可能达到1TB的大小的。
测试1
测试开始前,创建检查点,记录XLOG位置。
postgres=# checkpoint;
CHECKPOINT
postgres=# select pg_current_xlog_insert_location();
pg_current_xlog_insert_location
---------------------------------
592/9D563C70
(1 row)
创建快照
#zfs snapshot zp1/data01@2016010401
压测600秒
pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600
transaction type: TPC-B (sort of)
scaling factor: 70000
query mode: prepared
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 2866811
latency average: 10.044 ms
latency stddev: 16.135 ms
tps = 4777.542920 (including connections establishing)
tps = 4777.781687 (excluding connections establishing)
statement latencies in milliseconds:
0.004060 \set nbranches 1 * :scale
0.001105 \set ntellers 10 * :scale
0.000861 \set naccounts 100000 * :scale
0.001687 \setrandom aid 1 :naccounts
0.001017 \setrandom bid 1 :nbranches
0.000961 \setrandom tid 1 :ntellers
0.001000 \setrandom delta -5000 5000
0.052541 BEGIN;
9.318532 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.138619 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.169681 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.147535 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.115753 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.079338 END;
创建检查点,记录XLOG位置
postgres=# checkpoint;
CHECKPOINT
postgres=# select pg_current_xlog_insert_location();
pg_current_xlog_insert_location
---------------------------------
592/EC6AEB38
(1 row)
计算XLOG产生量
postgres=# select pg_size_pretty(pg_xlog_location_diff('592/EC6AEB38', '592/9D563C70'));
pg_size_pretty
----------------
1265 MB
(1 row)
创建快照
#zfs snapshot zp1/data01@2016010402
计算快照增量,约120GB,比XLOG大很多。后面会分析原因。
#zfs send -n -P -v -i zp1/data01@2016010401 zp1/data01@2016010402
incremental 2016010401 zp1/data01@2016010402 124825285344
size 124825285344
测试2
记录XLOG位置
postgres=# select pg_current_xlog_insert_location();
pg_current_xlog_insert_location
---------------------------------
592/ED000098
(1 row)
压测600秒
pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600
transaction type: TPC-B (sort of)
scaling factor: 70000
query mode: prepared
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 1837930
latency average: 15.667 ms
latency stddev: 34.785 ms
tps = 3062.922412 (including connections establishing)
tps = 3063.082480 (excluding connections establishing)
statement latencies in milliseconds:
0.004004 \set nbranches 1 * :scale
0.001107 \set ntellers 10 * :scale
0.000849 \set naccounts 100000 * :scale
0.001673 \setrandom aid 1 :naccounts
0.000973 \setrandom bid 1 :nbranches
0.000928 \setrandom tid 1 :ntellers
0.000969 \setrandom delta -5000 5000
0.052813 BEGIN;
14.959468 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.131955 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.156640 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.140038 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.125661 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.079363 END;
创建检查点,记录XLOG位置
postgres=# checkpoint;
CHECKPOINT
postgres=# select pg_current_xlog_insert_location();
pg_current_xlog_insert_location
---------------------------------
593/1FAEC728
(1 row)
产生XLOG量
postgres=# select pg_size_pretty(pg_xlog_location_diff('593/1FAEC728', '592/ED000098'));
pg_size_pretty
----------------
811 MB
(1 row)
创建快照
#zfs snapshot zp1/data01@2016010403
计算快照增量,约80GB
#zfs send -n -P -v -i zp1/data01@2016010402 zp1/data01@2016010403
incremental 2016010402 zp1/data01@2016010403 83633656888
size 83633656888
递归增量,总共约200GB
#zfs send -n -P -v -I zp1/data01@2016010401 zp1/data01@2016010403
incremental 2016010401 zp1/data01@2016010402 124825285344
incremental 2016010402 zp1/data01@2016010403 83633656888
size 208458942232
每个快照占用的空间
#zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zp1/data01@2016010401 121G - 1.01T -
zp1/data01@2016010402 20.9G - 1.01T -
zp1/data01@2016010403 200K - 1.01T -
#df -h
Filesystem Size Used Avail Use% Mounted on
zp1/data01 4.9T 1.1T 3.9T 21% /data01
删掉这三个快照,能回收140G空间.
#zfs destroy zp1/data01@2016010401
#zfs destroy zp1/data01@2016010402
#zfs destroy zp1/data01@2016010403
#df -h
Filesystem Size Used Avail Use% Mounted on
zp1/data01 5.1T 1.1T 4.1T 20% /data01
由于ZFS的快照是COW的,我们前面的测试涉及的块变更范围是1TB,两次600秒的压测,数据块变更的范围是80GB和120GB。
如果将活跃数据降低,快照也会变小,如下:
把活跃数据降低到1GB,再次测试。
测试3
pgbench -i -s 100 digoal
#zfs snapshot zp1/data01@2016010403
这次测试,TPS达到了4.57w/s,但是影响的块在1GB的范围内快照却小了很多。
pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600 digoal
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: prepared
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 27425009
latency average: 1.048 ms
latency stddev: 0.530 ms
tps = 45704.065177 (including connections establishing)
tps = 45706.628863 (excluding connections establishing)
statement latencies in milliseconds:
0.003538 \set nbranches 1 * :scale
0.001089 \set ntellers 10 * :scale
0.000896 \set naccounts 100000 * :scale
0.001683 \setrandom aid 1 :naccounts
0.001225 \setrandom bid 1 :nbranches
0.001526 \setrandom tid 1 :ntellers
0.002240 \setrandom delta -5000 5000
0.084777 BEGIN;
0.217070 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.118606 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.154885 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.204343 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.117456 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.129509 END;
#zfs snapshot zp1/data01@2016010404
#zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zp1/data01@2016010401 131G - 1.01T -
zp1/data01@2016010402 352K - 1.01T -
zp1/data01@2016010403 1.47G - 1.01T -
zp1/data01@2016010404 15.6M - 1.01T -
快照增量只有3.1GB。
#zfs send -n -P -v -i zp1/data01@2016010403 zp1/data01@2016010404
incremental 2016010403 zp1/data01@2016010404 3245256520
size 3245256520
postgres=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+---------+-------+-----------------------+---------+------------+--------------------------------------------
digoal | postgres | UTF8 | C | C | | 2971 MB | pg_default |
测试4
将full page write打开,再对比xlog和zfs快照增量的大小。
#zfs snapshot zp1/data01@2016010405
postgres=# select pg_current_xlog_insert_location();
pg_current_xlog_insert_location
---------------------------------
596/830000D0
(1 row)
pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600
transaction type: TPC-B (sort of)
scaling factor: 70000
query mode: prepared
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 1772299
latency average: 16.247 ms
latency stddev: 31.695 ms
tps = 2953.504271 (including connections establishing)
tps = 2953.662892 (excluding connections establishing)
statement latencies in milliseconds:
0.004457 \set nbranches 1 * :scale
0.001219 \set ntellers 10 * :scale
0.000958 \set naccounts 100000 * :scale
0.001777 \setrandom aid 1 :naccounts
0.001084 \setrandom bid 1 :nbranches
0.001013 \setrandom tid 1 :ntellers
0.001071 \setrandom delta -5000 5000
0.062108 BEGIN;
14.041890 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.153864 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.870157 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.478328 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.362079 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.255290 END;
postgres=# checkpoint;
CHECKPOINT
postgres=# select pg_current_xlog_insert_location();
pg_current_xlog_insert_location
---------------------------------
5A7/3DF47450
(1 row)
postgres=# select pg_size_pretty(pg_xlog_location_diff('5A7/3DF47450', '596/830000D0'));
pg_size_pretty
----------------
67 GB
(1 row)
#zfs snapshot zp1/data01@2016010406
#zfs send -n -P -v -i zp1/data01@2016010405 zp1/data01@2016010406
incremental 2016010405 zp1/data01@2016010406 80259589256
size 80259589256
postgres=# select pg_size_pretty(80259589256);
pg_size_pretty
----------------
75 GB
(1 row)
#zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zp1/data01@2016010401 131G - 1.01T -
zp1/data01@2016010402 352K - 1.01T -
zp1/data01@2016010403 1.47G - 1.01T -
zp1/data01@2016010404 2.16G - 1.01T -
zp1/data01@2016010405 11.0G - 1.01T -
zp1/data01@2016010406 424K - 1.01T -
#zfs destroy zp1/data01@2016010401
#zfs destroy zp1/data01@2016010402
#zfs destroy zp1/data01@2016010403
#zfs destroy zp1/data01@2016010404
#zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zp1/data01@2016010405 80.0G - 1.01T -
zp1/data01@2016010406 424K - 1.01T -
测试5
开启full page write,并且活跃数据缩小到1GB测试:
pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600 digoal
postgres=# select pg_size_pretty(pg_xlog_location_diff('5AA/7ED591C0', '5A7/3DF47450'));
pg_size_pretty
----------------
13 GB
(1 row)
#zfs snapshot zp1/data01@2016010407
#zfs send -n -P -v -i zp1/data01@2016010406 zp1/data01@2016010407
incremental 2016010406 zp1/data01@2016010407 3605632832
size 3605632832
postgres=# select pg_size_pretty(3605632832);
pg_size_pretty
----------------
3439 MB
(1 row)
#zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zp1/data01@2016010405 80.0G - 1.01T -
zp1/data01@2016010406 840K - 1.01T -
zp1/data01@2016010407 248K - 1.01T -
小结
1. zfs快照是块级别的,所以一定比xlog大,并且本例xlog关闭了fpw,所以进一步缩小了XLOG的产生量。
(当开启XLOG的full page write时,XLOG的量和ZFS快照增量就非常接近了,对于小数据库,快照比XLOG小很多)
2. zfs快照占用的空间,和数据块的变更有个,当数据块发生任意修改时,这个数据块就会占用快照空间。
对于PG来说,数据块的变动是非常多的,例如:
tuple hint bit 可能会在查询时更新,
vacuum 回收垃圾时,
vacuum freeze时,
update 数据时,
索引变更时。
以上操作都会引起对应数据块的更新,从而导致快照变大。
注意每个块不管更新多少次,在一个快照中只占用一个块的空间。
新增的块不会占用快照的空间,只有老的块发生变更时才占用快照空间。
3. 如果使用zfs的快照作为PostgreSQL的备份,需要注意什么?
监控快照的空间占用情况。
及时删除不需要的老的快照释放空间。
控制创建快照的频率。
关闭数据库的FULL PAGE WRITE,减少产生的日志量。