ZFS fsync IOPS performance in FreeBSD
背景
我在上一篇文章讲了一下ZFS的性能优化.
文章提到在Linux (CentOS 6.5 x64)中, ZFS的fsync调用性能不佳的问题, 完全不如ext4, 于是在同一台主机, 我安装了FreeBSD 10 x64. 使用同样的硬件测试一下fsync的性能.
PostgreSQL的安装参考
http://blog.163.com/digoal@126/blog/static/163877040201451181344545/
http://blog.163.com/digoal@126/blog/static/163877040201451282121414/
首先查看块设备, 这里使用12块4 TB的SATA盘.
# gpart list -a
Geom name: mfid[1-12]
创建zpool,
# zpool create zp1 mfid1 mfid2 mfid3 mfid4 mfid5 mfid6 mfid7 mfid8 mfid9 mfid10 mfid11 mfid12
# zpool get all zp1
NAME PROPERTY VALUE SOURCE
zp1 size 43.5T -
zp1 capacity 0% -
zp1 altroot - default
zp1 health ONLINE -
zp1 guid 8490038421326880416 default
zp1 version - default
zp1 bootfs - default
zp1 delegation on default
zp1 autoreplace off default
zp1 cachefile - default
zp1 failmode wait default
zp1 listsnapshots off default
zp1 autoexpand off default
zp1 dedupditto 0 default
zp1 dedupratio 1.00x -
zp1 free 43.5T -
zp1 allocated 285K -
zp1 readonly off -
zp1 comment - default
zp1 expandsize 0 -
zp1 freeing 0 default
zp1 feature@async_destroy enabled local
zp1 feature@empty_bpobj active local
zp1 feature@lz4_compress enabled local
zp1 feature@multi_vdev_crash_dump enabled local
创建zfs
# zfs create -o mountpoint=/data01 -o atime=off zp1/data01
# zfs get all zp1/data01
NAME PROPERTY VALUE SOURCE
zp1/data01 type filesystem -
zp1/data01 creation Thu Jun 26 23:52 2014 -
zp1/data01 used 32K -
zp1/data01 available 42.8T -
zp1/data01 referenced 32K -
zp1/data01 compressratio 1.00x -
zp1/data01 mounted yes -
zp1/data01 quota none default
zp1/data01 reservation none default
zp1/data01 recordsize 128K default
zp1/data01 mountpoint /data01 local
zp1/data01 sharenfs off default
zp1/data01 checksum on default
zp1/data01 compression off default
zp1/data01 atime off local
zp1/data01 devices on default
zp1/data01 exec on default
zp1/data01 setuid on default
zp1/data01 readonly off default
zp1/data01 jailed off default
zp1/data01 snapdir hidden default
zp1/data01 aclmode discard default
zp1/data01 aclinherit restricted default
zp1/data01 canmount on default
zp1/data01 xattr off temporary
zp1/data01 copies 1 default
zp1/data01 version 5 -
zp1/data01 utf8only off -
zp1/data01 normalization none -
zp1/data01 casesensitivity sensitive -
zp1/data01 vscan off default
zp1/data01 nbmand off default
zp1/data01 sharesmb off default
zp1/data01 refquota none default
zp1/data01 refreservation none default
zp1/data01 primarycache all default
zp1/data01 secondarycache all default
zp1/data01 usedbysnapshots 0 -
zp1/data01 usedbydataset 32K -
zp1/data01 usedbychildren 0 -
zp1/data01 usedbyrefreservation 0 -
zp1/data01 logbias latency default
zp1/data01 dedup off default
zp1/data01 mlslabel -
zp1/data01 sync disabled local
zp1/data01 refcompressratio 1.00x -
zp1/data01 written 32K -
zp1/data01 logicalused 16K -
zp1/data01 logicalreferenced 16K -
测试fsync, 相比Linux有很大的提升, 基本达到了块设备的瓶颈.
# /opt/pgsql9.3.4/bin/pg_test_fsync -f /data01/1
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync n/a
fsync 6676.001 ops/sec 150 usecs/op
fsync_writethrough n/a
open_sync 6087.783 ops/sec 164 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync n/a
fsync 4750.841 ops/sec 210 usecs/op
fsync_writethrough n/a
open_sync 3065.099 ops/sec 326 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
1 * 16kB open_sync write 4965.249 ops/sec 201 usecs/op
2 * 8kB open_sync writes 3039.074 ops/sec 329 usecs/op
4 * 4kB open_sync writes 1598.735 ops/sec 625 usecs/op
8 * 2kB open_sync writes 1326.517 ops/sec 754 usecs/op
16 * 1kB open_sync writes 620.992 ops/sec 1610 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close 5422.742 ops/sec 184 usecs/op
write, close, fsync 5552.278 ops/sec 180 usecs/op
Non-Sync'ed 8kB writes:
write 67460.621 ops/sec 15 usecs/op
# zpool iostat -v 1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
zp1 747M 43.5T 0 7.31K 0 39.1M
mfid1 62.8M 3.62T 0 638 0 3.27M
mfid2 61.9M 3.62T 0 615 0 3.23M
mfid3 62.8M 3.62T 0 615 0 3.23M
mfid4 62.0M 3.62T 0 615 0 3.23M
mfid5 62.9M 3.62T 0 616 0 3.24M
mfid6 62.0M 3.62T 0 616 0 3.24M
mfid7 62.9M 3.62T 0 620 0 3.24M
mfid8 61.6M 3.62T 0 620 0 3.24M
mfid9 62.2M 3.62T 0 619 0 3.23M
mfid10 61.8M 3.62T 0 615 0 3.23M
mfid11 62.2M 3.62T 0 648 0 3.41M
mfid12 62.1M 3.62T 0 650 0 3.29M
---------- ----- ----- ----- ----- ----- -----
zroot 2.69G 273G 0 0 0 0
mfid0p3 2.69G 273G 0 0 0 0
---------- ----- ----- ----- ----- ----- -----
# iostat -x 1
extended device statistics
device r/s w/s kr/s kw/s qlen svc_t %b
mfid0 0.0 0.0 0.0 0.0 0 0.0 0
mfid1 0.0 416.6 0.0 7468.5 0 0.1 2
mfid2 0.0 416.6 0.0 7468.5 0 0.0 2
mfid3 0.0 429.6 0.0 7480.0 0 0.1 2
mfid4 0.0 433.6 0.0 7484.0 0 0.1 3
mfid5 0.0 433.6 0.0 7495.9 0 0.1 2
mfid6 0.0 421.6 0.0 7484.5 0 0.1 3
mfid7 0.0 417.6 0.0 7488.5 0 0.1 3
mfid8 0.0 438.6 0.0 7638.3 0 0.1 2
mfid9 0.0 437.6 0.0 7510.4 0 0.1 2
mfid10 0.0 428.6 0.0 7494.4 0 0.1 4
mfid11 0.0 416.6 0.0 7468.5 0 0.1 2
mfid12 0.0 416.6 0.0 7468.5 0 0.1 2
disable sync的情形, FreeBSD和Linux下差不多.
# zfs set sync=disabled zp1/data01
# /opt/pgsql9.3.4/bin/pg_test_fsync -f /data01/1
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync n/a
fsync 115687.300 ops/sec 9 usecs/op
fsync_writethrough n/a
open_sync 126789.698 ops/sec 8 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync n/a
fsync 65027.801 ops/sec 15 usecs/op
fsync_writethrough n/a
open_sync 60239.232 ops/sec 17 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
1 * 16kB open_sync write 115246.114 ops/sec 9 usecs/op
2 * 8kB open_sync writes 63999.355 ops/sec 16 usecs/op
4 * 4kB open_sync writes 33661.426 ops/sec 30 usecs/op
8 * 2kB open_sync writes 18960.527 ops/sec 53 usecs/op
16 * 1kB open_sync writes 8251.087 ops/sec 121 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close 47380.701 ops/sec 21 usecs/op
write, close, fsync 50214.128 ops/sec 20 usecs/op
Non-Sync'ed 8kB writes:
write 78263.057 ops/sec 13 usecs/op
参考
1. http://blog.163.com/digoal@126/blog/static/1638770402014526992910/
2. http://blog.163.com/digoal@126/blog/static/163877040201451181344545/
3. http://blog.163.com/digoal@126/blog/static/163877040201451282121414/