ZFS fsync IOPS performance in FreeBSD

5 minute read

背景

我在上一篇文章讲了一下ZFS的性能优化.

文章提到在Linux (CentOS 6.5 x64)中, ZFS的fsync调用性能不佳的问题, 完全不如ext4, 于是在同一台主机, 我安装了FreeBSD 10 x64. 使用同样的硬件测试一下fsync的性能.

PostgreSQL的安装参考

http://blog.163.com/digoal@126/blog/static/163877040201451181344545/

http://blog.163.com/digoal@126/blog/static/163877040201451282121414/

首先查看块设备, 这里使用12块4 TB的SATA盘.

# gpart list -a  
Geom name: mfid[1-12]  

创建zpool,

# zpool create zp1 mfid1 mfid2 mfid3 mfid4 mfid5 mfid6 mfid7 mfid8 mfid9 mfid10 mfid11 mfid12  
# zpool get all zp1  
NAME  PROPERTY                       VALUE                          SOURCE  
zp1   size                           43.5T                          -  
zp1   capacity                       0%                             -  
zp1   altroot                        -                              default  
zp1   health                         ONLINE                         -  
zp1   guid                           8490038421326880416            default  
zp1   version                        -                              default  
zp1   bootfs                         -                              default  
zp1   delegation                     on                             default  
zp1   autoreplace                    off                            default  
zp1   cachefile                      -                              default  
zp1   failmode                       wait                           default  
zp1   listsnapshots                  off                            default  
zp1   autoexpand                     off                            default  
zp1   dedupditto                     0                              default  
zp1   dedupratio                     1.00x                          -  
zp1   free                           43.5T                          -  
zp1   allocated                      285K                           -  
zp1   readonly                       off                            -  
zp1   comment                        -                              default  
zp1   expandsize                     0                              -  
zp1   freeing                        0                              default  
zp1   feature@async_destroy          enabled                        local  
zp1   feature@empty_bpobj            active                         local  
zp1   feature@lz4_compress           enabled                        local  
zp1   feature@multi_vdev_crash_dump  enabled                        local  

创建zfs

# zfs create -o mountpoint=/data01 -o atime=off zp1/data01  
# zfs get all zp1/data01  
NAME        PROPERTY              VALUE                  SOURCE  
zp1/data01  type                  filesystem             -  
zp1/data01  creation              Thu Jun 26 23:52 2014  -  
zp1/data01  used                  32K                    -  
zp1/data01  available             42.8T                  -  
zp1/data01  referenced            32K                    -  
zp1/data01  compressratio         1.00x                  -  
zp1/data01  mounted               yes                    -  
zp1/data01  quota                 none                   default  
zp1/data01  reservation           none                   default  
zp1/data01  recordsize            128K                   default  
zp1/data01  mountpoint            /data01                local  
zp1/data01  sharenfs              off                    default  
zp1/data01  checksum              on                     default  
zp1/data01  compression           off                    default  
zp1/data01  atime                 off                    local  
zp1/data01  devices               on                     default  
zp1/data01  exec                  on                     default  
zp1/data01  setuid                on                     default  
zp1/data01  readonly              off                    default  
zp1/data01  jailed                off                    default  
zp1/data01  snapdir               hidden                 default  
zp1/data01  aclmode               discard                default  
zp1/data01  aclinherit            restricted             default  
zp1/data01  canmount              on                     default  
zp1/data01  xattr                 off                    temporary  
zp1/data01  copies                1                      default  
zp1/data01  version               5                      -  
zp1/data01  utf8only              off                    -  
zp1/data01  normalization         none                   -  
zp1/data01  casesensitivity       sensitive              -  
zp1/data01  vscan                 off                    default  
zp1/data01  nbmand                off                    default  
zp1/data01  sharesmb              off                    default  
zp1/data01  refquota              none                   default  
zp1/data01  refreservation        none                   default  
zp1/data01  primarycache          all                    default  
zp1/data01  secondarycache        all                    default  
zp1/data01  usedbysnapshots       0                      -  
zp1/data01  usedbydataset         32K                    -  
zp1/data01  usedbychildren        0                      -  
zp1/data01  usedbyrefreservation  0                      -  
zp1/data01  logbias               latency                default  
zp1/data01  dedup                 off                    default  
zp1/data01  mlslabel                                     -  
zp1/data01  sync                  disabled               local  
zp1/data01  refcompressratio      1.00x                  -  
zp1/data01  written               32K                    -  
zp1/data01  logicalused           16K                    -  
zp1/data01  logicalreferenced     16K                    -  

测试fsync, 相比Linux有很大的提升, 基本达到了块设备的瓶颈.

# /opt/pgsql9.3.4/bin/pg_test_fsync -f /data01/1  
5 seconds per test  
O_DIRECT supported on this platform for open_datasync and open_sync.  
  
Compare file sync methods using one 8kB write:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                 n/a  
        fdatasync                                     n/a  
        fsync                            6676.001 ops/sec     150 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                        6087.783 ops/sec     164 usecs/op  
  
Compare file sync methods using two 8kB writes:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                 n/a  
        fdatasync                                     n/a  
        fsync                            4750.841 ops/sec     210 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                        3065.099 ops/sec     326 usecs/op  
  
Compare open_sync with different write sizes:  
(This is designed to compare the cost of writing 16kB  
in different write open_sync sizes.)  
         1 * 16kB open_sync write        4965.249 ops/sec     201 usecs/op  
         2 *  8kB open_sync writes       3039.074 ops/sec     329 usecs/op  
         4 *  4kB open_sync writes       1598.735 ops/sec     625 usecs/op  
         8 *  2kB open_sync writes       1326.517 ops/sec     754 usecs/op  
        16 *  1kB open_sync writes        620.992 ops/sec    1610 usecs/op  
  
Test if fsync on non-write file descriptor is honored:  
(If the times are similar, fsync() can sync data written  
on a different descriptor.)  
        write, fsync, close              5422.742 ops/sec     184 usecs/op  
        write, close, fsync              5552.278 ops/sec     180 usecs/op  
  
Non-Sync'ed 8kB writes:  
        write                           67460.621 ops/sec      15 usecs/op  
  
# zpool iostat -v 1  
               capacity     operations    bandwidth  
pool        alloc   free   read  write   read  write  
----------  -----  -----  -----  -----  -----  -----  
zp1          747M  43.5T      0  7.31K      0  39.1M  
  mfid1     62.8M  3.62T      0    638      0  3.27M  
  mfid2     61.9M  3.62T      0    615      0  3.23M  
  mfid3     62.8M  3.62T      0    615      0  3.23M  
  mfid4     62.0M  3.62T      0    615      0  3.23M  
  mfid5     62.9M  3.62T      0    616      0  3.24M  
  mfid6     62.0M  3.62T      0    616      0  3.24M  
  mfid7     62.9M  3.62T      0    620      0  3.24M  
  mfid8     61.6M  3.62T      0    620      0  3.24M  
  mfid9     62.2M  3.62T      0    619      0  3.23M  
  mfid10    61.8M  3.62T      0    615      0  3.23M  
  mfid11    62.2M  3.62T      0    648      0  3.41M  
  mfid12    62.1M  3.62T      0    650      0  3.29M  
----------  -----  -----  -----  -----  -----  -----  
zroot       2.69G   273G      0      0      0      0  
  mfid0p3   2.69G   273G      0      0      0      0  
----------  -----  -----  -----  -----  -----  -----  
  
# iostat -x 1  
                        extended device statistics    
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b    
mfid0      0.0   0.0     0.0     0.0    0   0.0   0   
mfid1      0.0 416.6     0.0  7468.5    0   0.1   2   
mfid2      0.0 416.6     0.0  7468.5    0   0.0   2   
mfid3      0.0 429.6     0.0  7480.0    0   0.1   2   
mfid4      0.0 433.6     0.0  7484.0    0   0.1   3   
mfid5      0.0 433.6     0.0  7495.9    0   0.1   2   
mfid6      0.0 421.6     0.0  7484.5    0   0.1   3   
mfid7      0.0 417.6     0.0  7488.5    0   0.1   3   
mfid8      0.0 438.6     0.0  7638.3    0   0.1   2   
mfid9      0.0 437.6     0.0  7510.4    0   0.1   2   
mfid10     0.0 428.6     0.0  7494.4    0   0.1   4   
mfid11     0.0 416.6     0.0  7468.5    0   0.1   2   
mfid12     0.0 416.6     0.0  7468.5    0   0.1   2  

disable sync的情形, FreeBSD和Linux下差不多.

# zfs set sync=disabled zp1/data01  
# /opt/pgsql9.3.4/bin/pg_test_fsync -f /data01/1  
5 seconds per test  
O_DIRECT supported on this platform for open_datasync and open_sync.  
  
Compare file sync methods using one 8kB write:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                 n/a  
        fdatasync                                     n/a  
        fsync                           115687.300 ops/sec       9 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                       126789.698 ops/sec       8 usecs/op  
  
Compare file sync methods using two 8kB writes:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                 n/a  
        fdatasync                                     n/a  
        fsync                           65027.801 ops/sec      15 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                       60239.232 ops/sec      17 usecs/op  
  
Compare open_sync with different write sizes:  
(This is designed to compare the cost of writing 16kB  
in different write open_sync sizes.)  
         1 * 16kB open_sync write       115246.114 ops/sec       9 usecs/op  
         2 *  8kB open_sync writes      63999.355 ops/sec      16 usecs/op  
         4 *  4kB open_sync writes      33661.426 ops/sec      30 usecs/op  
         8 *  2kB open_sync writes      18960.527 ops/sec      53 usecs/op  
        16 *  1kB open_sync writes       8251.087 ops/sec     121 usecs/op  
  
Test if fsync on non-write file descriptor is honored:  
(If the times are similar, fsync() can sync data written  
on a different descriptor.)  
        write, fsync, close             47380.701 ops/sec      21 usecs/op  
        write, close, fsync             50214.128 ops/sec      20 usecs/op  
  
Non-Sync'ed 8kB writes:  
        write                           78263.057 ops/sec      13 usecs/op  

参考

1. http://blog.163.com/digoal@126/blog/static/1638770402014526992910/

2. http://blog.163.com/digoal@126/blog/static/163877040201451181344545/

3. http://blog.163.com/digoal@126/blog/static/163877040201451282121414/

Flag Counter

digoal’s 大量PostgreSQL文章入口