ZFS 12SATA JBOD vs MSA 2312FC 24SAS

12 minute read

背景

今天拿了两台主机PK一下zfs和存储的性能.

ZFS主机

联想 Reno/Raleigh  
  
8核 Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz  
24GB内存  
12*SATA 2TB, 其中2块RAID1, 另外10块作为zpool (raidz 9 + spare 1 + raid1的一个分区作为log)  

文件系统特殊项atime=off, compression=lz4 压缩比 约3.16

存储主机

DELL R610  
16核 Intel(R) Xeon(R) CPU           E5630  @ 2.53GHz  
32GB内存  
  
存储2台MSA2312FC, 分别12块300G SAS盘. 10块做的RAID 10. 2块hot spare.  

其中一台存储的配置

Controllers  
-----------  
Controller ID: A  
Serial Number: 3CL947R707  
Hardware Version: 56  
CPLD Version: 8  
Disks: 12  
Vdisks: 1  
Cache Memory Size (MB): 1024  
Host Ports: 2  
Disk Channels: 2  
Disk Bus Type: SAS  
Status: Running  
Failed Over: No  
Fail Over Reason: Not applicable  
  
# show disks  
Location Serial Number         Vendor   Rev  How Used   Type   Size      
  Rate(Gb/s)  SP Status       
-----------------------------------------------------------------------  
1.1      3QP2EN7V00009006CJT4  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.2      3QP232CM00009952PTMA  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.3      3QP2GKLZ00009008V1VA  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.4      3QP2G2LL00009008WAYU  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.5      3QP2EN0700009007DAPA  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.6      3QP2G6AE00009008V39E  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.7      6SJ4ZX1S0000N239DF6Q  SEAGATE  0008 GLOBAL SP  SAS    300.0GB   
  3.0            OK           
1.8      6SJ4ZTLT0000N239FM3P  SEAGATE  0008 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.9      6SJ4ZY6H0000N2407XKP  SEAGATE  0008 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.10     3QP2FVR100009008Z16H  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.11     3QP2DZEX00009008WBN9  SEAGATE  0004 VDISK VRSC SAS    300.0GB   
  3.0            OK           
1.12     3QP2CXWS00009008WBLN  SEAGATE  0004 GLOBAL SP  SAS    300.0GB   
  3.0            OK           
-----------------------------------------------------------------------  
  
Name Size     Free    Own Pref   RAID   Disks Spr Chk  Status Jobs        
  Serial Number                      
------------------------------------------------------------------------  
vd01 1498.4GB 100.4GB A   A      RAID10 10    0   16k  FTOL   VRSC 66%    
  00c0ffda61090000144de15100000000  
  
# show cache  
System Cache Parameters  
-----------------------  
Operation Mode: Active-Active ULP  
  
  Controller A Cache Parameters  
  -----------------------------  
  Write Back Status: Enabled  
  CompactFlash Status: Installed  
  Cache Flush: Enabled  
  
  Controller B Cache Parameters  
  -----------------------------  
  Write Back Status: Enabled  
  CompactFlash Status: Installed  
  Cache Flush: Enabled  

存储文件系统

[root@db- ~]# lvs  
  LV   VG       Attr   LSize    Origin Snap%  Move Log Copy%  Convert  
  lv01 vgdata01 -wi-ao  300.00G                                        
  lv02 vgdata01 -wi-ao  100.00G                                        
  lv03 vgdata01 -wi-ao    1.17T                                        
  lv04 vgdata01 -wi-ao 1001.99G                                        
[root@db- ~]# pvs  
  PV                            VG       Fmt  Attr PSize PFree  
  /dev/mpath/d09_msa1_vd01vol01 vgdata01 lvm2 a--  1.27T    0   
  /dev/mpath/d09_msa2_vd01vol01 vgdata01 lvm2 a--  1.27T    0   

使用ext4, noatime, nodiratime加载.

测试场景是PostgreSQL 9.2.8

目前只测试了读速度, 因为zfs这台是流复制备机. (数据库配置完全一致)

zfs下18G表的COUNT查询

digoal=> select count(*) from tbl;  
  count     
----------  
 48391818  
(1 row)  
Time: 9998.065 ms  

存储下的查询

digoal=> select count(*) from tbl;  
  count     
----------  
 48391818  
(1 row)  
Time: 64707.770 ms  

这个测试数据在ZFS中LZ4压缩算法后缩小了2.5倍左右.

pg_relation_filepath                 
-------------------------------------------------  
 pg_tblspc/16384/PG_9.2_201204301/70815/10088356  
  
> ll -h pg_tblspc/16384/PG_9.2_201204301/70815/10088356*  
-rw------- 1 postgres postgres 1.0G Jun 19 00:59 pg_tblspc/16384/PG_9.2_201204301/70815/10088356  
-rw------- 1 postgres postgres 1.0G Jun 19 04:24 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.1  
-rw------- 1 postgres postgres 1.0G Jun 19 01:21 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.10  
-rw------- 1 postgres postgres 1.0G Jun 19 05:15 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.11  
-rw------- 1 postgres postgres 1.0G Jun 19 04:52 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.12  
-rw------- 1 postgres postgres 1.0G Jun 19 01:50 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.13  
-rw------- 1 postgres postgres 1.0G Jun 19 04:22 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.14  
-rw------- 1 postgres postgres 1.0G Jun 19 03:32 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.15  
-rw------- 1 postgres postgres 1.0G Jun 19 02:05 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.16  
-rw------- 1 postgres postgres 575M Jun 19 04:26 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.17  
-rw------- 1 postgres postgres 1.0G Jun 19 04:30 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.2  
-rw------- 1 postgres postgres 1.0G Jun 19 01:27 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.3  
-rw------- 1 postgres postgres 1.0G Jun 19 03:24 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.4  
-rw------- 1 postgres postgres 1.0G Jun 19 00:52 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.5  
-rw------- 1 postgres postgres 1.0G Jun 19 03:39 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.6  
-rw------- 1 postgres postgres 1.0G Jun 19 04:53 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.7  
-rw------- 1 postgres postgres 1.0G Jun 19 00:49 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.8  
-rw------- 1 postgres postgres 1.0G Jun 19 05:14 pg_tblspc/16384/PG_9.2_201204301/70815/10088356.9  
-rw------- 1 postgres postgres 4.5M Jun 19 03:38 pg_tblspc/16384/PG_9.2_201204301/70815/10088356_fsm  
-rw------- 1 postgres postgres 288K Jun 19 05:12 pg_tblspc/16384/PG_9.2_201204301/70815/10088356_vm  
du -sh pg_tblspc/16384/PG_9.2_201204301/70815/10088356*  
415M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356  
405M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.1  
427M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.10  
428M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.11  
425M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.12  
425M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.13  
427M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.14  
427M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.15  
428M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.16  
237M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.17  
403M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.2  
413M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.3  
427M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.4  
432M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.5  
423M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.6  
425M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.7  
433M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.8  
428M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356.9  
3.5M    pg_tblspc/16384/PG_9.2_201204301/70815/10088356_fsm  
36K     pg_tblspc/16384/PG_9.2_201204301/70815/10088356_vm  

写速度测试补充

测试模型 :

postgres=# create table test (id int primary key, info text, crt_time timestamp);  
CREATE TABLE  
postgres=# create or replace function f(v_id int) returns void as   
$$  
declare  
begin  
  update test set info=md5(now()::text),crt_time=now() where id=v_id;  
  if not found then  
    insert into test values (v_id, md5(now()::text), now());           
  end if;  
  return;  
  exception when others then  
    return;  
end;  
$$ language plpgsql strict;  
CREATE FUNCTION  
  
$ vi test.sql  
\setrandom vid 1 5000000  
select f(:vid);  

测试结果

ZFS结果

pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -T 30  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 8  
number of threads: 4  
duration: 30 s  
number of transactions actually processed: 1529642  
tps = 50987.733547 (including connections establishing)  
tps = 50998.421896 (excluding connections establishing)  
statement latencies in milliseconds:  
        0.002064        \setrandom vid 1 5000000  
        0.153280        select f(:vid);  
postgres=# select count(*) from test;  
  count    
---------  
 1317641  
(1 row)  

存储主机结果

pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -T 30  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 8  
number of threads: 4  
duration: 30 s  
number of transactions actually processed: 717486  
tps = 23915.516813 (including connections establishing)  
tps = 23921.744263 (excluding connections establishing)  
statement latencies in milliseconds:  
        0.003088        \setrandom vid 1 5000000  
        0.328250        select f(:vid);  
postgres=# select count(*) from test;  
 count    
--------  
 668395  
(1 row)  

其他

1. 有slog和没有slog的pg_test_fsync的测试结果

有slog

pg_test_fsync  
O_DIRECT supported on this platform for open_datasync and open_sync.  
  
Compare file sync methods using one 8kB write:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                n/a*  
        fdatasync                         303.897 ops/sec    3291 usecs/op  
        fsync                             329.612 ops/sec    3034 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                                    n/a*  
* This file system and its mount options do not support direct  
I/O, e.g. ext4 in journaled mode.  
  
Compare file sync methods using two 8kB writes:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                n/a*  
        fdatasync                         328.331 ops/sec    3046 usecs/op  
        fsync                             326.671 ops/sec    3061 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                                    n/a*  
* This file system and its mount options do not support direct  
I/O, e.g. ext4 in journaled mode.  
  
Compare open_sync with different write sizes:  
(This is designed to compare the cost of writing 16kB  
in different write open_sync sizes.)  
         1 * 16kB open_sync write                    n/a*  
         2 *  8kB open_sync writes                   n/a*  
         4 *  4kB open_sync writes                   n/a*  
         8 *  2kB open_sync writes                   n/a*  
        16 *  1kB open_sync writes                   n/a*  
  
Test if fsync on non-write file descriptor is honored:  
(If the times are similar, fsync() can sync data written  
on a different descriptor.)  
        write, fsync, close               324.818 ops/sec    3079 usecs/op  
        write, close, fsync               325.872 ops/sec    3069 usecs/op  
  
Non-Sync'ed 8kB writes:  
        write                           78023.363 ops/sec      13 usecs/op  

没有slog

pg_test_fsync   
5 seconds per test  
O_DIRECT supported on this platform for open_datasync and open_sync.  
  
Compare file sync methods using one 8kB write:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                n/a*  
        fdatasync                         325.150 ops/sec    3076 usecs/op  
        fsync                             320.737 ops/sec    3118 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                                    n/a*  
* This file system and its mount options do not support direct  
I/O, e.g. ext4 in journaled mode.  
  
Compare file sync methods using two 8kB writes:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                n/a*  
        fdatasync                         313.791 ops/sec    3187 usecs/op  
        fsync                             313.884 ops/sec    3186 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                                    n/a*  
* This file system and its mount options do not support direct  
I/O, e.g. ext4 in journaled mode.  
  
Compare open_sync with different write sizes:  
(This is designed to compare the cost of writing 16kB  
in different write open_sync sizes.)  
         1 * 16kB open_sync write                    n/a*  
         2 *  8kB open_sync writes                   n/a*  
         4 *  4kB open_sync writes                   n/a*  
         8 *  2kB open_sync writes                   n/a*  
        16 *  1kB open_sync writes                   n/a*  
  
Test if fsync on non-write file descriptor is honored:  
(If the times are similar, fsync() can sync data written  
on a different descriptor.)  
        write, fsync, close               328.620 ops/sec    3043 usecs/op  
        write, close, fsync               328.271 ops/sec    3046 usecs/op  
  
Non-Sync'ed 8kB writes:  
        write                           71741.498 ops/sec      14 usecs/op  

通过iostat可以看到, 有SLOG时, pg_test_fsync全压到slog那个块设备了, 而没有slog的情况下, 压力都在vdev的块设备上, 这里是raidz所以, 全部在所有的设备上.

如果slog改成ssd, pg_test_fsync将会有很好的表现. 例如使用/dev/shm模拟ssd

# cd /dev/shm  
# dd if=/dev/zero of=./test.img bs=1k count=2048000  
# zpool add zp1 log /dev/shm/test.img   
# zpool status  
  pool: zp1  
 state: ONLINE  
  scan: none requested  
config:  
  
        NAME                 STATE     READ WRITE CKSUM  
        zp1                  ONLINE       0     0     0  
          raidz1-0           ONLINE       0     0     0  
            sda              ONLINE       0     0     0  
            sdb              ONLINE       0     0     0  
            sdc              ONLINE       0     0     0  
            sdd              ONLINE       0     0     0  
            sde              ONLINE       0     0     0  
            sdf              ONLINE       0     0     0  
            sdg              ONLINE       0     0     0  
            sdh              ONLINE       0     0     0  
            sdi              ONLINE       0     0     0  
        logs  
          /dev/shm/test.img  ONLINE       0     0     0  
        spares  
          sdj                AVAIL   

使用内存作为slog后, fsync显然提高了.

pg_test_fsync   
5 seconds per test  
O_DIRECT supported on this platform for open_datasync and open_sync.  
  
Compare file sync methods using one 8kB write:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                n/a*  
        fdatasync                        6695.657 ops/sec     149 usecs/op  
        fsync                            8079.750 ops/sec     124 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                                    n/a*  
* This file system and its mount options do not support direct  
I/O, e.g. ext4 in journaled mode.  
  
Compare file sync methods using two 8kB writes:  
(in wal_sync_method preference order, except fdatasync  
is Linux's default)  
        open_datasync                                n/a*  
        fdatasync                        6247.616 ops/sec     160 usecs/op  
        fsync                            3140.959 ops/sec     318 usecs/op  
        fsync_writethrough                            n/a  
        open_sync                                    n/a*  
* This file system and its mount options do not support direct  
I/O, e.g. ext4 in journaled mode.  
  
Compare open_sync with different write sizes:  
(This is designed to compare the cost of writing 16kB  
in different write open_sync sizes.)  
         1 * 16kB open_sync write                    n/a*  
         2 *  8kB open_sync writes                   n/a*  
         4 *  4kB open_sync writes                   n/a*  
         8 *  2kB open_sync writes                   n/a*  
        16 *  1kB open_sync writes                   n/a*  
  
Test if fsync on non-write file descriptor is honored:  
(If the times are similar, fsync() can sync data written  
on a different descriptor.)  
        write, fsync, close              6330.570 ops/sec     158 usecs/op  
        write, close, fsync              6989.741 ops/sec     143 usecs/op  
  
Non-Sync'ed 8kB writes:  
        write                           77800.273 ops/sec      13 usecs/op  

不过这里要说一下, 如果PostgreSQL 关闭了synchronous_commit, 其实普通盘的slog就够用了.

后面的写测试就是很好的证明.

2. 创建zpool的块设备最好是by-id的, 因为在Linux下设备名可能发生变更. 例如/dev/sda重启后可能变成了/dev/sdb

对于slog, 这是不允许的, 将导致数据崩溃.

查看by-id

# ll /dev/disk/by-id/*  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064b0a6dc -> ../../sdd  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064b0a6dc-part1 -> ../../sdd1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064b0a6dc-part9 -> ../../sdd9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064b563d5 -> ../../sda  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064b563d5-part1 -> ../../sda1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064b563d5-part9 -> ../../sda9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbc776 -> ../../sde  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbc776-part1 -> ../../sde1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbc776-part9 -> ../../sde9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbf23b -> ../../sdh  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbf23b-part1 -> ../../sdh1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbf23b-part9 -> ../../sdh9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbfc66 -> ../../sdf  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbfc66-part1 -> ../../sdf1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bbfc66-part9 -> ../../sdf9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bc046a -> ../../sdj  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bc046a-part1 -> ../../sdj1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bc046a-part9 -> ../../sdj9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bf56da -> ../../sdc  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bf56da-part1 -> ../../sdc1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bf56da-part9 -> ../../sdc9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bf65dd -> ../../sdb  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bf65dd-part1 -> ../../sdb1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064bf65dd-part9 -> ../../sdb9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064c02880 -> ../../sdi  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064c02880-part1 -> ../../sdi1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064c02880-part9 -> ../../sdi9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064c04f5a -> ../../sdg  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064c04f5a-part1 -> ../../sdg1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-35000c50064c04f5a-part9 -> ../../sdg9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/scsi-3600605b0079e70801b0e33ff07ebffa3 -> ../../sdk  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-3600605b0079e70801b0e33ff07ebffa3-part1 -> ../../sdk1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-3600605b0079e70801b0e33ff07ebffa3-part2 -> ../../sdk2  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/scsi-3600605b0079e70801b0e33ff07ebffa3-part3 -> ../../sdk3  
lrwxrwxrwx 1 root root 10 Jun 19 12:43 /dev/disk/by-id/scsi-3600605b0079e70801b0e33ff07ebffa3-part4 -> ../../sdk4  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064b0a6dc -> ../../sdd  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064b0a6dc-part1 -> ../../sdd1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064b0a6dc-part9 -> ../../sdd9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064b563d5 -> ../../sda  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064b563d5-part1 -> ../../sda1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064b563d5-part9 -> ../../sda9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbc776 -> ../../sde  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbc776-part1 -> ../../sde1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbc776-part9 -> ../../sde9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbf23b -> ../../sdh  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbf23b-part1 -> ../../sdh1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbf23b-part9 -> ../../sdh9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbfc66 -> ../../sdf  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbfc66-part1 -> ../../sdf1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bbfc66-part9 -> ../../sdf9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bc046a -> ../../sdj  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bc046a-part1 -> ../../sdj1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bc046a-part9 -> ../../sdj9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bf56da -> ../../sdc  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bf56da-part1 -> ../../sdc1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bf56da-part9 -> ../../sdc9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bf65dd -> ../../sdb  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bf65dd-part1 -> ../../sdb1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064bf65dd-part9 -> ../../sdb9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064c02880 -> ../../sdi  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064c02880-part1 -> ../../sdi1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064c02880-part9 -> ../../sdi9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064c04f5a -> ../../sdg  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064c04f5a-part1 -> ../../sdg1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x5000c50064c04f5a-part9 -> ../../sdg9  
lrwxrwxrwx 1 root root  9 Jun 19  2014 /dev/disk/by-id/wwn-0x600605b0079e70801b0e33ff07ebffa3 -> ../../sdk  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x600605b0079e70801b0e33ff07ebffa3-part1 -> ../../sdk1  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x600605b0079e70801b0e33ff07ebffa3-part2 -> ../../sdk2  
lrwxrwxrwx 1 root root 10 Jun 19  2014 /dev/disk/by-id/wwn-0x600605b0079e70801b0e33ff07ebffa3-part3 -> ../../sdk3  
lrwxrwxrwx 1 root root 10 Jun 19 12:43 /dev/disk/by-id/wwn-0x600605b0079e70801b0e33ff07ebffa3-part4 -> ../../sdk4  

如果已经使用了/dev/sd*, 可以删除后重新加入.

# zpool remove zp1 /dev/sdk4  
# zpool add zp1 log /dev/disk/by-id/scsi-3600605b0079e70801b0e33ff07ebffa3-part4  

slog一般不需要太大. 有几个G就差不多了. L2ARC则越大越好.

小结

测试面比较窄, 但是反映了一些问题.

1. 因为使用了SLOG, 所以ZFS写性能超出了这样配置的存储. 所以还是比较适合用作数据库的.

2. 因为这里的读测试还没有超出内存大小. 显然还不能说明问题. 超出内存后18G表的查询需要70秒左右. 如果加上SSD作为L2ARC的话, 读性能还能有提高.

3. 使用zfs压缩后, 存储空间是小了, 同时还要考虑压缩和解压带来的延迟和CPU开销.

4. slog很重要, 最好mirror , 如果底层是raid的话, 可以不mirror. 这里的用内存作为例子千万别模仿, 我只是模仿ssd.

参考

1. http://blog.163.com/digoal@126/blog/#m=0&t=1&c=fks_084075085094080071084086083095085080082075083081086071084

Flag Counter

digoal’s 大量PostgreSQL文章入口