PostgreSQL 11 1000亿 tpcb、1000W tpcc 性能测试 - on 阿里云ECS + ESSD (含quorum based 0丢失多副本配置与性能测试)

40 minute read

背景

https://help.aliyun.com/knowledge_detail/64950.html

阿里云ESSD提供了单盘32TB容量,100万IOPS,4GB/s读写吞吐的能力,单台ECS可以挂载16块ESSD盘,组成512 TB的大容量存储。非常适合数据库这类IO密集应用。

PostgreSQL 作为一款优秀的企业级开源数据库产品,阿里云ESSD的加入,可以带给用户什么样的体验呢?

《PostgreSQL 11 100亿 tpcb 性能 on ECS》

《PostgreSQL 11 tpcc 测试(103万tpmC on ECS) - use sysbench-tpcc by Percona-Lab》

《(TPC-H测试 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen》

ESSD云盘部署

parted -s /dev/vdb mklabel gpt          
parted -s /dev/vdb mkpart primary 1MiB 100%         
mkfs.ext4 /dev/vdb1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -T largefile -L data01          
mkdir /data01            
          
vi /etc/fstab            
            
LABEL=data01 /data01     ext4        defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback    0 0          
          
          
mount -a      

fsync RT测试

1 ECS本地SSD

使用PostgreSQL提供的pg_test_fsync进行测试

5 seconds per test    
O_DIRECT supported on this platform for open_datasync and open_sync.    
    
Compare file sync methods using one 8kB write:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                     46355.824 ops/sec      22 usecs/op    
        fdatasync                         39213.835 ops/sec      26 usecs/op    
        fsync                             35912.478 ops/sec      28 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                         42426.737 ops/sec      24 usecs/op    
    
Compare file sync methods using two 8kB writes:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                     17109.945 ops/sec      58 usecs/op    
        fdatasync                         26316.089 ops/sec      38 usecs/op    
        fsync                             24202.679 ops/sec      41 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                         15760.721 ops/sec      63 usecs/op    
    
Compare open_sync with different write sizes:    
(This is designed to compare the cost of writing 16kB in different write    
open_sync sizes.)    
         1 * 16kB open_sync write         29108.820 ops/sec      34 usecs/op    
         2 *  8kB open_sync writes        15674.805 ops/sec      64 usecs/op    
         4 *  4kB open_sync writes         9942.061 ops/sec     101 usecs/op    
         8 *  2kB open_sync writes         5637.484 ops/sec     177 usecs/op    
        16 *  1kB open_sync writes         3076.057 ops/sec     325 usecs/op    
    
Test if fsync on non-write file descriptor is honored:    
(If the times are similar, fsync() can sync data written on a different    
descriptor.)    
        write, fsync, close               32581.863 ops/sec      31 usecs/op    
        write, close, fsync               32512.798 ops/sec      31 usecs/op    
    
Non-sync'ed 8kB writes:    
        write                            350232.219 ops/sec       3 usecs/op    

2 ESSD

5 seconds per test    
O_DIRECT supported on this platform for open_datasync and open_sync.    
    
Compare file sync methods using one 8kB write:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                      8395.592 ops/sec     119 usecs/op    
        fdatasync                          7722.692 ops/sec     129 usecs/op    
        fsync                              5619.389 ops/sec     178 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                          5685.669 ops/sec     176 usecs/op    
    
Compare file sync methods using two 8kB writes:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                      3858.783 ops/sec     259 usecs/op    
        fdatasync                          5396.356 ops/sec     185 usecs/op    
        fsync                              4214.546 ops/sec     237 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                          3025.366 ops/sec     331 usecs/op    
    
Compare open_sync with different write sizes:    
(This is designed to compare the cost of writing 16kB in different write    
open_sync sizes.)    
         1 * 16kB open_sync write          4506.749 ops/sec     222 usecs/op    
         2 *  8kB open_sync writes         3099.963 ops/sec     323 usecs/op    
         4 *  4kB open_sync writes         1763.684 ops/sec     567 usecs/op    
         8 *  2kB open_sync writes          429.923 ops/sec    2326 usecs/op    
        16 *  1kB open_sync writes          198.005 ops/sec    5050 usecs/op    
    
Test if fsync on non-write file descriptor is honored:    
(If the times are similar, fsync() can sync data written on a different    
descriptor.)    
        write, fsync, close                5393.927 ops/sec     185 usecs/op    
        write, close, fsync                5470.240 ops/sec     183 usecs/op    
    
Non-sync'ed 8kB writes:    
        write                            385505.858 ops/sec       3 usecs/op    

fio 专业IO测试

测试项

vi test    
    
[global]        
thread        
numjobs=64        
ramp_time=6        
size=10g        
exitall        
time_based        
runtime=180        
group_reporting        
randrepeat=0        
norandommap        
bs=8k        
rwmixwrite=35        
        
[rw-rand-libaio-mysql-ext4]        
stonewall        
direct=1        
iodepth=16        
iodepth_batch=8        
iodepth_low=8        
iodepth_batch_complete=8        
rw=randrw        
ioengine=libaio        
filename=/data01/ext4        
        
[rw-seq-libaio-mysql-ext4]        
stonewall        
direct=1        
iodepth=16        
iodepth_batch=8        
iodepth_low=8        
iodepth_batch_complete=8        
rw=rw        
ioengine=libaio        
filename=/data01/ext4        
    
[rw-rand-sync-pgsql-ext4]        
stonewall        
direct=0        
rw=randrw        
ioengine=sync        
filename=/data01/ext4        
        
[rw-seq-sync-pgsql-ext4]        
stonewall        
direct=0        
rw=rw        
ioengine=sync        
filename=/data01/ext4    
fio test --output ./cfq-raw.log    

1 ECS本地SSD

rw-rand-libaio-mysql-ext4: (g=0): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-seq-libaio-mysql-ext4: (g=1): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-rand-sync-pgsql-ext4: (g=2): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
rw-seq-sync-pgsql-ext4: (g=3): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
fio-3.1    
Starting 256 threads    
    
rw-rand-libaio-mysql-ext4: (groupid=0, jobs=64): err= 0: pid=27005: Tue Sep 18 15:18:42 2018    
   read: IOPS=146k, BW=1140MiB/s (1196MB/s)(200GiB/180004msec)    
    slat (usec): min=22, max=114828, avg=2266.17, stdev=2520.66    
    clat (nsec): min=1759, max=115061k, avg=2316672.58, stdev=2575502.75    
     lat (usec): min=70, max=164337, avg=4582.93, stdev=3639.95    
    clat percentiles (usec):    
     |  1.00th=[  603],  5.00th=[ 1352], 10.00th=[ 1549], 20.00th=[ 1713],    
     | 30.00th=[ 1811], 40.00th=[ 1876], 50.00th=[ 1942], 60.00th=[ 2008],    
     | 70.00th=[ 2057], 80.00th=[ 2114], 90.00th=[ 2245], 95.00th=[ 2376],    
     | 99.00th=[16581], 99.50th=[19792], 99.90th=[27919], 99.95th=[32900],    
     | 99.99th=[49021]    
   bw (  KiB/s): min= 6000, max=42120, per=1.54%, avg=17968.83, stdev=2125.54, samples=22976    
   iops        : min=  750, max= 5265, avg=2245.75, stdev=265.70, samples=22976    
  write: IOPS=78.6k, BW=614MiB/s (644MB/s)(108GiB/180004msec)    
    slat (usec): min=24, max=114826, avg=2266.44, stdev=2515.63    
    clat (nsec): min=1056, max=114853k, avg=2225812.28, stdev=2478372.79    
     lat (usec): min=43, max=164125, avg=4492.34, stdev=3574.60    
    clat percentiles (usec):    
     |  1.00th=[   19],  5.00th=[ 1123], 10.00th=[ 1500], 20.00th=[ 1696],    
     | 30.00th=[ 1795], 40.00th=[ 1876], 50.00th=[ 1942], 60.00th=[ 1991],    
     | 70.00th=[ 2057], 80.00th=[ 2114], 90.00th=[ 2212], 95.00th=[ 2311],    
     | 99.00th=[16057], 99.50th=[19268], 99.90th=[27132], 99.95th=[31327],    
     | 99.99th=[47449]    
   bw (  KiB/s): min= 3235, max=22265, per=1.54%, avg=9680.55, stdev=1180.58, samples=22976    
   iops        : min=  404, max= 2783, avg=1209.72, stdev=147.58, samples=22976    
  lat (usec)   : 2=0.01%, 4=0.06%, 10=0.04%, 20=0.33%, 50=0.37%    
  lat (usec)   : 100=0.16%, 250=0.21%, 500=0.37%, 750=0.59%, 1000=0.84%    
  lat (msec)   : 2=56.76%, 4=36.91%, 10=0.82%, 20=2.08%, 50=0.44%    
  lat (msec)   : 100=0.01%, 250=0.01%    
  cpu          : usr=0.50%, sys=82.17%, ctx=8285760, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=103.4%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=26268555,14152416,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-seq-libaio-mysql-ext4: (groupid=1, jobs=64): err= 0: pid=27075: Tue Sep 18 15:18:42 2018    
   read: IOPS=147k, BW=1150MiB/s (1206MB/s)(202GiB/180004msec)    
    slat (usec): min=22, max=97402, avg=2245.77, stdev=2561.30    
    clat (nsec): min=1597, max=97414k, avg=2299719.84, stdev=2622137.64    
     lat (usec): min=37, max=121819, avg=4545.57, stdev=3688.31    
    clat percentiles (usec):    
     |  1.00th=[  529],  5.00th=[ 1287], 10.00th=[ 1516], 20.00th=[ 1680],    
     | 30.00th=[ 1778], 40.00th=[ 1844], 50.00th=[ 1909], 60.00th=[ 1975],    
     | 70.00th=[ 2040], 80.00th=[ 2114], 90.00th=[ 2212], 95.00th=[ 2376],    
     | 99.00th=[16712], 99.50th=[19792], 99.90th=[28181], 99.95th=[32375],    
     | 99.99th=[47973]    
   bw (  KiB/s): min= 6822, max=34080, per=1.53%, avg=18044.54, stdev=2237.76, samples=22979    
   iops        : min=  852, max= 4260, avg=2255.24, stdev=279.74, samples=22979    
  write: IOPS=79.3k, BW=620MiB/s (650MB/s)(109GiB/180004msec)    
    slat (usec): min=24, max=97396, avg=2247.37, stdev=2563.13    
    clat (nsec): min=965, max=97414k, avg=2198816.49, stdev=2518690.40    
     lat (usec): min=40, max=121817, avg=4446.27, stdev=3625.49    
    clat percentiles (usec):    
     |  1.00th=[   17],  5.00th=[  996], 10.00th=[ 1434], 20.00th=[ 1647],    
     | 30.00th=[ 1762], 40.00th=[ 1844], 50.00th=[ 1909], 60.00th=[ 1958],    
     | 70.00th=[ 2024], 80.00th=[ 2089], 90.00th=[ 2180], 95.00th=[ 2278],    
     | 99.00th=[16188], 99.50th=[19268], 99.90th=[27132], 99.95th=[31327],    
     | 99.99th=[46400]    
   bw (  KiB/s): min= 3073, max=18400, per=1.53%, avg=9720.67, stdev=1243.91, samples=22979    
   iops        : min=  384, max= 2300, avg=1214.75, stdev=155.52, samples=22979    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.01%, 4=0.09%, 10=0.06%, 20=0.50%, 50=0.28%    
  lat (usec)   : 100=0.13%, 250=0.20%, 500=0.54%, 750=0.71%, 1000=0.98%    
  lat (msec)   : 2=61.58%, 4=31.54%, 10=0.78%, 20=2.16%, 50=0.46%    
  lat (msec)   : 100=0.01%    
  cpu          : usr=0.50%, sys=81.14%, ctx=8462673, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=103.3%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=26505435,14277757,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-rand-sync-pgsql-ext4: (groupid=2, jobs=64): err= 0: pid=27150: Tue Sep 18 15:18:42 2018    
   read: IOPS=228k, BW=1785MiB/s (1872MB/s)(314GiB/180002msec)    
    clat (nsec): min=945, max=14282k, avg=5002.95, stdev=48977.41    
     lat (nsec): min=1247, max=14282k, avg=5374.56, stdev=49037.52    
    clat percentiles (usec):    
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    4],    
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    4], 60.00th=[    4],    
     | 70.00th=[    4], 80.00th=[    4], 90.00th=[    5], 95.00th=[    5],    
     | 99.00th=[   12], 99.50th=[   90], 99.90th=[  212], 99.95th=[  586],    
     | 99.99th=[ 2180]    
   bw (  KiB/s): min= 4824, max=77689, per=1.55%, avg=28302.49, stdev=3542.33, samples=22979    
   iops        : min=  603, max= 9711, avg=3537.39, stdev=442.79, samples=22979    
  write: IOPS=123k, BW=961MiB/s (1008MB/s)(169GiB/180002msec)    
    clat (usec): min=2, max=216155, avg=504.28, stdev=1314.99    
     lat (usec): min=2, max=216155, avg=504.72, stdev=1315.00    
    clat percentiles (usec):    
     |  1.00th=[    8],  5.00th=[   77], 10.00th=[  285], 20.00th=[  392],    
     | 30.00th=[  408], 40.00th=[  420], 50.00th=[  469], 60.00th=[  519],    
     | 70.00th=[  545], 80.00th=[  562], 90.00th=[  586], 95.00th=[  603],    
     | 99.00th=[  685], 99.50th=[ 2089], 99.90th=[15270], 99.95th=[23462],    
     | 99.99th=[46924]    
   bw (  KiB/s): min= 2661, max=42215, per=1.55%, avg=15240.76, stdev=1812.18, samples=22979    
   iops        : min=  332, max= 5276, avg=1904.70, stdev=226.51, samples=22979    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.01%, 4=56.34%, 10=8.49%, 20=1.04%, 50=0.35%    
  lat (usec)   : 100=0.43%, 250=1.42%, 500=16.05%, 750=15.58%, 1000=0.04%    
  lat (msec)   : 2=0.07%, 4=0.05%, 10=0.07%, 20=0.04%, 50=0.02%    
  lat (msec)   : 100=0.01%, 250=0.01%    
  cpu          : usr=0.93%, sys=89.47%, ctx=5631403, majf=0, minf=0    
  IO depths    : 1=102.3%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=41127965,22146325,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
rw-seq-sync-pgsql-ext4: (groupid=3, jobs=64): err= 0: pid=27221: Tue Sep 18 15:18:42 2018    
   read: IOPS=357k, BW=2786MiB/s (2922MB/s)(490GiB/180001msec)    
    clat (nsec): min=1006, max=20336k, avg=2745.46, stdev=22291.35    
     lat (nsec): min=1307, max=20336k, avg=3110.59, stdev=22301.63    
    clat percentiles (nsec):    
     |  1.00th=[ 1592],  5.00th=[ 1800], 10.00th=[ 1944], 20.00th=[ 2160],    
     | 30.00th=[ 2352], 40.00th=[ 2480], 50.00th=[ 2608], 60.00th=[ 2736],    
     | 70.00th=[ 2864], 80.00th=[ 3024], 90.00th=[ 3216], 95.00th=[ 3440],    
     | 99.00th=[ 4016], 99.50th=[ 8896], 99.90th=[15552], 99.95th=[17280],    
     | 99.99th=[21376]    
   bw (  KiB/s): min=21099, max=151871, per=1.56%, avg=44588.93, stdev=5219.41, samples=22983    
   iops        : min= 2637, max=18983, avg=5573.29, stdev=652.43, samples=22983    
  write: IOPS=192k, BW=1500MiB/s (1573MB/s)(264GiB/180001msec)    
    clat (usec): min=2, max=97210, avg=322.71, stdev=828.04    
     lat (usec): min=2, max=97210, avg=323.13, stdev=828.04    
    clat percentiles (usec):    
     |  1.00th=[    6],  5.00th=[   97], 10.00th=[  182], 20.00th=[  269],    
     | 30.00th=[  297], 40.00th=[  306], 50.00th=[  310], 60.00th=[  314],    
     | 70.00th=[  322], 80.00th=[  326], 90.00th=[  334], 95.00th=[  343],    
     | 99.00th=[  400], 99.50th=[  930], 99.90th=[12911], 99.95th=[19792],    
     | 99.99th=[32113]    
   bw (  KiB/s): min=11433, max=81619, per=1.56%, avg=24008.70, stdev=2707.40, samples=22983    
   iops        : min= 1429, max=10202, avg=3000.77, stdev=338.43, samples=22983    
  lat (usec)   : 2=7.89%, 4=56.48%, 10=1.25%, 20=0.37%, 50=0.25%    
  lat (usec)   : 100=0.57%, 250=4.14%, 500=28.80%, 750=0.06%, 1000=0.03%    
  lat (msec)   : 2=0.05%, 4=0.03%, 10=0.04%, 20=0.03%, 50=0.02%    
  lat (msec)   : 100=0.01%    
  cpu          : usr=1.34%, sys=90.37%, ctx=8645391, majf=0, minf=0    
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=64196338,34566565,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
    
Run status group 0 (all jobs):    
   READ: bw=1140MiB/s (1196MB/s), 1140MiB/s-1140MiB/s (1196MB/s-1196MB/s), io=200GiB (215GB), run=180004-180004msec    
  WRITE: bw=614MiB/s (644MB/s), 614MiB/s-614MiB/s (644MB/s-644MB/s), io=108GiB (116GB), run=180004-180004msec    
    
Run status group 1 (all jobs):    
   READ: bw=1150MiB/s (1206MB/s), 1150MiB/s-1150MiB/s (1206MB/s-1206MB/s), io=202GiB (217GB), run=180004-180004msec    
  WRITE: bw=620MiB/s (650MB/s), 620MiB/s-620MiB/s (650MB/s-650MB/s), io=109GiB (117GB), run=180004-180004msec    
    
Run status group 2 (all jobs):    
   READ: bw=1785MiB/s (1872MB/s), 1785MiB/s-1785MiB/s (1872MB/s-1872MB/s), io=314GiB (337GB), run=180002-180002msec    
  WRITE: bw=961MiB/s (1008MB/s), 961MiB/s-961MiB/s (1008MB/s-1008MB/s), io=169GiB (181GB), run=180002-180002msec    
    
Run status group 3 (all jobs):    
   READ: bw=2786MiB/s (2922MB/s), 2786MiB/s-2786MiB/s (2922MB/s-2922MB/s), io=490GiB (526GB), run=180001-180001msec    
  WRITE: bw=1500MiB/s (1573MB/s), 1500MiB/s-1500MiB/s (1573MB/s-1573MB/s), io=264GiB (283GB), run=180001-180001msec    
    
Disk stats (read/write):    
    dm-0: ios=56484741/52000097, merge=0/0, ticks=55721482/45621907, in_queue=102865767, util=67.65%, aggrios=6930165/6249265, aggrmerge=130427/250746, aggrticks=6648860/4061887, aggrin_queue=10712820, aggrutil=66.53%    
  vdb: ios=6928564/6249448, merge=130186/250641, ticks=6629128/3928950, in_queue=10559055, util=65.93%    
  vdc: ios=6930486/6248991, merge=130413/250950, ticks=6643790/3962244, in_queue=10605847, util=66.02%    
  vdd: ios=6928089/6250855, merge=130732/250764, ticks=6472207/4009640, in_queue=10493342, util=66.18%    
  vde: ios=6929909/6250351, merge=130382/250303, ticks=6661137/4040922, in_queue=10701353, util=66.20%    
  vdf: ios=6932429/6245563, merge=130328/251129, ticks=6681549/4088606, in_queue=10769793, util=66.29%    
  vdg: ios=6930521/6249106, merge=130467/250956, ticks=6697543/4113859, in_queue=10811547, util=66.35%    
  vdh: ios=6930993/6249837, merge=130681/250844, ticks=6698661/4151500, in_queue=10851206, util=66.41%    
  vdi: ios=6930333/6249975, merge=130228/250384, ticks=6706869/4199379, in_queue=10910423, util=66.53%    

2 ESSD

rw-rand-libaio-mysql-ext4: (g=0): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-seq-libaio-mysql-ext4: (g=1): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-rand-sync-pgsql-ext4: (g=2): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
rw-seq-sync-pgsql-ext4: (g=3): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
fio-3.1    
Starting 256 threads    
    
rw-rand-libaio-mysql-ext4: (groupid=0, jobs=64): err= 0: pid=21221: Tue Sep 18 15:19:03 2018    
   read: IOPS=112k, BW=879MiB/s (922MB/s)(154GiB/180008msec)    
    slat (usec): min=16, max=119403, avg=1880.80, stdev=2119.90    
    clat (nsec): min=940, max=512641k, avg=4098774.34, stdev=3181162.93    
     lat (usec): min=158, max=514427, avg=5979.64, stdev=3696.51    
    clat percentiles (usec):    
     |  1.00th=[    3],  5.00th=[  486], 10.00th=[  922], 20.00th=[ 1729],    
     | 30.00th=[ 2474], 40.00th=[ 2933], 50.00th=[ 3425], 60.00th=[ 4080],    
     | 70.00th=[ 5014], 80.00th=[ 6128], 90.00th=[ 7832], 95.00th=[ 9503],    
     | 99.00th=[14091], 99.50th=[16450], 99.90th=[23200], 99.95th=[29230],    
     | 99.99th=[61604]    
   bw (  KiB/s): min= 8615, max=31056, per=1.57%, avg=14172.78, stdev=990.53, samples=23040    
   iops        : min= 1076, max= 3882, avg=1771.17, stdev=123.81, samples=23040    
  write: IOPS=60.6k, BW=473MiB/s (496MB/s)(83.2GiB/180008msec)    
    slat (usec): min=17, max=119400, avg=1881.51, stdev=2122.86    
    clat (nsec): min=1019, max=510353k, avg=3892737.90, stdev=3154201.30    
     lat (usec): min=118, max=512897, avg=5774.32, stdev=3688.63    
    clat percentiles (usec):    
     |  1.00th=[    3],  5.00th=[  388], 10.00th=[  799], 20.00th=[ 1500],    
     | 30.00th=[ 2278], 40.00th=[ 2802], 50.00th=[ 3261], 60.00th=[ 3851],    
     | 70.00th=[ 4752], 80.00th=[ 5866], 90.00th=[ 7504], 95.00th=[ 9241],    
     | 99.00th=[13829], 99.50th=[16188], 99.90th=[22676], 99.95th=[28181],    
     | 99.99th=[58459]    
   bw (  KiB/s): min= 4137, max=17712, per=1.57%, avg=7634.48, stdev=590.18, samples=23040    
   iops        : min=  517, max= 2214, avg=953.86, stdev=73.79, samples=23040    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.71%, 4=1.40%, 10=0.17%, 20=0.04%, 50=0.11%    
  lat (usec)   : 100=0.20%, 250=0.80%, 500=2.06%, 750=2.89%, 1000=3.31%    
  lat (msec)   : 2=12.56%, 4=35.84%, 10=36.00%, 20=3.73%, 50=0.17%    
  lat (msec)   : 100=0.01%, 250=0.01%, 750=0.01%    
  cpu          : usr=0.31%, sys=49.41%, ctx=17838909, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=103.3%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=20248219,10907423,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-seq-libaio-mysql-ext4: (groupid=1, jobs=64): err= 0: pid=21285: Tue Sep 18 15:19:03 2018    
   read: IOPS=133k, BW=1037MiB/s (1087MB/s)(182GiB/180005msec)    
    slat (usec): min=10, max=40443, avg=861.52, stdev=1493.93    
    clat (nsec): min=953, max=508067k, avg=4239522.11, stdev=2801559.16    
     lat (usec): min=188, max=510200, avg=5101.11, stdev=3015.85    
    clat percentiles (usec):    
     |  1.00th=[  420],  5.00th=[  914], 10.00th=[ 1319], 20.00th=[ 2024],    
     | 30.00th=[ 2671], 40.00th=[ 3261], 50.00th=[ 3851], 60.00th=[ 4490],    
     | 70.00th=[ 5211], 80.00th=[ 6194], 90.00th=[ 7635], 95.00th=[ 8979],    
     | 99.00th=[11731], 99.50th=[12911], 99.90th=[15795], 99.95th=[17171],    
     | 99.99th=[21890]    
   bw (  KiB/s): min=  602, max=22672, per=1.53%, avg=16220.02, stdev=2446.89, samples=23020    
   iops        : min=   75, max= 2834, avg=2027.07, stdev=305.86, samples=23020    
  write: IOPS=71.4k, BW=558MiB/s (585MB/s)(98.1GiB/180005msec)    
    slat (usec): min=12, max=40442, avg=871.28, stdev=1501.73    
    clat (nsec): min=913, max=509492k, avg=3964521.95, stdev=2769947.35    
     lat (usec): min=129, max=511537, avg=4835.88, stdev=2992.88    
    clat percentiles (usec):    
     |  1.00th=[  223],  5.00th=[  791], 10.00th=[ 1172], 20.00th=[ 1827],    
     | 30.00th=[ 2409], 40.00th=[ 2999], 50.00th=[ 3589], 60.00th=[ 4228],    
     | 70.00th=[ 4883], 80.00th=[ 5866], 90.00th=[ 7242], 95.00th=[ 8455],    
     | 99.00th=[11207], 99.50th=[12387], 99.90th=[15008], 99.95th=[16319],    
     | 99.99th=[20317]    
   bw (  KiB/s): min=  309, max=12471, per=1.53%, avg=8733.75, stdev=1342.51, samples=23020    
   iops        : min=   38, max= 1558, avg=1091.29, stdev=167.81, samples=23020    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.31%, 4=0.36%, 10=0.02%, 20=0.01%, 50=0.01%    
  lat (usec)   : 100=0.01%, 250=0.08%, 500=0.82%, 750=2.09%, 1000=2.82%    
  lat (msec)   : 2=14.32%, 4=33.12%, 10=43.55%, 20=2.47%, 50=0.01%    
  lat (msec)   : 100=0.01%, 250=0.01%, 750=0.01%    
  cpu          : usr=0.41%, sys=9.90%, ctx=21948046, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=102.1%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=23881029,12859729,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-rand-sync-pgsql-ext4: (groupid=2, jobs=64): err= 0: pid=21355: Tue Sep 18 15:19:03 2018    
   read: IOPS=304k, BW=2374MiB/s (2489MB/s)(417GiB/180001msec)    
    clat (nsec): min=659, max=20344k, avg=4937.11, stdev=53635.32    
     lat (nsec): min=830, max=20344k, avg=5147.67, stdev=53639.04    
    clat percentiles (nsec):    
     |  1.00th=[   1928],  5.00th=[   2096], 10.00th=[   2192],    
     | 20.00th=[   2352], 30.00th=[   2448], 40.00th=[   2544],    
     | 50.00th=[   2640], 60.00th=[   2736], 70.00th=[   2864],    
     | 80.00th=[   2992], 90.00th=[   3248], 95.00th=[   3504],    
     | 99.00th=[   8640], 99.50th=[  16512], 99.90th=[ 585728],    
     | 99.95th=[1122304], 99.99th=[2375680]    
   bw (  KiB/s): min= 8661, max=312564, per=1.57%, avg=38145.72, stdev=5837.53, samples=23040    
   iops        : min= 1082, max=39070, avg=4767.84, stdev=729.68, samples=23040    
  write: IOPS=164k, BW=1278MiB/s (1340MB/s)(225GiB/180001msec)    
    clat (usec): min=2, max=517110, avg=377.98, stdev=1501.97    
     lat (usec): min=2, max=517110, avg=378.25, stdev=1501.97    
    clat percentiles (usec):    
     |  1.00th=[    6],  5.00th=[   99], 10.00th=[  265], 20.00th=[  330],    
     | 30.00th=[  343], 40.00th=[  351], 50.00th=[  355], 60.00th=[  363],    
     | 70.00th=[  371], 80.00th=[  379], 90.00th=[  396], 95.00th=[  416],    
     | 99.00th=[  545], 99.50th=[ 1045], 99.90th=[11469], 99.95th=[16581],    
     | 99.99th=[28967]    
   bw (  KiB/s): min= 4709, max=168193, per=1.57%, avg=20539.14, stdev=3068.53, samples=23040    
   iops        : min=  588, max=21024, avg=2567.01, stdev=383.55, samples=23040    
  lat (nsec)   : 750=0.01%, 1000=0.01%    
  lat (usec)   : 2=1.33%, 4=62.24%, 10=1.91%, 20=0.50%, 50=0.18%    
  lat (usec)   : 100=0.32%, 250=1.63%, 500=31.34%, 750=0.27%, 1000=0.06%    
  lat (msec)   : 2=0.07%, 4=0.04%, 10=0.06%, 20=0.03%, 50=0.01%    
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%    
  cpu          : usr=0.53%, sys=89.73%, ctx=7630423, majf=0, minf=0    
  IO depths    : 1=102.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=54689028,29447045,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
rw-seq-sync-pgsql-ext4: (groupid=3, jobs=64): err= 0: pid=21420: Tue Sep 18 15:19:03 2018    
   read: IOPS=400k, BW=3128MiB/s (3280MB/s)(550GiB/180002msec)    
    clat (nsec): min=655, max=77256k, avg=1908.86, stdev=16100.37    
     lat (nsec): min=818, max=77256k, avg=2119.98, stdev=16102.12    
    clat percentiles (nsec):    
     |  1.00th=[  988],  5.00th=[ 1144], 10.00th=[ 1256], 20.00th=[ 1416],    
     | 30.00th=[ 1576], 40.00th=[ 1704], 50.00th=[ 1816], 60.00th=[ 1928],    
     | 70.00th=[ 2040], 80.00th=[ 2192], 90.00th=[ 2384], 95.00th=[ 2576],    
     | 99.00th=[ 3184], 99.50th=[ 5472], 99.90th=[13504], 99.95th=[15424],    
     | 99.99th=[18304]    
   bw (  KiB/s): min=27335, max=205560, per=1.57%, avg=50224.99, stdev=4880.87, samples=23040    
   iops        : min= 3416, max=25695, avg=6277.74, stdev=610.11, samples=23040    
  write: IOPS=216k, BW=1684MiB/s (1766MB/s)(296GiB/180002msec)    
    clat (nsec): min=1980, max=344807k, avg=289811.56, stdev=750064.44    
     lat (usec): min=2, max=344807, avg=290.08, stdev=750.06    
    clat percentiles (usec):    
     |  1.00th=[    5],  5.00th=[  106], 10.00th=[  194], 20.00th=[  258],    
     | 30.00th=[  273], 40.00th=[  277], 50.00th=[  285], 60.00th=[  289],    
     | 70.00th=[  293], 80.00th=[  297], 90.00th=[  306], 95.00th=[  310],    
     | 99.00th=[  351], 99.50th=[  799], 99.90th=[ 8356], 99.95th=[12387],    
     | 99.99th=[24249]    
   bw (  KiB/s): min=13581, max=112633, per=1.57%, avg=27046.57, stdev=2507.74, samples=23040    
   iops        : min= 1697, max=14079, avg=3380.43, stdev=313.47, samples=23040    
  lat (nsec)   : 750=0.01%, 1000=0.74%    
  lat (usec)   : 2=42.27%, 4=21.80%, 10=1.17%, 20=0.26%, 50=0.13%    
  lat (usec)   : 100=0.33%, 250=4.51%, 500=28.59%, 750=0.03%, 1000=0.02%    
  lat (msec)   : 2=0.05%, 4=0.04%, 10=0.05%, 20=0.02%, 50=0.01%    
  lat (msec)   : 100=0.01%, 500=0.01%    
  cpu          : usr=0.63%, sys=92.37%, ctx=9426487, majf=0, minf=0    
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=72069547,38810440,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
    
Run status group 0 (all jobs):    
   READ: bw=879MiB/s (922MB/s), 879MiB/s-879MiB/s (922MB/s-922MB/s), io=154GiB (166GB), run=180008-180008msec    
  WRITE: bw=473MiB/s (496MB/s), 473MiB/s-473MiB/s (496MB/s-496MB/s), io=83.2GiB (89.4GB), run=180008-180008msec    
    
Run status group 1 (all jobs):    
   READ: bw=1037MiB/s (1087MB/s), 1037MiB/s-1037MiB/s (1087MB/s-1087MB/s), io=182GiB (196GB), run=180005-180005msec    
  WRITE: bw=558MiB/s (585MB/s), 558MiB/s-558MiB/s (585MB/s-585MB/s), io=98.1GiB (105GB), run=180005-180005msec    
    
Run status group 2 (all jobs):    
   READ: bw=2374MiB/s (2489MB/s), 2374MiB/s-2374MiB/s (2489MB/s-2489MB/s), io=417GiB (448GB), run=180001-180001msec    
  WRITE: bw=1278MiB/s (1340MB/s), 1278MiB/s-1278MiB/s (1340MB/s-1340MB/s), io=225GiB (241GB), run=180001-180001msec    
    
Run status group 3 (all jobs):    
   READ: bw=3128MiB/s (3280MB/s), 3128MiB/s-3128MiB/s (3280MB/s-3280MB/s), io=550GiB (590GB), run=180002-180002msec    
  WRITE: bw=1684MiB/s (1766MB/s), 1684MiB/s-1684MiB/s (1766MB/s-1766MB/s), io=296GiB (318GB), run=180002-180002msec    
    
Disk stats (read/write):    
  vdb: ios=32558186/26432830, merge=13683969/3967228, ticks=103112470/71513817, in_queue=174719639, util=60.48%    

PostgreSQL 11 测试

1、参数

listen_addresses = '0.0.0.0'      
port = 1921      
max_connections = 2000      
superuser_reserved_connections = 3      
unix_socket_directories = '., /var/run/postgresql, /tmp'      
tcp_keepalives_idle = 60      
tcp_keepalives_interval = 10      
tcp_keepalives_count = 10      
shared_buffers = 64GB      
max_prepared_transactions = 2000      
work_mem = 8MB      
maintenance_work_mem = 2GB      
dynamic_shared_memory_type = posix      
vacuum_cost_delay = 0      
bgwriter_delay = 10ms      
bgwriter_lru_maxpages = 1000      
bgwriter_lru_multiplier = 10.0      
effective_io_concurrency = 0      
max_worker_processes = 128      
max_parallel_maintenance_workers = 64      
max_parallel_workers_per_gather = 0      
parallel_leader_participation = on      
max_parallel_workers = 64      
wal_level = minimal        
synchronous_commit = off      
wal_writer_delay = 10ms      
checkpoint_timeout = 35min      
max_wal_size = 128GB      
min_wal_size = 32GB      
checkpoint_completion_target = 0.1      
max_wal_senders = 0      
effective_cache_size = 400GB      
log_destination = 'csvlog'      
logging_collector = on      
log_directory = 'log'      
log_filename = 'postgresql-%a.log'      
log_truncate_on_rotation = on      
log_rotation_age = 1d      
log_rotation_size = 0      
log_checkpoints = on       
log_connections = on      
log_disconnections = on      
log_error_verbosity = verbose       
log_line_prefix = '%m [%p] '      
log_timezone = 'PRC'      
log_autovacuum_min_duration = 0      
autovacuum_max_workers = 16      
autovacuum_freeze_max_age = 1200000000      
autovacuum_multixact_freeze_max_age = 1400000000      
autovacuum_vacuum_cost_delay = 0ms      
vacuum_freeze_table_age = 1150000000      
vacuum_multixact_freeze_table_age = 1150000000      
datestyle = 'iso, mdy'      
timezone = 'PRC'      
lc_messages = 'C'      
lc_monetary = 'C'      
lc_numeric = 'C'      
lc_time = 'C'      
default_text_search_config = 'pg_catalog.english'      
jit = off      
cpu_tuple_cost=0.00018884145574257426        
cpu_index_tuple_cost = 0.00433497085216479990        
cpu_operator_cost = 0.00216748542608239995        
seq_page_cost=0.014329        
random_page_cost = 0.016     
parallel_tuple_cost = 0      
parallel_setup_cost = 0     
min_parallel_table_scan_size = 0    
min_parallel_index_scan_size = 0    

1000W tpcc 测试

16072 * 60 = 96.4万 tpmC    

详细结果

......    
[ 2993s ] thds: 64 tps: 15107.81 qps: 431171.53 (r/w/o: 196624.50/204331.41/30215.62) lat (ms,95%): 10.65 err/s 51.00 reconn/s: 0.00    
[ 2994s ] thds: 64 tps: 15454.20 qps: 434439.71 (r/w/o: 198195.61/205335.70/30908.41) lat (ms,95%): 10.46 err/s 77.00 reconn/s: 0.00    
[ 2995s ] thds: 64 tps: 15480.57 qps: 438798.81 (r/w/o: 200298.44/207538.23/30962.14) lat (ms,95%): 10.46 err/s 72.00 reconn/s: 0.00    
[ 2996s ] thds: 64 tps: 15341.97 qps: 434496.22 (r/w/o: 198027.64/205784.63/30683.94) lat (ms,95%): 10.65 err/s 73.00 reconn/s: 0.00    
[ 2997s ] thds: 64 tps: 15208.54 qps: 433973.96 (r/w/o: 197975.05/205581.82/30417.08) lat (ms,95%): 10.65 err/s 75.01 reconn/s: 0.00    
[ 2998s ] thds: 64 tps: 15300.14 qps: 431763.76 (r/w/o: 196862.95/204300.53/30600.28) lat (ms,95%): 10.65 err/s 84.00 reconn/s: 0.00    
[ 2999s ] thds: 64 tps: 15108.49 qps: 426253.35 (r/w/o: 194171.57/201864.80/30216.98) lat (ms,95%): 10.65 err/s 56.99 reconn/s: 0.00    
[ 3000s ] thds: 64 tps: 15046.89 qps: 428187.32 (r/w/o: 195463.56/202631.98/30091.78) lat (ms,95%): 10.84 err/s 70.00 reconn/s: 0.00    
SQL statistics:    
    queries performed:    
        read:                            625427855    
        write:                           649118720    
        other:                           96478628    
        total:                           1371025203    
    transactions:                        48223282 (16072.47 per sec.)    
    queries:                             1371025203 (456952.75 per sec.)    
    ignored errors:                      210005 (69.99 per sec.)    
    reconnects:                          0      (0.00 per sec.)    
    
General statistics:    
    total time:                          3000.3636s    
    total number of events:              48223282    
    
Latency (ms):    
         min:                                    0.28    
         avg:                                    3.98    
         max:                                  912.95    
         95th percentile:                        9.91    
         sum:                            191859179.61    
    
Threads fairness:    
    events (avg/stddev):           753488.7812/3072.19    
    execution time (avg/stddev):   2997.7997/0.05    

1000亿 tpcb 测试

1、生成1000亿数据

nohup pgbench -i -s 1000000 -I dtg >./pgbench_ins.log 2>&1 &    
    
    
99998900000 of 100000000000 tuples (99%) done (elapsed 93180.83 s, remaining 1.03 s)    
99999000000 of 100000000000 tuples (99%) done (elapsed 93181.05 s, remaining 0.93 s)    
99999100000 of 100000000000 tuples (99%) done (elapsed 93181.13 s, remaining 0.84 s)    
99999200000 of 100000000000 tuples (99%) done (elapsed 93181.21 s, remaining 0.75 s)    
99999300000 of 100000000000 tuples (99%) done (elapsed 93181.30 s, remaining 0.65 s)    
99999400000 of 100000000000 tuples (99%) done (elapsed 93182.01 s, remaining 0.56 s)    
99999500000 of 100000000000 tuples (99%) done (elapsed 93182.09 s, remaining 0.47 s)    
99999600000 of 100000000000 tuples (99%) done (elapsed 93182.17 s, remaining 0.37 s)    
99999700000 of 100000000000 tuples (99%) done (elapsed 93182.25 s, remaining 0.28 s)    
99999800000 of 100000000000 tuples (99%) done (elapsed 93182.33 s, remaining 0.19 s)    
99999900000 of 100000000000 tuples (99%) done (elapsed 93182.42 s, remaining 0.09 s)    
100000000000 of 100000000000 tuples (100%) done (elapsed 93182.50 s, remaining 0.00 s)    
done.    

生成1000亿数据耗时: 93182 秒。 (约25小时 52分钟。)

2、给1000亿的单表创建索引(64 parallel)

postgres=# analyze pgbench_accounts ;    
ANALYZE    
postgres=# alter table pgbench_accounts set (parallel_workers =64);    
ALTER TABLE    
nohup pgbench -i -s 1000000 -I p > ./pk.log 2>&1 &    

1000亿单表创建索引耗时: 10小时 50分钟。

1000亿单表、索引容量大小

1000亿单表:12.5 TB。

1000亿单表索引: 2 TB。

postgres=# \di+ pgbench*  
                                      List of relations  
 Schema |         Name          | Type  |  Owner   |      Table       |  Size   | Description   
--------+-----------------------+-------+----------+------------------+---------+-------------  
 public | pgbench_accounts_pkey | index | postgres | pgbench_accounts | 2092 GB |   
 public | pgbench_branches_pkey | index | postgres | pgbench_branches | 21 MB   |   
 public | pgbench_tellers_pkey  | index | postgres | pgbench_tellers  | 214 MB  |   
(3 rows)  
  
postgres=# \dt+ pgbench*  
                          List of relations  
 Schema |       Name       | Type  |  Owner   |  Size   | Description   
--------+------------------+-------+----------+---------+-------------  
 public | pgbench_accounts | table | postgres | 12 TB   |   
 public | pgbench_branches | table | postgres | 35 MB   |   
 public | pgbench_history  | table | postgres | 0 bytes |   
 public | pgbench_tellers  | table | postgres | 422 MB  |   
(4 rows)  

索引深度

1、1000亿行,INT8类型索引,深度为4(不包括ROOT PAGE)。

postgres=# select * from bt_metap('pgbench_accounts_pkey');  
 magic  | version |   root   | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples   
--------+---------+----------+-------+----------+-----------+-------------+-------------------------  
 340322 |       3 | 23149704 |     4 | 23149704 |         4 |           0 |                      -1  
(1 row)  

2、索引查询,索引需要搜索5个BLOCK。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from pgbench_accounts where aid=10000000;  
                                                                   QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using pgbench_accounts_pkey on public.pgbench_accounts  (cost=0.62..0.66 rows=1 width=101) (actual time=0.020..0.021 rows=1 loops=1)  
   Output: aid, bid, abalance, filler  
   Index Cond: (pgbench_accounts.aid = 10000000)  
   Buffers: shared hit=6  -- 5个index block, 1个heap block  
 Planning Time: 0.049 ms  
 Execution Time: 0.033 ms  
(6 rows)  

《深入浅出PostgreSQL B-Tree索引结构》

tpcb 1000亿 性能测试

使用高斯分布,生成测试数据。

《生成泊松、高斯、指数、随机分布数据 - PostgreSQL 9.5 new feature - pgbench improve, gaussian (standard normal) & exponential distribution》

只读

vi test.sql  
  
\set aid random_gaussian(1, :range, 10.0)  
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

读写

vi rw.sql  
  
\set aid random_gaussian(1, :range, 10.0)  
\set bid random(1, 1 * :scale)  
\set tid random(1, 10 * :scale)  
\set delta random(-5000, 5000)  
BEGIN;  
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
END;  

tpcb 1000亿 只读测试

1、活跃数据10亿

QPS: 998818

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=1000000000  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 359606534  
latency average = 0.064 ms  
latency stddev = 0.046 ms  
tps = 998777.462686 (including connections establishing)  
tps = 998818.121681 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.062  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

2、活跃数据100亿

QPS: 597877

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=10000000000  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 215257932  
latency average = 0.107 ms  
latency stddev = 0.526 ms  
tps = 597861.125133 (including connections establishing)  
tps = 597877.469245 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.105  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

前期IO

Total DISK READ :       2.32 G/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:       2.32 G/s | Actual DISK WRITE:       0.00 B/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
46798 be/4 postgres   31.72 M/s    0.00 B/s  0.00 % 72.60 % postgres: postgres postgres [local] BIND    
46774 be/4 postgres   37.42 M/s    0.00 B/s  0.00 % 71.91 % postgres: postgres postgres [local] SELECT  
46792 be/4 postgres   35.54 M/s    0.00 B/s  0.00 % 71.89 % postgres: postgres postgres [local] idle    
46708 be/4 postgres   35.08 M/s    0.00 B/s  0.00 % 71.59 % postgres: postgres postgres [local] SELECT  
46730 be/4 postgres   46.84 M/s    0.00 B/s  0.00 % 70.99 % postgres: postgres postgres [local] SELECT  
46704 be/4 postgres   34.51 M/s    0.00 B/s  0.00 % 70.84 % postgres: postgres postgres [local] SELECT  
46716 be/4 postgres   46.05 M/s    0.00 B/s  0.00 % 70.84 % postgres: postgres postgres [local] SELECT  
46788 be/4 postgres   33.83 M/s    0.00 B/s  0.00 % 70.84 % postgres: postgres postgres [local] SELECT  
46807 be/4 postgres   33.78 M/s    0.00 B/s  0.00 % 70.41 % postgres: postgres postgres [local] SELECT  
46815 be/4 postgres   35.21 M/s    0.00 B/s  0.00 % 70.33 % postgres: postgres postgres [local] SELECT  
46812 be/4 postgres   45.95 M/s    0.00 B/s  0.00 % 70.18 % postgres: postgres postgres [local] SELECT  
46752 be/4 postgres   34.21 M/s    0.00 B/s  0.00 % 70.09 % postgres: postgres postgres [local] SELECT  

加热后IO

... ...    
  
Total DISK READ :     527.32 M/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:     527.24 M/s | Actual DISK WRITE:      30.77 K/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
47108 be/4 postgres   11.77 M/s    0.00 B/s  0.00 %  4.71 % postgres: postgres postgres [local] SELECT  
47025 be/4 postgres   10.55 M/s    0.00 B/s  0.00 %  4.39 % postgres: postgres postgres [local] SELECT  
47115 be/4 postgres    9.28 M/s    0.00 B/s  0.00 %  4.30 % postgres: postgres postgres [local] SELECT  
47061 be/4 postgres   13.27 M/s    0.00 B/s  0.00 %  4.23 % postgres: postgres postgres [local] SELECT  
47082 be/4 postgres   10.49 M/s    0.00 B/s  0.00 %  4.21 % postgres: postgres postgres [local] SELECT  
47111 be/4 postgres    6.54 M/s    0.00 B/s  0.00 %  4.18 % postgres: postgres postgres [local] idle    
47071 be/4 postgres    6.46 M/s    0.00 B/s  0.00 %  4.15 % postgres: postgres postgres [local] idle    
47018 be/4 postgres    9.13 M/s    0.00 B/s  0.00 %  4.11 % postgres: postgres postgres [local] idle    
47087 be/4 postgres    5.77 M/s    0.00 B/s  0.00 %  4.09 % postgres: postgres postgres [local] idle    
47105 be/4 postgres    8.89 M/s    0.00 B/s  0.00 %  4.08 % postgres: postgres postgres [local] BINDCT  
47069 be/4 postgres    8.46 M/s    0.00 B/s  0.00 %  4.05 % postgres: postgres postgres [local] SELECT  
47106 be/4 postgres    8.89 M/s    0.00 B/s  0.00 %  3.91 % postgres: postgres postgres [local] idle    
47053 be/4 postgres    6.48 M/s    0.00 B/s  0.00 %  3.91 % postgres: postgres postgres [local] SELECT  
47028 be/4 postgres    9.71 M/s    0.00 B/s  0.00 %  3.83 % postgres: postgres postgres [local] idle    
47112 be/4 postgres    6.72 M/s    0.00 B/s  0.00 %  3.82 % postgres: postgres postgres [local] SELECT  
47039 be/4 postgres    7.63 M/s    0.00 B/s  0.00 %  3.81 % postgres: postgres postgres [local] BIND    

3、活跃数据500亿

QPS: 66678

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=50000000000  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 3600 s  
number of transactions actually processed: 240046184  
latency average = 0.960 ms  
latency stddev = 1.660 ms  
tps = 66678.433880 (including connections establishing)  
tps = 66678.672147 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.001  \set aid random_gaussian(1, :range, 10.0)  
         0.958  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

IO

Total DISK READ :       2.45 G/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:       2.45 G/s | Actual DISK WRITE:       0.00 B/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
47230 be/4 postgres   39.87 M/s    0.00 B/s  0.00 % 87.93 % postgres: postgres postgres [local] SELECT  
47218 be/4 postgres   32.12 M/s    0.00 B/s  0.00 % 87.85 % postgres: postgres postgres [local] SELECT  
47196 be/4 postgres   38.54 M/s    0.00 B/s  0.00 % 87.77 % postgres: postgres postgres [local] SELECT  
47250 be/4 postgres   32.52 M/s    0.00 B/s  0.00 % 87.73 % postgres: postgres postgres [local] SELECT  
47210 be/4 postgres   35.25 M/s    0.00 B/s  0.00 % 87.64 % postgres: postgres postgres [local] SELECT  
47173 be/4 postgres   35.29 M/s    0.00 B/s  0.00 % 87.63 % postgres: postgres postgres [local] SELECT  
47220 be/4 postgres   36.14 M/s    0.00 B/s  0.00 % 87.63 % postgres: postgres postgres [local] SELECT  
47243 be/4 postgres   44.79 M/s    0.00 B/s  0.00 % 87.61 % postgres: postgres postgres [local] SELECT  
47149 be/4 postgres   48.33 M/s    0.00 B/s  0.00 % 87.55 % postgres: postgres postgres [local] SELECT  
47245 be/4 postgres   44.83 M/s    0.00 B/s  0.00 % 87.54 % postgres: postgres postgres [local] SELECT  
47254 be/4 postgres   29.74 M/s    0.00 B/s  0.00 % 87.53 % postgres: postgres postgres [local] SELECT  
47253 be/4 postgres   41.24 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47162 be/4 postgres   30.31 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47229 be/4 postgres   29.40 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47234 be/4 postgres   37.08 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47242 be/4 postgres   40.28 M/s    0.00 B/s  0.00 % 87.46 % postgres: postgres postgres [local] SELECT  
47186 be/4 postgres   36.05 M/s    0.00 B/s  0.00 % 87.44 % postgres: postgres postgres [local] SELECT  
47165 be/4 postgres   33.66 M/s    0.00 B/s  0.00 % 87.43 % postgres: postgres postgres [local] SELECT  

4、活跃数据1000亿

QPS: 67295

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=100000000000  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 3600 s  
number of transactions actually processed: 242265704  
latency average = 0.951 ms  
latency stddev = 2.313 ms  
tps = 67295.523254 (including connections establishing)  
tps = 67295.778158 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.949  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

IO

Total DISK READ :       2.24 G/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:       2.24 G/s | Actual DISK WRITE:      54.79 K/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
47932 be/4 postgres   33.54 M/s    0.00 B/s  0.00 % 88.36 % postgres: postgres postgres [local] SELECT  
48010 be/4 postgres   33.41 M/s    0.00 B/s  0.00 % 88.27 % postgres: postgres postgres [local] SELECT  
48021 be/4 postgres   34.10 M/s    0.00 B/s  0.00 % 88.21 % postgres: postgres postgres [local] SELECT  
48049 be/4 postgres   32.14 M/s    0.00 B/s  0.00 % 88.20 % postgres: postgres postgres [local] SELECT  
48048 be/4 postgres   33.34 M/s    0.00 B/s  0.00 % 88.18 % postgres: postgres postgres [local] SELECT  
47988 be/4 postgres   31.79 M/s    0.00 B/s  0.00 % 88.11 % postgres: postgres postgres [local] SELECT  
48007 be/4 postgres   26.25 M/s    0.00 B/s  0.00 % 88.07 % postgres: postgres postgres [local] SELECT  
48013 be/4 postgres   35.37 M/s    0.00 B/s  0.00 % 88.07 % postgres: postgres postgres [local] SELECT  
47949 be/4 postgres   36.25 M/s    0.00 B/s  0.00 % 88.04 % postgres: postgres postgres [local] SELECT  
47979 be/4 postgres   44.90 M/s    0.00 B/s  0.00 % 88.02 % postgres: postgres postgres [local] SELECT  
48047 be/4 postgres   39.64 M/s    0.00 B/s  0.00 % 87.97 % postgres: postgres postgres [local] SELECT  
48038 be/4 postgres   39.24 M/s    0.00 B/s  0.00 % 87.93 % postgres: postgres postgres [local] SELECT  
48034 be/4 postgres   38.02 M/s    0.00 B/s  0.00 % 87.89 % postgres: postgres postgres [local] SELECT  
48019 be/4 postgres   35.99 M/s    0.00 B/s  0.00 % 87.88 % postgres: postgres postgres [local] SELECT  
48046 be/4 postgres   32.00 M/s    0.00 B/s  0.00 % 87.88 % postgres: postgres postgres [local] SELECT  

tpcb 1000亿 读写测试

1、活跃数据10亿

TPS: 95119

QPS: 475595

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=1000000000   
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 34244287  
latency average = 0.673 ms  
latency stddev = 0.394 ms  
tps = 95116.186279 (including connections establishing)  
tps = 95119.886927 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.046  BEGIN;  
         0.133  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.077  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.104  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.088  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.074  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.146  END;  

2、活跃数据100亿

TPS: 85278

QPS: 426390

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=10000000000  
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 30702466  
latency average = 0.750 ms  
latency stddev = 1.518 ms  
tps = 85275.759706 (including connections establishing)  
tps = 85278.402619 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.047  BEGIN;  
         0.193  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.082  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.108  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.093  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.078  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.144  END;  

IO

Total DISK READ :     124.77 M/s | Total DISK WRITE :     846.78 M/s  
Actual DISK READ:     124.01 M/s | Actual DISK WRITE:     820.10 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
31430 be/4 postgres   27.03 K/s   44.03 M/s  0.00 % 22.13 % postgres: walwriter  
49767 be/4 postgres 1629.40 K/s    8.68 M/s  0.00 %  3.03 % postgres: postgres postgres [local] UPDATE  
49771 be/4 postgres    2.73 M/s    8.63 M/s  0.00 %  2.34 % postgres: postgres postgres [local] UPDATE               
49742 be/4 postgres    2.77 M/s    8.26 M/s  0.00 %  2.31 % postgres: postgres postgres [local] UPDATE               
49787 be/4 postgres 1343.68 K/s    9.27 M/s  0.00 %  2.29 % postgres: postgres postgres [local] UPDATE               
49785 be/4 postgres    3.05 M/s    8.64 M/s  0.00 %  2.27 % postgres: postgres postgres [local] UPDATE               
49776 be/4 postgres 1783.85 K/s   11.64 M/s  0.00 %  2.05 % postgres: postgres postgres [local] UPDATE               
49774 be/4 postgres    4.22 M/s    8.13 M/s  0.00 %  2.05 % postgres: postgres postgres [local] UPDATE  
49775 be/4 postgres  671.84 K/s    8.79 M/s  0.00 %  2.04 % postgres: postgres postgres [local] UPDATE               
49786 be/4 postgres 1220.12 K/s    8.15 M/s  0.00 %  2.04 % postgres: postgres postgres [local] UPDATE               
49772 be/4 postgres 1003.90 K/s    8.77 M/s  0.00 %  2.02 % postgres: postgres postgres [local] UPDATE               
49697 be/4 postgres    2.56 M/s    8.69 M/s  0.00 %  2.01 % postgres: postgres postgres [local] UPDATE  
49803 be/4 postgres  733.62 K/s    8.22 M/s  0.00 %  2.00 % postgres: postgres postgres [local] UPDATE  
49806 be/4 postgres    2.84 M/s   10.23 M/s  0.00 %  1.99 % postgres: postgres postgres [local] UPDATE               
49804 be/4 postgres 1783.85 K/s    8.60 M/s  0.00 %  1.98 % postgres: postgres postgres [local] UPDATE               
49766 be/4 postgres  478.78 K/s   14.62 M/s  0.00 %  1.97 % postgres: postgres postgres [local] UPDATE               
49770 be/4 postgres    2.29 M/s    8.82 M/s  0.00 %  1.96 % postgres: postgres postgres [local] UPDATE  
49715 be/4 postgres    3.20 M/s    8.37 M/s  0.00 %  1.96 % postgres: postgres postgres [local] UPDATE               
49810 be/4 postgres    3.35 M/s    9.68 M/s  0.00 %  1.93 % postgres: postgres postgres [local] UPDATE  
49780 be/4 postgres    2.56 M/s    8.22 M/s  0.00 %  1.92 % postgres: postgres postgres [local] UPDATE               
49784 be/4 postgres    3.39 M/s    8.35 M/s  0.00 %  1.92 % postgres: postgres postgres [local] UPDATE               
49734 be/4 postgres    2.31 M/s    8.66 M/s  0.00 %  1.91 % postgres: postgres postgres [local] UPDATE               
49800 be/4 postgres 2023.24 K/s    8.55 M/s  0.00 %  1.90 % postgres: postgres postgres [local] UPDATE  
49794 be/4 postgres 1629.40 K/s    8.88 M/s  0.00 %  1.85 % postgres: postgres postgres [local] UPDATE               
49765 be/4 postgres    3.24 M/s    8.67 M/s  0.00 %  1.80 % postgres: postgres postgres [local] UPDATE               
49724 be/4 postgres 1173.79 K/s   10.08 M/s  0.00 %  1.79 % postgres: postgres postgres [local] UPDATE               
49728 be/4 postgres 1413.18 K/s    8.63 M/s  0.00 %  1.78 % postgres: postgres postgres [local] UPDATE  
49781 be/4 postgres 1436.34 K/s    7.93 M/s  0.00 %  1.76 % postgres: postgres postgres [local] UPDATE               
49790 be/4 postgres 1096.56 K/s    8.60 M/s  0.00 %  1.76 % postgres: postgres postgres [local] UPDATE               

3、活跃数据500亿

TPS: 38301

QPS: 191505

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=50000000000   
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 13790704  
latency average = 1.671 ms  
latency stddev = 2.620 ms  
tps = 38299.935890 (including connections establishing)  
tps = 38301.102322 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.031  BEGIN;  
         1.274  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.068  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.103  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.076  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.058  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.056  END;  

IO

Total DISK READ :    1508.55 M/s | Total DISK WRITE :     618.92 M/s  
Actual DISK READ:    1507.72 M/s | Actual DISK WRITE:     450.33 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
49510 be/4 postgres   21.45 M/s    5.40 M/s  0.00 % 61.29 % postgres: postgres postgres [local] UPDATE               
49507 be/4 postgres   26.85 M/s    5.61 M/s  0.00 % 60.79 % postgres: postgres postgres [local] UPDATE               
49456 be/4 postgres   28.59 M/s    5.67 M/s  0.00 % 60.42 % postgres: postgres postgres [local] UPDATE               
49436 be/4 postgres   23.76 M/s    5.23 M/s  0.00 % 60.31 % postgres: postgres postgres [local] UPDATE               
49516 be/4 postgres   21.82 M/s    5.40 M/s  0.00 % 59.84 % postgres: postgres postgres [local] UPDATE               
49414 be/4 postgres   20.77 M/s    5.44 M/s  0.00 % 59.84 % postgres: postgres postgres [local] UPDATE               
49503 be/4 postgres   20.13 M/s    6.15 M/s  0.00 % 59.81 % postgres: postgres postgres [local] UPDATE               
49410 be/4 postgres   29.23 M/s    5.52 M/s  0.00 % 59.73 % postgres: postgres postgres [local] UPDATE               
49427 be/4 postgres   18.61 M/s    5.18 M/s  0.00 % 59.71 % postgres: postgres postgres [local] idle in transaction  
49501 be/4 postgres   17.22 M/s    5.60 M/s  0.00 % 59.70 % postgres: postgres postgres [local] UPDATE               
49493 be/4 postgres   24.60 M/s    7.22 M/s  0.00 % 59.66 % postgres: postgres postgres [local] UPDATE               
49512 be/4 postgres   23.08 M/s    5.53 M/s  0.00 % 59.65 % postgres: postgres postgres [local] UPDATE               
49509 be/4 postgres   24.04 M/s    5.64 M/s  0.00 % 59.55 % postgres: postgres postgres [local] UPDATE               
49490 be/4 postgres   17.89 M/s    5.62 M/s  0.00 % 59.55 % postgres: postgres postgres [local] UPDATE             

4、活跃数据1000亿

TPS: 35189

QPS: 175945

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=100000000000   
  
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 12670591  
latency average = 1.818 ms  
latency stddev = 3.928 ms  
tps = 35188.224787 (including connections establishing)  
tps = 35189.625697 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.032  BEGIN;  
         1.392  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.072  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.112  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.086  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.061  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.059  END;  

IO

Total DISK READ :    1824.52 M/s | Total DISK WRITE :     241.53 M/s  
Actual DISK READ:    1821.44 M/s | Actual DISK WRITE:     237.31 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
48918 be/4 postgres   27.19 M/s    0.00 B/s  0.00 % 67.37 % postgres: postgres postgres [local] UPDATE  
48893 be/4 postgres   32.32 M/s    0.00 B/s  0.00 % 67.29 % postgres: postgres postgres [local] UPDATE               
48914 be/4 postgres   23.47 M/s    0.00 B/s  0.00 % 67.29 % postgres: postgres postgres [local] UPDATE  
48889 be/4 postgres   33.45 M/s    0.00 B/s  0.00 % 67.23 % postgres: postgres postgres [local] UPDATE               
48904 be/4 postgres   34.80 M/s    0.00 B/s  0.00 % 67.21 % postgres: postgres postgres [local] UPDATE  
48861 be/4 postgres   30.88 M/s    7.81 K/s  0.00 % 67.06 % postgres: postgres postgres [local] UPDATE  
48910 be/4 postgres   27.16 M/s  328.11 K/s  0.00 % 67.02 % postgres: postgres postgres [local] UPDATE               
48821 be/4 postgres   27.54 M/s    7.81 K/s  0.00 % 67.01 % postgres: postgres postgres [local] UPDATE  
48825 be/4 postgres   35.88 M/s    0.00 B/s  0.00 % 66.89 % postgres: postgres postgres [local] UPDATE               
48930 be/4 postgres   31.68 M/s    7.81 K/s  0.00 % 66.82 % postgres: postgres postgres [local] UPDATE  
48867 be/4 postgres   26.99 M/s    7.81 K/s  0.00 % 66.81 % postgres: postgres postgres [local] idle in transaction  
48929 be/4 postgres   25.61 M/s    7.81 K/s  0.00 % 66.77 % postgres: postgres postgres [local] UPDATE  
48894 be/4 postgres   24.08 M/s    0.00 B/s  0.00 % 66.67 % postgres: postgres postgres [local] UPDATE               
48921 be/4 postgres   32.90 M/s  640.60 K/s  0.00 % 66.66 % postgres: postgres postgres [local] UPDATE  
48925 be/4 postgres   27.30 M/s    0.00 B/s  0.00 % 66.63 % postgres: postgres postgres [local] UPDATE  
48829 be/4 postgres   24.85 M/s    0.00 B/s  0.00 % 66.63 % postgres: postgres postgres [local] idle                 
48901 be/4 postgres   29.57 M/s    0.00 B/s  0.00 % 66.62 % postgres: postgres postgres [local] UPDATE  
48927 be/4 postgres   24.33 M/s    7.81 K/s  0.00 % 66.59 % postgres: postgres postgres [local] UPDATE  
48933 be/4 postgres   27.85 M/s    7.81 K/s  0.00 % 66.57 % postgres: postgres postgres [local] BINDTE  
48890 be/4 postgres   27.79 M/s    0.00 B/s  0.00 % 66.56 % postgres: postgres postgres [local] UPDATE  
48931 be/4 postgres   30.29 M/s    0.00 B/s  0.00 % 66.55 % postgres: postgres postgres [local] UPDATE  

其他测试

同步多副本环境

《PostgreSQL 一主多从(多副本,强同步)简明手册 - 配置、压测、监控、切换、防脑裂、修复、0丢失 - 珍藏级》

其中一个备库使用zfs存储,开启lz4压缩,提供闪回,备份能力。

创建备库,单个备库的创建速度约500MB/s,15TB的库,需要9个半小时创建完成。

[root@pg11 ~]# dstat
You did not select any stats, using -cdngy by default.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  4   7  85   4   0   0| 289M  164M|   0     0 |   0     0 |  41k   80k
  0   1  99   0   0   0| 508M    0 |1218k 1007M|   0     0 |  40k 1559 
  0   1  99   0   0   0| 500M    0 |1233k 1004M|   0     0 |  41k 1673 
  0   1  99   0   0   0| 492M    0 |1206k  994M|   0     0 |  40k 1576 
  0   1  99   0   0   0| 508M    0 |1245k 1015M|   0     0 |  41k 1601 
  0   1  99   0   0   0| 516M    0 |1257k 1021M|   0     0 |  42k 1576 
  0   2  98   0   0   0| 520M    0 |1300k 1044M|   0     0 |  44k 1891 

[root@pg11 ~]# top -c -u postgres

top - 15:09:33 up 2 days,  4:48,  2 users,  load average: 0.41, 3.98, 22.70
Tasks: 516 total,   2 running, 514 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.8 sy,  0.0 ni, 99.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 52819500+total, 29681988 free,  3079916 used, 49543308+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 52057548+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                  
50168 postgres  20   0 66.436g  14264  12248 R  35.1  0.0   0:37.57 postgres: walsender postgres 172.17.20.28(48412) sending backup "pg_basebackup base backup"                                                                              
50176 postgres  20   0 66.436g  20416  18400 S  24.2  0.0   0:18.35 postgres: walsender postgres 172.17.20.29(65032) sending backup "pg_basebackup base backup"                                                                              
50154 postgres  20   0 66.433g 1.979g 1.978g S   0.0  0.4   0:01.41 /usr/pgsql-11/bin/postgres                                                                                                                                               
50155 postgres  20   0  245148   2036    596 S   0.0  0.0   0:00.00 postgres: logger                                                                                                                                                         
50157 postgres  20   0 66.434g 531212 529720 S   0.0  0.1   0:00.43 postgres: checkpointer                                                                                                                                                   
50158 postgres  20   0 66.434g 530592 529120 S   0.0  0.1   0:00.54 postgres: background writer                                                                                                                                              
50159 postgres  20   0 66.433g 526884 525420 S   0.0  0.1   0:00.28 postgres: walwriter                                                                                                                                                      
50160 postgres  20   0 66.436g   3224   1392 S   0.0  0.0   0:00.00 postgres: autovacuum launcher                                                                                                                                            
50161 postgres  20   0  247404   2252    692 S   0.0  0.0   0:00.00 postgres: stats collector                                                                                                                                                
50162 postgres  20   0 66.436g   2844   1096 S   0.0  0.0   0:00.00 postgres: logical replication launcher                                                                                                                                   
50169 postgres  20   0 66.436g   4124   1944 S   0.0  0.0   0:00.01 postgres: walsender postgres 172.17.20.28(48414) streaming 101/44000140                                                                                                  
50177 postgres  20   0 66.436g   3796   1872 S   0.0  0.0   0:00.00 postgres: walsender postgres 172.17.20.29(65034) streaming 101/44000140                                                                                                  

配置步骤

1、配置pg_hba.conf

host replication all xxx.xxx.xxx.xxx/32 md5

2、创建replication角色用户

create role repxxx login replication encrypted password 'xxx';

3、pg_basebackup拉取数据,注意开启SLOT,否则对于很大的数据库,可能数据备份结束后,WAL已经在主库被清除了。(有WAL归档的情况下,可以不开启SLOT。开启SLOT后,未被备库拉取的WAL不会被清除。)

export PGPASSWORD=xxx

nohup pg_basebackup -F p -D $PGDATA -h 172.17.20.30 -p 1921 -U repxxx -X stream -C -S standby1 >/dev/null 2>&1 &
  
  
nohup pg_basebackup -F p -D $PGDATA -h 172.17.20.30 -p 1921 -U repxxx -X stream -C -S standby2 >/dev/null 2>&1 &

4、配置recovery.conf

recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=172.17.20.30 port=1921 user=repxxx password=xxx'          # e.g. 'host=localhost port=5432'
primary_slot_name = 'standby1'


recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=172.17.20.30 port=1921 user=repxxx password=xxx'          # e.g. 'host=localhost port=5432'
primary_slot_name = 'standby2'

5、启动备库

6、主库配置多副本参数

synchronous_standby_names = 'ANY 1 (*)'
synchronous_commit = remote_write 


postgres=# show synchronous_commit ;
 synchronous_commit 
--------------------
 remote_write
(1 row)

postgres=# show synchronous_standby_names ;
 synchronous_standby_names 
---------------------------
 ANY 1 (*)
(1 row)

7、tpcb 1000亿(活跃10亿) rw 测试, QPS 5.34万。

transaction type: ./rw.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 360 s
number of transactions actually processed: 19239053
latency average = 1.198 ms
latency stddev = 1.208 ms
tps = 53432.922774 (including connections establishing)
tps = 53435.051257 (excluding connections establishing)
statement latencies in milliseconds:
         0.003  \set aid random_gaussian(1, :range, 10.0)
         0.001  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.043  BEGIN;
         0.122  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.068  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.089  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.080  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.064  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.726  END;

8、tpcc 1000W , tps : 11000 , tpmC : 66W.

SQL statistics:
    queries performed:
        read:                            513698640
        write:                           533163795
        other:                           79242254
        total:                           1126104689
    transactions:                        39605095 (11000.27 per sec.)
    queries:                             1126104689 (312774.25 per sec.)
    ignored errors:                      172646 (47.95 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          3600.3729s
    total number of events:              39605095

Latency (ms):
         min:                                    0.29
         avg:                                    5.81
         max:                                 1388.75
         95th percentile:                       11.87
         sum:                            230276426.25

Threads fairness:
    events (avg/stddev):           618829.6094/1712.80
    execution time (avg/stddev):   3598.0692/0.06

flashback 闪回

《PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)验证 - recovery test script for zfs snapshot clone + postgresql stream replication + archive》

主备切换

《PostgreSQL 一主多从(多副本,强同步)简明手册 - 配置、压测、监控、切换、防脑裂、修复、0丢失 - 珍藏级》

修复主库, pg_rewind

《PostgreSQL primary-standby failback tools : pg_rewind》

《PostgreSQL 9.5 new feature - pg_rewind fast sync Split Brain Primary & Standby》

《PostgreSQL 9.5 add pg_rewind for Fast align for PostgreSQL unaligned primary & standby》

小结

1、8K fsync IO RT

1 ecs本地ssd

22 us    

2 essd

119 us    

2、8K directIO

1 ecs本地ssd

离散读

IOPS=146k, BW=1140MiB/s (1196MB/s)(200GiB/180004msec)    

离散写

IOPS=78.6k, BW=614MiB/s (644MB/s)(108GiB/180004msec)    

顺序读

IOPS=147k, BW=1150MiB/s (1206MB/s)(202GiB/180004msec)    

顺序写

IOPS=79.3k, BW=620MiB/s (650MB/s)(109GiB/180004m    

2 essd

离散读

IOPS=112k, BW=879MiB/s (922MB/s)(154GiB/180008msec)    

离散写

IOPS=60.6k, BW=473MiB/s (496MB/s)(83.2GiB/180008msec)    

顺序读

IOPS=133k, BW=1037MiB/s (1087MB/s)(182GiB/180005msec)    

顺序写

IOPS=71.4k, BW=558MiB/s (585MB/s)(98.1GiB/180005msec)    

3、tpcc 1000W (ESSD)

96.4万 tpmC

4、tpcb 1000亿 只读 (ESSD)

1、活跃数据10亿

QPS: 998818

2、活跃数据100亿

QPS: 597877

3、活跃数据500亿

QPS: 66678

4、活跃数据1000亿

QPS: 67295

5、tpcb 1000亿 读写 (ESSD)

1、活跃数据10亿

TPS: 95119

QPS: 475595

2、活跃数据100亿

TPS: 85278

QPS: 426390

3、活跃数据500亿

TPS: 38301

QPS: 191505

4、活跃数据1000亿

TPS: 35189

QPS: 175945

性能小结

环境:阿里云 ECS + 32T ESSD

表SIZE: 12.5 TB 写入耗时 25h52min

索引SIZE: 2 TB 创建耗时 10h50min

索引深度: 5级

单表数据量 TEST CASE QPS TPS
10 * 100W tpcc 1000W - 96.4万 tpmC
10 * 100W tpcc 1000W(同步多副本) - 66万 tpmC
1000亿 tpcb 活跃数据10亿 只读 998818 998818
1000亿 tpcb 活跃数据100亿 只读 597877 597877
1000亿 tpcb 活跃数据500亿 只读 66678 66678
1000亿 tpcb 活跃数据1000亿 只读 67295 67295
1000亿 tpcb 活跃数据10亿 读写 475595 95119
1000亿 tpcb 活跃数据10亿 读写(同步多副本) 267160 53432
1000亿 tpcb 活跃数据100亿 读写 426390 85278
1000亿 tpcb 活跃数据500亿 读写 191505 38301
1000亿 tpcb 活跃数据1000亿 读写 175945 35189

阿里云ESSD的引入,结合PostgreSQL企业级开源数据库(良好的性能、可管理海量数据、功能对齐Oracle,不仅ESSD层面提供多副本,同时数据库层面也支持通过quorum based replication多副本提供金融级的可靠性,使用ZFS秒级快照,闪回等企业特性),给企业大容量关系数据库上云提供了便利。

对于PG企业用户,可以选择阿里云PG企业版PPAS,拥有以上所有特性的同时,提供ORACLE兼容性。

参考

《fio测试IO性能》

ECS 本地SSD云盘(8*1.8TB条带)测试:

《PostgreSQL 100亿 tpcb 性能 on ECS》

《PostgreSQL 11 tpcc 测试(103万tpmC on ECS) - use sysbench-tpcc by Percona-Lab》

《(TPC-H测试 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen》

《PostgreSQL 10 on ECS 实施 流复制备库镜像+自动快照备份+自动备份验证+自动清理备份与归档 - 珍藏级》

https://help.aliyun.com/knowledge_detail/64950.html

Flag Counter

digoal’s 大量PostgreSQL文章入口