PostgreSQL 设计优化case - 大宽表任意字段组合查询索引如何选择(btree, gin, rum) - (含单个索引列数超过32列的方法)

20 minute read

背景

大宽表,任意字段组合查询,透视。是实时分析系统中的常见需求:

1、实时写入。

2、实时任意字段组合查询、透视。

PostgreSQL中,有多种方法支持这种场景:

《PostgreSQL 9种索引的原理和应用场景》

1、方法1,每个字段加一个索引(普通类型用btree, 多值类型(数组,json,全文检索,枚举等类型)用gin, 空间、范围类型用gist,时序类型用brin)。

2、方法2,使用单个复合索引,把所有字段放到GIN倒排索引接口中。

3、方法3,使用单个复合索引,把所有字段放到RUM索引接口中。

那么这三种方法,应该选哪种更好呢?

或者说选哪种,可以在写入、查询性能上获得更好的平衡。

让单个复合索引支持超过32个字段

注意,PG默认情况下,仅支持一个索引中,最多放32个字段。

通过修改 src/include/pg_config_manual.h ,重新编译,可以支持更多的字段。但是注意,支持的字段越多,索引的头信息会越大。

vi src/include/pg_config_manual.h  
  
/*  
 * Maximum number of columns in an index.  There is little point in making  
 * this anything but a multiple of 32, because the main cost is associated  
 * with index tuple header size (see access/itup.h).  
 *  
 * Changing this requires an initdb.  
 */  
// #define INDEX_MAX_KEYS   32  
#define INDEX_MAX_KEYS   128  
src/include/access/itup.h  
  
/*  
 * Index tuple header structure  
 *  
 * All index tuples start with IndexTupleData.  If the HasNulls bit is set,  
 * this is followed by an IndexAttributeBitMapData.  The index attribute  
 * values follow, beginning at a MAXALIGN boundary.  
 *  
 * Note that the space allocated for the bitmap does not vary with the number  
 * of attributes; that is because we don't have room to store the number of  
 * attributes in the header.  Given the MAXALIGN constraint there's no space  
 * savings to be had anyway, for usual values of INDEX_MAX_KEYS.  
 */  

如果字段允许NULL,则有一个数据结构IndexAttributeBitMapData用来表示哪个字段的值是NULL(与TUPLE的头信息类似,也有NULL BITMAP的表示)。

因此如果字段有NULL,则索引条目的的头信息中,会多出若干BIT,是固定大小的。

typedef struct IndexAttributeBitMapData  
{  
        bits8           bits[(INDEX_MAX_KEYS + 8 - 1) / 8];  
}   

INDEX_MAX_KEYS = 32 时,4字节。

对比每列独立索引、GIN单个全字段复合索引、RUM单个全字段复合索引。

安装测试PostgreSQL软件

1、编译软件

wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2  
tar -jxvf postgresql-snapshot.tar.bz2  
cd postgresql-12devel  
  
参考上面的章节修改源码 src/include/pg_config_manual.h ,支持单个索引放128个字段。  
  
export USE_NAMED_POSIX_SEMAPHORES=1  
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql12  
LIBS=-lpthread CFLAGS="-O3" make world -j 128  
LIBS=-lpthread CFLAGS="-O3" make install-world  

安装rum插件,略。

https://github.com/postgrespro/rum

2、设置环境变量

vi env12.sh  
export PS1="$USER@`/bin/hostname -s`-> "    
export PGPORT=8000  
export PGDATA=/data01/pg/pg_root$PGPORT    
export LANG=en_US.utf8    
export PGHOME=/home/digoal/pgsql12  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH    
export DATE=`date +"%Y%m%d%H%M"`    
export PATH=$PGHOME/bin:$PATH:.    
export MANPATH=$PGHOME/share/man:$MANPATH    
export PGHOST=$PGDATA    
export PGUSER=postgres    
export PGDATABASE=postgres    
alias rm='rm -i'    
alias ll='ls -lh'    
unalias vi  
  
  
. ./env12.sh  

3、初始化数据库集群

initdb -D $PGDATA -U postgres -X /data02/pg/pg_wal_8000 -E SQL_ASCII --locale=C  

4、修改数据库配置

vi $PGDATA/postgresql.conf  
  
listen_addresses = '0.0.0.0'  
port = 8000  
max_connections = 1000  
unix_socket_directories = '/tmp,.'  
shared_buffers = 32GB  
maintenance_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 10.0  
effective_io_concurrency = 0  
max_worker_processes = 128  
wal_level = replica  
synchronous_commit = off  
wal_buffers = 16MB  
wal_writer_delay = 10ms  
checkpoint_timeout = 35min  
max_wal_size = 64GB  
min_wal_size = 16GB  
checkpoint_completion_target = 0.1  
random_page_cost = 1.1  
log_destination = 'csvlog'  
logging_collector = on  
log_truncate_on_rotation = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose    
log_timezone = 'PRC'  
log_autovacuum_min_duration = 0    
autovacuum_freeze_max_age = 1200000000  
autovacuum_multixact_freeze_max_age = 1400000000  
autovacuum_vacuum_cost_delay = 0ms  
vacuum_freeze_table_age = 1150000000  
vacuum_multixact_freeze_table_age = 1150000000  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'C'  
lc_monetary = 'C'  
lc_numeric = 'C'  
lc_time = 'C'  
default_text_search_config = 'pg_catalog.english'  
jit = off  

5、启动数据库

pg_ctl start  

构建测试表

1、分别创建a b c 3张表,结构一样,都是128个字段。按要求创建索引

每列一个BTREE索引

do language plpgsql $$   
declare  
  sql text := 'create table a (';  
begin  
  for i in 1..128 loop  
    sql := sql || 'c' || i || ' int default random()*2000000000,';  
  end loop;  
  sql := rtrim(sql,',');  
  sql := sql || ')';  
  execute sql;  
  for i in 1..128 loop  
    execute 'create index idx_a_'||i||' on a (c'||i||')';  
  end loop;  
end;  
$$;  

所有列放到一个GIN索引中

(注意GIN索引有FASTUPDATE参数,开启的时候,写入性能很好,但是会延迟合并PENDING TUPLES,写入快了,查询就可能慢(如果在PENDING TUPLES合并慢,并且很大时,越慢),后面会有例子)

create extension btree_gin;  
  
do language plpgsql $$   
declare  
  sql text := 'create table b (';  
begin  
  for i in 1..128 loop  
    sql := sql || 'c' || i || ' int default random()*2000000000,';  
  end loop;  
  sql := rtrim(sql,',');  
  sql := sql || ')';  
  execute sql;  
    
  sql := 'create index idx_b_1 on b using gin (';  
  for i in 1..128 loop  
    sql := sql || 'c' || i || ',';  
  end loop;  
  sql := rtrim(sql,',');  
  sql := sql || ')';  
  execute sql;  
end;  
$$;  

所有列放到一个RUM索引中

create extension rum;  
  
do language plpgsql $$   
declare  
  sql text := 'create table c (';  
begin  
  for i in 1..128 loop  
    sql := sql || 'c' || i || ' int default random()*2000000000,';  
  end loop;  
  sql := rtrim(sql,',');  
  sql := sql || ')';  
  execute sql;  
    
  sql := 'create index idx_c_1 on c using rum (';  
  for i in 1..128 loop  
    sql := sql || 'c' || i || ',';  
  end loop;  
  sql := rtrim(sql,',');  
  sql := sql || ')';  
  execute sql;  
end;  
$$;  

性能测试

1、创建三个性能测试脚本,分别用于测试三张表的写入性能

vi test_btree.sql  
\set c1 random(1,2000000000)  
insert into a (c1) values (:c1);  
  
  
vi test_gin.sql  
\set c1 random(1,2000000000)  
insert into b (c1) values (:c1);  
  
  
vi test_rum.sql  
\set c1 random(1,2000000000)  
insert into c (c1) values (:c1);  

2、独立128个btree索引的写入性能。

pgbench -M prepared -n -r -P 1 -f ./test_btree.sql -c 28 -j 28 -T 120  
  
transaction type: ./test_btree.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 28  
number of threads: 28  
duration: 120 s  
number of transactions actually processed: 1906387  
latency average = 1.762 ms  
latency stddev = 0.693 ms  
tps = 15886.268820 (including connections establishing)  
tps = 15887.521828 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set c1 random(1,2000000000)  
         1.761  insert into a (c1) values (:c1);  

1.59万行/s

Samples: 1M of event 'cpu-clock', Event count (approx.): 216077120900   
Overhead  Shared Object        Symbol                                   
  12.07%  postgres             [.] _bt_compare                          
   9.85%  postgres             [.] hash_search_with_hash_value          
   6.44%  [kernel]             [k] _raw_spin_unlock_irqrestore          
   5.33%  postgres             [.] LWLockAcquire                        
   4.17%  [kernel]             [k] __do_softirq                         
   3.90%  [kernel]             [k] run_timer_softirq                    
   3.50%  postgres             [.] PinBuffer                            
   3.20%  postgres             [.] _bt_relandgetbuf                     
   2.24%  libc-2.17.so         [.] __memset_sse2                        
   2.14%  postgres             [.] _bt_moveright                        
   2.07%  postgres             [.] LWLockRelease                        
   1.88%  libc-2.17.so         [.] __memmove_ssse3_back                 
   1.59%  [kernel]             [k] finish_task_switch                   
   1.25%  postgres             [.] LWLockReleaseClearVar                
   1.06%  postgres             [.] MarkBufferDirty                      
   0.99%  libc-2.17.so         [.] __memcpy_ssse3_back   

3、一个GIN复合索引(开启FASTUPDATE)的写入性能

pgbench -M prepared -n -r -P 1 -f ./test_gin.sql -c 28 -j 28 -T 120  
  
transaction type: ./test_gin.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 28  
number of threads: 28  
duration: 120 s  
number of transactions actually processed: 4791575  
latency average = 0.701 ms  
latency stddev = 32.171 ms  
tps = 39929.190873 (including connections establishing)  
tps = 39932.416884 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.001  \set c1 random(1,2000000000)  
         0.700  insert into b (c1) values (:c1);  

3.99万行/s

Samples: 698K of event 'cpu-clock', Event count (approx.): 119873063495   
Overhead  Shared Object        Symbol                                     
   9.70%  libc-2.17.so         [.] __memset_sse2                          
   7.79%  [kernel]             [k] _raw_spin_unlock_irqrestore            
   3.93%  postgres             [.] ExecInitExprRec                        
   3.13%  postgres             [.] AllocSetAlloc                          
   2.69%  postgres             [.] ExecInitFunc                           
   2.12%  postgres             [.] TupleDescInitEntry                     
   2.04%  [kernel]             [k] finish_task_switch                     
   1.64%  postgres             [.] hash_search_with_hash_value            
   1.57%  postgres             [.] SearchCatCache1                        
   1.56%  libc-2.17.so         [.] bsearch                                
   1.54%  libc-2.17.so         [.] __memcpy_ssse3_back                    
   1.40%  postgres             [.] expression_tree_walker                 
   1.33%  postgres             [.] palloc0                                
   1.29%  [kernel]             [k] run_timer_softirq                      
   1.27%  postgres             [.] dispatch_compare_ptr                   
   1.25%  postgres             [.] fmgr_info                              
   0.96%  postgres             [.] LWLockAcquire                   

4、一个RUM复合索引的写入性能

pgbench -M prepared -n -r -P 1 -f ./test_rum.sql -c 28 -j 28 -T 120  
  
  
transaction type: ./test_rum.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 28  
number of threads: 28  
duration: 120 s  
number of transactions actually processed: 453539  
latency average = 7.408 ms  
latency stddev = 11.713 ms  
tps = 3779.393984 (including connections establishing)  
tps = 3779.643084 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set c1 random(1,2000000000)  
         7.406  insert into c (c1) values (:c1);  

0.378万行/s

Samples: 260K of event 'cpu-clock', Event count (approx.): 56451610586   
Overhead  Shared Object       Symbol                                     
  18.58%  postgres            [.] computeRegionDelta                     
   5.74%  postgres            [.] GenericXLogRegisterBuffer              
   5.68%  [kernel]            [k] _raw_spin_unlock_irqrestore            
   5.25%  rum.so              [.] rumtuple_get_attrnum                   
   4.11%  [kernel]            [k] __do_softirq                           
   3.51%  postgres            [.] hash_search_with_hash_value            
   3.44%  rum.so              [.] rumFindLeafPage                        
   3.38%  libc-2.17.so        [.] __memcpy_ssse3_back                    
   3.10%  libc-2.17.so        [.] __memset_sse2                          
   2.77%  [kernel]            [k] run_timer_softirq                      
   2.70%  postgres            [.] LWLockAcquire                          
   2.07%  postgres            [.] PinBuffer                              
   1.71%  rum.so              [.] entryLocateLeafEntry                   
   1.49%  postgres            [.] LWLockReleaseClearVar                  
   1.36%  [kernel]            [k] copy_user_enhanced_fast_string         
   1.07%  postgres            [.] LWLockRelease                          
   0.97%  rum.so              [.] entryLocateEntry                       
   0.89%  postgres            [.] UnpinBuffer.constprop.6  

写入性能小结

很显然,性能最好的是GIN(开启FASTUPDATE)时的性能,其次是独立索引,最后是RUM索引。

查询性能

1、都测试了120秒,写入量如下

postgres=# \dt+  
                    List of relations  
 Schema | Name | Type  |  Owner   |  Size   | Description   
--------+------+-------+----------+---------+-------------  
 public | a    | table | postgres | 993 MB  |   
 public | b    | table | postgres | 2497 MB |   
 public | c    | table | postgres | 237 MB  |   
(2 rows)  
  
  
  
postgres=# select count(*) from a;  
  count    
---------  
 1906387  
(1 row)  
  
postgres=# select count(*) from b;  
  count    
---------  
 4791575  
(1 row)  
  
  
postgres=# select count(*) from c;  
 count    
--------  
 453539  
(1 row)  

2、索引大小,很显然GIN索引很大,超过了所有BTREE索引加起来的大小,并且还没有完全合并所有的pending tuples。

postgres=# \di+  
                          List of relations  
 Schema |   Name    | Type  |  Owner   | Table | Size  | Description   
--------+-----------+-------+----------+-------+-------+-------------  
 public | idx_a_1   | index | postgres | a     | 53 MB |   
 public | idx_a_10  | index | postgres | a     | 53 MB |   
 public | idx_a_100 | index | postgres | a     | 54 MB |   
 public | idx_a_101 | index | postgres | a     | 54 MB |   
 public | idx_a_102 | index | postgres | a     | 53 MB |   
 public | idx_a_103 | index | postgres | a     | 54 MB |   
 public | idx_a_104 | index | postgres | a     | 54 MB |   
 public | idx_a_105 | index | postgres | a     | 53 MB |   
 public | idx_a_106 | index | postgres | a     | 53 MB |   
 public | idx_a_107 | index | postgres | a     | 54 MB |   
 public | idx_a_108 | index | postgres | a     | 54 MB |   
 public | idx_a_109 | index | postgres | a     | 54 MB |   
 public | idx_a_11  | index | postgres | a     | 54 MB |   
 public | idx_a_110 | index | postgres | a     | 53 MB |   
 public | idx_a_111 | index | postgres | a     | 53 MB |   
 public | idx_a_112 | index | postgres | a     | 54 MB |   
 public | idx_a_113 | index | postgres | a     | 54 MB |   
 public | idx_a_114 | index | postgres | a     | 53 MB |   
 public | idx_a_115 | index | postgres | a     | 53 MB |   
 public | idx_a_116 | index | postgres | a     | 54 MB |   
 public | idx_a_117 | index | postgres | a     | 53 MB |   
 public | idx_a_118 | index | postgres | a     | 54 MB |   
 public | idx_a_119 | index | postgres | a     | 54 MB |   
 public | idx_a_12  | index | postgres | a     | 54 MB |   
 public | idx_a_120 | index | postgres | a     | 54 MB |   
 public | idx_a_121 | index | postgres | a     | 53 MB |   
 public | idx_a_122 | index | postgres | a     | 53 MB |   
 public | idx_a_123 | index | postgres | a     | 53 MB |   
 public | idx_a_124 | index | postgres | a     | 54 MB |   
 public | idx_a_125 | index | postgres | a     | 54 MB |   
 public | idx_a_126 | index | postgres | a     | 53 MB |   
 public | idx_a_127 | index | postgres | a     | 53 MB |   
 public | idx_a_128 | index | postgres | a     | 54 MB |   
 public | idx_a_13  | index | postgres | a     | 54 MB |   
 public | idx_a_14  | index | postgres | a     | 54 MB |   
 public | idx_a_15  | index | postgres | a     | 54 MB |   
 public | idx_a_16  | index | postgres | a     | 53 MB |   
 public | idx_a_17  | index | postgres | a     | 53 MB |   
 public | idx_a_18  | index | postgres | a     | 53 MB |   
 public | idx_a_19  | index | postgres | a     | 53 MB |   
 public | idx_a_2   | index | postgres | a     | 53 MB |   
 public | idx_a_20  | index | postgres | a     | 53 MB |   
 public | idx_a_21  | index | postgres | a     | 53 MB |   
 public | idx_a_22  | index | postgres | a     | 53 MB |   
 public | idx_a_23  | index | postgres | a     | 53 MB |   
 public | idx_a_24  | index | postgres | a     | 54 MB |   
 public | idx_a_25  | index | postgres | a     | 53 MB |   
 public | idx_a_26  | index | postgres | a     | 54 MB |   
 public | idx_a_27  | index | postgres | a     | 54 MB |   
 public | idx_a_28  | index | postgres | a     | 53 MB |   
 public | idx_a_29  | index | postgres | a     | 54 MB |   
 public | idx_a_3   | index | postgres | a     | 54 MB |   
 public | idx_a_30  | index | postgres | a     | 54 MB |   
 public | idx_a_31  | index | postgres | a     | 53 MB |   
 public | idx_a_32  | index | postgres | a     | 53 MB |   
 public | idx_a_33  | index | postgres | a     | 53 MB |   
 public | idx_a_34  | index | postgres | a     | 54 MB |   
 public | idx_a_35  | index | postgres | a     | 53 MB |   
 public | idx_a_36  | index | postgres | a     | 53 MB |   
 public | idx_a_37  | index | postgres | a     | 53 MB |   
 public | idx_a_38  | index | postgres | a     | 54 MB |   
 public | idx_a_39  | index | postgres | a     | 54 MB |   
 public | idx_a_4   | index | postgres | a     | 54 MB |   
 public | idx_a_40  | index | postgres | a     | 53 MB |   
 public | idx_a_41  | index | postgres | a     | 54 MB |   
 public | idx_a_42  | index | postgres | a     | 53 MB |   
 public | idx_a_43  | index | postgres | a     | 53 MB |   
 public | idx_a_44  | index | postgres | a     | 53 MB |   
 public | idx_a_45  | index | postgres | a     | 53 MB |   
 public | idx_a_46  | index | postgres | a     | 54 MB |   
 public | idx_a_47  | index | postgres | a     | 54 MB |   
 public | idx_a_48  | index | postgres | a     | 54 MB |   
 public | idx_a_49  | index | postgres | a     | 53 MB |   
 public | idx_a_5   | index | postgres | a     | 54 MB |   
 public | idx_a_50  | index | postgres | a     | 54 MB |   
 public | idx_a_51  | index | postgres | a     | 53 MB |   
 public | idx_a_52  | index | postgres | a     | 54 MB |   
 public | idx_a_53  | index | postgres | a     | 54 MB |   
 public | idx_a_54  | index | postgres | a     | 53 MB |   
 public | idx_a_55  | index | postgres | a     | 54 MB |   
 public | idx_a_56  | index | postgres | a     | 54 MB |   
 public | idx_a_57  | index | postgres | a     | 53 MB |   
 public | idx_a_58  | index | postgres | a     | 53 MB |   
 public | idx_a_59  | index | postgres | a     | 53 MB |   
 public | idx_a_6   | index | postgres | a     | 53 MB |   
 public | idx_a_60  | index | postgres | a     | 54 MB |   
 public | idx_a_61  | index | postgres | a     | 53 MB |   
 public | idx_a_62  | index | postgres | a     | 54 MB |   
 public | idx_a_63  | index | postgres | a     | 54 MB |   
 public | idx_a_64  | index | postgres | a     | 54 MB |   
 public | idx_a_65  | index | postgres | a     | 53 MB |   
 public | idx_a_66  | index | postgres | a     | 54 MB |   
 public | idx_a_67  | index | postgres | a     | 53 MB |   
 public | idx_a_68  | index | postgres | a     | 54 MB |   
 public | idx_a_69  | index | postgres | a     | 54 MB |   
 public | idx_a_7   | index | postgres | a     | 54 MB |   
 public | idx_a_70  | index | postgres | a     | 53 MB |   
 public | idx_a_71  | index | postgres | a     | 54 MB |   
 public | idx_a_72  | index | postgres | a     | 54 MB |   
 public | idx_a_73  | index | postgres | a     | 54 MB |   
 public | idx_a_74  | index | postgres | a     | 54 MB |   
 public | idx_a_75  | index | postgres | a     | 54 MB |   
 public | idx_a_76  | index | postgres | a     | 53 MB |   
 public | idx_a_77  | index | postgres | a     | 54 MB |   
 public | idx_a_78  | index | postgres | a     | 53 MB |   
 public | idx_a_79  | index | postgres | a     | 53 MB |   
 public | idx_a_8   | index | postgres | a     | 53 MB |   
 public | idx_a_80  | index | postgres | a     | 53 MB |   
 public | idx_a_81  | index | postgres | a     | 53 MB |   
 public | idx_a_82  | index | postgres | a     | 54 MB |   
 public | idx_a_83  | index | postgres | a     | 53 MB |   
 public | idx_a_84  | index | postgres | a     | 53 MB |   
 public | idx_a_85  | index | postgres | a     | 54 MB |   
 public | idx_a_86  | index | postgres | a     | 53 MB |   
 public | idx_a_87  | index | postgres | a     | 54 MB |   
 public | idx_a_88  | index | postgres | a     | 53 MB |   
 public | idx_a_89  | index | postgres | a     | 53 MB |   
 public | idx_a_9   | index | postgres | a     | 54 MB |   
 public | idx_a_90  | index | postgres | a     | 54 MB |   
 public | idx_a_91  | index | postgres | a     | 54 MB |   
 public | idx_a_92  | index | postgres | a     | 54 MB |   
 public | idx_a_93  | index | postgres | a     | 53 MB |   
 public | idx_a_94  | index | postgres | a     | 53 MB |   
 public | idx_a_95  | index | postgres | a     | 54 MB |   
 public | idx_a_96  | index | postgres | a     | 53 MB |   
 public | idx_a_97  | index | postgres | a     | 53 MB |   
 public | idx_a_98  | index | postgres | a     | 53 MB |   
 public | idx_a_99  | index | postgres | a     | 54 MB |   
 public | idx_b_1   | index | postgres | b     | 17 GB |   
(129 rows)  
  
  
postgres=# \di+  
                          List of relations  
 Schema |  Name   | Type  |  Owner   | Table |  Size   | Description   
--------+---------+-------+----------+-------+---------+-------------  
 public | idx_c_1 | index | postgres | c     | 2250 MB |   
(1 row)  

3、多列查询CASE 1,一个条件选择性好,一个条件选择性差

postgres=# explain analyze select count(*) from a where c1<100000 and c2<10000000;  
                                                    QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=104.57..104.58 rows=1 width=8) (actual time=0.332..0.332 rows=1 loops=1)  
   ->  Index Scan using idx_a_1 on a  (cost=0.43..104.56 rows=1 width=0) (actual time=0.327..0.327 rows=0 loops=1)  
         Index Cond: (c1 < 100000)  
         Filter: (c2 < 10000000)  
         Rows Removed by Filter: 92  
 Planning Time: 0.977 ms  
 Execution Time: 0.399 ms  
(7 rows)  
  
postgres=# explain analyze select count(*) from b where c1<100000 and c2<10000000;  
                                                             QUERY PLAN                                                               
------------------------------------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=351003.54..351003.55 rows=1 width=8) (actual time=752.842..752.842 rows=1 loops=1)  
   ->  Gather  (cost=351003.32..351003.53 rows=2 width=8) (actual time=752.829..752.834 rows=3 loops=1)  
         Workers Planned: 2  
         Workers Launched: 2  
         ->  Partial Aggregate  (cost=350003.32..350003.33 rows=1 width=8) (actual time=546.163..546.164 rows=1 loops=3)  
               ->  Parallel Seq Scan on b  (cost=0.00..349448.64 rows=221871 width=0) (actual time=499.082..546.152 rows=0 loops=3)  
                     Filter: ((c1 < 100000) AND (c2 < 10000000))  
                     Rows Removed by Filter: 1597191  
 Planning Time: 0.669 ms  
 Execution Time: 792.562 ms  
(10 rows)  
  
  
postgres=# explain analyze select count(*) from c where c1<100000 and c2<10000000;  
                                                    QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=18.71..18.72 rows=1 width=8) (actual time=0.776..0.776 rows=1 loops=1)  
   ->  Index Scan using idx_c_1 on c  (cost=16.50..18.71 rows=1 width=0) (actual time=0.773..0.773 rows=0 loops=1)  
         Index Cond: ((c1 < 100000) AND (c2 < 10000000))  
 Planning Time: 0.077 ms  
 Execution Time: 0.815 ms  
(5 rows)  

显然,性能最好的是独立索引,因为选择性好的一个就好了,只是少量的FILTER

为什么GIN性能不行呢?理论上应该很好呀。

https://www.postgresql.org/docs/devel/static/pageinspect.html#id-1.11.7.31.7

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_b_1', 0));  
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version   
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------  
       230891 |      1842877 |           5600 |         1587579 |          4205010 |             2 |             1 |            0 |         0 |       2  
(1 row)  

原因,开启了FASTUPDATE的情况下,写入超快,但是合并PENDING TUPLES只有一个AUTOVACUUM WORKER,所以合并没有写入快,导致了pending tuples越来越大。如上,有几百万条还没有合并。

强制合并:

vacuum analyze b;  

耗时需要很久很久,所以实际上开启gin索引的fastupdate,如果写入长时间持续太猛了,并不好。因为合并跟不上,查询性能跟不上。

修改GIN索引的fastupdate,关闭,重测

postgres=#   alter index idx_b_1 set (fastupdate =off);  
ALTER INDEX  

重新测试

transaction type: ./test_gin.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 28  
number of threads: 28  
duration: 120 s  
number of transactions actually processed: 1638833  
latency average = 1.985 ms  
latency stddev = 0.572 ms  
tps = 13656.742727 (including connections establishing)  
tps = 13657.785073 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set c1 random(1,2000000000)  
         1.983  insert into b (c1) values (:c1);  

写入速度从3.99万行/s下降到了1.36万行/s

  
Samples: 963K of event 'cpu-clock', Event count (approx.): 191435286728   
Overhead  Shared Object       Symbol                                      
   9.26%  postgres            [.] PinBuffer                               
   8.74%  postgres            [.] LWLockAcquire                           
   7.85%  postgres            [.] gintuple_get_attrnum                    
   6.01%  postgres            [.] UnpinBuffer.constprop.6                 
   5.48%  postgres            [.] hash_search_with_hash_value             
   5.46%  postgres            [.] ginFindLeafPage                         
   4.85%  postgres            [.] LWLockRelease                           
   3.89%  [kernel]            [k] _raw_spin_unlock_irqrestore             
   3.19%  [kernel]            [k] __do_softirq                            
   3.03%  [kernel]            [k] run_timer_softirq                       
   2.89%  postgres            [.] ReadBuffer_common                       
   2.51%  postgres            [.] entryLocateLeafEntry                    
   1.93%  postgres            [.] entryLocateEntry                        
   1.67%  libc-2.17.so        [.] __memset_sse2                           
   1.48%  postgres            [.] gintuple_get_key                        
   1.29%  postgres            [.] LWLockReleaseClearVar                   
   1.21%  libc-2.17.so        [.] __memmove_ssse3_back                    
   1.07%  libc-2.17.so        [.] __memcpy_ssse3_back                     
   0.89%  postgres            [.] MarkBufferDirty                         
   0.89%  postgres            [.] AllocSetAlloc                           
   0.76%  [kernel]            [k] finish_task_switch                      
   0.72%  postgres            [.] XLogInsertRecord            

写入性能下降,比128列独立索引还要慢,可以看到现在没有pending tuples了。

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_b_1', 0));  
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version   
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------  
   4294967295 |   4294967295 |              0 |               0 |                0 |             2 |             1 |            0 |         0 |       2  
(1 row)  

多列查询CASE 1,一个条件选择性好,一个条件选择性差.

现在GIN索引的查询性能还好,但是还是不如128独立列索引。

postgres=# explain analyze select count(*) from b where c1<100000 and c2<10000000;  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=23.13..23.14 rows=1 width=8) (actual time=3.062..3.062 rows=1 loops=1)  
   ->  Bitmap Heap Scan on b  (cost=22.01..23.12 rows=1 width=0) (actual time=3.060..3.060 rows=0 loops=1)  
         Recheck Cond: ((c1 < 100000) AND (c2 < 10000000))  
         ->  Bitmap Index Scan on idx_b_1  (cost=0.00..22.01 rows=1 width=0) (actual time=3.058..3.058 rows=0 loops=1)  
               Index Cond: ((c1 < 100000) AND (c2 < 10000000))  
 Planning Time: 0.053 ms  
 Execution Time: 3.187 ms  
(7 rows)  

4、多列查询CASE 2,两个条件都是大范围,合并后结果集较小

postgres=# explain analyze select count(*) from c where c1<100000000 and c2<100000000;  
                                                          QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=1290.52..1290.53 rows=1 width=8) (actual time=20.199..20.200 rows=1 loops=1)  
   ->  Bitmap Heap Scan on c  (cost=29.59..1287.60 rows=1170 width=0) (actual time=18.299..20.091 rows=1078 loops=1)  
         Recheck Cond: ((c1 < 100000000) AND (c2 < 100000000))  
         Heap Blocks: exact=1061  
         ->  Bitmap Index Scan on idx_c_1  (cost=0.00..29.30 rows=1170 width=0) (actual time=18.179..18.179 rows=1078 loops=1)  
               Index Cond: ((c1 < 100000000) AND (c2 < 100000000))  
 Planning Time: 0.183 ms  
 Execution Time: 20.474 ms  
(8 rows)  
  
  
postgres=# explain analyze select count(*) from b where c1<100000000 and c2<100000000;  
                                                          QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=4567.42..4567.43 rows=1 width=8) (actual time=105.917..105.918 rows=1 loops=1)  
   ->  Bitmap Heap Scan on b  (cost=65.90..4556.98 rows=4176 width=0) (actual time=81.593..105.503 rows=4098 loops=1)  
         Recheck Cond: ((c1 < 100000000) AND (c2 < 100000000))  
         Heap Blocks: exact=4020  
         ->  Bitmap Index Scan on idx_b_1  (cost=0.00..64.86 rows=4176 width=0) (actual time=81.067..81.067 rows=4098 loops=1)  
               Index Cond: ((c1 < 100000000) AND (c2 < 100000000))  
 Planning Time: 10.918 ms  
 Execution Time: 107.486 ms  
(8 rows)  
  
postgres=# explain analyze select count(*) from a where c1<100000000 and c2<100000000;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=7574.38..7574.39 rows=1 width=8) (actual time=181.375..181.375 rows=1 loops=1)  
   ->  Bitmap Heap Scan on a  (cost=2231.28..7561.98 rows=4959 width=0) (actual time=62.215..180.837 rows=4860 loops=1)  
         Recheck Cond: ((c1 < 100000000) AND (c2 < 100000000))  
         Rows Removed by Index Recheck: 266961  
         Heap Blocks: exact=3166 lossy=17904  
         ->  BitmapAnd  (cost=2231.28..2231.28 rows=4959 width=0) (actual time=61.461..61.461 rows=0 loops=1)  
               ->  Bitmap Index Scan on idx_a_1  (cost=0.00..1090.57 rows=95192 width=0) (actual time=20.603..20.603 rows=95198 loops=1)  
                     Index Cond: (c1 < 100000000)  
               ->  Bitmap Index Scan on idx_a_2  (cost=0.00..1137.98 rows=99314 width=0) (actual time=37.628..37.628 rows=95665 loops=1)  
                     Index Cond: (c2 < 100000000)  
 Planning Time: 3.797 ms  
 Execution Time: 183.723 ms  
(12 rows)  

现在,性能最好的是GIN索引,其次是RUM和独立列索引(性能差不多)。

5、多列查询CASE 3,单个字段大范围,结果集大

postgres=# explain analyze select count(*) from c where c1<100000000 ;  
                                                          QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=18671.14..18671.15 rows=1 width=8) (actual time=25.041..25.041 rows=1 loops=1)  
   ->  Index Scan using idx_c_1 on c  (cost=8.80..18614.25 rows=22757 width=0) (actual time=10.142..22.987 rows=22238 loops=1)  
         Index Cond: (c1 < 100000000)  
 Planning Time: 0.099 ms  
 Execution Time: 25.276 ms  
(5 rows)  
  
  
postgres=# explain analyze select count(*) from b where c1<100000000 ;  
                                                            QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=63372.32..63372.33 rows=1 width=8) (actual time=139.236..139.236 rows=1 loops=1)  
   ->  Bitmap Heap Scan on b  (cost=684.51..63166.06 rows=82505 width=0) (actual time=57.762..131.941 rows=81741 loops=1)  
         Recheck Cond: (c1 < 100000000)  
         Heap Blocks: exact=58458  
         ->  Bitmap Index Scan on idx_b_1  (cost=0.00..663.89 rows=82505 width=0) (actual time=48.058..48.058 rows=81741 loops=1)  
               Index Cond: (c1 < 100000000)  
 Planning Time: 0.123 ms  
 Execution Time: 141.904 ms  
(8 rows)  
  
postgres=# explain analyze select count(*) from a where c1<100000000 ;  
                                                             QUERY PLAN                                                               
------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=78459.78..78459.79 rows=1 width=8) (actual time=109.022..109.023 rows=1 loops=1)  
   ->  Index Only Scan using idx_a_1 on a  (cost=0.43..78221.80 rows=95192 width=0) (actual time=0.028..100.343 rows=95198 loops=1)  
         Index Cond: (c1 < 100000000)  
         Heap Fetches: 95198  
 Planning Time: 0.614 ms  
 Execution Time: 109.053 ms  
(6 rows)  

现在,性能最好的是RUM和独立列索引(性能差不多),其次是GIN索引。

小结

索引模式 写入速度 索引大小 单列查询性能(选择性好) 单列查询性能(选择性不好) 多列查询性能(单列选择性都不好,但是合并后选择性很好) 多列查询性能(有些列选择性很好)
128列单个gin(fastupdate=on)索引 3.99万行/s 最大 不好(pending tuples很大时) 不好(选择性不好本身就不适合用索引过滤) 不好(pending tuples很大时) 不好(pending tuples很大时)
128列单个gin(fastupdate=off)索引 1.36万行/s 最大 不好(选择性不好本身就不适合用索引过滤) 一般
128列单个rum索引 0.378万行/s 较小 很好 不好(选择性不好本身就不适合用索引过滤) 一般
128列128个btree索引 1.59万行/s 较小 很好 不好(选择性不好本身就不适合用索引过滤) 一般

所以,建议使用每列独立的索引,来支撑任意列的组合查询,可以获得写入、读取较好的均衡。

选择性不好的列,建议不要使用索引,选择性通过pg_stats查看

analyze c;

postgres=# select schemaname,tablename,attname,n_distinct from pg_stats where tablename='c';
 schemaname | tablename | attname | n_distinct 
------------+-----------+---------+------------
 public     | c         | c1      |         -1
 public     | c         | c2      |  -0.999025
 public     | c         | c3      |         -1
...

《PostgreSQL SQL自动优化案例 - 极简,自动推荐索引》

《自动选择正确索引访问接口(btree,hash,gin,gist,sp-gist,brin,bitmap…)的方法》

为什么写入性能会是这样的结果?

同步模式BUILD索引,索引越多,写入的影响越大,本文对单表的测试可以看出,128个索引,写入只能达到1.59万行/s。如果硬件资源足够的话,可以通过使用分区表(多个表)来解决写入的瓶颈问题。

异步模式BUILD索引,(GIN开启fastupdate),写入吞吐可以提高,但是如果长时间处于高吞吐的写入下,由于一张表只有一个VACUUM WORKER进程合并pending tuples,所以可能导致PENDING TUPLES会越来越多,导致查询性能下降。解决办法是分区表,这样每个分区可以有1个vacuum worker参与合并,降低延迟。

为什么查询性能在几种情况下会是这样的结果?

选择性好的情况下,三种索引,性能都很好。

多列查询,如果所有条件的选择性都不好,但是最后所有条件的合并选择性如果很好时,GIN内置的BITMAP过滤,使得性能在RUM,GIN,BTREE多列三种情况下达到最好。

多列查询,如果所有条件的选择性都不好,并且最后所有条件合并后的选择性也不好时,所有索引的性能都一般。

Flag Counter

digoal’s 大量PostgreSQL文章入口