PostgreSQL 11 preview - with_llvm JIT支持部署与试用

43 minute read

背景

PostgreSQL 11 JIT,目前支持tuple deform(将磁盘上的tuple转换为内存中TUPLE格式),以及表达式(select, where, 等语义中的表达式,操作符运算,UDF等)的动态编译。

对海量数据的计算(并且表达式或TUPLE DEFORM已成为瓶颈时)有加速效果。

特别适合OLAP场景的大量数据的复杂计算。

《分析加速引擎黑科技 - LLVM、列存、多核并行、算子复用 大联姻 - 一起来开启PostgreSQL的百宝箱》

部署PostgreSQL 11 with LLVM

参考

《PostgreSQL 10 + PostGIS + Sharding(pg_pathman) + MySQL(fdw外部表) on ECS 部署指南(适合新用户)》

以CentOS 7.x x64为例

1、install epel

wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm    
    
rpm -ivh epel-release-latest-7.noarch.rpm     

2、

yum install -y cmake3  
  
yum install -y python  

3、LLVM

wget http://releases.llvm.org/6.0.0/llvm-6.0.0.src.tar.xz  
  
tar -xvf llvm-6.0.0.src.tar.xz  

4、CLANG

wget http://releases.llvm.org/6.0.0/cfe-6.0.0.src.tar.xz  
  
tar -xvf cfe-6.0.0.src.tar.xz   
mv cfe-6.0.0.src clang  
mv clang llvm-6.0.0.src/tools  

5、安装支持JIT的编译器llvm和clang

cd llvm-6.0.0.src  
mkdir mybuilddir  
cd mybuilddir  
alias cmake=cmake3  
export CXXFLAGS=-O2  
cmake ..  
cmake --build .  
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm6.0.0 -P cmake_install.cmake  

6、编译PG with llvm

wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2  
  
tar -jxvf postgresql-snapshot.tar.bz2   
  
cd postgresql-11devel  
CLANG=/usr/local/llvm6.0.0/bin/clang LLVM_CONFIG=/usr/local/llvm6.0.0/bin/llvm-config ./configure --prefix=/home/digoal/pgsql11_llvm --with-llvm --with-blocksize=32 --with-segsize=2  
make world -j 128  
make install-world  

7、pg with llvm的目录结构

bit code (二进制码)

digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd  
/home/digoal/pgsql11_llvm/lib  
digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll llvmjit*  
-rwxr-xr-x 1 root root 709M Apr  4 06:58 llvmjit.so  
-rw-r--r-- 1 root root 7.8K Apr  4 06:58 llvmjit_types.bc  
  
digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd  
/home/digoal/pgsql11_llvm/lib/bitcode  
digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll  
total 1.8M  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 adminpack  
-rw-r--r--  1 root root 1.4K Apr  4 06:58 adminpack.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 amcheck  
-rw-r--r--  1 root root 2.3K Apr  4 06:58 amcheck.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 auth_delay  
-rw-r--r--  1 root root  376 Apr  4 06:58 auth_delay.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 auto_explain  
-rw-r--r--  1 root root 1.7K Apr  4 06:58 auto_explain.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 autoinc  
-rw-r--r--  1 root root  528 Apr  4 06:58 autoinc.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 bloom  
-rw-r--r--  1 root root 1.8K Apr  4 06:58 bloom.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 btree_gin  
-rw-r--r--  1 root root 5.8K Apr  4 06:58 btree_gin.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 btree_gist  
-rw-r--r--  1 root root  20K Apr  4 06:58 btree_gist.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 citext  
-rw-r--r--  1 root root 1.3K Apr  4 06:58 citext.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 cube  
-rw-r--r--  1 root root 6.4K Apr  4 06:58 cube.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 dblink  
-rw-r--r--  1 root root 5.6K Apr  4 06:58 dblink.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 dict_int  
-rw-r--r--  1 root root  480 Apr  4 06:58 dict_int.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 dict_xsyn  
-rw-r--r--  1 root root  656 Apr  4 06:58 dict_xsyn.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 earthdistance  
-rw-r--r--  1 root root  280 Apr  4 06:58 earthdistance.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 file_fdw  
-rw-r--r--  1 root root 1.8K Apr  4 06:58 file_fdw.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 fuzzystrmatch  
-rw-r--r--  1 root root 5.7K Apr  4 06:58 fuzzystrmatch.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 hstore  
-rw-r--r--  1 root root 6.9K Apr  4 06:58 hstore.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 insert_username  
-rw-r--r--  1 root root  544 Apr  4 06:58 insert_username.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 _int  
-rw-r--r--  1 root root 6.9K Apr  4 06:58 _int.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 isn  
-rw-r--r--  1 root root  64K Apr  4 06:58 isn.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 lo  
-rw-r--r--  1 root root  432 Apr  4 06:58 lo.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 ltree  
-rw-r--r--  1 root root 7.4K Apr  4 06:58 ltree.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 moddatetime  
-rw-r--r--  1 root root  592 Apr  4 06:58 moddatetime.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pageinspect  
-rw-r--r--  1 root root 6.3K Apr  4 06:58 pageinspect.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 passwordcheck  
-rw-r--r--  1 root root  436 Apr  4 06:58 passwordcheck.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pg_buffercache  
-rw-r--r--  1 root root  660 Apr  4 06:58 pg_buffercache.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pgcrypto  
-rw-r--r--  1 root root  23K Apr  4 06:58 pgcrypto.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pg_freespacemap  
-rw-r--r--  1 root root  372 Apr  4 06:58 pg_freespacemap.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pg_prewarm  
-rw-r--r--  1 root root 2.6K Apr  4 06:58 pg_prewarm.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pgrowlocks  
-rw-r--r--  1 root root  968 Apr  4 06:58 pgrowlocks.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pg_stat_statements  
-rw-r--r--  1 root root 3.1K Apr  4 06:58 pg_stat_statements.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pgstattuple  
-rw-r--r--  1 root root 3.3K Apr  4 06:58 pgstattuple.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pg_trgm  
-rw-r--r--  1 root root 4.1K Apr  4 06:58 pg_trgm.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 pg_visibility  
-rw-r--r--  1 root root 1.1K Apr  4 06:58 pg_visibility.index.bc  
drwxr-xr-x 25 root root 4.0K Apr  4 06:58 postgres  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 postgres_fdw  
-rw-r--r--  1 root root  12K Apr  4 06:58 postgres_fdw.index.bc  
-rw-r--r--  1 root root 1.3M Apr  4 06:58 postgres.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 refint  
-rw-r--r--  1 root root 1.9K Apr  4 06:58 refint.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 seg  
-rw-r--r--  1 root root 5.2K Apr  4 06:58 seg.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 tablefunc  
-rw-r--r--  1 root root 2.1K Apr  4 06:58 tablefunc.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 tcn  
-rw-r--r--  1 root root  584 Apr  4 06:58 tcn.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 test_decoding  
-rw-r--r--  1 root root 1.5K Apr  4 06:58 test_decoding.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 timetravel  
-rw-r--r--  1 root root 1.2K Apr  4 06:58 timetravel.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 tsm_system_rows  
-rw-r--r--  1 root root  524 Apr  4 06:58 tsm_system_rows.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 tsm_system_time  
-rw-r--r--  1 root root  524 Apr  4 06:58 tsm_system_time.index.bc  
drwxr-xr-x  2 root root 4.0K Apr  4 06:58 unaccent  
-rw-r--r--  1 root root  844 Apr  4 06:58 unaccent.index.bc  
  
digoal@iZbp13nu0s9j3x3op4zpd4Z-> cd unaccent  
digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll  
total 16K  
-rw-r--r-- 1 root root 15K Apr  4 06:58 unaccent.bc  

7.1、使用llvm-bcanalyzer观察bc的内容

cd /home/digoal/pgsql11_llvm/lib/bitcode/postgres/access/heap

/usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./heapam.bc 

Summary of ./heapam.bc:
         Total size: 1459456b/182432.00B/45608W
        Stream type: LLVM IR
  # Toplevel Blocks: 4

Per-block Summary:
  Block ID #0 (BLOCKINFO_BLOCK):
      Num Instances: 1
         Total Size: 672b/84.00B/21W
    Percent of file: 0.0460%
      Num SubBlocks: 0
        Num Abbrevs: 16
        Num Records: 3
    Percent Abbrevs: 0.0000%

        Record Histogram:
                  Count    # Bits     b/Rec   % Abv  Record Kind
                      3        60      20.0          SETBID

  Block ID #8 (MODULE_BLOCK):
      Num Instances: 1
         Total Size: 44789b/5598.62B/1399W
    Percent of file: 3.0689%
      Num SubBlocks: 79
        Num Abbrevs: 3
        Num Records: 317
    Percent Abbrevs: 1.5773%

        Record Histogram:
                  Count    # Bits     b/Rec   % Abv  Record Kind
                    222     31488     141.8          FUNCTION
                     89     11128     125.0    3.37  GLOBALVAR
                      1       207                    HASH
                      1        57            100.00  SOURCE_FILENAME
                      1        35            100.00  VSTOFFSET
                      1       465                    DATALAYOUT
                      1       303                    TRIPLE
                      1        21                    VERSION
......

8、设置环境变量

su - digoal  
vi env_pg11_llvm.sh  
  
export PS1="$USER@`/bin/hostname -s`-> "    
export PGPORT=4000  
export PGDATA=/data01/pg/pg_root$PGPORT    
export LANG=en_US.utf8    
export PGHOME=/home/digoal/pgsql11_llvm  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH    
export DATE=`date +"%Y%m%d%H%M"`    
export PATH=$PGHOME/bin:$PATH:.    
export MANPATH=$PGHOME/share/man:$MANPATH    
export PGHOST=$PGDATA    
export PGUSER=postgres    
export PGDATABASE=postgres    
alias rm='rm -i'    
alias ll='ls -lh'    
unalias vi   
  
. ./env_pg11_llvm.sh  

9、初始化数据库

initdb -D $PGDATA -U postgres -E UTF8 --locale=en_US.UTF8 -X /data02/pg/pg_wal_$PGPORT  

10、配置postgresql.conf

listen_addresses = '0.0.0.0'  
port = 4000  
max_connections = 3000  
unix_socket_directories = '/tmp,.'  
unix_socket_permissions = 0700  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 128GB  
huge_pages = on  
work_mem = 32MB  
maintenance_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 500  
bgwriter_lru_multiplier = 5.0  
effective_io_concurrency = 0  
max_worker_processes = 128  
max_parallel_maintenance_workers = 2  
max_parallel_workers_per_gather = 32  
parallel_leader_participation = on  
max_parallel_workers = 128  
wal_level = minimal  
synchronous_commit = off  
wal_buffers = 128MB  
wal_writer_delay = 10ms  
wal_writer_flush_after = 8MB  
checkpoint_timeout = 35min  
max_wal_size = 256GB  
min_wal_size = 64GB  
max_wal_senders = 0  
enable_bitmapscan = on  
enable_hashagg = on  
enable_hashjoin = on  
enable_indexscan = on  
enable_indexonlyscan = on  
enable_material = on  
enable_mergejoin = on  
enable_nestloop = on  
enable_parallel_append = on  
enable_seqscan = on  
enable_sort = on  
enable_tidscan = on  
enable_partitionwise_join = on  
enable_partitionwise_aggregate = on  
enable_parallel_hash = on  
random_page_cost = 1.1  
effective_cache_size = 400GB  
log_destination = 'csvlog'  
logging_collector = on  
log_truncate_on_rotation = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose    
log_timezone = 'PRC'  
log_autovacuum_min_duration = 0  
autovacuum_vacuum_cost_delay = 0  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'en_US.UTF8'  
lc_monetary = 'en_US.UTF8'  
lc_numeric = 'en_US.UTF8'  
lc_time = 'en_US.UTF8'  
default_text_search_config = 'pg_catalog.english'  
  
# JIT相关参数  
jit_above_cost = 100000  
jit_optimize_above_cost = 500000  
jit_inline_above_cost = 500000  
dynamic_library_path = '$libdir'  
jit = on  
jit_provider = 'llvmjit'  

JIT支持到什么程度了

目前已支持表达式、TUPLE DEFORM的JIT。

Currently PostgreSQL’s JIT implementation has support for accelerating expression evaluation and tuple deforming. Several other operations could be accelerated in the future.

Expression evaluation is used to evaluate WHERE clauses, target lists, aggregates and projections. It can be accelerated by generating code specific to each case.

Tuple deforming is the process of transforming an on-disk tuple (see Section 68.6) into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.

什么样的SQL适合JIT

1、需要处理的数据量庞大

2、每条记录需要大量的操作符参与计算

3、需要查询的字段靠后,需要先deform tuple中前面的字段

 249 Currently expression evaluation and tuple deforming are JITed. Those  
 250 were chosen because they commonly are major CPU bottlenecks in  
 251 analytics queries, but are by no means the only potentially beneficial cases.  
 252   
 253 For JITing to be beneficial a piece of code first and foremost has to  
 254 be a CPU bottleneck. But also importantly, JITing can only be  
 255 beneficial if overhead can be removed by doing so. E.g. in the tuple  
 256 deforming case the knowledge about the number of columns and their  
 257 types can remove a significant number of branches, and in the  
 258 expression evaluation case a lot of indirect jumps/calls can be  
 259 removed.  If neither of these is the case, JITing is a waste of  
 260 resources.  
 261   
 262 Future avenues for JITing are tuple sorting, COPY parsing/output  
 263 generation, and later compiling larger parts of queries.  

PG优化器如何选择是否使用JIT

控制参数

1、jit (boolean)

Determines whether JIT may be used by PostgreSQL, if available (see Chapter 32). The default is on.

默认ON,表示开启JIT。

2、jit_above_cost (floating point)

Sets the planner’s cutoff above which JIT compilation is used as part of query execution (see Chapter 32). Performing JIT costs time but can accelerate query execution. The default is 100000.

默认100000,当planner发现COST大于这个值时,优化器会启用JIT动态编译。

3、jit_optimize_above_cost (floating point)

Sets the planner’s cutoff above which JIT compiled programs (see jit_above_cost) are optimized. Optimization initially takes time, but can improve execution speed. It is not meaningful to set this to a lower value than jit_above_cost. The default is 500000.

默认500000,当planner发现COST大于这个值时,优化器会启用JIT动态编译优化。

4、jit_inline_above_cost (floating point)

Sets the planner’s cutoff above which JIT compiled programs (see jit_above_cost) attempt to inline functions and operators. Inlining initially takes time, but can improve execution speed. It is unlikely to be beneficial to set jit_inline_above_cost below jit_optimize_above_cost. The default is 500000.

默认500000,当planner发现COST大于这个值时,优化器会对用户自定义函数、操作符(目前仅支持C, internal类型的函数)启用JIT优化。

https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE

5、jit_provider (string)

Determines which JIT provider (see JIT Extensibility) is used. The built-in default is llvmjit.

If set to a non-existent library JIT will not be available, but no error will be raised. This allows JIT support to be installed separately from the main PostgreSQL package. This parameter can only be set at server start.

为了让JIT支持更多的编译器,PG设计时对编译器的支持也是模块化的,通过jit_provider可以指定使用哪个编译器,当然这个需要实现对应的provider接口才行。

https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE

目前PG默认选择的是LLVM编译器,原因是LLVM友好的许可协议与PG的开源许可协议无冲突。第二方面是LLVM后面有很强大的公司在支撑,比如苹果。

  52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen  
  53 because it is developed by several large corporations and therefore  
  54 unlikely to be discontinued, because it has a license compatible with  
  55 PostgreSQL, and because its IR can be generated from C using the Clang  
  56 compiler.  

控制代码

src/backend/jit/jit.c

/* GUCs */  
bool            jit_enabled = true;  
char       *jit_provider = "llvmjit";  
bool            jit_debugging_support = false;  
bool            jit_dump_bitcode = false;  
bool            jit_expressions = true;  
bool            jit_profiling_support = false;  
bool            jit_tuple_deforming = true;  
double          jit_above_cost = 100000;  
double          jit_inline_above_cost = 500000;  
double          jit_optimize_above_cost = 500000;  

src/include/jit/jit.h

/* Flags determining what kind of JIT operations to perform */  
#define PGJIT_NONE     0  
#define PGJIT_PERFORM  1 << 0  
#define PGJIT_OPT3     1 << 1  
#define PGJIT_INLINE   1 << 2  
#define PGJIT_EXPR         1 << 3  
#define PGJIT_DEFORM   1 << 4  

src/backend/optimizer/plan/planner.c

        result->jitFlags = PGJIT_NONE;  
        if (jit_enabled && jit_above_cost >= 0 &&  
                top_plan->total_cost > jit_above_cost)  
        {  
                result->jitFlags |= PGJIT_PERFORM;  
  
                /*  
                 * Decide how much effort should be put into generating better code.  
                 */  
                if (jit_optimize_above_cost >= 0 &&  
                        top_plan->total_cost > jit_optimize_above_cost)  
                        result->jitFlags |= PGJIT_OPT3;  
                if (jit_inline_above_cost >= 0 &&  
                        top_plan->total_cost > jit_inline_above_cost)  
                        result->jitFlags |= PGJIT_INLINE;  
  
                /*  
                 * Decide which operations should be JITed.  
                 */  
                if (jit_expressions)  
                        result->jitFlags |= PGJIT_EXPR;  
                if (jit_tuple_deforming)  
                        result->jitFlags |= PGJIT_DEFORM;  
        }  

开发参数

1、jit_debugging_support (boolean)

If LLVM has the required functionality, register generated functions with GDB. This makes debugging easier. The default setting is off, and can only be set at server start.

如果要使用GDB调试JIT动态产生的code,那么需要打开jit_debugging_support

2、jit_dump_bitcode (boolean)

Writes the generated LLVM IR out to the filesystem, inside data_directory. This is only useful for working on the internals of the JIT implementation. The default setting is off, and it can only be changed by a superuser.

是否导出JIT产生的中间代码。会写入$PGDATA目录中。

3、jit_expressions (boolean)

Determines whether expressions are JIT compiled, subject to costing decisions (see Section 32.2). The default is on.

是否对表达式实施jit。默认打开,关闭的话不会对表达式实施JIT.

4、jit_profiling_support (boolean)

If LLVM has the required functionality, emit required data to allow perf to profile functions generated by JIT. This writes out files to $HOME/.debug/jit/; the user is responsible for performing cleanup when desired. The default setting is off, and can only be set at server start.

如果要使用perf来分析JIT代码的profiling,那么需要打开jit_profiling_support这个参数。结果保留在$HOME/.debug/jit/,注意自己清理这个目录。

5、jit_tuple_deforming (boolean)

Determines whether tuple deforming is JIT compiled, subject to costing decisions (see Section 32.2). The default is on.

是否对tuple deforming式实施jit。默认打开,关闭的话不会对tuple deforming(即将磁盘中的TUPLE格式转换为内存中的TUPLE格式)实施JIT.

JIT code gen有CACHE吗

暂时没有

 225 Currently it is not yet possible to cache generated functions, even  
 226 though that'd be desirable from a performance point of view. The  
 227 problem is that the generated functions commonly contain pointers into  
 228 per-execution memory. The expression evaluation machinery needs to  
 229 be redesigned a bit to avoid that. Basically all per-execution memory  
 230 needs to be referenced as an offset to one block of memory stored in  
 231 an ExprState, rather than absolute pointers into memory.  
 232   
 233 Once that is addressed, adding an LRU cache that's keyed by the  
 234 generated LLVM IR will allow to use optimized functions even for  
 235 faster queries.  
 236   
 237 A longer term project is to move expression compilation to the planner  
 238 stage, allowing e.g. to tie compiled expressions to prepared  
 239 statements.  
 240   
 241 An even more advanced approach would be to use JIT with few  
 242 optimizations initially, and build an optimized version in the  
 243 background. But that's even further off.  
 244   

目前支持的编译器有哪些

目前支持LLVM

  52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen  
  53 because it is developed by several large corporations and therefore  
  54 unlikely to be discontinued, because it has a license compatible with  
  55 PostgreSQL, and because its IR can be generated from C using the Clang  
  56 compiler.  
  
 199 It obviously is undesirable to maintain a second implementation of  
 200 commonly used functions, just for inlining purposes. Instead we take  
 201 advantage of the fact that the Clang compiler can emit LLVM IR.  

通过实现jit provider可以扩展更多的编译器支持。

  59 Shared Library Separation  
  60 -------------------------  
  61   
  62 To avoid the main PostgreSQL binary directly depending on LLVM, which  
  63 would prevent LLVM support being independently installed by OS package  
  64 managers, the LLVM dependent code is located in a shared library that  
  65 is loaded on-demand.  
  66   
  67 An additional benefit of doing so is that it is relatively easy to  
  68 evaluate JIT compilation that does not use LLVM, by changing out the  
  69 shared library used to provide JIT compilation.  
  70   
  71 To achieve this, code intending to perform JIT (e.g. expression evaluation)  
  72 calls an LLVM independent wrapper located in jit.c to do so. If the  
  73 shared library providing JIT support can be loaded (i.e. PostgreSQL was  
  74 compiled with LLVM support and the shared library is installed), the task  
  75 of JIT compiling an expression gets handed off to the shared library. This  
  76 obviously requires that the function in jit.c is allowed to fail in case  
  77 no JIT provider can be loaded.  
  78   
  79 Which shared library is loaded is determined by the jit_provider GUC,  
  80 defaulting to "llvmjit".  
  81   
  82 Cloistering code performing JIT into a shared library unfortunately  
  83 also means that code doing JIT compilation for various parts of code  
  84 has to be located separately from the code doing so without  
  85 JIT. E.g. the JIT version of execExprInterp.c is located in jit/llvm/  
  86 rather than executor/.  

内置的类型转换在哪里

 177 Instead there is one small file (llvmjit_types.c) which references each of  
 178 the types required for JITing. That file is translated to bitcode at  
 179 compile time, and loaded when LLVM is initialized in a backend.  

已有的JIT bitcode代码在哪里

$pkglibdir/bitcode/postgres/  
  
  
 211 Similarly extensions can install code into  
 212   $pkglibdir/bitcode/[extension]/  
 213 accompanied by  
 214   $pkglibdir/bitcode/[extension].index.bc  

性能对比

1、建表(65个字段,33个INT8,32个TEXT)

do language plpgsql $$   
declare  
  sql text;  
begin  
  sql := 'create table test(id int8,';  
  -- 32 个text字段  
  for i in 1..32 loop  
    sql := sql||'c'||i||' text default md5(random()::text),';  
  end loop;  
  -- 32 个int8字段  
  for i in 33..64 loop  
    sql := sql||'c'||i||' int8 default random()*10,';  
  end loop;  
  sql := rtrim(sql, ',');  
  sql := sql||')';  
  execute sql;  
end;  
$$;  

2、写入1亿测试数据

for((i=1;i<=100;i++)) do nohup psql -c "insert into test select generate_series(1,1000000);" >/dev/null 2>&1 & done  

空间占用127 GB

postgres=# \dt+  
                    List of relations  
 Schema | Name | Type  |  Owner   |  Size  | Description   
--------+------+-------+----------+--------+-------------  
 public | test | table | postgres | 127 GB |   
(1 row)  

并行度24强制开关

set max_parallel_workers_per_gather =24;  
set parallel_setup_cost =0;  
set parallel_tuple_cost =0;  
set min_parallel_table_scan_size =0;  
set min_parallel_index_scan_size =0;  
alter table test set (parallel_workers =24);  

3、多表达式计算测试

explain select c33,c64,count(*),avg(c33),min(c34),max(c35),stddev(c36),sum(c37),stddev(greatest(c37,c38)),  
        sum(  
  case c38   
  when 0 then c39  
  when 1 then c40  
  when 2 then c41  
  when 3 then c42  
  when 4 then c43  
  when 5 then c44  
  when 6 then c45  
  when 7 then c46  
  when 8 then c47  
  when 9 then c48  
  when 10 then c49  
  when 11 then c50  
  when 12 then c51  
  when 13 then c52  
  when 14 then c53  
  when 15 then c54  
  when 16 then c55  
  else c56  
  end   
)  from test   
where   
c64^2>0   
and sqrt(c64)>0   
and hashtext(c1)>hashtext(c2)  
and   
pow(sqrt(abs(hashint8(case c38   
  when 0 then c39  
  when 1 then c40  
  when 2 then c41  
  when 3 then c42  
  when 4 then c43  
  when 5 then c44  
  when 6 then c45  
  when 7 then c46  
  when 8 then c47  
  when 9 then c48  
  when 10 then c49  
  when 11 then c50  
  when 12 then c51  
  when 13 then c52  
  when 14 then c53  
  when 15 then c54  
  when 16 then c55  
  else c56  
  end))),2) >0  
group by c33,c64;  
  
                                        QUERY PLAN                                           
-------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=4545694.04..4549612.47 rows=121 width=200)  
   Group Key: c33, c64  
   ->  Gather Merge  (cost=4545694.04..4549500.85 rows=2904 width=200)  
         Workers Planned: 24  
         ->  Partial GroupAggregate  (cost=4545693.46..4549425.58 rows=121 width=200)  
               Group Key: c33, c64  
               ->  Sort  (cost=4545693.46..4545822.06 rows=51440 width=200)  
                     Sort Key: c33, c64  
                     ->  Parallel Seq Scan on test  (cost=0.00..4541668.12 rows=51440 width=200)  
                           Filter: ((((c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((c64)::double precision) > '0'::double precision) AND (hashtext(c1) > hashtext(c2)) AND (pow(sqrt((abs(hashint8(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)))::double precision), '2'::double precision) > '0'::double precision))  
 JIT:  
   Functions: 18  
   Inlining: true  
   Optimization: true  
(14 rows)  

3.1、开启JIT

set jit=on;  

并行

set max_parallel_workers_per_gather =24;  
  
 Finalize GroupAggregate  (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=16228.136..18083.962 rows=110 loops=1)  
   Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)  
   Group Key: test.c33, test.c64  
   Buffers: shared hit=204165, temp read=14929 written=14950  
   ->  Gather Merge  (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=16219.163..18075.339 rows=2750 loops=1)  
         Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))  
         Workers Planned: 24  
         Workers Launched: 24  
         Buffers: shared hit=204165, temp read=14929 written=14950  
         ->  Partial GroupAggregate  (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=16055.979..17551.930 rows=110 loops=25)  
               Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)  
               Group Key: test.c33, test.c64  
               Buffers: shared hit=4167196, temp read=304952 written=305369  
               Worker 0: actual time=16063.931..17380.570 rows=110 loops=1  
                 Buffers: shared hit=146725, temp read=10743 written=10758  
               Worker 1: actual time=16157.211..17453.030 rows=110 loops=1  
                 Buffers: shared hit=144252, temp read=10570 written=10584  
               Worker 2: actual time=15937.390..17766.085 rows=110 loops=1  
                 Buffers: shared hit=203275, temp read=14873 written=14894  
               Worker 3: actual time=16087.670..17463.100 rows=110 loops=1  
                 Buffers: shared hit=153917, temp read=11273 written=11288  
               Worker 4: actual time=16055.948..17369.731 rows=110 loops=1  
                 Buffers: shared hit=146569, temp read=10723 written=10738  
               Worker 5: actual time=16142.715..17518.040 rows=110 loops=1  
                 Buffers: shared hit=148380, temp read=10854 written=10869  
               Worker 6: actual time=16191.173..17582.148 rows=110 loops=1  
                 Buffers: shared hit=154115, temp read=11273 written=11288  
               Worker 7: actual time=16125.974..17411.918 rows=110 loops=1  
                 Buffers: shared hit=144344, temp read=10570 written=10584  
               Worker 8: actual time=16162.576..17452.647 rows=110 loops=1  
                 Buffers: shared hit=144356, temp read=10570 written=10584  
               Worker 9: actual time=16028.487..17333.188 rows=110 loops=1  
                 Buffers: shared hit=146326, temp read=10703 written=10718  
               Worker 10: actual time=15909.637..17710.282 rows=110 loops=1  
                 Buffers: shared hit=202502, temp read=14819 written=14840  
               Worker 11: actual time=15991.109..17201.509 rows=110 loops=1  
                 Buffers: shared hit=136312, temp read=9973 written=9987  
               Worker 12: actual time=15948.994..17758.986 rows=110 loops=1  
                 Buffers: shared hit=202082, temp read=14788 written=14808  
               Worker 13: actual time=15923.985..17734.805 rows=110 loops=1  
                 Buffers: shared hit=202136, temp read=14788 written=14808  
               Worker 14: actual time=16185.559..17477.490 rows=110 loops=1  
                 Buffers: shared hit=144407, temp read=10570 written=10584  
               Worker 15: actual time=15957.127..17879.999 rows=110 loops=1  
                 Buffers: shared hit=214616, temp read=15702 written=15724  
               Worker 16: actual time=16150.016..17429.737 rows=110 loops=1  
                 Buffers: shared hit=138941, temp read=10168 written=10182  
               Worker 17: actual time=16036.894..17849.093 rows=110 loops=1  
                 Buffers: shared hit=202016, temp read=14788 written=14808  
               Worker 18: actual time=16129.668..17369.734 rows=110 loops=1  
                 Buffers: shared hit=138300, temp read=10126 written=10140  
               Worker 19: actual time=15948.382..17832.470 rows=110 loops=1  
                 Buffers: shared hit=211842, temp read=15491 written=15512  
               Worker 20: actual time=16153.969..17352.508 rows=110 loops=1  
                 Buffers: shared hit=134834, temp read=9867 written=9880  
               Worker 21: actual time=15916.013..17716.694 rows=110 loops=1  
                 Buffers: shared hit=202104, temp read=14788 written=14808  
               Worker 22: actual time=16164.524..17462.568 rows=110 loops=1  
                 Buffers: shared hit=144423, temp read=10570 written=10584  
               Worker 23: actual time=16046.586..17455.273 rows=110 loops=1  
                 Buffers: shared hit=156257, temp read=11433 written=11449  
               ->  Sort  (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=16048.787..16592.827 rows=1900091 loops=25)  
                     Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56  
                     Sort Key: test.c33, test.c64  
                     Sort Method: external merge  Disk: 477728kB  
                     Worker 0:  Sort Method: external merge  Disk: 343776kB  
                     Worker 1:  Sort Method: external merge  Disk: 338240kB  
                     Worker 2:  Sort Method: external merge  Disk: 475936kB  
                     Worker 3:  Sort Method: external merge  Disk: 360736kB  
                     Worker 4:  Sort Method: external merge  Disk: 343136kB  
                     Worker 5:  Sort Method: external merge  Disk: 347328kB  
                     Worker 6:  Sort Method: external merge  Disk: 360736kB  
                     Worker 7:  Sort Method: external merge  Disk: 338240kB  
                     Worker 8:  Sort Method: external merge  Disk: 338240kB  
                     Worker 9:  Sort Method: external merge  Disk: 342496kB  
                     Worker 10:  Sort Method: external merge  Disk: 474208kB  
                     Worker 11:  Sort Method: external merge  Disk: 319136kB  
                     Worker 12:  Sort Method: external merge  Disk: 473216kB  
                     Worker 13:  Sort Method: external merge  Disk: 473216kB  
                     Worker 14:  Sort Method: external merge  Disk: 338240kB  
                     Worker 15:  Sort Method: external merge  Disk: 502464kB  
                     Worker 16:  Sort Method: external merge  Disk: 325376kB  
                     Worker 17:  Sort Method: external merge  Disk: 473216kB  
                     Worker 18:  Sort Method: external merge  Disk: 324032kB  
                     Worker 19:  Sort Method: external merge  Disk: 495712kB  
                     Worker 20:  Sort Method: external merge  Disk: 315744kB  
                     Worker 21:  Sort Method: external merge  Disk: 473216kB  
                     Worker 22:  Sort Method: external merge  Disk: 338240kB  
                     Worker 23:  Sort Method: external merge  Disk: 365856kB  
                     Buffers: shared hit=4167196, temp read=304952 written=305369  
                     Worker 0: actual time=16057.644..16539.626 rows=1673456 loops=1  
                       Buffers: shared hit=146725, temp read=10743 written=10758  
                     Worker 1: actual time=16150.861..16625.428 rows=1646431 loops=1  
                       Buffers: shared hit=144252, temp read=10570 written=10584  
                     Worker 2: actual time=15928.651..16592.993 rows=2316826 loops=1  
                       Buffers: shared hit=203275, temp read=14873 written=14894  
                     Worker 3: actual time=16081.087..16581.918 rows=1755939 loops=1  
                       Buffers: shared hit=153917, temp read=11273 written=11288  
                     Worker 4: actual time=16049.644..16528.522 rows=1670397 loops=1  
                       Buffers: shared hit=146569, temp read=10723 written=10738  
                     Worker 5: actual time=16135.668..16623.247 rows=1690829 loops=1  
                       Buffers: shared hit=148380, temp read=10854 written=10869  
                     Worker 6: actual time=16184.539..16693.836 rows=1755942 loops=1  
                       Buffers: shared hit=154115, temp read=11273 written=11288  
                     Worker 7: actual time=16119.729..16583.823 rows=1646433 loops=1  
                       Buffers: shared hit=144344, temp read=10570 written=10584  
                     Worker 8: actual time=16156.343..16626.634 rows=1646432 loops=1  
                       Buffers: shared hit=144356, temp read=10570 written=10584  
                     Worker 9: actual time=16022.254..16496.534 rows=1667285 loops=1  
                       Buffers: shared hit=146326, temp read=10703 written=10718  
                     Worker 10: actual time=15901.059..16553.594 rows=2308407 loops=1  
                       Buffers: shared hit=202502, temp read=14819 written=14840  
                     Worker 11: actual time=15985.290..16423.172 rows=1553527 loops=1  
                       Buffers: shared hit=136312, temp read=9973 written=9987  
                     Worker 12: actual time=15940.471..16603.025 rows=2303469 loops=1  
                       Buffers: shared hit=202082, temp read=14788 written=14808  
                     Worker 13: actual time=15915.230..16578.124 rows=2303471 loops=1  
                       Buffers: shared hit=202136, temp read=14788 written=14808  
                     Worker 14: actual time=16179.341..16650.442 rows=1646430 loops=1  
                       Buffers: shared hit=144407, temp read=10570 written=10584  
                     Worker 15: actual time=15948.014..16649.356 rows=2445950 loops=1  
                       Buffers: shared hit=214616, temp read=15702 written=15724  
                     Worker 16: actual time=16144.032..16604.106 rows=1583954 loops=1  
                       Buffers: shared hit=138941, temp read=10168 written=10182  
                     Worker 17: actual time=16028.217..16692.633 rows=2303463 loops=1  
                       Buffers: shared hit=202016, temp read=14788 written=14808  
                     Worker 18: actual time=16123.611..16575.587 rows=1577397 loops=1  
                       Buffers: shared hit=138300, temp read=10126 written=10140  
                     Worker 19: actual time=15939.379..16625.915 rows=2412964 loops=1  
                       Buffers: shared hit=211842, temp read=15491 written=15512  
                     Worker 20: actual time=16148.053..16579.552 rows=1536937 loops=1  
                       Buffers: shared hit=134834, temp read=9867 written=9880  
                     Worker 21: actual time=15907.527..16565.235 rows=2303464 loops=1  
                       Buffers: shared hit=202104, temp read=14788 written=14808  
                     Worker 22: actual time=16158.176..16633.146 rows=1646438 loops=1  
                       Buffers: shared hit=144423, temp read=10570 written=10584  
                     Worker 23: actual time=16039.773..16559.599 rows=1780966 loops=1  
                       Buffers: shared hit=156257, temp read=11433 written=11449  
                     ->  Parallel Seq Scan on public.test  (cost=0.00..4541668.12 rows=51440 width=200) (actual time=1032.131..5845.529 rows=1900091 loops=25)  
                           Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56  
                           Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))  
                           Rows Removed by Filter: 2099909  
                           Buffers: shared hit=4166668  
                           Worker 0: actual time=1020.341..5255.168 rows=1673456 loops=1  
                             Buffers: shared hit=146703  
                           Worker 1: actual time=1027.407..5221.878 rows=1646431 loops=1  
                             Buffers: shared hit=144230  
                           Worker 2: actual time=1153.318..6915.975 rows=2316826 loops=1  
                             Buffers: shared hit=203253  
                           Worker 3: actual time=1020.220..5472.580 rows=1755939 loops=1  
                             Buffers: shared hit=153895  
                           Worker 4: actual time=1032.901..5292.200 rows=1670397 loops=1  
                             Buffers: shared hit=146547  
                           Worker 5: actual time=1017.394..5301.776 rows=1690829 loops=1  
                             Buffers: shared hit=148358  
                           Worker 6: actual time=1030.128..5511.666 rows=1755942 loops=1  
                             Buffers: shared hit=154093  
                           Worker 7: actual time=1030.999..5254.166 rows=1646433 loops=1  
                             Buffers: shared hit=144322  
                           Worker 8: actual time=1029.738..5242.343 rows=1646432 loops=1  
                             Buffers: shared hit=144334  
                           Worker 9: actual time=1043.564..5337.423 rows=1667285 loops=1  
                             Buffers: shared hit=146304  
                           Worker 10: actual time=1003.084..6826.056 rows=2308407 loops=1  
                             Buffers: shared hit=202480  
                           Worker 11: actual time=1041.330..5138.905 rows=1553527 loops=1  
                             Buffers: shared hit=136290  
                           Worker 12: actual time=1006.781..6753.862 rows=2303469 loops=1  
                             Buffers: shared hit=202060  
                           Worker 13: actual time=1004.776..6749.535 rows=2303471 loops=1  
                             Buffers: shared hit=202114  
                           Worker 14: actual time=1030.832..5281.034 rows=1646430 loops=1  
                             Buffers: shared hit=144385  
                           Worker 15: actual time=1003.335..7089.916 rows=2445950 loops=1  
                             Buffers: shared hit=214594  
                           Worker 16: actual time=1015.769..5129.373 rows=1583954 loops=1  
                             Buffers: shared hit=138919  
                           Worker 17: actual time=1003.077..6892.321 rows=2303463 loops=1  
                             Buffers: shared hit=201994  
                           Worker 18: actual time=1016.105..5052.138 rows=1577397 loops=1  
                             Buffers: shared hit=138278  
                           Worker 19: actual time=1006.413..6976.141 rows=2412964 loops=1  
                             Buffers: shared hit=211820  
                           Worker 20: actual time=1013.958..5004.057 rows=1536937 loops=1  
                             Buffers: shared hit=134812  
                           Worker 21: actual time=1009.717..6748.666 rows=2303464 loops=1  
                             Buffers: shared hit=202082  
                           Worker 22: actual time=1013.944..5248.687 rows=1646438 loops=1  
                             Buffers: shared hit=144401  
                           Worker 23: actual time=1009.779..5471.091 rows=1780966 loops=1  
                             Buffers: shared hit=156235  
 Planning Time: 0.233 ms  
 JIT:  
   Functions: 18  
   Generation Time: 7.829 ms  
   Inlining: true  
   Inlining Time: 19.924 ms  
   Optimization: true  
   Optimization Time: 778.549 ms  
   Emission Time: 404.543 ms  
 Execution Time: 18161.029 ms  
(201 rows)  

非并行

set max_parallel_workers_per_gather =0;  
  
 HashAggregate  (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=142124.876..142125.769 rows=110 loops=1)  
   Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)  
   Group Key: test.c33, test.c64  
   Buffers: shared hit=4166668  
   ->  Seq Scan on public.test  (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=1127.028..114279.915 rows=47502283 loops=1)  
         Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64  
         Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))  
         Rows Removed by Filter: 52497717  
         Buffers: shared hit=4166668  
 Planning Time: 0.668 ms  
 JIT:  
   Functions: 9  
   Generation Time: 6.997 ms  
   Inlining: true  
   Inlining Time: 16.558 ms  
   Optimization: true  
   Optimization Time: 739.139 ms  
   Emission Time: 358.289 ms  
 Execution Time: 142132.967 ms  
(19 rows)  

3.2、关闭JIT

set jit=off;  

并行

set max_parallel_workers_per_gather =24;  
  
 Finalize GroupAggregate  (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=12229.706..14319.958 rows=110 loops=1)  
   Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)  
   Group Key: test.c33, test.c64  
   Buffers: shared hit=193072, temp read=14121 written=14141  
   ->  Gather Merge  (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=12218.740..14311.155 rows=2750 loops=1)  
         Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))  
         Workers Planned: 24  
         Workers Launched: 24  
         Buffers: shared hit=193072, temp read=14121 written=14141  
         ->  Partial GroupAggregate  (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=12085.108..13888.218 rows=110 loops=25)  
               Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)  
               Group Key: test.c33, test.c64  
               Buffers: shared hit=4167196, temp read=304952 written=305367  
               Worker 0: actual time=12000.792..13585.035 rows=110 loops=1  
                 Buffers: shared hit=147042, temp read=10761 written=10776  
               Worker 1: actual time=12031.831..13687.539 rows=110 loops=1  
                 Buffers: shared hit=153988, temp read=11273 written=11288  
               Worker 2: actual time=12146.820..13812.121 rows=110 loops=1  
                 Buffers: shared hit=154021, temp read=11273 written=11288  
               Worker 3: actual time=12145.395..13804.585 rows=110 loops=1  
                 Buffers: shared hit=154017, temp read=11273 written=11288  
               Worker 4: actual time=12091.839..14217.464 rows=110 loops=1  
                 Buffers: shared hit=197518, temp read=14457 written=14477  
               Worker 5: actual time=12169.045..13787.539 rows=110 loops=1  
                 Buffers: shared hit=150711, temp read=11032 written=11047  
               Worker 6: actual time=12141.908..13807.836 rows=110 loops=1  
                 Buffers: shared hit=154079, temp read=11273 written=11288  
               Worker 7: actual time=12110.195..13777.285 rows=110 loops=1  
                 Buffers: shared hit=153838, temp read=11273 written=11288  
               Worker 8: actual time=12137.576..13814.893 rows=110 loops=1  
                 Buffers: shared hit=154004, temp read=11273 written=11288  
               Worker 9: actual time=12196.750..13862.298 rows=110 loops=1  
                 Buffers: shared hit=154129, temp read=11273 written=11288  
               Worker 10: actual time=12092.448..14117.280 rows=110 loops=1  
                 Buffers: shared hit=188201, temp read=13770 written=13789  
               Worker 11: actual time=12068.763..14188.286 rows=110 loops=1  
                 Buffers: shared hit=197254, temp read=14421 written=14441  
               Worker 12: actual time=12190.847..13923.117 rows=110 loops=1  
                 Buffers: shared hit=153959, temp read=11273 written=11288  
               Worker 13: actual time=12093.689..14117.677 rows=110 loops=1  
                 Buffers: shared hit=188177, temp read=13771 written=13790  
               Worker 14: actual time=12141.815..13701.354 rows=110 loops=1  
                 Buffers: shared hit=144501, temp read=10570 written=10584  
               Worker 15: actual time=12012.075..14027.824 rows=110 loops=1  
                 Buffers: shared hit=186377, temp read=13640 written=13659  
               Worker 16: actual time=12092.582..14131.998 rows=110 loops=1  
                 Buffers: shared hit=188052, temp read=13759 written=13778  
               Worker 17: actual time=11906.057..13579.607 rows=110 loops=1  
                 Buffers: shared hit=154068, temp read=11273 written=11288  
               Worker 18: actual time=12179.217..13853.580 rows=110 loops=1  
                 Buffers: shared hit=154191, temp read=11273 written=11288  
               Worker 19: actual time=11944.620..14048.474 rows=110 loops=1  
                 Buffers: shared hit=194987, temp read=14285 written=14305  
               Worker 20: actual time=12152.533..13818.214 rows=110 loops=1  
                 Buffers: shared hit=153932, temp read=11273 written=11288  
               Worker 21: actual time=12196.138..13864.195 rows=110 loops=1  
                 Buffers: shared hit=154158, temp read=11273 written=11288  
               Worker 22: actual time=12091.831..14229.393 rows=110 loops=1  
                 Buffers: shared hit=197454, temp read=14452 written=14472  
               Worker 23: actual time=11910.497..13476.474 rows=110 loops=1  
                 Buffers: shared hit=145466, temp read=10637 written=10652  
               ->  Sort  (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=12076.303..12606.046 rows=1900091 loops=25)  
                     Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56  
                     Sort Key: test.c33, test.c64  
                     Sort Method: external merge  Disk: 451872kB  
                     Worker 0:  Sort Method: external merge  Disk: 344352kB  
                     Worker 1:  Sort Method: external merge  Disk: 360736kB  
                     Worker 2:  Sort Method: external merge  Disk: 360736kB  
                     Worker 3:  Sort Method: external merge  Disk: 360736kB  
                     Worker 4:  Sort Method: external merge  Disk: 462624kB  
                     Worker 5:  Sort Method: external merge  Disk: 353024kB  
                     Worker 6:  Sort Method: external merge  Disk: 360736kB  
                     Worker 7:  Sort Method: external merge  Disk: 360736kB  
                     Worker 8:  Sort Method: external merge  Disk: 360736kB  
                     Worker 9:  Sort Method: external merge  Disk: 360736kB  
                     Worker 10:  Sort Method: external merge  Disk: 440640kB  
                     Worker 11:  Sort Method: external merge  Disk: 461472kB  
                     Worker 12:  Sort Method: external merge  Disk: 360736kB  
                     Worker 13:  Sort Method: external merge  Disk: 440672kB  
                     Worker 14:  Sort Method: external merge  Disk: 338240kB  
                     Worker 15:  Sort Method: external merge  Disk: 436480kB  
                     Worker 16:  Sort Method: external merge  Disk: 440288kB  
                     Worker 17:  Sort Method: external merge  Disk: 360736kB  
                     Worker 18:  Sort Method: external merge  Disk: 360736kB  
                     Worker 19:  Sort Method: external merge  Disk: 457120kB  
                     Worker 20:  Sort Method: external merge  Disk: 360736kB  
                     Worker 21:  Sort Method: external merge  Disk: 360736kB  
                     Worker 22:  Sort Method: external merge  Disk: 462464kB  
                     Worker 23:  Sort Method: external merge  Disk: 340384kB  
                     Buffers: shared hit=4167196, temp read=304952 written=305367  
                     Worker 0: actual time=11993.075..12463.819 rows=1676370 loops=1  
                       Buffers: shared hit=147042, temp read=10761 written=10776  
                     Worker 1: actual time=12023.856..12510.021 rows=1755946 loops=1  
                       Buffers: shared hit=153988, temp read=11273 written=11288  
                     Worker 2: actual time=12138.628..12631.082 rows=1755945 loops=1  
                       Buffers: shared hit=154021, temp read=11273 written=11288  
                     Worker 3: actual time=12137.214..12626.779 rows=1755935 loops=1  
                       Buffers: shared hit=154017, temp read=11273 written=11288  
                     Worker 4: actual time=12081.480..12703.075 rows=2251992 loops=1  
                       Buffers: shared hit=197518, temp read=14457 written=14477  
                     Worker 5: actual time=12161.027..12638.282 rows=1718516 loops=1  
                       Buffers: shared hit=150711, temp read=11032 written=11047  
                     Worker 6: actual time=12133.786..12626.911 rows=1755948 loops=1  
                       Buffers: shared hit=154079, temp read=11273 written=11288  
                     Worker 7: actual time=12102.069..12591.735 rows=1755947 loops=1  
                       Buffers: shared hit=153838, temp read=11273 written=11288  
                     Worker 8: actual time=12129.413..12622.455 rows=1755936 loops=1  
                       Buffers: shared hit=154004, temp read=11273 written=11288  
                     Worker 9: actual time=12188.648..12679.617 rows=1755941 loops=1  
                       Buffers: shared hit=154129, temp read=11273 written=11288  
                     Worker 10: actual time=12082.492..12676.479 rows=2145118 loops=1  
                       Buffers: shared hit=188201, temp read=13770 written=13789  
                     Worker 11: actual time=12058.276..12683.336 rows=2246436 loops=1  
                       Buffers: shared hit=197254, temp read=14421 written=14441  
                     Worker 12: actual time=12182.379..12683.611 rows=1755944 loops=1  
                       Buffers: shared hit=153959, temp read=11273 written=11288  
                     Worker 13: actual time=12083.860..12673.566 rows=2145124 loops=1  
                       Buffers: shared hit=188177, temp read=13771 written=13790  
                     Worker 14: actual time=12134.030..12593.859 rows=1646436 loops=1  
                       Buffers: shared hit=144501, temp read=10570 written=10584  
                     Worker 15: actual time=12002.363..12594.854 rows=2124740 loops=1  
                       Buffers: shared hit=186377, temp read=13640 written=13659  
                     Worker 16: actual time=12082.700..12682.126 rows=2143293 loops=1  
                       Buffers: shared hit=188052, temp read=13759 written=13778  
                     Worker 17: actual time=11897.939..12391.175 rows=1755948 loops=1  
                       Buffers: shared hit=154068, temp read=11273 written=11288  
                     Worker 18: actual time=12171.070..12661.251 rows=1755944 loops=1  
                       Buffers: shared hit=154191, temp read=11273 written=11288  
                     Worker 19: actual time=11934.165..12549.351 rows=2225151 loops=1  
                       Buffers: shared hit=194987, temp read=14285 written=14305  
                     Worker 20: actual time=12144.452..12632.096 rows=1755935 loops=1  
                       Buffers: shared hit=153932, temp read=11273 written=11288  
                     Worker 21: actual time=12188.108..12679.023 rows=1755934 loops=1  
                       Buffers: shared hit=154158, temp read=11273 written=11288  
                     Worker 22: actual time=12081.380..12710.648 rows=2251317 loops=1  
                       Buffers: shared hit=197454, temp read=14452 written=14472  
                     Worker 23: actual time=11902.854..12362.911 rows=1656889 loops=1  
                       Buffers: shared hit=145466, temp read=10637 written=10652  
                     ->  Parallel Seq Scan on public.test  (cost=0.00..4541668.12 rows=51440 width=200) (actual time=0.026..4627.345 rows=1900091 loops=25)  
                           Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56  
                           Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))  
                           Rows Removed by Filter: 2099909  
                           Buffers: shared hit=4166668  
                           Worker 0: actual time=0.024..4041.154 rows=1676370 loops=1  
                             Buffers: shared hit=147020  
                           Worker 1: actual time=0.026..4232.052 rows=1755946 loops=1  
                             Buffers: shared hit=153966  
                           Worker 2: actual time=0.025..4275.863 rows=1755945 loops=1  
                             Buffers: shared hit=153999  
                           Worker 3: actual time=0.024..4268.778 rows=1755935 loops=1  
                             Buffers: shared hit=153995  
                           Worker 4: actual time=0.025..5517.396 rows=2251992 loops=1  
                             Buffers: shared hit=197496  
                           Worker 5: actual time=0.024..4173.081 rows=1718516 loops=1  
                             Buffers: shared hit=150689  
                           Worker 6: actual time=0.025..4315.820 rows=1755948 loops=1  
                             Buffers: shared hit=154057  
                           Worker 7: actual time=0.028..4271.339 rows=1755947 loops=1  
                             Buffers: shared hit=153816  
                           Worker 8: actual time=0.025..4270.376 rows=1755936 loops=1  
                             Buffers: shared hit=153982  
                           Worker 9: actual time=0.032..4301.026 rows=1755941 loops=1  
                             Buffers: shared hit=154107  
                           Worker 10: actual time=0.025..5193.043 rows=2145118 loops=1  
                             Buffers: shared hit=188179  
                           Worker 11: actual time=0.024..5393.193 rows=2246436 loops=1  
                             Buffers: shared hit=197232  
                           Worker 12: actual time=0.026..4288.686 rows=1755944 loops=1  
                             Buffers: shared hit=153937  
                           Worker 13: actual time=0.026..5177.502 rows=2145124 loops=1  
                             Buffers: shared hit=188155  
                           Worker 14: actual time=0.031..4098.000 rows=1646436 loops=1  
                             Buffers: shared hit=144479  
                           Worker 15: actual time=0.023..5155.199 rows=2124740 loops=1  
                             Buffers: shared hit=186355  
                           Worker 16: actual time=0.024..5205.816 rows=2143293 loops=1  
                             Buffers: shared hit=188030  
                           Worker 17: actual time=0.024..4334.653 rows=1755948 loops=1  
                             Buffers: shared hit=154046  
                           Worker 18: actual time=0.030..4290.928 rows=1755944 loops=1  
                             Buffers: shared hit=154169  
                           Worker 19: actual time=0.025..5447.905 rows=2225151 loops=1  
                             Buffers: shared hit=194965  
                           Worker 20: actual time=0.034..4296.168 rows=1755935 loops=1  
                             Buffers: shared hit=153910  
                           Worker 21: actual time=0.025..4273.822 rows=1755934 loops=1  
                             Buffers: shared hit=154136  
                           Worker 22: actual time=0.026..5437.385 rows=2251317 loops=1  
                             Buffers: shared hit=197432  
                           Worker 23: actual time=0.027..4039.679 rows=1656889 loops=1  
                             Buffers: shared hit=145444  
 Planning Time: 0.230 ms  
 Execution Time: 14383.790 ms  
(193 rows)  

非并行

set max_parallel_workers_per_gather =0;  
  
 HashAggregate  (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=135995.544..135996.426 rows=110 loops=1)  
   Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)  
   Group Key: test.c33, test.c64  
   Buffers: shared hit=4166668  
   ->  Seq Scan on public.test  (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=0.026..101059.780 rows=47502283 loops=1)  
         Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64  
         Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))  
         Rows Removed by Filter: 52497717  
         Buffers: shared hit=4166668  
 Planning Time: 0.202 ms  
 Execution Time: 135996.540 ms  
(11 rows)  

4、少量表达式性能对比

explain select count(*) from test;  
  
                                        QUERY PLAN                                           
-------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=4218751.42..4218751.43 rows=1 width=8)  
   ->  Gather  (cost=4218751.35..4218751.36 rows=24 width=8)  
         Workers Planned: 24  
         ->  Partial Aggregate  (cost=4218751.35..4218751.36 rows=1 width=8)  
               ->  Parallel Seq Scan on test  (cost=0.00..4208334.68 rows=4166668 width=0)  
 JIT:  
   Functions: 5  
   Inlining: true  
   Optimization: true  
(9 rows)  

4.1、开启JIT

set jit=on;  

并行

set max_parallel_workers_per_gather =24;  
  
 Finalize Aggregate  (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1195.630..1195.630 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=171976  
   ->  Gather  (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1195.599..1195.616 rows=25 loops=1)  
         Output: (PARTIAL count(*))  
         Workers Planned: 24  
         Workers Launched: 24  
         Buffers: shared hit=171976  
         ->  Partial Aggregate  (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1172.856..1172.856 rows=1 loops=25)  
               Output: PARTIAL count(*)  
               Buffers: shared hit=4166668  
               Worker 0: actual time=1171.634..1171.634 rows=1 loops=1  
                 Buffers: shared hit=166655  
               Worker 1: actual time=1171.631..1171.631 rows=1 loops=1  
                 Buffers: shared hit=166878  
               Worker 2: actual time=1171.533..1171.533 rows=1 loops=1  
                 Buffers: shared hit=166588  
               Worker 3: actual time=1171.634..1171.634 rows=1 loops=1  
                 Buffers: shared hit=166659  
               Worker 4: actual time=1172.296..1172.296 rows=1 loops=1  
                 Buffers: shared hit=165127  
               Worker 5: actual time=1171.637..1171.637 rows=1 loops=1  
                 Buffers: shared hit=166191  
               Worker 6: actual time=1172.351..1172.351 rows=1 loops=1  
                 Buffers: shared hit=166251  
               Worker 7: actual time=1171.634..1171.634 rows=1 loops=1  
                 Buffers: shared hit=166945  
               Worker 8: actual time=1172.317..1172.317 rows=1 loops=1  
                 Buffers: shared hit=166515  
               Worker 9: actual time=1171.632..1171.632 rows=1 loops=1  
                 Buffers: shared hit=166900  
               Worker 10: actual time=1172.379..1172.379 rows=1 loops=1  
                 Buffers: shared hit=167330  
               Worker 11: actual time=1171.555..1171.555 rows=1 loops=1  
                 Buffers: shared hit=165973  
               Worker 12: actual time=1172.369..1172.369 rows=1 loops=1  
                 Buffers: shared hit=167326  
               Worker 13: actual time=1172.361..1172.361 rows=1 loops=1  
                 Buffers: shared hit=167240  
               Worker 14: actual time=1171.603..1171.603 rows=1 loops=1  
                 Buffers: shared hit=165948  
               Worker 15: actual time=1172.308..1172.308 rows=1 loops=1  
                 Buffers: shared hit=167271  
               Worker 16: actual time=1171.627..1171.627 rows=1 loops=1  
                 Buffers: shared hit=166118  
               Worker 17: actual time=1172.363..1172.363 rows=1 loops=1  
                 Buffers: shared hit=166540  
               Worker 18: actual time=1171.627..1171.628 rows=1 loops=1  
                 Buffers: shared hit=161783  
               Worker 19: actual time=1172.367..1172.367 rows=1 loops=1  
                 Buffers: shared hit=167241  
               Worker 20: actual time=1171.622..1171.622 rows=1 loops=1  
                 Buffers: shared hit=166895  
               Worker 21: actual time=1172.310..1172.311 rows=1 loops=1  
                 Buffers: shared hit=166291  
               Worker 22: actual time=1171.634..1171.634 rows=1 loops=1  
                 Buffers: shared hit=166817  
               Worker 23: actual time=1172.357..1172.357 rows=1 loops=1  
                 Buffers: shared hit=167210  
               ->  Parallel Seq Scan on public.test  (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.021 rows=4000000 loops=25)  
                     Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64  
                     Buffers: shared hit=4166668  
                     Worker 0: actual time=0.005..752.685 rows=3999720 loops=1  
                       Buffers: shared hit=166655  
                     Worker 1: actual time=0.005..753.934 rows=4005072 loops=1  
                       Buffers: shared hit=166878  
                     Worker 2: actual time=0.006..752.288 rows=3998112 loops=1  
                       Buffers: shared hit=166588  
                     Worker 3: actual time=0.005..749.831 rows=3999816 loops=1  
                       Buffers: shared hit=166659  
                     Worker 4: actual time=0.005..754.798 rows=3963048 loops=1  
                       Buffers: shared hit=165127  
                     Worker 5: actual time=0.006..753.153 rows=3988584 loops=1  
                       Buffers: shared hit=166191  
                     Worker 6: actual time=0.005..755.385 rows=3990024 loops=1  
                       Buffers: shared hit=166251  
                     Worker 7: actual time=0.005..754.636 rows=4006667 loops=1  
                       Buffers: shared hit=166945  
                     Worker 8: actual time=0.005..757.278 rows=3996360 loops=1  
                       Buffers: shared hit=166515  
                     Worker 9: actual time=0.005..752.936 rows=4005600 loops=1  
                       Buffers: shared hit=166900  
                     Worker 10: actual time=0.006..755.266 rows=4015920 loops=1  
                       Buffers: shared hit=167330  
                     Worker 11: actual time=0.005..755.747 rows=3983352 loops=1  
                       Buffers: shared hit=165973  
                     Worker 12: actual time=0.005..755.743 rows=4015824 loops=1  
                       Buffers: shared hit=167326  
                     Worker 13: actual time=0.005..756.052 rows=4013742 loops=1  
                       Buffers: shared hit=167240  
                     Worker 14: actual time=0.006..755.470 rows=3982752 loops=1  
                       Buffers: shared hit=165948  
                     Worker 15: actual time=0.005..755.769 rows=4014504 loops=1  
                       Buffers: shared hit=167271  
                     Worker 16: actual time=0.006..755.859 rows=3986832 loops=1  
                       Buffers: shared hit=166118  
                     Worker 17: actual time=0.005..757.579 rows=3996959 loops=1  
                       Buffers: shared hit=166540  
                     Worker 18: actual time=0.005..756.139 rows=3882792 loops=1  
                       Buffers: shared hit=161783  
                     Worker 19: actual time=0.005..755.706 rows=4013784 loops=1  
                       Buffers: shared hit=167241  
                     Worker 20: actual time=0.005..753.655 rows=4005480 loops=1  
                       Buffers: shared hit=166895  
                     Worker 21: actual time=0.005..757.511 rows=3990984 loops=1  
                       Buffers: shared hit=166291  
                     Worker 22: actual time=0.005..753.577 rows=4003608 loops=1  
                       Buffers: shared hit=166817  
                     Worker 23: actual time=0.005..755.443 rows=4013040 loops=1  
                       Buffers: shared hit=167210  
 Planning Time: 0.192 ms  
 JIT:  
   Functions: 5  
   Generation Time: 1.082 ms  
   Inlining: true  
   Inlining Time: 2.907 ms  
   Optimization: true  
   Optimization Time: 23.873 ms  
   Emission Time: 12.577 ms  
 Execution Time: 1211.337 ms  
(120 rows)  

非并行

set max_parallel_workers_per_gather =0;  
  
 Aggregate  (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26222.597..26222.598 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=4166668  
   ->  Seq Scan on public.test  (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.016..17338.548 rows=100000000 loops=1)  
         Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64  
         Buffers: shared hit=4166668  
 Planning Time: 0.056 ms  
 JIT:  
   Functions: 2  
   Generation Time: 0.457 ms  
   Inlining: true  
   Inlining Time: 1.355 ms  
   Optimization: true  
   Optimization Time: 8.269 ms  
   Emission Time: 5.354 ms  
 Execution Time: 26223.178 ms  
(16 rows)  

4.2、关闭JIT

set jit=off;  

并行

set max_parallel_workers_per_gather =24;  
  
 Finalize Aggregate  (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1166.951..1166.951 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=170684  
   ->  Gather  (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1166.923..1166.939 rows=25 loops=1)  
         Output: (PARTIAL count(*))  
         Workers Planned: 24  
         Workers Launched: 24  
         Buffers: shared hit=170684  
         ->  Partial Aggregate  (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1146.387..1146.388 rows=1 loops=25)  
               Output: PARTIAL count(*)  
               Buffers: shared hit=4166668  
               Worker 0: actual time=1145.584..1145.584 rows=1 loops=1  
                 Buffers: shared hit=167586  
               Worker 1: actual time=1145.605..1145.605 rows=1 loops=1  
                 Buffers: shared hit=167235  
               Worker 2: actual time=1145.517..1145.517 rows=1 loops=1  
                 Buffers: shared hit=164055  
               Worker 3: actual time=1145.537..1145.537 rows=1 loops=1  
                 Buffers: shared hit=166048  
               Worker 4: actual time=1145.625..1145.626 rows=1 loops=1  
                 Buffers: shared hit=165348  
               Worker 5: actual time=1145.621..1145.621 rows=1 loops=1  
                 Buffers: shared hit=167426  
               Worker 6: actual time=1145.746..1145.746 rows=1 loops=1  
                 Buffers: shared hit=166945  
               Worker 7: actual time=1145.707..1145.707 rows=1 loops=1  
                 Buffers: shared hit=166628  
               Worker 8: actual time=1143.795..1143.796 rows=1 loops=1  
                 Buffers: shared hit=163445  
               Worker 9: actual time=1145.555..1145.556 rows=1 loops=1  
                 Buffers: shared hit=166960  
               Worker 10: actual time=1145.605..1145.605 rows=1 loops=1  
                 Buffers: shared hit=166142  
               Worker 11: actual time=1145.633..1145.633 rows=1 loops=1  
                 Buffers: shared hit=166090  
               Worker 12: actual time=1145.553..1145.553 rows=1 loops=1  
                 Buffers: shared hit=168115  
               Worker 13: actual time=1145.819..1145.819 rows=1 loops=1  
                 Buffers: shared hit=167973  
               Worker 14: actual time=1145.615..1145.615 rows=1 loops=1  
                 Buffers: shared hit=166419  
               Worker 15: actual time=1145.558..1145.558 rows=1 loops=1  
                 Buffers: shared hit=168061  
               Worker 16: actual time=1145.527..1145.527 rows=1 loops=1  
                 Buffers: shared hit=166438  
               Worker 17: actual time=1146.062..1146.062 rows=1 loops=1  
                 Buffers: shared hit=166993  
               Worker 18: actual time=1145.629..1145.629 rows=1 loops=1  
                 Buffers: shared hit=162480  
               Worker 19: actual time=1146.053..1146.054 rows=1 loops=1  
                 Buffers: shared hit=167989  
               Worker 20: actual time=1144.864..1144.864 rows=1 loops=1  
                 Buffers: shared hit=164403  
               Worker 21: actual time=1145.754..1145.754 rows=1 loops=1  
                 Buffers: shared hit=167957  
               Worker 22: actual time=1145.625..1145.625 rows=1 loops=1  
                 Buffers: shared hit=167457  
               Worker 23: actual time=1146.064..1146.064 rows=1 loops=1  
                 Buffers: shared hit=167791  
               ->  Parallel Seq Scan on public.test  (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.396 rows=4000000 loops=25)  
                     Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64  
                     Buffers: shared hit=4166668  
                     Worker 0: actual time=0.005..756.202 rows=4022064 loops=1  
                       Buffers: shared hit=167586  
                     Worker 1: actual time=0.005..755.320 rows=4013627 loops=1  
                       Buffers: shared hit=167235  
                     Worker 2: actual time=0.005..744.106 rows=3937320 loops=1  
                       Buffers: shared hit=164055  
                     Worker 3: actual time=0.005..751.125 rows=3985152 loops=1  
                       Buffers: shared hit=166048  
                     Worker 4: actual time=0.005..753.618 rows=3968352 loops=1  
                       Buffers: shared hit=165348  
                     Worker 5: actual time=0.005..756.612 rows=4018224 loops=1  
                       Buffers: shared hit=167426  
                     Worker 6: actual time=0.004..758.326 rows=4006680 loops=1  
                       Buffers: shared hit=166945  
                     Worker 7: actual time=0.005..759.780 rows=3999071 loops=1  
                       Buffers: shared hit=166628  
                     Worker 8: actual time=0.006..756.676 rows=3922680 loops=1  
                       Buffers: shared hit=163445  
                     Worker 9: actual time=0.005..751.844 rows=4007040 loops=1  
                       Buffers: shared hit=166960  
                     Worker 10: actual time=0.005..749.207 rows=3987408 loops=1  
                       Buffers: shared hit=166142  
                     Worker 11: actual time=0.006..757.084 rows=3986160 loops=1  
                       Buffers: shared hit=166090  
                     Worker 12: actual time=0.004..756.443 rows=4034760 loops=1  
                       Buffers: shared hit=168115  
                     Worker 13: actual time=0.005..756.966 rows=4031352 loops=1  
                       Buffers: shared hit=167973  
                     Worker 14: actual time=0.005..759.423 rows=3994056 loops=1  
                       Buffers: shared hit=166419  
                     Worker 15: actual time=0.004..756.996 rows=4033464 loops=1  
                       Buffers: shared hit=168061  
                     Worker 16: actual time=0.006..759.134 rows=3994512 loops=1  
                       Buffers: shared hit=166438  
                     Worker 17: actual time=0.006..758.454 rows=4007814 loops=1  
                       Buffers: shared hit=166993  
                     Worker 18: actual time=0.005..758.647 rows=3899520 loops=1  
                       Buffers: shared hit=162480  
                     Worker 19: actual time=0.005..756.724 rows=4031736 loops=1  
                       Buffers: shared hit=167989  
                     Worker 20: actual time=0.006..755.985 rows=3945672 loops=1  
                       Buffers: shared hit=164403  
                     Worker 21: actual time=0.004..755.998 rows=4030968 loops=1  
                       Buffers: shared hit=167957  
                     Worker 22: actual time=0.006..757.390 rows=4018968 loops=1  
                       Buffers: shared hit=167457  
                     Worker 23: actual time=0.006..756.996 rows=4026984 loops=1  
                       Buffers: shared hit=167791  
 Planning Time: 0.070 ms  
 Execution Time: 1172.569 ms  
(112 rows)  

非并行

set max_parallel_workers_per_gather =0;  
  
 Aggregate  (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26976.475..26976.476 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=4166668  
   ->  Seq Scan on public.test  (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.011..17349.593 rows=100000000 loops=1)  
         Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64  
         Buffers: shared hit=4166668  
 Planning Time: 0.053 ms  
 Execution Time: 26976.515 ms  
(8 rows)  

小结

目前观察到bitcode的代码是每个worker都会产生一份,这个在实现bitcode cache后应该也是可以共用一份的。

开启bc代码的DUMP

set jit_dump_bitcode =on;

注意清理,bc文件不会自动清理。
cd $PGDATA

-rw------- 1 digoal digoal 126K Apr  4 12:59 34026.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34021.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34019.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34024.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34023.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34017.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34015.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34013.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34020.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34016.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34033.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34031.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34018.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34014.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34036.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34035.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34030.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34028.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34032.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34029.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34022.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34025.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr  4 12:59 34027.0.optimized.bc

使用llvm-bcanalyzer可以查看bitcode的内容

https://www.llvm.org/docs/CommandGuide/llvm-bcanalyzer.html

/usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./40818.7.bc

Summary of ./40818.7.bc:
         Total size: 1069440b/133680.00B/33420W
        Stream type: LLVM IR
  # Toplevel Blocks: 4

Per-block Summary:
  Block ID #0 (BLOCKINFO_BLOCK):
      Num Instances: 1
         Total Size: 672b/84.00B/21W
    Percent of file: 0.0628%
      Num SubBlocks: 0
        Num Abbrevs: 16
        Num Records: 3
    Percent Abbrevs: 0.0000%

        Record Histogram:
                  Count    # Bits     b/Rec   % Abv  Record Kind
                      3        60      20.0          SETBID

  Block ID #8 (MODULE_BLOCK):
      Num Instances: 1
         Total Size: 18530b/2316.25B/579W
    Percent of file: 1.7327%
      Num SubBlocks: 67
        Num Abbrevs: 3
        Num Records: 129
    Percent Abbrevs: 2.3256%

        Record Histogram:
                  Count    # Bits     b/Rec   % Abv  Record Kind
                     86     11904     138.4          FUNCTION
                     38      4813     126.7    2.63  GLOBALVAR
                      1        21            100.00  SOURCE_FILENAME
                      1        35            100.00  VSTOFFSET
                      1       465                    DATALAYOUT
                      1       303                    TRIPLE
                      1        21                    VERSION
........

参考

http://blog.jobbole.com/113684/

https://www.postgresql.org/docs/devel/static/jit-extensibility.html

https://www.postgresql.org/docs/devel/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

https://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-JIT-PROVIDER

https://www.postgresql.org/docs/devel/static/runtime-config-query.html#GUC-JIT-ABOVE-COST

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/jit/README;h=6271677163ac8acf9767370f3fff39b45f575dc2;hb=refs/heads/master

Using LLVM For Program Transformation

Flag Counter

digoal’s 大量PostgreSQL文章入口