PostgreSQL 11 preview - with_llvm JIT支持部署与试用
背景
PostgreSQL 11 JIT,目前支持tuple deform(将磁盘上的tuple转换为内存中TUPLE格式),以及表达式(select, where, 等语义中的表达式,操作符运算,UDF等)的动态编译。
对海量数据的计算(并且表达式或TUPLE DEFORM已成为瓶颈时)有加速效果。
特别适合OLAP场景的大量数据的复杂计算。
《分析加速引擎黑科技 - LLVM、列存、多核并行、算子复用 大联姻 - 一起来开启PostgreSQL的百宝箱》
部署PostgreSQL 11 with LLVM
参考
《PostgreSQL 10 + PostGIS + Sharding(pg_pathman) + MySQL(fdw外部表) on ECS 部署指南(适合新用户)》
以CentOS 7.x x64为例
1、install epel
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
2、
yum install -y cmake3
yum install -y python
3、LLVM
wget http://releases.llvm.org/6.0.0/llvm-6.0.0.src.tar.xz
tar -xvf llvm-6.0.0.src.tar.xz
4、CLANG
wget http://releases.llvm.org/6.0.0/cfe-6.0.0.src.tar.xz
tar -xvf cfe-6.0.0.src.tar.xz
mv cfe-6.0.0.src clang
mv clang llvm-6.0.0.src/tools
5、安装支持JIT的编译器llvm和clang
cd llvm-6.0.0.src
mkdir mybuilddir
cd mybuilddir
alias cmake=cmake3
export CXXFLAGS=-O2
cmake ..
cmake --build .
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm6.0.0 -P cmake_install.cmake
6、编译PG with llvm
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar -jxvf postgresql-snapshot.tar.bz2
cd postgresql-11devel
CLANG=/usr/local/llvm6.0.0/bin/clang LLVM_CONFIG=/usr/local/llvm6.0.0/bin/llvm-config ./configure --prefix=/home/digoal/pgsql11_llvm --with-llvm --with-blocksize=32 --with-segsize=2
make world -j 128
make install-world
7、pg with llvm的目录结构
bit code (二进制码)
digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd
/home/digoal/pgsql11_llvm/lib
digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll llvmjit*
-rwxr-xr-x 1 root root 709M Apr 4 06:58 llvmjit.so
-rw-r--r-- 1 root root 7.8K Apr 4 06:58 llvmjit_types.bc
digoal@iZbp13nu0s9j3x3op4zpd4Z-> pwd
/home/digoal/pgsql11_llvm/lib/bitcode
digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll
total 1.8M
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 adminpack
-rw-r--r-- 1 root root 1.4K Apr 4 06:58 adminpack.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 amcheck
-rw-r--r-- 1 root root 2.3K Apr 4 06:58 amcheck.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 auth_delay
-rw-r--r-- 1 root root 376 Apr 4 06:58 auth_delay.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 auto_explain
-rw-r--r-- 1 root root 1.7K Apr 4 06:58 auto_explain.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 autoinc
-rw-r--r-- 1 root root 528 Apr 4 06:58 autoinc.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 bloom
-rw-r--r-- 1 root root 1.8K Apr 4 06:58 bloom.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 btree_gin
-rw-r--r-- 1 root root 5.8K Apr 4 06:58 btree_gin.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 btree_gist
-rw-r--r-- 1 root root 20K Apr 4 06:58 btree_gist.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 citext
-rw-r--r-- 1 root root 1.3K Apr 4 06:58 citext.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 cube
-rw-r--r-- 1 root root 6.4K Apr 4 06:58 cube.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dblink
-rw-r--r-- 1 root root 5.6K Apr 4 06:58 dblink.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dict_int
-rw-r--r-- 1 root root 480 Apr 4 06:58 dict_int.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 dict_xsyn
-rw-r--r-- 1 root root 656 Apr 4 06:58 dict_xsyn.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 earthdistance
-rw-r--r-- 1 root root 280 Apr 4 06:58 earthdistance.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 file_fdw
-rw-r--r-- 1 root root 1.8K Apr 4 06:58 file_fdw.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 fuzzystrmatch
-rw-r--r-- 1 root root 5.7K Apr 4 06:58 fuzzystrmatch.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 hstore
-rw-r--r-- 1 root root 6.9K Apr 4 06:58 hstore.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 insert_username
-rw-r--r-- 1 root root 544 Apr 4 06:58 insert_username.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 _int
-rw-r--r-- 1 root root 6.9K Apr 4 06:58 _int.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 isn
-rw-r--r-- 1 root root 64K Apr 4 06:58 isn.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 lo
-rw-r--r-- 1 root root 432 Apr 4 06:58 lo.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 ltree
-rw-r--r-- 1 root root 7.4K Apr 4 06:58 ltree.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 moddatetime
-rw-r--r-- 1 root root 592 Apr 4 06:58 moddatetime.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pageinspect
-rw-r--r-- 1 root root 6.3K Apr 4 06:58 pageinspect.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 passwordcheck
-rw-r--r-- 1 root root 436 Apr 4 06:58 passwordcheck.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_buffercache
-rw-r--r-- 1 root root 660 Apr 4 06:58 pg_buffercache.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgcrypto
-rw-r--r-- 1 root root 23K Apr 4 06:58 pgcrypto.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_freespacemap
-rw-r--r-- 1 root root 372 Apr 4 06:58 pg_freespacemap.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_prewarm
-rw-r--r-- 1 root root 2.6K Apr 4 06:58 pg_prewarm.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgrowlocks
-rw-r--r-- 1 root root 968 Apr 4 06:58 pgrowlocks.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_stat_statements
-rw-r--r-- 1 root root 3.1K Apr 4 06:58 pg_stat_statements.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pgstattuple
-rw-r--r-- 1 root root 3.3K Apr 4 06:58 pgstattuple.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_trgm
-rw-r--r-- 1 root root 4.1K Apr 4 06:58 pg_trgm.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 pg_visibility
-rw-r--r-- 1 root root 1.1K Apr 4 06:58 pg_visibility.index.bc
drwxr-xr-x 25 root root 4.0K Apr 4 06:58 postgres
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 postgres_fdw
-rw-r--r-- 1 root root 12K Apr 4 06:58 postgres_fdw.index.bc
-rw-r--r-- 1 root root 1.3M Apr 4 06:58 postgres.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 refint
-rw-r--r-- 1 root root 1.9K Apr 4 06:58 refint.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 seg
-rw-r--r-- 1 root root 5.2K Apr 4 06:58 seg.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tablefunc
-rw-r--r-- 1 root root 2.1K Apr 4 06:58 tablefunc.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tcn
-rw-r--r-- 1 root root 584 Apr 4 06:58 tcn.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 test_decoding
-rw-r--r-- 1 root root 1.5K Apr 4 06:58 test_decoding.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 timetravel
-rw-r--r-- 1 root root 1.2K Apr 4 06:58 timetravel.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tsm_system_rows
-rw-r--r-- 1 root root 524 Apr 4 06:58 tsm_system_rows.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 tsm_system_time
-rw-r--r-- 1 root root 524 Apr 4 06:58 tsm_system_time.index.bc
drwxr-xr-x 2 root root 4.0K Apr 4 06:58 unaccent
-rw-r--r-- 1 root root 844 Apr 4 06:58 unaccent.index.bc
digoal@iZbp13nu0s9j3x3op4zpd4Z-> cd unaccent
digoal@iZbp13nu0s9j3x3op4zpd4Z-> ll
total 16K
-rw-r--r-- 1 root root 15K Apr 4 06:58 unaccent.bc
7.1、使用llvm-bcanalyzer观察bc的内容
cd /home/digoal/pgsql11_llvm/lib/bitcode/postgres/access/heap
/usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./heapam.bc
Summary of ./heapam.bc:
Total size: 1459456b/182432.00B/45608W
Stream type: LLVM IR
# Toplevel Blocks: 4
Per-block Summary:
Block ID #0 (BLOCKINFO_BLOCK):
Num Instances: 1
Total Size: 672b/84.00B/21W
Percent of file: 0.0460%
Num SubBlocks: 0
Num Abbrevs: 16
Num Records: 3
Percent Abbrevs: 0.0000%
Record Histogram:
Count # Bits b/Rec % Abv Record Kind
3 60 20.0 SETBID
Block ID #8 (MODULE_BLOCK):
Num Instances: 1
Total Size: 44789b/5598.62B/1399W
Percent of file: 3.0689%
Num SubBlocks: 79
Num Abbrevs: 3
Num Records: 317
Percent Abbrevs: 1.5773%
Record Histogram:
Count # Bits b/Rec % Abv Record Kind
222 31488 141.8 FUNCTION
89 11128 125.0 3.37 GLOBALVAR
1 207 HASH
1 57 100.00 SOURCE_FILENAME
1 35 100.00 VSTOFFSET
1 465 DATALAYOUT
1 303 TRIPLE
1 21 VERSION
......
8、设置环境变量
su - digoal
vi env_pg11_llvm.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=4000
export PGDATA=/data01/pg/pg_root$PGPORT
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql11_llvm
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
. ./env_pg11_llvm.sh
9、初始化数据库
initdb -D $PGDATA -U postgres -E UTF8 --locale=en_US.UTF8 -X /data02/pg/pg_wal_$PGPORT
10、配置postgresql.conf
listen_addresses = '0.0.0.0'
port = 4000
max_connections = 3000
unix_socket_directories = '/tmp,.'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 128GB
huge_pages = on
work_mem = 32MB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 500
bgwriter_lru_multiplier = 5.0
effective_io_concurrency = 0
max_worker_processes = 128
max_parallel_maintenance_workers = 2
max_parallel_workers_per_gather = 32
parallel_leader_participation = on
max_parallel_workers = 128
wal_level = minimal
synchronous_commit = off
wal_buffers = 128MB
wal_writer_delay = 10ms
wal_writer_flush_after = 8MB
checkpoint_timeout = 35min
max_wal_size = 256GB
min_wal_size = 64GB
max_wal_senders = 0
enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_indexonlyscan = on
enable_material = on
enable_mergejoin = on
enable_nestloop = on
enable_parallel_append = on
enable_seqscan = on
enable_sort = on
enable_tidscan = on
enable_partitionwise_join = on
enable_partitionwise_aggregate = on
enable_parallel_hash = on
random_page_cost = 1.1
effective_cache_size = 400GB
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'en_US.UTF8'
lc_monetary = 'en_US.UTF8'
lc_numeric = 'en_US.UTF8'
lc_time = 'en_US.UTF8'
default_text_search_config = 'pg_catalog.english'
# JIT相关参数
jit_above_cost = 100000
jit_optimize_above_cost = 500000
jit_inline_above_cost = 500000
dynamic_library_path = '$libdir'
jit = on
jit_provider = 'llvmjit'
JIT支持到什么程度了
目前已支持表达式、TUPLE DEFORM的JIT。
Currently PostgreSQL’s JIT implementation has support for accelerating expression evaluation and tuple deforming. Several other operations could be accelerated in the future.
Expression evaluation is used to evaluate WHERE clauses, target lists, aggregates and projections. It can be accelerated by generating code specific to each case.
Tuple deforming is the process of transforming an on-disk tuple (see Section 68.6) into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.
什么样的SQL适合JIT
1、需要处理的数据量庞大
2、每条记录需要大量的操作符参与计算
3、需要查询的字段靠后,需要先deform tuple中前面的字段
249 Currently expression evaluation and tuple deforming are JITed. Those
250 were chosen because they commonly are major CPU bottlenecks in
251 analytics queries, but are by no means the only potentially beneficial cases.
252
253 For JITing to be beneficial a piece of code first and foremost has to
254 be a CPU bottleneck. But also importantly, JITing can only be
255 beneficial if overhead can be removed by doing so. E.g. in the tuple
256 deforming case the knowledge about the number of columns and their
257 types can remove a significant number of branches, and in the
258 expression evaluation case a lot of indirect jumps/calls can be
259 removed. If neither of these is the case, JITing is a waste of
260 resources.
261
262 Future avenues for JITing are tuple sorting, COPY parsing/output
263 generation, and later compiling larger parts of queries.
PG优化器如何选择是否使用JIT
控制参数
1、jit (boolean)
Determines whether JIT may be used by PostgreSQL, if available (see Chapter 32). The default is on.
默认ON,表示开启JIT。
2、jit_above_cost (floating point)
Sets the planner’s cutoff above which JIT compilation is used as part of query execution (see Chapter 32). Performing JIT costs time but can accelerate query execution. The default is 100000.
默认100000,当planner发现COST大于这个值时,优化器会启用JIT动态编译。
3、jit_optimize_above_cost (floating point)
Sets the planner’s cutoff above which JIT compiled programs (see jit_above_cost) are optimized. Optimization initially takes time, but can improve execution speed. It is not meaningful to set this to a lower value than jit_above_cost. The default is 500000.
默认500000,当planner发现COST大于这个值时,优化器会启用JIT动态编译优化。
4、jit_inline_above_cost (floating point)
Sets the planner’s cutoff above which JIT compiled programs (see jit_above_cost) attempt to inline functions and operators. Inlining initially takes time, but can improve execution speed. It is unlikely to be beneficial to set jit_inline_above_cost below jit_optimize_above_cost. The default is 500000.
默认500000,当planner发现COST大于这个值时,优化器会对用户自定义函数、操作符(目前仅支持C, internal类型的函数)启用JIT优化。
https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE
5、jit_provider (string)
Determines which JIT provider (see JIT Extensibility) is used. The built-in default is llvmjit.
If set to a non-existent library JIT will not be available, but no error will be raised. This allows JIT support to be installed separately from the main PostgreSQL package. This parameter can only be set at server start.
为了让JIT支持更多的编译器,PG设计时对编译器的支持也是模块化的,通过jit_provider可以指定使用哪个编译器,当然这个需要实现对应的provider接口才行。
https://www.postgresql.org/docs/devel/static/jit-extensibility.html#JIT-PLUGGABLE
目前PG默认选择的是LLVM编译器,原因是LLVM友好的许可协议与PG的开源许可协议无冲突。第二方面是LLVM后面有很强大的公司在支撑,比如苹果。
52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
53 because it is developed by several large corporations and therefore
54 unlikely to be discontinued, because it has a license compatible with
55 PostgreSQL, and because its IR can be generated from C using the Clang
56 compiler.
控制代码
src/backend/jit/jit.c
/* GUCs */
bool jit_enabled = true;
char *jit_provider = "llvmjit";
bool jit_debugging_support = false;
bool jit_dump_bitcode = false;
bool jit_expressions = true;
bool jit_profiling_support = false;
bool jit_tuple_deforming = true;
double jit_above_cost = 100000;
double jit_inline_above_cost = 500000;
double jit_optimize_above_cost = 500000;
src/include/jit/jit.h
/* Flags determining what kind of JIT operations to perform */
#define PGJIT_NONE 0
#define PGJIT_PERFORM 1 << 0
#define PGJIT_OPT3 1 << 1
#define PGJIT_INLINE 1 << 2
#define PGJIT_EXPR 1 << 3
#define PGJIT_DEFORM 1 << 4
src/backend/optimizer/plan/planner.c
result->jitFlags = PGJIT_NONE;
if (jit_enabled && jit_above_cost >= 0 &&
top_plan->total_cost > jit_above_cost)
{
result->jitFlags |= PGJIT_PERFORM;
/*
* Decide how much effort should be put into generating better code.
*/
if (jit_optimize_above_cost >= 0 &&
top_plan->total_cost > jit_optimize_above_cost)
result->jitFlags |= PGJIT_OPT3;
if (jit_inline_above_cost >= 0 &&
top_plan->total_cost > jit_inline_above_cost)
result->jitFlags |= PGJIT_INLINE;
/*
* Decide which operations should be JITed.
*/
if (jit_expressions)
result->jitFlags |= PGJIT_EXPR;
if (jit_tuple_deforming)
result->jitFlags |= PGJIT_DEFORM;
}
开发参数
1、jit_debugging_support (boolean)
If LLVM has the required functionality, register generated functions with GDB. This makes debugging easier. The default setting is off, and can only be set at server start.
如果要使用GDB调试JIT动态产生的code,那么需要打开jit_debugging_support
2、jit_dump_bitcode (boolean)
Writes the generated LLVM IR out to the filesystem, inside data_directory. This is only useful for working on the internals of the JIT implementation. The default setting is off, and it can only be changed by a superuser.
是否导出JIT产生的中间代码。会写入$PGDATA目录中。
3、jit_expressions (boolean)
Determines whether expressions are JIT compiled, subject to costing decisions (see Section 32.2). The default is on.
是否对表达式实施jit。默认打开,关闭的话不会对表达式实施JIT.
4、jit_profiling_support (boolean)
If LLVM has the required functionality, emit required data to allow perf to profile functions generated by JIT. This writes out files to $HOME/.debug/jit/; the user is responsible for performing cleanup when desired. The default setting is off, and can only be set at server start.
如果要使用perf来分析JIT代码的profiling,那么需要打开jit_profiling_support这个参数。结果保留在$HOME/.debug/jit/
,注意自己清理这个目录。
5、jit_tuple_deforming (boolean)
Determines whether tuple deforming is JIT compiled, subject to costing decisions (see Section 32.2). The default is on.
是否对tuple deforming式实施jit。默认打开,关闭的话不会对tuple deforming(即将磁盘中的TUPLE格式转换为内存中的TUPLE格式)实施JIT.
JIT code gen有CACHE吗
暂时没有
225 Currently it is not yet possible to cache generated functions, even
226 though that'd be desirable from a performance point of view. The
227 problem is that the generated functions commonly contain pointers into
228 per-execution memory. The expression evaluation machinery needs to
229 be redesigned a bit to avoid that. Basically all per-execution memory
230 needs to be referenced as an offset to one block of memory stored in
231 an ExprState, rather than absolute pointers into memory.
232
233 Once that is addressed, adding an LRU cache that's keyed by the
234 generated LLVM IR will allow to use optimized functions even for
235 faster queries.
236
237 A longer term project is to move expression compilation to the planner
238 stage, allowing e.g. to tie compiled expressions to prepared
239 statements.
240
241 An even more advanced approach would be to use JIT with few
242 optimizations initially, and build an optimized version in the
243 background. But that's even further off.
244
目前支持的编译器有哪些
目前支持LLVM
52 PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
53 because it is developed by several large corporations and therefore
54 unlikely to be discontinued, because it has a license compatible with
55 PostgreSQL, and because its IR can be generated from C using the Clang
56 compiler.
199 It obviously is undesirable to maintain a second implementation of
200 commonly used functions, just for inlining purposes. Instead we take
201 advantage of the fact that the Clang compiler can emit LLVM IR.
通过实现jit provider可以扩展更多的编译器支持。
59 Shared Library Separation
60 -------------------------
61
62 To avoid the main PostgreSQL binary directly depending on LLVM, which
63 would prevent LLVM support being independently installed by OS package
64 managers, the LLVM dependent code is located in a shared library that
65 is loaded on-demand.
66
67 An additional benefit of doing so is that it is relatively easy to
68 evaluate JIT compilation that does not use LLVM, by changing out the
69 shared library used to provide JIT compilation.
70
71 To achieve this, code intending to perform JIT (e.g. expression evaluation)
72 calls an LLVM independent wrapper located in jit.c to do so. If the
73 shared library providing JIT support can be loaded (i.e. PostgreSQL was
74 compiled with LLVM support and the shared library is installed), the task
75 of JIT compiling an expression gets handed off to the shared library. This
76 obviously requires that the function in jit.c is allowed to fail in case
77 no JIT provider can be loaded.
78
79 Which shared library is loaded is determined by the jit_provider GUC,
80 defaulting to "llvmjit".
81
82 Cloistering code performing JIT into a shared library unfortunately
83 also means that code doing JIT compilation for various parts of code
84 has to be located separately from the code doing so without
85 JIT. E.g. the JIT version of execExprInterp.c is located in jit/llvm/
86 rather than executor/.
内置的类型转换在哪里
177 Instead there is one small file (llvmjit_types.c) which references each of
178 the types required for JITing. That file is translated to bitcode at
179 compile time, and loaded when LLVM is initialized in a backend.
已有的JIT bitcode代码在哪里
$pkglibdir/bitcode/postgres/
211 Similarly extensions can install code into
212 $pkglibdir/bitcode/[extension]/
213 accompanied by
214 $pkglibdir/bitcode/[extension].index.bc
性能对比
1、建表(65个字段,33个INT8,32个TEXT)
do language plpgsql $$
declare
sql text;
begin
sql := 'create table test(id int8,';
-- 32 个text字段
for i in 1..32 loop
sql := sql||'c'||i||' text default md5(random()::text),';
end loop;
-- 32 个int8字段
for i in 33..64 loop
sql := sql||'c'||i||' int8 default random()*10,';
end loop;
sql := rtrim(sql, ',');
sql := sql||')';
execute sql;
end;
$$;
2、写入1亿测试数据
for((i=1;i<=100;i++)) do nohup psql -c "insert into test select generate_series(1,1000000);" >/dev/null 2>&1 & done
空间占用127 GB
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+--------+-------------
public | test | table | postgres | 127 GB |
(1 row)
并行度24强制开关
set max_parallel_workers_per_gather =24;
set parallel_setup_cost =0;
set parallel_tuple_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
alter table test set (parallel_workers =24);
3、多表达式计算测试
explain select c33,c64,count(*),avg(c33),min(c34),max(c35),stddev(c36),sum(c37),stddev(greatest(c37,c38)),
sum(
case c38
when 0 then c39
when 1 then c40
when 2 then c41
when 3 then c42
when 4 then c43
when 5 then c44
when 6 then c45
when 7 then c46
when 8 then c47
when 9 then c48
when 10 then c49
when 11 then c50
when 12 then c51
when 13 then c52
when 14 then c53
when 15 then c54
when 16 then c55
else c56
end
) from test
where
c64^2>0
and sqrt(c64)>0
and hashtext(c1)>hashtext(c2)
and
pow(sqrt(abs(hashint8(case c38
when 0 then c39
when 1 then c40
when 2 then c41
when 3 then c42
when 4 then c43
when 5 then c44
when 6 then c45
when 7 then c46
when 8 then c47
when 9 then c48
when 10 then c49
when 11 then c50
when 12 then c51
when 13 then c52
when 14 then c53
when 15 then c54
when 16 then c55
else c56
end))),2) >0
group by c33,c64;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200)
Group Key: c33, c64
-> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200)
Workers Planned: 24
-> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200)
Group Key: c33, c64
-> Sort (cost=4545693.46..4545822.06 rows=51440 width=200)
Sort Key: c33, c64
-> Parallel Seq Scan on test (cost=0.00..4541668.12 rows=51440 width=200)
Filter: ((((c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((c64)::double precision) > '0'::double precision) AND (hashtext(c1) > hashtext(c2)) AND (pow(sqrt((abs(hashint8(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)))::double precision), '2'::double precision) > '0'::double precision))
JIT:
Functions: 18
Inlining: true
Optimization: true
(14 rows)
3.1、开启JIT
set jit=on;
并行
set max_parallel_workers_per_gather =24;
Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=16228.136..18083.962 rows=110 loops=1)
Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
Group Key: test.c33, test.c64
Buffers: shared hit=204165, temp read=14929 written=14950
-> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=16219.163..18075.339 rows=2750 loops=1)
Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))
Workers Planned: 24
Workers Launched: 24
Buffers: shared hit=204165, temp read=14929 written=14950
-> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=16055.979..17551.930 rows=110 loops=25)
Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
Group Key: test.c33, test.c64
Buffers: shared hit=4167196, temp read=304952 written=305369
Worker 0: actual time=16063.931..17380.570 rows=110 loops=1
Buffers: shared hit=146725, temp read=10743 written=10758
Worker 1: actual time=16157.211..17453.030 rows=110 loops=1
Buffers: shared hit=144252, temp read=10570 written=10584
Worker 2: actual time=15937.390..17766.085 rows=110 loops=1
Buffers: shared hit=203275, temp read=14873 written=14894
Worker 3: actual time=16087.670..17463.100 rows=110 loops=1
Buffers: shared hit=153917, temp read=11273 written=11288
Worker 4: actual time=16055.948..17369.731 rows=110 loops=1
Buffers: shared hit=146569, temp read=10723 written=10738
Worker 5: actual time=16142.715..17518.040 rows=110 loops=1
Buffers: shared hit=148380, temp read=10854 written=10869
Worker 6: actual time=16191.173..17582.148 rows=110 loops=1
Buffers: shared hit=154115, temp read=11273 written=11288
Worker 7: actual time=16125.974..17411.918 rows=110 loops=1
Buffers: shared hit=144344, temp read=10570 written=10584
Worker 8: actual time=16162.576..17452.647 rows=110 loops=1
Buffers: shared hit=144356, temp read=10570 written=10584
Worker 9: actual time=16028.487..17333.188 rows=110 loops=1
Buffers: shared hit=146326, temp read=10703 written=10718
Worker 10: actual time=15909.637..17710.282 rows=110 loops=1
Buffers: shared hit=202502, temp read=14819 written=14840
Worker 11: actual time=15991.109..17201.509 rows=110 loops=1
Buffers: shared hit=136312, temp read=9973 written=9987
Worker 12: actual time=15948.994..17758.986 rows=110 loops=1
Buffers: shared hit=202082, temp read=14788 written=14808
Worker 13: actual time=15923.985..17734.805 rows=110 loops=1
Buffers: shared hit=202136, temp read=14788 written=14808
Worker 14: actual time=16185.559..17477.490 rows=110 loops=1
Buffers: shared hit=144407, temp read=10570 written=10584
Worker 15: actual time=15957.127..17879.999 rows=110 loops=1
Buffers: shared hit=214616, temp read=15702 written=15724
Worker 16: actual time=16150.016..17429.737 rows=110 loops=1
Buffers: shared hit=138941, temp read=10168 written=10182
Worker 17: actual time=16036.894..17849.093 rows=110 loops=1
Buffers: shared hit=202016, temp read=14788 written=14808
Worker 18: actual time=16129.668..17369.734 rows=110 loops=1
Buffers: shared hit=138300, temp read=10126 written=10140
Worker 19: actual time=15948.382..17832.470 rows=110 loops=1
Buffers: shared hit=211842, temp read=15491 written=15512
Worker 20: actual time=16153.969..17352.508 rows=110 loops=1
Buffers: shared hit=134834, temp read=9867 written=9880
Worker 21: actual time=15916.013..17716.694 rows=110 loops=1
Buffers: shared hit=202104, temp read=14788 written=14808
Worker 22: actual time=16164.524..17462.568 rows=110 loops=1
Buffers: shared hit=144423, temp read=10570 written=10584
Worker 23: actual time=16046.586..17455.273 rows=110 loops=1
Buffers: shared hit=156257, temp read=11433 written=11449
-> Sort (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=16048.787..16592.827 rows=1900091 loops=25)
Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
Sort Key: test.c33, test.c64
Sort Method: external merge Disk: 477728kB
Worker 0: Sort Method: external merge Disk: 343776kB
Worker 1: Sort Method: external merge Disk: 338240kB
Worker 2: Sort Method: external merge Disk: 475936kB
Worker 3: Sort Method: external merge Disk: 360736kB
Worker 4: Sort Method: external merge Disk: 343136kB
Worker 5: Sort Method: external merge Disk: 347328kB
Worker 6: Sort Method: external merge Disk: 360736kB
Worker 7: Sort Method: external merge Disk: 338240kB
Worker 8: Sort Method: external merge Disk: 338240kB
Worker 9: Sort Method: external merge Disk: 342496kB
Worker 10: Sort Method: external merge Disk: 474208kB
Worker 11: Sort Method: external merge Disk: 319136kB
Worker 12: Sort Method: external merge Disk: 473216kB
Worker 13: Sort Method: external merge Disk: 473216kB
Worker 14: Sort Method: external merge Disk: 338240kB
Worker 15: Sort Method: external merge Disk: 502464kB
Worker 16: Sort Method: external merge Disk: 325376kB
Worker 17: Sort Method: external merge Disk: 473216kB
Worker 18: Sort Method: external merge Disk: 324032kB
Worker 19: Sort Method: external merge Disk: 495712kB
Worker 20: Sort Method: external merge Disk: 315744kB
Worker 21: Sort Method: external merge Disk: 473216kB
Worker 22: Sort Method: external merge Disk: 338240kB
Worker 23: Sort Method: external merge Disk: 365856kB
Buffers: shared hit=4167196, temp read=304952 written=305369
Worker 0: actual time=16057.644..16539.626 rows=1673456 loops=1
Buffers: shared hit=146725, temp read=10743 written=10758
Worker 1: actual time=16150.861..16625.428 rows=1646431 loops=1
Buffers: shared hit=144252, temp read=10570 written=10584
Worker 2: actual time=15928.651..16592.993 rows=2316826 loops=1
Buffers: shared hit=203275, temp read=14873 written=14894
Worker 3: actual time=16081.087..16581.918 rows=1755939 loops=1
Buffers: shared hit=153917, temp read=11273 written=11288
Worker 4: actual time=16049.644..16528.522 rows=1670397 loops=1
Buffers: shared hit=146569, temp read=10723 written=10738
Worker 5: actual time=16135.668..16623.247 rows=1690829 loops=1
Buffers: shared hit=148380, temp read=10854 written=10869
Worker 6: actual time=16184.539..16693.836 rows=1755942 loops=1
Buffers: shared hit=154115, temp read=11273 written=11288
Worker 7: actual time=16119.729..16583.823 rows=1646433 loops=1
Buffers: shared hit=144344, temp read=10570 written=10584
Worker 8: actual time=16156.343..16626.634 rows=1646432 loops=1
Buffers: shared hit=144356, temp read=10570 written=10584
Worker 9: actual time=16022.254..16496.534 rows=1667285 loops=1
Buffers: shared hit=146326, temp read=10703 written=10718
Worker 10: actual time=15901.059..16553.594 rows=2308407 loops=1
Buffers: shared hit=202502, temp read=14819 written=14840
Worker 11: actual time=15985.290..16423.172 rows=1553527 loops=1
Buffers: shared hit=136312, temp read=9973 written=9987
Worker 12: actual time=15940.471..16603.025 rows=2303469 loops=1
Buffers: shared hit=202082, temp read=14788 written=14808
Worker 13: actual time=15915.230..16578.124 rows=2303471 loops=1
Buffers: shared hit=202136, temp read=14788 written=14808
Worker 14: actual time=16179.341..16650.442 rows=1646430 loops=1
Buffers: shared hit=144407, temp read=10570 written=10584
Worker 15: actual time=15948.014..16649.356 rows=2445950 loops=1
Buffers: shared hit=214616, temp read=15702 written=15724
Worker 16: actual time=16144.032..16604.106 rows=1583954 loops=1
Buffers: shared hit=138941, temp read=10168 written=10182
Worker 17: actual time=16028.217..16692.633 rows=2303463 loops=1
Buffers: shared hit=202016, temp read=14788 written=14808
Worker 18: actual time=16123.611..16575.587 rows=1577397 loops=1
Buffers: shared hit=138300, temp read=10126 written=10140
Worker 19: actual time=15939.379..16625.915 rows=2412964 loops=1
Buffers: shared hit=211842, temp read=15491 written=15512
Worker 20: actual time=16148.053..16579.552 rows=1536937 loops=1
Buffers: shared hit=134834, temp read=9867 written=9880
Worker 21: actual time=15907.527..16565.235 rows=2303464 loops=1
Buffers: shared hit=202104, temp read=14788 written=14808
Worker 22: actual time=16158.176..16633.146 rows=1646438 loops=1
Buffers: shared hit=144423, temp read=10570 written=10584
Worker 23: actual time=16039.773..16559.599 rows=1780966 loops=1
Buffers: shared hit=156257, temp read=11433 written=11449
-> Parallel Seq Scan on public.test (cost=0.00..4541668.12 rows=51440 width=200) (actual time=1032.131..5845.529 rows=1900091 loops=25)
Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
Rows Removed by Filter: 2099909
Buffers: shared hit=4166668
Worker 0: actual time=1020.341..5255.168 rows=1673456 loops=1
Buffers: shared hit=146703
Worker 1: actual time=1027.407..5221.878 rows=1646431 loops=1
Buffers: shared hit=144230
Worker 2: actual time=1153.318..6915.975 rows=2316826 loops=1
Buffers: shared hit=203253
Worker 3: actual time=1020.220..5472.580 rows=1755939 loops=1
Buffers: shared hit=153895
Worker 4: actual time=1032.901..5292.200 rows=1670397 loops=1
Buffers: shared hit=146547
Worker 5: actual time=1017.394..5301.776 rows=1690829 loops=1
Buffers: shared hit=148358
Worker 6: actual time=1030.128..5511.666 rows=1755942 loops=1
Buffers: shared hit=154093
Worker 7: actual time=1030.999..5254.166 rows=1646433 loops=1
Buffers: shared hit=144322
Worker 8: actual time=1029.738..5242.343 rows=1646432 loops=1
Buffers: shared hit=144334
Worker 9: actual time=1043.564..5337.423 rows=1667285 loops=1
Buffers: shared hit=146304
Worker 10: actual time=1003.084..6826.056 rows=2308407 loops=1
Buffers: shared hit=202480
Worker 11: actual time=1041.330..5138.905 rows=1553527 loops=1
Buffers: shared hit=136290
Worker 12: actual time=1006.781..6753.862 rows=2303469 loops=1
Buffers: shared hit=202060
Worker 13: actual time=1004.776..6749.535 rows=2303471 loops=1
Buffers: shared hit=202114
Worker 14: actual time=1030.832..5281.034 rows=1646430 loops=1
Buffers: shared hit=144385
Worker 15: actual time=1003.335..7089.916 rows=2445950 loops=1
Buffers: shared hit=214594
Worker 16: actual time=1015.769..5129.373 rows=1583954 loops=1
Buffers: shared hit=138919
Worker 17: actual time=1003.077..6892.321 rows=2303463 loops=1
Buffers: shared hit=201994
Worker 18: actual time=1016.105..5052.138 rows=1577397 loops=1
Buffers: shared hit=138278
Worker 19: actual time=1006.413..6976.141 rows=2412964 loops=1
Buffers: shared hit=211820
Worker 20: actual time=1013.958..5004.057 rows=1536937 loops=1
Buffers: shared hit=134812
Worker 21: actual time=1009.717..6748.666 rows=2303464 loops=1
Buffers: shared hit=202082
Worker 22: actual time=1013.944..5248.687 rows=1646438 loops=1
Buffers: shared hit=144401
Worker 23: actual time=1009.779..5471.091 rows=1780966 loops=1
Buffers: shared hit=156235
Planning Time: 0.233 ms
JIT:
Functions: 18
Generation Time: 7.829 ms
Inlining: true
Inlining Time: 19.924 ms
Optimization: true
Optimization Time: 778.549 ms
Emission Time: 404.543 ms
Execution Time: 18161.029 ms
(201 rows)
非并行
set max_parallel_workers_per_gather =0;
HashAggregate (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=142124.876..142125.769 rows=110 loops=1)
Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
Group Key: test.c33, test.c64
Buffers: shared hit=4166668
-> Seq Scan on public.test (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=1127.028..114279.915 rows=47502283 loops=1)
Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
Rows Removed by Filter: 52497717
Buffers: shared hit=4166668
Planning Time: 0.668 ms
JIT:
Functions: 9
Generation Time: 6.997 ms
Inlining: true
Inlining Time: 16.558 ms
Optimization: true
Optimization Time: 739.139 ms
Emission Time: 358.289 ms
Execution Time: 142132.967 ms
(19 rows)
3.2、关闭JIT
set jit=off;
并行
set max_parallel_workers_per_gather =24;
Finalize GroupAggregate (cost=4545694.04..4549612.47 rows=121 width=200) (actual time=12229.706..14319.958 rows=110 loops=1)
Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
Group Key: test.c33, test.c64
Buffers: shared hit=193072, temp read=14121 written=14141
-> Gather Merge (cost=4545694.04..4549500.85 rows=2904 width=200) (actual time=12218.740..14311.155 rows=2750 loops=1)
Output: c33, c64, (PARTIAL count(*)), (PARTIAL avg(c33)), (PARTIAL min(c34)), (PARTIAL max(c35)), (PARTIAL stddev(c36)), (PARTIAL sum(c37)), (PARTIAL stddev(GREATEST(c37, c38))), (PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END))
Workers Planned: 24
Workers Launched: 24
Buffers: shared hit=193072, temp read=14121 written=14141
-> Partial GroupAggregate (cost=4545693.46..4549425.58 rows=121 width=200) (actual time=12085.108..13888.218 rows=110 loops=25)
Output: c33, c64, PARTIAL count(*), PARTIAL avg(c33), PARTIAL min(c34), PARTIAL max(c35), PARTIAL stddev(c36), PARTIAL sum(c37), PARTIAL stddev(GREATEST(c37, c38)), PARTIAL sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
Group Key: test.c33, test.c64
Buffers: shared hit=4167196, temp read=304952 written=305367
Worker 0: actual time=12000.792..13585.035 rows=110 loops=1
Buffers: shared hit=147042, temp read=10761 written=10776
Worker 1: actual time=12031.831..13687.539 rows=110 loops=1
Buffers: shared hit=153988, temp read=11273 written=11288
Worker 2: actual time=12146.820..13812.121 rows=110 loops=1
Buffers: shared hit=154021, temp read=11273 written=11288
Worker 3: actual time=12145.395..13804.585 rows=110 loops=1
Buffers: shared hit=154017, temp read=11273 written=11288
Worker 4: actual time=12091.839..14217.464 rows=110 loops=1
Buffers: shared hit=197518, temp read=14457 written=14477
Worker 5: actual time=12169.045..13787.539 rows=110 loops=1
Buffers: shared hit=150711, temp read=11032 written=11047
Worker 6: actual time=12141.908..13807.836 rows=110 loops=1
Buffers: shared hit=154079, temp read=11273 written=11288
Worker 7: actual time=12110.195..13777.285 rows=110 loops=1
Buffers: shared hit=153838, temp read=11273 written=11288
Worker 8: actual time=12137.576..13814.893 rows=110 loops=1
Buffers: shared hit=154004, temp read=11273 written=11288
Worker 9: actual time=12196.750..13862.298 rows=110 loops=1
Buffers: shared hit=154129, temp read=11273 written=11288
Worker 10: actual time=12092.448..14117.280 rows=110 loops=1
Buffers: shared hit=188201, temp read=13770 written=13789
Worker 11: actual time=12068.763..14188.286 rows=110 loops=1
Buffers: shared hit=197254, temp read=14421 written=14441
Worker 12: actual time=12190.847..13923.117 rows=110 loops=1
Buffers: shared hit=153959, temp read=11273 written=11288
Worker 13: actual time=12093.689..14117.677 rows=110 loops=1
Buffers: shared hit=188177, temp read=13771 written=13790
Worker 14: actual time=12141.815..13701.354 rows=110 loops=1
Buffers: shared hit=144501, temp read=10570 written=10584
Worker 15: actual time=12012.075..14027.824 rows=110 loops=1
Buffers: shared hit=186377, temp read=13640 written=13659
Worker 16: actual time=12092.582..14131.998 rows=110 loops=1
Buffers: shared hit=188052, temp read=13759 written=13778
Worker 17: actual time=11906.057..13579.607 rows=110 loops=1
Buffers: shared hit=154068, temp read=11273 written=11288
Worker 18: actual time=12179.217..13853.580 rows=110 loops=1
Buffers: shared hit=154191, temp read=11273 written=11288
Worker 19: actual time=11944.620..14048.474 rows=110 loops=1
Buffers: shared hit=194987, temp read=14285 written=14305
Worker 20: actual time=12152.533..13818.214 rows=110 loops=1
Buffers: shared hit=153932, temp read=11273 written=11288
Worker 21: actual time=12196.138..13864.195 rows=110 loops=1
Buffers: shared hit=154158, temp read=11273 written=11288
Worker 22: actual time=12091.831..14229.393 rows=110 loops=1
Buffers: shared hit=197454, temp read=14452 written=14472
Worker 23: actual time=11910.497..13476.474 rows=110 loops=1
Buffers: shared hit=145466, temp read=10637 written=10652
-> Sort (cost=4545693.46..4545822.06 rows=51440 width=200) (actual time=12076.303..12606.046 rows=1900091 loops=25)
Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
Sort Key: test.c33, test.c64
Sort Method: external merge Disk: 451872kB
Worker 0: Sort Method: external merge Disk: 344352kB
Worker 1: Sort Method: external merge Disk: 360736kB
Worker 2: Sort Method: external merge Disk: 360736kB
Worker 3: Sort Method: external merge Disk: 360736kB
Worker 4: Sort Method: external merge Disk: 462624kB
Worker 5: Sort Method: external merge Disk: 353024kB
Worker 6: Sort Method: external merge Disk: 360736kB
Worker 7: Sort Method: external merge Disk: 360736kB
Worker 8: Sort Method: external merge Disk: 360736kB
Worker 9: Sort Method: external merge Disk: 360736kB
Worker 10: Sort Method: external merge Disk: 440640kB
Worker 11: Sort Method: external merge Disk: 461472kB
Worker 12: Sort Method: external merge Disk: 360736kB
Worker 13: Sort Method: external merge Disk: 440672kB
Worker 14: Sort Method: external merge Disk: 338240kB
Worker 15: Sort Method: external merge Disk: 436480kB
Worker 16: Sort Method: external merge Disk: 440288kB
Worker 17: Sort Method: external merge Disk: 360736kB
Worker 18: Sort Method: external merge Disk: 360736kB
Worker 19: Sort Method: external merge Disk: 457120kB
Worker 20: Sort Method: external merge Disk: 360736kB
Worker 21: Sort Method: external merge Disk: 360736kB
Worker 22: Sort Method: external merge Disk: 462464kB
Worker 23: Sort Method: external merge Disk: 340384kB
Buffers: shared hit=4167196, temp read=304952 written=305367
Worker 0: actual time=11993.075..12463.819 rows=1676370 loops=1
Buffers: shared hit=147042, temp read=10761 written=10776
Worker 1: actual time=12023.856..12510.021 rows=1755946 loops=1
Buffers: shared hit=153988, temp read=11273 written=11288
Worker 2: actual time=12138.628..12631.082 rows=1755945 loops=1
Buffers: shared hit=154021, temp read=11273 written=11288
Worker 3: actual time=12137.214..12626.779 rows=1755935 loops=1
Buffers: shared hit=154017, temp read=11273 written=11288
Worker 4: actual time=12081.480..12703.075 rows=2251992 loops=1
Buffers: shared hit=197518, temp read=14457 written=14477
Worker 5: actual time=12161.027..12638.282 rows=1718516 loops=1
Buffers: shared hit=150711, temp read=11032 written=11047
Worker 6: actual time=12133.786..12626.911 rows=1755948 loops=1
Buffers: shared hit=154079, temp read=11273 written=11288
Worker 7: actual time=12102.069..12591.735 rows=1755947 loops=1
Buffers: shared hit=153838, temp read=11273 written=11288
Worker 8: actual time=12129.413..12622.455 rows=1755936 loops=1
Buffers: shared hit=154004, temp read=11273 written=11288
Worker 9: actual time=12188.648..12679.617 rows=1755941 loops=1
Buffers: shared hit=154129, temp read=11273 written=11288
Worker 10: actual time=12082.492..12676.479 rows=2145118 loops=1
Buffers: shared hit=188201, temp read=13770 written=13789
Worker 11: actual time=12058.276..12683.336 rows=2246436 loops=1
Buffers: shared hit=197254, temp read=14421 written=14441
Worker 12: actual time=12182.379..12683.611 rows=1755944 loops=1
Buffers: shared hit=153959, temp read=11273 written=11288
Worker 13: actual time=12083.860..12673.566 rows=2145124 loops=1
Buffers: shared hit=188177, temp read=13771 written=13790
Worker 14: actual time=12134.030..12593.859 rows=1646436 loops=1
Buffers: shared hit=144501, temp read=10570 written=10584
Worker 15: actual time=12002.363..12594.854 rows=2124740 loops=1
Buffers: shared hit=186377, temp read=13640 written=13659
Worker 16: actual time=12082.700..12682.126 rows=2143293 loops=1
Buffers: shared hit=188052, temp read=13759 written=13778
Worker 17: actual time=11897.939..12391.175 rows=1755948 loops=1
Buffers: shared hit=154068, temp read=11273 written=11288
Worker 18: actual time=12171.070..12661.251 rows=1755944 loops=1
Buffers: shared hit=154191, temp read=11273 written=11288
Worker 19: actual time=11934.165..12549.351 rows=2225151 loops=1
Buffers: shared hit=194987, temp read=14285 written=14305
Worker 20: actual time=12144.452..12632.096 rows=1755935 loops=1
Buffers: shared hit=153932, temp read=11273 written=11288
Worker 21: actual time=12188.108..12679.023 rows=1755934 loops=1
Buffers: shared hit=154158, temp read=11273 written=11288
Worker 22: actual time=12081.380..12710.648 rows=2251317 loops=1
Buffers: shared hit=197454, temp read=14452 written=14472
Worker 23: actual time=11902.854..12362.911 rows=1656889 loops=1
Buffers: shared hit=145466, temp read=10637 written=10652
-> Parallel Seq Scan on public.test (cost=0.00..4541668.12 rows=51440 width=200) (actual time=0.026..4627.345 rows=1900091 loops=25)
Output: c33, c64, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56
Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
Rows Removed by Filter: 2099909
Buffers: shared hit=4166668
Worker 0: actual time=0.024..4041.154 rows=1676370 loops=1
Buffers: shared hit=147020
Worker 1: actual time=0.026..4232.052 rows=1755946 loops=1
Buffers: shared hit=153966
Worker 2: actual time=0.025..4275.863 rows=1755945 loops=1
Buffers: shared hit=153999
Worker 3: actual time=0.024..4268.778 rows=1755935 loops=1
Buffers: shared hit=153995
Worker 4: actual time=0.025..5517.396 rows=2251992 loops=1
Buffers: shared hit=197496
Worker 5: actual time=0.024..4173.081 rows=1718516 loops=1
Buffers: shared hit=150689
Worker 6: actual time=0.025..4315.820 rows=1755948 loops=1
Buffers: shared hit=154057
Worker 7: actual time=0.028..4271.339 rows=1755947 loops=1
Buffers: shared hit=153816
Worker 8: actual time=0.025..4270.376 rows=1755936 loops=1
Buffers: shared hit=153982
Worker 9: actual time=0.032..4301.026 rows=1755941 loops=1
Buffers: shared hit=154107
Worker 10: actual time=0.025..5193.043 rows=2145118 loops=1
Buffers: shared hit=188179
Worker 11: actual time=0.024..5393.193 rows=2246436 loops=1
Buffers: shared hit=197232
Worker 12: actual time=0.026..4288.686 rows=1755944 loops=1
Buffers: shared hit=153937
Worker 13: actual time=0.026..5177.502 rows=2145124 loops=1
Buffers: shared hit=188155
Worker 14: actual time=0.031..4098.000 rows=1646436 loops=1
Buffers: shared hit=144479
Worker 15: actual time=0.023..5155.199 rows=2124740 loops=1
Buffers: shared hit=186355
Worker 16: actual time=0.024..5205.816 rows=2143293 loops=1
Buffers: shared hit=188030
Worker 17: actual time=0.024..4334.653 rows=1755948 loops=1
Buffers: shared hit=154046
Worker 18: actual time=0.030..4290.928 rows=1755944 loops=1
Buffers: shared hit=154169
Worker 19: actual time=0.025..5447.905 rows=2225151 loops=1
Buffers: shared hit=194965
Worker 20: actual time=0.034..4296.168 rows=1755935 loops=1
Buffers: shared hit=153910
Worker 21: actual time=0.025..4273.822 rows=1755934 loops=1
Buffers: shared hit=154136
Worker 22: actual time=0.026..5437.385 rows=2251317 loops=1
Buffers: shared hit=197432
Worker 23: actual time=0.027..4039.679 rows=1656889 loops=1
Buffers: shared hit=145444
Planning Time: 0.230 ms
Execution Time: 14383.790 ms
(193 rows)
非并行
set max_parallel_workers_per_gather =0;
HashAggregate (cost=13253090.64..13253093.36 rows=121 width=200) (actual time=135995.544..135996.426 rows=110 loops=1)
Output: c33, c64, count(*), avg(c33), min(c34), max(c35), stddev(c36), sum(c37), stddev(GREATEST(c37, c38)), sum(CASE c38 WHEN 0 THEN c39 WHEN 1 THEN c40 WHEN 2 THEN c41 WHEN 3 THEN c42 WHEN 4 THEN c43 WHEN 5 THEN c44 WHEN 6 THEN c45 WHEN 7 THEN c46 WHEN 8 THEN c47 WHEN 9 THEN c48 WHEN 10 THEN c49 WHEN 11 THEN c50 WHEN 12 THEN c51 WHEN 13 THEN c52 WHEN 14 THEN c53 WHEN 15 THEN c54 WHEN 16 THEN c55 ELSE c56 END)
Group Key: test.c33, test.c64
Buffers: shared hit=4166668
-> Seq Scan on public.test (cost=0.00..13166670.88 rows=1234568 width=200) (actual time=0.026..101059.780 rows=47502283 loops=1)
Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
Filter: ((((test.c64)::double precision ^ '2'::double precision) > '0'::double precision) AND (sqrt((test.c64)::double precision) > '0'::double precision) AND (hashtext(test.c1) > hashtext(test.c2)) AND (pow(sqrt((abs(hashint8(CASE test.c38 WHEN 0 THEN test.c39 WHEN 1 THEN test.c40 WHEN 2 THEN test.c41 WHEN 3 THEN test.c42 WHEN 4 THEN test.c43 WHEN 5 THEN test.c44 WHEN 6 THEN test.c45 WHEN 7 THEN test.c46 WHEN 8 THEN test.c47 WHEN 9 THEN test.c48 WHEN 10 THEN test.c49 WHEN 11 THEN test.c50 WHEN 12 THEN test.c51 WHEN 13 THEN test.c52 WHEN 14 THEN test.c53 WHEN 15 THEN test.c54 WHEN 16 THEN test.c55 ELSE test.c56 END)))::double precision), '2'::double precision) > '0'::double precision))
Rows Removed by Filter: 52497717
Buffers: shared hit=4166668
Planning Time: 0.202 ms
Execution Time: 135996.540 ms
(11 rows)
4、少量表达式性能对比
explain select count(*) from test;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8)
-> Gather (cost=4218751.35..4218751.36 rows=24 width=8)
Workers Planned: 24
-> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8)
-> Parallel Seq Scan on test (cost=0.00..4208334.68 rows=4166668 width=0)
JIT:
Functions: 5
Inlining: true
Optimization: true
(9 rows)
4.1、开启JIT
set jit=on;
并行
set max_parallel_workers_per_gather =24;
Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1195.630..1195.630 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=171976
-> Gather (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1195.599..1195.616 rows=25 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 24
Workers Launched: 24
Buffers: shared hit=171976
-> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1172.856..1172.856 rows=1 loops=25)
Output: PARTIAL count(*)
Buffers: shared hit=4166668
Worker 0: actual time=1171.634..1171.634 rows=1 loops=1
Buffers: shared hit=166655
Worker 1: actual time=1171.631..1171.631 rows=1 loops=1
Buffers: shared hit=166878
Worker 2: actual time=1171.533..1171.533 rows=1 loops=1
Buffers: shared hit=166588
Worker 3: actual time=1171.634..1171.634 rows=1 loops=1
Buffers: shared hit=166659
Worker 4: actual time=1172.296..1172.296 rows=1 loops=1
Buffers: shared hit=165127
Worker 5: actual time=1171.637..1171.637 rows=1 loops=1
Buffers: shared hit=166191
Worker 6: actual time=1172.351..1172.351 rows=1 loops=1
Buffers: shared hit=166251
Worker 7: actual time=1171.634..1171.634 rows=1 loops=1
Buffers: shared hit=166945
Worker 8: actual time=1172.317..1172.317 rows=1 loops=1
Buffers: shared hit=166515
Worker 9: actual time=1171.632..1171.632 rows=1 loops=1
Buffers: shared hit=166900
Worker 10: actual time=1172.379..1172.379 rows=1 loops=1
Buffers: shared hit=167330
Worker 11: actual time=1171.555..1171.555 rows=1 loops=1
Buffers: shared hit=165973
Worker 12: actual time=1172.369..1172.369 rows=1 loops=1
Buffers: shared hit=167326
Worker 13: actual time=1172.361..1172.361 rows=1 loops=1
Buffers: shared hit=167240
Worker 14: actual time=1171.603..1171.603 rows=1 loops=1
Buffers: shared hit=165948
Worker 15: actual time=1172.308..1172.308 rows=1 loops=1
Buffers: shared hit=167271
Worker 16: actual time=1171.627..1171.627 rows=1 loops=1
Buffers: shared hit=166118
Worker 17: actual time=1172.363..1172.363 rows=1 loops=1
Buffers: shared hit=166540
Worker 18: actual time=1171.627..1171.628 rows=1 loops=1
Buffers: shared hit=161783
Worker 19: actual time=1172.367..1172.367 rows=1 loops=1
Buffers: shared hit=167241
Worker 20: actual time=1171.622..1171.622 rows=1 loops=1
Buffers: shared hit=166895
Worker 21: actual time=1172.310..1172.311 rows=1 loops=1
Buffers: shared hit=166291
Worker 22: actual time=1171.634..1171.634 rows=1 loops=1
Buffers: shared hit=166817
Worker 23: actual time=1172.357..1172.357 rows=1 loops=1
Buffers: shared hit=167210
-> Parallel Seq Scan on public.test (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.021 rows=4000000 loops=25)
Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
Buffers: shared hit=4166668
Worker 0: actual time=0.005..752.685 rows=3999720 loops=1
Buffers: shared hit=166655
Worker 1: actual time=0.005..753.934 rows=4005072 loops=1
Buffers: shared hit=166878
Worker 2: actual time=0.006..752.288 rows=3998112 loops=1
Buffers: shared hit=166588
Worker 3: actual time=0.005..749.831 rows=3999816 loops=1
Buffers: shared hit=166659
Worker 4: actual time=0.005..754.798 rows=3963048 loops=1
Buffers: shared hit=165127
Worker 5: actual time=0.006..753.153 rows=3988584 loops=1
Buffers: shared hit=166191
Worker 6: actual time=0.005..755.385 rows=3990024 loops=1
Buffers: shared hit=166251
Worker 7: actual time=0.005..754.636 rows=4006667 loops=1
Buffers: shared hit=166945
Worker 8: actual time=0.005..757.278 rows=3996360 loops=1
Buffers: shared hit=166515
Worker 9: actual time=0.005..752.936 rows=4005600 loops=1
Buffers: shared hit=166900
Worker 10: actual time=0.006..755.266 rows=4015920 loops=1
Buffers: shared hit=167330
Worker 11: actual time=0.005..755.747 rows=3983352 loops=1
Buffers: shared hit=165973
Worker 12: actual time=0.005..755.743 rows=4015824 loops=1
Buffers: shared hit=167326
Worker 13: actual time=0.005..756.052 rows=4013742 loops=1
Buffers: shared hit=167240
Worker 14: actual time=0.006..755.470 rows=3982752 loops=1
Buffers: shared hit=165948
Worker 15: actual time=0.005..755.769 rows=4014504 loops=1
Buffers: shared hit=167271
Worker 16: actual time=0.006..755.859 rows=3986832 loops=1
Buffers: shared hit=166118
Worker 17: actual time=0.005..757.579 rows=3996959 loops=1
Buffers: shared hit=166540
Worker 18: actual time=0.005..756.139 rows=3882792 loops=1
Buffers: shared hit=161783
Worker 19: actual time=0.005..755.706 rows=4013784 loops=1
Buffers: shared hit=167241
Worker 20: actual time=0.005..753.655 rows=4005480 loops=1
Buffers: shared hit=166895
Worker 21: actual time=0.005..757.511 rows=3990984 loops=1
Buffers: shared hit=166291
Worker 22: actual time=0.005..753.577 rows=4003608 loops=1
Buffers: shared hit=166817
Worker 23: actual time=0.005..755.443 rows=4013040 loops=1
Buffers: shared hit=167210
Planning Time: 0.192 ms
JIT:
Functions: 5
Generation Time: 1.082 ms
Inlining: true
Inlining Time: 2.907 ms
Optimization: true
Optimization Time: 23.873 ms
Emission Time: 12.577 ms
Execution Time: 1211.337 ms
(120 rows)
非并行
set max_parallel_workers_per_gather =0;
Aggregate (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26222.597..26222.598 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=4166668
-> Seq Scan on public.test (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.016..17338.548 rows=100000000 loops=1)
Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
Buffers: shared hit=4166668
Planning Time: 0.056 ms
JIT:
Functions: 2
Generation Time: 0.457 ms
Inlining: true
Inlining Time: 1.355 ms
Optimization: true
Optimization Time: 8.269 ms
Emission Time: 5.354 ms
Execution Time: 26223.178 ms
(16 rows)
4.2、关闭JIT
set jit=off;
并行
set max_parallel_workers_per_gather =24;
Finalize Aggregate (cost=4218751.42..4218751.43 rows=1 width=8) (actual time=1166.951..1166.951 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=170684
-> Gather (cost=4218751.35..4218751.36 rows=24 width=8) (actual time=1166.923..1166.939 rows=25 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 24
Workers Launched: 24
Buffers: shared hit=170684
-> Partial Aggregate (cost=4218751.35..4218751.36 rows=1 width=8) (actual time=1146.387..1146.388 rows=1 loops=25)
Output: PARTIAL count(*)
Buffers: shared hit=4166668
Worker 0: actual time=1145.584..1145.584 rows=1 loops=1
Buffers: shared hit=167586
Worker 1: actual time=1145.605..1145.605 rows=1 loops=1
Buffers: shared hit=167235
Worker 2: actual time=1145.517..1145.517 rows=1 loops=1
Buffers: shared hit=164055
Worker 3: actual time=1145.537..1145.537 rows=1 loops=1
Buffers: shared hit=166048
Worker 4: actual time=1145.625..1145.626 rows=1 loops=1
Buffers: shared hit=165348
Worker 5: actual time=1145.621..1145.621 rows=1 loops=1
Buffers: shared hit=167426
Worker 6: actual time=1145.746..1145.746 rows=1 loops=1
Buffers: shared hit=166945
Worker 7: actual time=1145.707..1145.707 rows=1 loops=1
Buffers: shared hit=166628
Worker 8: actual time=1143.795..1143.796 rows=1 loops=1
Buffers: shared hit=163445
Worker 9: actual time=1145.555..1145.556 rows=1 loops=1
Buffers: shared hit=166960
Worker 10: actual time=1145.605..1145.605 rows=1 loops=1
Buffers: shared hit=166142
Worker 11: actual time=1145.633..1145.633 rows=1 loops=1
Buffers: shared hit=166090
Worker 12: actual time=1145.553..1145.553 rows=1 loops=1
Buffers: shared hit=168115
Worker 13: actual time=1145.819..1145.819 rows=1 loops=1
Buffers: shared hit=167973
Worker 14: actual time=1145.615..1145.615 rows=1 loops=1
Buffers: shared hit=166419
Worker 15: actual time=1145.558..1145.558 rows=1 loops=1
Buffers: shared hit=168061
Worker 16: actual time=1145.527..1145.527 rows=1 loops=1
Buffers: shared hit=166438
Worker 17: actual time=1146.062..1146.062 rows=1 loops=1
Buffers: shared hit=166993
Worker 18: actual time=1145.629..1145.629 rows=1 loops=1
Buffers: shared hit=162480
Worker 19: actual time=1146.053..1146.054 rows=1 loops=1
Buffers: shared hit=167989
Worker 20: actual time=1144.864..1144.864 rows=1 loops=1
Buffers: shared hit=164403
Worker 21: actual time=1145.754..1145.754 rows=1 loops=1
Buffers: shared hit=167957
Worker 22: actual time=1145.625..1145.625 rows=1 loops=1
Buffers: shared hit=167457
Worker 23: actual time=1146.064..1146.064 rows=1 loops=1
Buffers: shared hit=167791
-> Parallel Seq Scan on public.test (cost=0.00..4208334.68 rows=4166668 width=0) (actual time=0.005..756.396 rows=4000000 loops=25)
Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
Buffers: shared hit=4166668
Worker 0: actual time=0.005..756.202 rows=4022064 loops=1
Buffers: shared hit=167586
Worker 1: actual time=0.005..755.320 rows=4013627 loops=1
Buffers: shared hit=167235
Worker 2: actual time=0.005..744.106 rows=3937320 loops=1
Buffers: shared hit=164055
Worker 3: actual time=0.005..751.125 rows=3985152 loops=1
Buffers: shared hit=166048
Worker 4: actual time=0.005..753.618 rows=3968352 loops=1
Buffers: shared hit=165348
Worker 5: actual time=0.005..756.612 rows=4018224 loops=1
Buffers: shared hit=167426
Worker 6: actual time=0.004..758.326 rows=4006680 loops=1
Buffers: shared hit=166945
Worker 7: actual time=0.005..759.780 rows=3999071 loops=1
Buffers: shared hit=166628
Worker 8: actual time=0.006..756.676 rows=3922680 loops=1
Buffers: shared hit=163445
Worker 9: actual time=0.005..751.844 rows=4007040 loops=1
Buffers: shared hit=166960
Worker 10: actual time=0.005..749.207 rows=3987408 loops=1
Buffers: shared hit=166142
Worker 11: actual time=0.006..757.084 rows=3986160 loops=1
Buffers: shared hit=166090
Worker 12: actual time=0.004..756.443 rows=4034760 loops=1
Buffers: shared hit=168115
Worker 13: actual time=0.005..756.966 rows=4031352 loops=1
Buffers: shared hit=167973
Worker 14: actual time=0.005..759.423 rows=3994056 loops=1
Buffers: shared hit=166419
Worker 15: actual time=0.004..756.996 rows=4033464 loops=1
Buffers: shared hit=168061
Worker 16: actual time=0.006..759.134 rows=3994512 loops=1
Buffers: shared hit=166438
Worker 17: actual time=0.006..758.454 rows=4007814 loops=1
Buffers: shared hit=166993
Worker 18: actual time=0.005..758.647 rows=3899520 loops=1
Buffers: shared hit=162480
Worker 19: actual time=0.005..756.724 rows=4031736 loops=1
Buffers: shared hit=167989
Worker 20: actual time=0.006..755.985 rows=3945672 loops=1
Buffers: shared hit=164403
Worker 21: actual time=0.004..755.998 rows=4030968 loops=1
Buffers: shared hit=167957
Worker 22: actual time=0.006..757.390 rows=4018968 loops=1
Buffers: shared hit=167457
Worker 23: actual time=0.006..756.996 rows=4026984 loops=1
Buffers: shared hit=167791
Planning Time: 0.070 ms
Execution Time: 1172.569 ms
(112 rows)
非并行
set max_parallel_workers_per_gather =0;
Aggregate (cost=5416668.40..5416668.41 rows=1 width=8) (actual time=26976.475..26976.476 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=4166668
-> Seq Scan on public.test (cost=0.00..5166668.32 rows=100000032 width=0) (actual time=0.011..17349.593 rows=100000000 loops=1)
Output: id, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64
Buffers: shared hit=4166668
Planning Time: 0.053 ms
Execution Time: 26976.515 ms
(8 rows)
小结
目前观察到bitcode的代码是每个worker都会产生一份,这个在实现bitcode cache后应该也是可以共用一份的。
开启bc代码的DUMP
set jit_dump_bitcode =on;
注意清理,bc文件不会自动清理。
cd $PGDATA
-rw------- 1 digoal digoal 126K Apr 4 12:59 34026.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34021.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34019.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34024.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34023.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34017.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34015.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34013.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34020.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34016.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34033.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34031.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34018.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34014.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34036.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34035.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34030.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34028.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34032.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34029.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34022.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34025.0.optimized.bc
-rw------- 1 digoal digoal 126K Apr 4 12:59 34027.0.optimized.bc
使用llvm-bcanalyzer可以查看bitcode的内容
https://www.llvm.org/docs/CommandGuide/llvm-bcanalyzer.html
/usr/local/llvm6.0.0/bin/llvm-bcanalyzer ./40818.7.bc
Summary of ./40818.7.bc:
Total size: 1069440b/133680.00B/33420W
Stream type: LLVM IR
# Toplevel Blocks: 4
Per-block Summary:
Block ID #0 (BLOCKINFO_BLOCK):
Num Instances: 1
Total Size: 672b/84.00B/21W
Percent of file: 0.0628%
Num SubBlocks: 0
Num Abbrevs: 16
Num Records: 3
Percent Abbrevs: 0.0000%
Record Histogram:
Count # Bits b/Rec % Abv Record Kind
3 60 20.0 SETBID
Block ID #8 (MODULE_BLOCK):
Num Instances: 1
Total Size: 18530b/2316.25B/579W
Percent of file: 1.7327%
Num SubBlocks: 67
Num Abbrevs: 3
Num Records: 129
Percent Abbrevs: 2.3256%
Record Histogram:
Count # Bits b/Rec % Abv Record Kind
86 11904 138.4 FUNCTION
38 4813 126.7 2.63 GLOBALVAR
1 21 100.00 SOURCE_FILENAME
1 35 100.00 VSTOFFSET
1 465 DATALAYOUT
1 303 TRIPLE
1 21 VERSION
........
参考
http://blog.jobbole.com/113684/
https://www.postgresql.org/docs/devel/static/jit-extensibility.html
https://www.postgresql.org/docs/devel/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
https://www.postgresql.org/docs/devel/static/runtime-config-client.html#GUC-JIT-PROVIDER
https://www.postgresql.org/docs/devel/static/runtime-config-query.html#GUC-JIT-ABOVE-COST
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/jit/README;h=6271677163ac8acf9767370f3fff39b45f575dc2;hb=refs/heads/master
Using LLVM For Program Transformation