PostgreSQL 11 preview - parallel hash (含hash JOIN , hash agg等) 性能极大提升
背景
两张较大的表做JOIN,并且经过WHERE条件筛选出来的量依旧很大时,通常PostgreSQL会选择HASH JOIN,提高JOIN的性能。
PostgreSQL 11增加了一个重量级特性:
并行哈希JOIN。
https://commitfest.postgresql.org/16/871/
拥有了这个特性,大表JOIN,或者过滤结果依旧很大的表JOIN,性能有了很大的提升。
例子1 - 1000万 JOIN 1000万
1、部署PostgreSQL 11,极简命令如下
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar -jxvf postgresql-snapshot.tar.bz2
cd postgresql-11devel
./configure --prefix=/home/digoal/pg11
make world -j 128
make install-world
2、一些参数
port = 9999
max_connections = 1000
unix_socket_directories = '.'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 32GB
dynamic_shared_memory_type = posix
bgwriter_delay = 10ms
effective_io_concurrency = 0
max_worker_processes = 128
max_parallel_workers_per_gather = 32
parallel_leader_participation = on
max_parallel_workers = 128
synchronous_commit = off
wal_buffers = 128MB
wal_writer_delay = 10ms
max_wal_size = 64GB
min_wal_size = 16GB
checkpoint_completion_target = 0.1
random_page_cost = 1.0
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = on
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'PRC'
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
3、测试,新建测试表。
create table tbl(id int, c1 int);
4、写入1000万数据。
insert into tbl select generate_series(1,10000000), random()*99;
5、自关联,统计。
alter table tbl set (parallel_workers =16);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_index_scan_size =0;
set min_parallel_table_scan_size =0;
set work_mem='64MB';
执行计划如下,可以明显的看到使用了并行HASH JOIN
postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=114753.97..114766.97 rows=100 width=12)
Group Key: t1.c1
-> Sort (cost=114753.97..114757.97 rows=1600 width=12)
Sort Key: t1.c1
-> Gather (cost=114667.82..114668.82 rows=1600 width=12)
Workers Planned: 16
-> Partial HashAggregate (cost=114667.82..114668.82 rows=100 width=12)
Group Key: t1.c1
-- 并行HASH join
-> Parallel Hash Join (cost=58310.47..111542.83 rows=624999 width=4)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on tbl t1 (cost=0.00..50497.99 rows=624999 width=8)
-> Parallel Hash (cost=50497.99..50497.99 rows=624999 width=4)
-> Parallel Seq Scan on tbl t2 (cost=0.00..50497.99 rows=624999 width=4)
(13 rows)
执行耗时1.08秒。
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;
c1 | count
----+--------
0 | 50362
1 | 101199
2 | 101405
3 | 100999
......
96 | 100938
97 | 100857
98 | 101143
99 | 50964
(100 rows)
Time: 1083.172 ms (00:01.083)
对比PostgreSQL 10
PostgreSQL 10未使用并行hash join。
postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=421591.59..421604.59 rows=100 width=12)
Group Key: t1.c1
-> Sort (cost=421591.59..421595.59 rows=1600 width=12)
Sort Key: t1.c1
-> Gather (cost=421505.44..421506.44 rows=1600 width=12)
Workers Planned: 16
-> Partial HashAggregate (cost=421505.44..421506.44 rows=100 width=12)
Group Key: t1.c1
-- 普通HASH JOIN
-> Hash Join (cost=308310.48..418380.44 rows=624999 width=4)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on tbl t1 (cost=0.00..50497.99 rows=624999 width=8)
-> Hash (cost=144247.77..144247.77 rows=9999977 width=4)
-> Seq Scan on tbl t2 (cost=0.00..144247.77 rows=9999977 width=4)
(13 rows)
PostgreSQL 10耗时,5.39秒。
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;
c1 | count
----+--------
0 | 50362
1 | 101199
2 | 101405
3 | 100999
......
96 | 100938
97 | 100857
98 | 101143
99 | 50964
(100 rows)
Time: 5388.591 ms (00:05.389)
附录,完整执行计划
1、PostgreSQL 11
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=114753.97..114766.97 rows=100 width=12) (actual time=1156.061..1156.476 rows=100 loops=1)
Output: t1.c1, count(*)
Group Key: t1.c1
Buffers: shared hit=5537
-> Sort (cost=114753.97..114757.97 rows=1600 width=12) (actual time=1156.050..1156.215 rows=1700 loops=1)
Output: t1.c1, (PARTIAL count(*))
Sort Key: t1.c1
Sort Method: quicksort Memory: 128kB
Buffers: shared hit=5537
-> Gather (cost=114667.82..114668.82 rows=1600 width=12) (actual time=1148.423..1155.699 rows=1700 loops=1)
Output: t1.c1, (PARTIAL count(*))
Workers Planned: 16
Workers Launched: 16
Buffers: shared hit=5537
-> Partial HashAggregate (cost=114667.82..114668.82 rows=100 width=12) (actual time=1122.889..1122.905 rows=100 loops=17)
Output: t1.c1, PARTIAL count(*)
Group Key: t1.c1
Buffers: shared hit=88544
Worker 0: actual time=1120.803..1120.819 rows=100 loops=1
Buffers: shared hit=5267
Worker 1: actual time=1120.839..1120.855 rows=100 loops=1
Buffers: shared hit=5161
Worker 2: actual time=1128.183..1128.206 rows=100 loops=1
Buffers: shared hit=5127
Worker 3: actual time=1120.921..1120.937 rows=100 loops=1
Buffers: shared hit=5235
Worker 4: actual time=1120.903..1120.919 rows=100 loops=1
Buffers: shared hit=5128
Worker 5: actual time=1120.841..1120.857 rows=100 loops=1
Buffers: shared hit=5093
Worker 6: actual time=1120.843..1120.860 rows=100 loops=1
Buffers: shared hit=5175
Worker 7: actual time=1120.838..1120.854 rows=100 loops=1
Buffers: shared hit=5214
Worker 8: actual time=1120.897..1120.913 rows=100 loops=1
Buffers: shared hit=5138
Worker 9: actual time=1120.780..1120.797 rows=100 loops=1
Buffers: shared hit=5225
Worker 10: actual time=1120.862..1120.878 rows=100 loops=1
Buffers: shared hit=5160
Worker 11: actual time=1121.021..1121.037 rows=100 loops=1
Buffers: shared hit=5188
Worker 12: actual time=1120.787..1120.803 rows=100 loops=1
Buffers: shared hit=5247
Worker 13: actual time=1120.927..1120.943 rows=100 loops=1
Buffers: shared hit=5244
Worker 14: actual time=1120.854..1120.870 rows=100 loops=1
Buffers: shared hit=5166
Worker 15: actual time=1121.003..1121.019 rows=100 loops=1
Buffers: shared hit=5239
-> Parallel Hash Join (cost=58310.47..111542.83 rows=624999 width=4) (actual time=485.581..1006.858 rows=588235 loops=17)
Output: t1.c1
Hash Cond: (t1.id = t2.id)
Buffers: shared hit=88544
Worker 0: actual time=483.955..1002.826 rows=595058 loops=1
Buffers: shared hit=5267
Worker 1: actual time=483.904..1006.660 rows=583532 loops=1
Buffers: shared hit=5161
Worker 2: actual time=483.905..1013.057 rows=582628 loops=1
Buffers: shared hit=5127
Worker 3: actual time=483.992..1003.798 rows=594784 loops=1
Buffers: shared hit=5235
Worker 4: actual time=484.062..1004.845 rows=581724 loops=1
Buffers: shared hit=5128
Worker 5: actual time=483.918..1006.055 rows=576074 loops=1
Buffers: shared hit=5093
Worker 6: actual time=484.048..1005.659 rows=586470 loops=1
Buffers: shared hit=5175
Worker 7: actual time=483.994..1001.889 rows=592346 loops=1
Buffers: shared hit=5214
Worker 8: actual time=484.006..1003.867 rows=583306 loops=1
Buffers: shared hit=5138
Worker 9: actual time=483.960..1003.685 rows=593250 loops=1
Buffers: shared hit=5225
Worker 10: actual time=483.950..1002.386 rows=584210 loops=1
Buffers: shared hit=5160
Worker 11: actual time=484.106..1004.267 rows=588278 loops=1
Buffers: shared hit=5188
Worker 12: actual time=483.980..1005.969 rows=598900 loops=1
Buffers: shared hit=5247
Worker 13: actual time=484.041..1005.781 rows=595962 loops=1
Buffers: shared hit=5244
Worker 14: actual time=484.000..1007.576 rows=585114 loops=1
Buffers: shared hit=5166
Worker 15: actual time=484.106..1006.748 rows=590312 loops=1
Buffers: shared hit=5239
-> Parallel Seq Scan on public.tbl t1 (cost=0.00..50497.99 rows=624999 width=8) (actual time=0.012..78.204 rows=588235 loops=17)
Output: t1.id, t1.c1
Buffers: shared hit=44248
Worker 0: actual time=0.014..78.932 rows=595058 loops=1
Buffers: shared hit=2633
Worker 1: actual time=0.012..77.867 rows=583532 loops=1
Buffers: shared hit=2582
Worker 2: actual time=0.013..78.231 rows=582628 loops=1
Buffers: shared hit=2578
Worker 3: actual time=0.010..79.102 rows=594784 loops=1
Buffers: shared hit=2632
Worker 4: actual time=0.012..77.634 rows=581724 loops=1
Buffers: shared hit=2574
Worker 5: actual time=0.012..77.716 rows=576074 loops=1
Buffers: shared hit=2549
Worker 6: actual time=0.014..78.201 rows=586470 loops=1
Buffers: shared hit=2595
Worker 7: actual time=0.013..78.874 rows=592346 loops=1
Buffers: shared hit=2621
Worker 8: actual time=0.012..77.747 rows=583306 loops=1
Buffers: shared hit=2581
Worker 9: actual time=0.014..78.816 rows=593250 loops=1
Buffers: shared hit=2625
Worker 10: actual time=0.014..77.932 rows=584210 loops=1
Buffers: shared hit=2585
Worker 11: actual time=0.012..78.139 rows=588278 loops=1
Buffers: shared hit=2603
Worker 12: actual time=0.012..79.524 rows=598900 loops=1
Buffers: shared hit=2650
Worker 13: actual time=0.010..78.885 rows=595962 loops=1
Buffers: shared hit=2637
Worker 14: actual time=0.013..78.367 rows=585114 loops=1
Buffers: shared hit=2589
Worker 15: actual time=0.012..77.940 rows=590312 loops=1
Buffers: shared hit=2612
-> Parallel Hash (cost=50497.99..50497.99 rows=624999 width=4) (actual time=479.784..479.784 rows=588235 loops=17)
Output: t2.id
Buckets: 16777216 Batches: 1 Memory Usage: 522496kB
Buffers: shared hit=44248
Worker 0: actual time=483.815..483.815 rows=594606 loops=1
Buffers: shared hit=2631
Worker 1: actual time=483.772..483.772 rows=582176 loops=1
Buffers: shared hit=2576
Worker 2: actual time=483.774..483.774 rows=575396 loops=1
Buffers: shared hit=2546
Worker 3: actual time=483.861..483.861 rows=587600 loops=1
Buffers: shared hit=2600
Worker 4: actual time=483.931..483.931 rows=576526 loops=1
Buffers: shared hit=2551
Worker 5: actual time=483.788..483.788 rows=574266 loops=1
Buffers: shared hit=2541
Worker 6: actual time=483.918..483.918 rows=582402 loops=1
Buffers: shared hit=2577
Worker 7: actual time=483.853..483.853 rows=585340 loops=1
Buffers: shared hit=2590
Worker 8: actual time=483.880..483.880 rows=577204 loops=1
Buffers: shared hit=2554
Worker 9: actual time=483.820..483.820 rows=586922 loops=1
Buffers: shared hit=2597
Worker 10: actual time=483.824..483.824 rows=581272 loops=1
Buffers: shared hit=2572
Worker 11: actual time=483.971..483.971 rows=583484 loops=1
Buffers: shared hit=2582
Worker 12: actual time=483.842..483.842 rows=586244 loops=1
Buffers: shared hit=2594
Worker 13: actual time=483.913..483.913 rows=588504 loops=1
Buffers: shared hit=2604
Worker 14: actual time=483.860..483.860 rows=581724 loops=1
Buffers: shared hit=2574
Worker 15: actual time=483.980..483.980 rows=593024 loops=1
Buffers: shared hit=2624
-> Parallel Seq Scan on public.tbl t2 (cost=0.00..50497.99 rows=624999 width=4) (actual time=0.027..94.879 rows=588235 loops=17)
Output: t2.id
Buffers: shared hit=44248
Worker 0: actual time=0.036..95.714 rows=594606 loops=1
Buffers: shared hit=2631
Worker 1: actual time=0.034..94.385 rows=582176 loops=1
Buffers: shared hit=2576
Worker 2: actual time=0.023..93.751 rows=575396 loops=1
Buffers: shared hit=2546
Worker 3: actual time=0.037..95.241 rows=587600 loops=1
Buffers: shared hit=2600
Worker 4: actual time=0.020..93.895 rows=576526 loops=1
Buffers: shared hit=2551
Worker 5: actual time=0.022..94.588 rows=574266 loops=1
Buffers: shared hit=2541
Worker 6: actual time=0.021..94.733 rows=582402 loops=1
Buffers: shared hit=2577
Worker 7: actual time=0.026..95.584 rows=585340 loops=1
Buffers: shared hit=2590
Worker 8: actual time=0.021..93.899 rows=577204 loops=1
Buffers: shared hit=2554
Worker 9: actual time=0.026..95.514 rows=586922 loops=1
Buffers: shared hit=2597
Worker 10: actual time=0.021..94.602 rows=581272 loops=1
Buffers: shared hit=2572
Worker 11: actual time=0.023..94.955 rows=583484 loops=1
Buffers: shared hit=2582
Worker 12: actual time=0.050..94.690 rows=586244 loops=1
Buffers: shared hit=2594
Worker 13: actual time=0.023..94.627 rows=588504 loops=1
Buffers: shared hit=2604
Worker 14: actual time=0.027..94.133 rows=581724 loops=1
Buffers: shared hit=2574
Worker 15: actual time=0.024..95.267 rows=593024 loops=1
Buffers: shared hit=2624
Planning time: 0.129 ms
Execution time: 1349.215 ms
(194 rows)
2、PostgreSQL 10
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=421591.59..421604.59 rows=100 width=12) (actual time=6937.115..6937.548 rows=100 loops=1)
Output: t1.c1, count(*)
Group Key: t1.c1
Buffers: shared hit=48624, temp read=30679 written=30649
-> Sort (cost=421591.59..421595.59 rows=1600 width=12) (actual time=6937.106..6937.273 rows=1700 loops=1)
Output: t1.c1, (PARTIAL count(*))
Sort Key: t1.c1
Sort Method: quicksort Memory: 128kB
Buffers: shared hit=48624, temp read=30679 written=30649
-> Gather (cost=421505.44..421506.44 rows=1600 width=12) (actual time=6936.361..6936.756 rows=1700 loops=1)
Output: t1.c1, (PARTIAL count(*))
Workers Planned: 16
Workers Launched: 16
Buffers: shared hit=48624, temp read=30679 written=30649
-> Partial HashAggregate (cost=421505.44..421506.44 rows=100 width=12) (actual time=6762.210..6762.227 rows=100 loops=17)
Output: t1.c1, PARTIAL count(*)
Group Key: t1.c1
Buffers: shared hit=796992, temp read=499722 written=499212
Worker 0: actual time=6731.758..6731.774 rows=100 loops=1
Buffers: shared hit=46402, temp read=29044 written=29014
Worker 1: actual time=6660.380..6660.397 rows=100 loops=1
Buffers: shared hit=46578, temp read=29175 written=29145
Worker 2: actual time=6864.011..6864.028 rows=100 loops=1
Buffers: shared hit=47288, temp read=29688 written=29658
Worker 3: actual time=6769.795..6769.811 rows=100 loops=1
Buffers: shared hit=47194, temp read=29620 written=29590
Worker 4: actual time=6764.356..6764.373 rows=100 loops=1
Buffers: shared hit=46599, temp read=29191 written=29161
Worker 5: actual time=6825.759..6825.775 rows=100 loops=1
Buffers: shared hit=46993, temp read=29475 written=29445
Worker 6: actual time=6822.713..6822.730 rows=100 loops=1
Buffers: shared hit=47278, temp read=29680 written=29650
Worker 7: actual time=6867.773..6867.795 rows=100 loops=1
Buffers: shared hit=47599, temp read=29913 written=29883
Worker 8: actual time=6607.908..6607.925 rows=100 loops=1
Buffers: shared hit=46069, temp read=28806 written=28776
Worker 9: actual time=6868.725..6868.741 rows=100 loops=1
Buffers: shared hit=47600, temp read=29913 written=29883
Worker 10: actual time=6911.439..6911.456 rows=100 loops=1
Buffers: shared hit=48020, temp read=30218 written=30188
Worker 11: actual time=6721.336..6721.352 rows=100 loops=1
Buffers: shared hit=46264, temp read=28946 written=28916
Worker 12: actual time=6734.470..6734.486 rows=100 loops=1
Buffers: shared hit=47042, temp read=29510 written=29480
Worker 13: actual time=6829.711..6829.733 rows=100 loops=1
Buffers: shared hit=47249, temp read=29661 written=29631
Worker 14: actual time=6630.409..6630.426 rows=100 loops=1
Buffers: shared hit=45881, temp read=28670 written=28640
Worker 15: actual time=6411.387..6411.404 rows=100 loops=1
Buffers: shared hit=44312, temp read=27533 written=27503
-> Hash Join (cost=308310.48..418380.44 rows=624999 width=4) (actual time=4246.049..6635.659 rows=588235 loops=17)
Output: t1.c1
Hash Cond: (t1.id = t2.id)
Buffers: shared hit=796992, temp read=499722 written=499212
Worker 0: actual time=4267.925..6622.956 rows=479346 loops=1
Buffers: shared hit=46402, temp read=29044 written=29014
Worker 1: actual time=4276.855..6545.040 rows=519122 loops=1
Buffers: shared hit=46578, temp read=29175 written=29145
Worker 2: actual time=4207.921..6714.533 rows=679582 loops=1
Buffers: shared hit=47288, temp read=29688 written=29658
Worker 3: actual time=4209.163..6630.422 rows=658338 loops=1
Buffers: shared hit=47194, temp read=29620 written=29590
Worker 4: actual time=4269.171..6652.047 rows=523868 loops=1
Buffers: shared hit=46599, temp read=29191 written=29161
Worker 5: actual time=4229.457..6694.605 rows=612912 loops=1
Buffers: shared hit=46993, temp read=29475 written=29445
Worker 6: actual time=4209.138..6677.693 rows=677322 loops=1
Buffers: shared hit=47278, temp read=29680 written=29650
Worker 7: actual time=4172.545..6706.718 rows=749868 loops=1
Buffers: shared hit=47599, temp read=29913 written=29883
Worker 8: actual time=4324.320..6521.704 rows=404040 loops=1
Buffers: shared hit=46069, temp read=28806 written=28776
Worker 9: actual time=4173.581..6708.671 rows=750094 loops=1
Buffers: shared hit=47600, temp read=29913 written=29883
Worker 10: actual time=4131.316..6730.818 rows=845014 loops=1
Buffers: shared hit=48020, temp read=30218 written=30188
Worker 11: actual time=4312.563..6626.083 rows=448158 loops=1
Buffers: shared hit=46264, temp read=28946 written=28916
Worker 12: actual time=4237.928..6601.519 rows=623986 loops=1
Buffers: shared hit=47042, temp read=29510 written=29480
Worker 13: actual time=4219.162..6685.480 rows=670768 loops=1
Buffers: shared hit=47249, temp read=29661 written=29631
Worker 14: actual time=4351.151..6551.854 rows=361600 loops=1
Buffers: shared hit=45881, temp read=28670 written=28640
Worker 15: actual time=4503.065..6409.684 rows=7006 loops=1
Buffers: shared hit=44312, temp read=27533 written=27503
-> Parallel Seq Scan on public.tbl t1 (cost=0.00..50497.99 rows=624999 width=8) (actual time=0.020..99.393 rows=588235 loops=17)
Output: t1.c1, t1.id
Buffers: shared hit=44248
Worker 0: actual time=0.020..86.999 rows=479346 loops=1
Buffers: shared hit=2121
Worker 1: actual time=0.024..83.786 rows=519122 loops=1
Buffers: shared hit=2297
Worker 2: actual time=0.021..110.111 rows=679582 loops=1
Buffers: shared hit=3007
Worker 3: actual time=0.020..115.294 rows=658338 loops=1
Buffers: shared hit=2913
Worker 4: actual time=0.019..91.982 rows=523868 loops=1
Buffers: shared hit=2318
Worker 5: actual time=0.019..107.672 rows=612912 loops=1
Buffers: shared hit=2712
Worker 6: actual time=0.018..110.009 rows=677322 loops=1
Buffers: shared hit=2997
Worker 7: actual time=0.022..131.508 rows=749868 loops=1
Buffers: shared hit=3318
Worker 8: actual time=0.020..70.879 rows=404040 loops=1
Buffers: shared hit=1788
Worker 9: actual time=0.020..131.739 rows=750094 loops=1
Buffers: shared hit=3319
Worker 10: actual time=0.017..148.482 rows=845014 loops=1
Buffers: shared hit=3739
Worker 11: actual time=0.017..72.478 rows=448158 loops=1
Buffers: shared hit=1983
Worker 12: actual time=0.019..100.650 rows=623986 loops=1
Buffers: shared hit=2761
Worker 13: actual time=0.022..108.408 rows=670768 loops=1
Buffers: shared hit=2968
Worker 14: actual time=0.021..58.355 rows=361600 loops=1
Buffers: shared hit=1600
Worker 15: actual time=0.019..1.240 rows=7006 loops=1
Buffers: shared hit=31
-> Hash (cost=144247.77..144247.77 rows=9999977 width=4) (actual time=4234.182..4234.182 rows=10000000 loops=17)
Output: t2.id
Buckets: 2097152 Batches: 16 Memory Usage: 38355kB
Buffers: shared hit=752216, temp written=466786
Worker 0: actual time=4255.940..4255.940 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 1: actual time=4264.988..4264.988 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 2: actual time=4195.758..4195.758 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 3: actual time=4196.835..4196.835 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 4: actual time=4256.876..4256.876 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 5: actual time=4217.069..4217.069 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 6: actual time=4196.597..4196.597 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 7: actual time=4160.107..4160.107 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 8: actual time=4311.684..4311.684 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 9: actual time=4160.753..4160.753 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 10: actual time=4118.749..4118.749 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 11: actual time=4300.207..4300.207 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 12: actual time=4225.272..4225.272 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 13: actual time=4206.682..4206.682 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 14: actual time=4338.706..4338.706 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
Worker 15: actual time=4490.527..4490.527 rows=10000000 loops=1
Buffers: shared hit=44248, temp written=27458
-> Seq Scan on public.tbl t2 (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.023..1766.936 rows=10000000 loops=17)
Output: t2.id
Buffers: shared hit=752216
Worker 0: actual time=0.020..1788.417 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 1: actual time=0.021..1787.249 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 2: actual time=0.020..1782.212 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 3: actual time=0.022..1758.902 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 4: actual time=0.020..1781.158 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 5: actual time=0.021..1749.199 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 6: actual time=0.021..1751.445 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 7: actual time=0.021..1741.847 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 8: actual time=0.022..1814.993 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 9: actual time=0.021..1743.618 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 10: actual time=0.022..1725.305 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 11: actual time=0.021..1774.372 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 12: actual time=0.035..1735.574 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 13: actual time=0.023..1747.472 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 14: actual time=0.022..1803.754 rows=10000000 loops=1
Buffers: shared hit=44248
Worker 15: actual time=0.046..1912.041 rows=10000000 loops=1
Buffers: shared hit=44248
Planning time: 0.137 ms
Execution time: 6938.022 ms
(194 rows)
例子2 - 1亿 JOIN 1亿
1、PostgreSQL 11,1亿 JOIN 1亿,双表过滤1000万。
过滤条件用到了索引,索引扫描也支持并行扫描。
insert into tbl select id, random()*99 from generate_series(1,100000000) t(id);
postgres=# create index idx_tbl1 on tbl using brin(id);
CREATE INDEX
执行计划
postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=919498.43..919984.64 rows=100 width=12)
Group Key: t1.c1
-> Gather Merge (cost=919498.43..919975.64 rows=1600 width=12)
Workers Planned: 16
-> Partial GroupAggregate (cost=919498.08..919938.59 rows=100 width=12)
Group Key: t1.c1
-> Sort (cost=919498.08..919644.58 rows=58601 width=4)
Sort Key: t1.c1
-> Parallel Hash Join (cost=462502.50..914857.28 rows=58601 width=4)
Hash Cond: (t1.id = t2.id)
-> Parallel Bitmap Heap Scan on tbl t1 (cost=2444.48..452493.17 rows=605189 width=8)
Recheck Cond: (id <= 10000000)
-> Bitmap Index Scan on idx_tbl1 (cost=0.00..23.72 rows=9690483 width=0)
Index Cond: (id <= 10000000)
-> Parallel Hash (cost=452493.17..452493.17 rows=605189 width=4)
-> Parallel Bitmap Heap Scan on tbl t2 (cost=2444.48..452493.17 rows=605189 width=4)
Recheck Cond: (id <= 10000000)
-> Bitmap Index Scan on idx_tbl1 (cost=0.00..23.72 rows=9690483 width=0)
Index Cond: (id <= 10000000)
(19 rows)
响应时间1.24秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50298
1 | 101409
2 | 101151
3 | 100965
4 | 101308
5 | 100968
......
94 | 100989
95 | 100940
96 | 100897
97 | 101426
98 | 101426
99 | 50585
(100 rows)
Time: 1244.262 ms (00:01.244)
2、PostgreSQL 10
postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1249990.64..1250516.25 rows=100 width=12)
Group Key: t1.c1
-> Gather Merge (cost=1249990.64..1250507.25 rows=1600 width=12)
Workers Planned: 16
-> Partial GroupAggregate (cost=1249990.29..1250470.20 rows=100 width=12)
Group Key: t1.c1
-> Sort (cost=1249990.29..1250149.93 rows=63855 width=4)
Sort Key: t1.c1
-> Hash Join (cost=739971.53..1244893.86 rows=63855 width=4)
Hash Cond: (t1.id = t2.id)
-> Parallel Bitmap Heap Scan on tbl t1 (cost=2553.36..452941.04 rows=631740 width=8)
Recheck Cond: (id <= 10000000)
-> Bitmap Index Scan on idx_tbl1 (cost=0.00..26.40 rows=10124385 width=0)
Index Cond: (id <= 10000000)
-> Hash (cost=571586.17..571586.17 rows=10107840 width=4)
-> Bitmap Heap Scan on tbl t2 (cost=2553.36..571586.17 rows=10107840 width=4)
Recheck Cond: (id <= 10000000)
-> Bitmap Index Scan on idx_tbl1 (cost=0.00..26.40 rows=10124385 width=0)
Index Cond: (id <= 10000000)
(19 rows)
耗时6.35秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50298
1 | 101409
2 | 101151
3 | 100965
4 | 101308
5 | 100968
......
95 | 100760
96 | 101514
97 | 100543
98 | 100944
99 | 50756
(100 rows)
Time: 6354.000 ms (00:06.354)
1亿 JOIN 1亿,不过滤任何记录
1、PostgreSQL 11,耗时10.7秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id) group by t1.c1;
c1 | count
----+---------
0 | 504590
1 | 1010766
2 | 1010562
3 | 1009673
4 | 1009991
......
95 | 1008846
96 | 1010079
97 | 1011009
98 | 1009981
99 | 505627
(100 rows)
Time: 10742.472 ms (00:10.742)
2、PostgreSQL 10,耗时58.3秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id) group by t1.c1;
c1 | count
----+---------
0 | 504177
1 | 1009780
2 | 1009475
3 | 1010739
4 | 1010680
......
96 | 1010552
97 | 1009568
98 | 1010606
99 | 505210
(100 rows)
Time: 58297.043 ms (00:58.297)
1亿 JOIN 1亿,单表过滤1000万
1、PostgreSQL 11,耗时2秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50520
1 | 101802
2 | 101210
......
94 | 100722
95 | 101527
96 | 100719
97 | 100881
98 | 101241
99 | 50460
(100 rows)
Time: 2008.152 ms (00:02.008)
2、PostgreSQL 10,耗时8.5秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50441
1 | 100789
2 | 101040
3 | 101655
4 | 100412
......
93 | 101191
94 | 100570
95 | 101345
96 | 101246
97 | 101158
98 | 100746
99 | 50610
(100 rows)
Time: 8544.481 ms (00:08.544)
10亿 JOIN 10亿,单表过滤1000万
select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
1、PostgreSQL 11,耗时10秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50680
1 | 101203
2 | 100880
3 | 100337
4 | 101399
......
94 | 100852
95 | 100929
96 | 100848
97 | 100921
98 | 100962
99 | 50781
(100 rows)
Time: 10196.189 ms (00:10.196)
2、PostgreSQL 10,耗时37秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50830
1 | 100537
2 | 100821
3 | 100901
4 | 100392
5 | 101244
......
92 | 100611
93 | 100328
94 | 101158
95 | 101322
96 | 100827
97 | 101342
98 | 101285
99 | 50821
(100 rows)
Time: 37153.008 ms (00:37.153)
10亿 JOIN 10亿,双表过滤1000万
select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
1、PostgreSQL 11,耗时0.99秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50680
1 | 101203
2 | 100880
3 | 100337
4 | 101399
......
94 | 100852
95 | 100929
96 | 100848
97 | 100921
98 | 100962
99 | 50781
(100 rows)
Time: 993.254 ms
2、PostgreSQL 10,耗时12秒
postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
c1 | count
----+--------
0 | 50830
1 | 100537
2 | 100821
3 | 100901
......
95 | 101322
96 | 100827
97 | 101342
98 | 101285
99 | 50821
(100 rows)
Time: 12342.835 ms (00:12.343)
hash agg测试(仅PostgreSQL 11测试)
测试10亿的单表
1、HASH AGG 并行,耗时 11.2 秒
set enable_gathermerge =off;
set work_mem='1GB';
postgres=# explain (analyze,verbose) select mod(id,64),count(*) from tbl group by 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=164145929.33..184145930.61 rows=1000000064 width=12) (actual time=10617.537..10618.961 rows=64 loops=1)
Output: (mod(id, 64)), count(*)
Group Key: (mod(tbl.id, 64))
-> Sort (cost=164145929.33..166645929.49 rows=1000000064 width=12) (actual time=10617.496..10618.060 rows=4160 loops=1)
Output: (mod(id, 64)), (PARTIAL count(*))
Sort Key: (mod(tbl.id, 64))
Sort Method: quicksort Memory: 388kB
-> Gather (cost=4698216.52..4893529.03 rows=1000000064 width=12) (actual time=9952.181..10615.292 rows=4160 loops=1)
Output: (mod(id, 64)), (PARTIAL count(*))
Workers Planned: 64
Workers Launched: 64
-> Partial HashAggregate (cost=4698216.52..4893529.03 rows=15625001 width=12) (actual time=8635.257..9278.101 rows=64 loops=65)
Output: (mod(id, 64)), PARTIAL count(*)
Group Key: mod(tbl.id, 64)
Worker 0: actual time=8585.110..9220.770 rows=64 loops=1
Worker 1: actual time=8363.083..9009.730 rows=64 loops=1
Worker 2: actual time=8506.100..9154.658 rows=64 loops=1
...........
Worker 62: actual time=8722.494..9391.633 rows=64 loops=1
Worker 63: actual time=8715.185..9347.808 rows=64 loops=1
-> Parallel Seq Scan on public.tbl (cost=0.00..4620091.51 rows=15625001 width=4) (actual time=0.050..4269.941 rows=15384615 loops=65)
Output: mod(id, 64)
Worker 0: actual time=0.067..4164.559 rows=15373650 loops=1
Worker 1: actual time=0.029..4154.181 rows=14842324 loops=1
Worker 2: actual time=0.028..4221.663 rows=15133186 loops=1
.............
Worker 61: actual time=0.082..4300.683 rows=15392860 loops=1
Worker 62: actual time=0.083..4341.929 rows=15397606 loops=1
Worker 63: actual time=0.052..4369.521 rows=15158724 loops=1
Planning time: 0.080 ms
Execution time: 11235.180 ms
(146 rows)
2、HASH AGG 串行,耗时 296 秒
postgres=# set work_mem='128GB';
SET
postgres=# set max_parallel_workers_per_gather=0;
SET
postgres=# explain (analyze,verbose) select mod(id,64),count(*) from tbl group by 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=21924780.12..34424780.92 rows=1000000064 width=12) (actual time=270483.965..278182.935 rows=64 loops=1)
Output: (mod(id, 64)), count(*)
Group Key: mod(tbl.id, 64)
-> Seq Scan on public.tbl (cost=0.00..16924779.80 rows=1000000064 width=4) (actual time=0.026..128690.659 rows=1000000000 loops=1)
Output: mod(id, 64)
Planning time: 0.057 ms
Execution time: 295974.163 ms
(7 rows)
小结
PostgreSQL 10与11 - hash join性能对比:
CASE | PostgreSQL 10 | PostgreSQL 11 |
---|---|---|
1千万 JOIN 1千万 | 5.39秒 | 1.08秒 |
1亿 JOIN 1亿(双表过滤1千万) | 6.35秒 | 1.24秒 |
1亿 JOIN 1亿(单表过滤1千万) | 8.5秒 | 2秒 |
1亿 JOIN 1亿 | 58.3秒 | 10.7秒 |
10亿 JOIN 10亿(双表过滤1千万) | 12秒 | 1秒 |
10亿 JOIN 10亿(单表过滤1千万) | 37秒 | 10秒 |
10亿 hash agg(附加测试) | 串行 296秒 | 并行 11.2秒 |
PostgreSQL 11, 性能主要取决于参与hash JOIN的行数(通常来说一张大表,可能会有一些条件过滤掉一些数据,再参与JOIN)。
PostgreSQL 11 并行hash join的一个特点,可以把所有CPU核全部用完,例如这里有56核的机器,并行度开到56核时,10亿JOIN10亿,仅花费0.99秒。
PostgreSQL 11 并行hash,不仅仅能用于加速hash join,其他的hash操作(例如hash agg)都适用,提升性能的原理主要是:
PG 9.6虽然支持HASH并行,但是HASH TABLE是每个Worker process不共享的,所以需要多份HASH TABLE。
PG 11,采用了shared hash table,所以只有一份HASH TABLE。
HASH TABLE越大,PG 11体现出来的性能越好。
换句话说,PG 11,不管多大的表JOIN,HASH性能都是杠杠的。
参考
https://commitfest.postgresql.org/16/871/
《PostgreSQL dblink异步调用实现 并行hash分片JOIN - 含数据交、并、差 提速案例》
https://www.postgresql.org/message-id/flat/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com#CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
https://commitfest.postgresql.org/16/871/