PostgreSQL 11 preview - parallel hash (含hash JOIN , hash agg等) 性能极大提升

18 minute read

背景

两张较大的表做JOIN,并且经过WHERE条件筛选出来的量依旧很大时,通常PostgreSQL会选择HASH JOIN,提高JOIN的性能。

PostgreSQL 11增加了一个重量级特性:

并行哈希JOIN。

https://commitfest.postgresql.org/16/871/

拥有了这个特性,大表JOIN,或者过滤结果依旧很大的表JOIN,性能有了很大的提升。

例子1 - 1000万 JOIN 1000万

1、部署PostgreSQL 11,极简命令如下

wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2  
  
tar -jxvf postgresql-snapshot.tar.bz2  
  
cd postgresql-11devel  
./configure --prefix=/home/digoal/pg11  
make world -j 128  
make install-world  

2、一些参数

port = 9999     
max_connections = 1000  
unix_socket_directories = '.'  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 32GB  
dynamic_shared_memory_type = posix  
bgwriter_delay = 10ms  
effective_io_concurrency = 0  
max_worker_processes = 128  
max_parallel_workers_per_gather = 32  
parallel_leader_participation = on  
max_parallel_workers = 128  
synchronous_commit = off  
wal_buffers = 128MB  
wal_writer_delay = 10ms  
max_wal_size = 64GB  
min_wal_size = 16GB  
checkpoint_completion_target = 0.1  
random_page_cost = 1.0  
log_destination = 'csvlog'  
logging_collector = on  
log_directory = 'log'  
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'  
log_file_mode = 0600  
log_truncate_on_rotation = on  
log_min_duration_statement = 0  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose             
log_lock_waits = on  
log_statement = 'ddl'  
log_timezone = 'PRC'  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'C'  
lc_monetary = 'C'  
lc_numeric = 'C'  
lc_time = 'C'  
default_text_search_config = 'pg_catalog.english'  

3、测试,新建测试表。

create table tbl(id int, c1 int);  

4、写入1000万数据。

insert into tbl select generate_series(1,10000000), random()*99;  

5、自关联,统计。

alter table tbl set (parallel_workers =16);  
set parallel_tuple_cost =0;  
set parallel_setup_cost =0;  
set min_parallel_index_scan_size =0;  
set min_parallel_table_scan_size =0;  
set work_mem='64MB';  

执行计划如下,可以明显的看到使用了并行HASH JOIN

postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;  
                                                 QUERY PLAN                                                   
------------------------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=114753.97..114766.97 rows=100 width=12)  
   Group Key: t1.c1  
   ->  Sort  (cost=114753.97..114757.97 rows=1600 width=12)  
         Sort Key: t1.c1  
         ->  Gather  (cost=114667.82..114668.82 rows=1600 width=12)  
               Workers Planned: 16  
	       ->  Partial HashAggregate  (cost=114667.82..114668.82 rows=100 width=12)  
                     Group Key: t1.c1  
                     -- 并行HASH join  
		     ->  Parallel Hash Join  (cost=58310.47..111542.83 rows=624999 width=4)  
                           Hash Cond: (t1.id = t2.id)  
                           ->  Parallel Seq Scan on tbl t1  (cost=0.00..50497.99 rows=624999 width=8)  
                           ->  Parallel Hash  (cost=50497.99..50497.99 rows=624999 width=4)  
                                 ->  Parallel Seq Scan on tbl t2  (cost=0.00..50497.99 rows=624999 width=4)  
(13 rows)  

执行耗时1.08秒。

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;  
 c1 | count    
----+--------  
  0 |  50362  
  1 | 101199  
  2 | 101405  
  3 | 100999  
......  
 96 | 100938  
 97 | 100857  
 98 | 101143  
 99 |  50964  
(100 rows)  
  
Time: 1083.172 ms (00:01.083)  

对比PostgreSQL 10

PostgreSQL 10未使用并行hash join。

postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;  
                                              QUERY PLAN                                                
------------------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=421591.59..421604.59 rows=100 width=12)  
   Group Key: t1.c1  
   ->  Sort  (cost=421591.59..421595.59 rows=1600 width=12)  
         Sort Key: t1.c1  
         ->  Gather  (cost=421505.44..421506.44 rows=1600 width=12)  
               Workers Planned: 16  
               ->  Partial HashAggregate  (cost=421505.44..421506.44 rows=100 width=12)  
                     Group Key: t1.c1  
                     -- 普通HASH JOIN  
		     ->  Hash Join  (cost=308310.48..418380.44 rows=624999 width=4)  
                           Hash Cond: (t1.id = t2.id)  
                           ->  Parallel Seq Scan on tbl t1  (cost=0.00..50497.99 rows=624999 width=8)  
                           ->  Hash  (cost=144247.77..144247.77 rows=9999977 width=4)  
                                 ->  Seq Scan on tbl t2  (cost=0.00..144247.77 rows=9999977 width=4)  
(13 rows)  

PostgreSQL 10耗时,5.39秒。

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;  
 c1 | count    
----+--------  
  0 |  50362  
  1 | 101199  
  2 | 101405  
  3 | 100999  
......  
 96 | 100938  
 97 | 100857  
 98 | 101143  
 99 |  50964  
(100 rows)  
  
Time: 5388.591 ms (00:05.389)  

附录,完整执行计划

1、PostgreSQL 11

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;  
                                                                             QUERY PLAN                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=114753.97..114766.97 rows=100 width=12) (actual time=1156.061..1156.476 rows=100 loops=1)  
   Output: t1.c1, count(*)  
   Group Key: t1.c1  
   Buffers: shared hit=5537  
   ->  Sort  (cost=114753.97..114757.97 rows=1600 width=12) (actual time=1156.050..1156.215 rows=1700 loops=1)  
         Output: t1.c1, (PARTIAL count(*))  
         Sort Key: t1.c1  
         Sort Method: quicksort  Memory: 128kB  
         Buffers: shared hit=5537  
         ->  Gather  (cost=114667.82..114668.82 rows=1600 width=12) (actual time=1148.423..1155.699 rows=1700 loops=1)  
               Output: t1.c1, (PARTIAL count(*))  
               Workers Planned: 16  
               Workers Launched: 16  
               Buffers: shared hit=5537  
               ->  Partial HashAggregate  (cost=114667.82..114668.82 rows=100 width=12) (actual time=1122.889..1122.905 rows=100 loops=17)  
                     Output: t1.c1, PARTIAL count(*)  
                     Group Key: t1.c1  
                     Buffers: shared hit=88544  
                     Worker 0: actual time=1120.803..1120.819 rows=100 loops=1  
                       Buffers: shared hit=5267  
                     Worker 1: actual time=1120.839..1120.855 rows=100 loops=1  
                       Buffers: shared hit=5161  
                     Worker 2: actual time=1128.183..1128.206 rows=100 loops=1  
                       Buffers: shared hit=5127  
                     Worker 3: actual time=1120.921..1120.937 rows=100 loops=1  
                       Buffers: shared hit=5235  
                     Worker 4: actual time=1120.903..1120.919 rows=100 loops=1  
                       Buffers: shared hit=5128  
                     Worker 5: actual time=1120.841..1120.857 rows=100 loops=1  
                       Buffers: shared hit=5093  
                     Worker 6: actual time=1120.843..1120.860 rows=100 loops=1  
                       Buffers: shared hit=5175  
                     Worker 7: actual time=1120.838..1120.854 rows=100 loops=1  
                       Buffers: shared hit=5214  
                     Worker 8: actual time=1120.897..1120.913 rows=100 loops=1  
                       Buffers: shared hit=5138  
                     Worker 9: actual time=1120.780..1120.797 rows=100 loops=1  
                       Buffers: shared hit=5225  
                     Worker 10: actual time=1120.862..1120.878 rows=100 loops=1  
                       Buffers: shared hit=5160  
                     Worker 11: actual time=1121.021..1121.037 rows=100 loops=1  
                       Buffers: shared hit=5188  
                     Worker 12: actual time=1120.787..1120.803 rows=100 loops=1  
                       Buffers: shared hit=5247  
                     Worker 13: actual time=1120.927..1120.943 rows=100 loops=1  
                       Buffers: shared hit=5244  
                     Worker 14: actual time=1120.854..1120.870 rows=100 loops=1  
                       Buffers: shared hit=5166  
                     Worker 15: actual time=1121.003..1121.019 rows=100 loops=1  
                       Buffers: shared hit=5239  
                     ->  Parallel Hash Join  (cost=58310.47..111542.83 rows=624999 width=4) (actual time=485.581..1006.858 rows=588235 loops=17)  
                           Output: t1.c1  
                           Hash Cond: (t1.id = t2.id)  
                           Buffers: shared hit=88544  
                           Worker 0: actual time=483.955..1002.826 rows=595058 loops=1  
                             Buffers: shared hit=5267  
                           Worker 1: actual time=483.904..1006.660 rows=583532 loops=1  
                             Buffers: shared hit=5161  
                           Worker 2: actual time=483.905..1013.057 rows=582628 loops=1  
                             Buffers: shared hit=5127  
                           Worker 3: actual time=483.992..1003.798 rows=594784 loops=1  
                             Buffers: shared hit=5235  
                           Worker 4: actual time=484.062..1004.845 rows=581724 loops=1  
                             Buffers: shared hit=5128  
                           Worker 5: actual time=483.918..1006.055 rows=576074 loops=1  
                             Buffers: shared hit=5093  
                           Worker 6: actual time=484.048..1005.659 rows=586470 loops=1  
                             Buffers: shared hit=5175  
                           Worker 7: actual time=483.994..1001.889 rows=592346 loops=1  
                             Buffers: shared hit=5214  
                           Worker 8: actual time=484.006..1003.867 rows=583306 loops=1  
                             Buffers: shared hit=5138  
                           Worker 9: actual time=483.960..1003.685 rows=593250 loops=1  
                             Buffers: shared hit=5225  
                           Worker 10: actual time=483.950..1002.386 rows=584210 loops=1  
                             Buffers: shared hit=5160  
                           Worker 11: actual time=484.106..1004.267 rows=588278 loops=1  
                             Buffers: shared hit=5188  
                           Worker 12: actual time=483.980..1005.969 rows=598900 loops=1  
                             Buffers: shared hit=5247  
                           Worker 13: actual time=484.041..1005.781 rows=595962 loops=1  
                             Buffers: shared hit=5244  
                           Worker 14: actual time=484.000..1007.576 rows=585114 loops=1  
                             Buffers: shared hit=5166  
                           Worker 15: actual time=484.106..1006.748 rows=590312 loops=1  
                             Buffers: shared hit=5239  
                           ->  Parallel Seq Scan on public.tbl t1  (cost=0.00..50497.99 rows=624999 width=8) (actual time=0.012..78.204 rows=588235 loops=17)  
                                 Output: t1.id, t1.c1  
                                 Buffers: shared hit=44248  
                                 Worker 0: actual time=0.014..78.932 rows=595058 loops=1  
                                   Buffers: shared hit=2633  
                                 Worker 1: actual time=0.012..77.867 rows=583532 loops=1  
                                   Buffers: shared hit=2582  
                                 Worker 2: actual time=0.013..78.231 rows=582628 loops=1  
                                   Buffers: shared hit=2578  
                                 Worker 3: actual time=0.010..79.102 rows=594784 loops=1  
                                   Buffers: shared hit=2632  
                                 Worker 4: actual time=0.012..77.634 rows=581724 loops=1  
                                   Buffers: shared hit=2574  
                                 Worker 5: actual time=0.012..77.716 rows=576074 loops=1  
                                   Buffers: shared hit=2549  
                                 Worker 6: actual time=0.014..78.201 rows=586470 loops=1  
                                   Buffers: shared hit=2595  
                                 Worker 7: actual time=0.013..78.874 rows=592346 loops=1  
                                   Buffers: shared hit=2621  
                                 Worker 8: actual time=0.012..77.747 rows=583306 loops=1  
                                   Buffers: shared hit=2581  
                                 Worker 9: actual time=0.014..78.816 rows=593250 loops=1  
                                   Buffers: shared hit=2625  
                                 Worker 10: actual time=0.014..77.932 rows=584210 loops=1  
                                   Buffers: shared hit=2585  
                                 Worker 11: actual time=0.012..78.139 rows=588278 loops=1  
                                   Buffers: shared hit=2603  
                                 Worker 12: actual time=0.012..79.524 rows=598900 loops=1  
                                   Buffers: shared hit=2650  
                                 Worker 13: actual time=0.010..78.885 rows=595962 loops=1  
                                   Buffers: shared hit=2637  
                                 Worker 14: actual time=0.013..78.367 rows=585114 loops=1  
                                   Buffers: shared hit=2589  
                                 Worker 15: actual time=0.012..77.940 rows=590312 loops=1  
                                   Buffers: shared hit=2612  
                           ->  Parallel Hash  (cost=50497.99..50497.99 rows=624999 width=4) (actual time=479.784..479.784 rows=588235 loops=17)  
                                 Output: t2.id  
                                 Buckets: 16777216  Batches: 1  Memory Usage: 522496kB  
                                 Buffers: shared hit=44248  
                                 Worker 0: actual time=483.815..483.815 rows=594606 loops=1  
                                   Buffers: shared hit=2631  
                                 Worker 1: actual time=483.772..483.772 rows=582176 loops=1  
                                   Buffers: shared hit=2576  
                                 Worker 2: actual time=483.774..483.774 rows=575396 loops=1  
                                   Buffers: shared hit=2546  
                                 Worker 3: actual time=483.861..483.861 rows=587600 loops=1  
                                   Buffers: shared hit=2600  
                                 Worker 4: actual time=483.931..483.931 rows=576526 loops=1  
                                   Buffers: shared hit=2551  
                                 Worker 5: actual time=483.788..483.788 rows=574266 loops=1  
                                   Buffers: shared hit=2541  
                                 Worker 6: actual time=483.918..483.918 rows=582402 loops=1  
                                   Buffers: shared hit=2577  
                                 Worker 7: actual time=483.853..483.853 rows=585340 loops=1  
                                   Buffers: shared hit=2590  
                                 Worker 8: actual time=483.880..483.880 rows=577204 loops=1  
                                   Buffers: shared hit=2554  
                                 Worker 9: actual time=483.820..483.820 rows=586922 loops=1  
                                   Buffers: shared hit=2597  
                                 Worker 10: actual time=483.824..483.824 rows=581272 loops=1  
                                   Buffers: shared hit=2572  
                                 Worker 11: actual time=483.971..483.971 rows=583484 loops=1  
                                   Buffers: shared hit=2582  
                                 Worker 12: actual time=483.842..483.842 rows=586244 loops=1  
                                   Buffers: shared hit=2594  
                                 Worker 13: actual time=483.913..483.913 rows=588504 loops=1  
                                   Buffers: shared hit=2604  
                                 Worker 14: actual time=483.860..483.860 rows=581724 loops=1  
                                   Buffers: shared hit=2574  
                                 Worker 15: actual time=483.980..483.980 rows=593024 loops=1  
                                   Buffers: shared hit=2624  
                                 ->  Parallel Seq Scan on public.tbl t2  (cost=0.00..50497.99 rows=624999 width=4) (actual time=0.027..94.879 rows=588235 loops=17)  
                                       Output: t2.id  
                                       Buffers: shared hit=44248  
                                       Worker 0: actual time=0.036..95.714 rows=594606 loops=1  
                                         Buffers: shared hit=2631  
                                       Worker 1: actual time=0.034..94.385 rows=582176 loops=1  
                                         Buffers: shared hit=2576  
                                       Worker 2: actual time=0.023..93.751 rows=575396 loops=1  
                                         Buffers: shared hit=2546  
                                       Worker 3: actual time=0.037..95.241 rows=587600 loops=1  
                                         Buffers: shared hit=2600  
                                       Worker 4: actual time=0.020..93.895 rows=576526 loops=1  
                                         Buffers: shared hit=2551  
                                       Worker 5: actual time=0.022..94.588 rows=574266 loops=1  
                                         Buffers: shared hit=2541  
                                       Worker 6: actual time=0.021..94.733 rows=582402 loops=1  
                                         Buffers: shared hit=2577  
                                       Worker 7: actual time=0.026..95.584 rows=585340 loops=1  
                                         Buffers: shared hit=2590  
                                       Worker 8: actual time=0.021..93.899 rows=577204 loops=1  
                                         Buffers: shared hit=2554  
                                       Worker 9: actual time=0.026..95.514 rows=586922 loops=1  
                                         Buffers: shared hit=2597  
                                       Worker 10: actual time=0.021..94.602 rows=581272 loops=1  
                                         Buffers: shared hit=2572  
                                       Worker 11: actual time=0.023..94.955 rows=583484 loops=1  
                                         Buffers: shared hit=2582  
                                       Worker 12: actual time=0.050..94.690 rows=586244 loops=1  
                                         Buffers: shared hit=2594  
                                       Worker 13: actual time=0.023..94.627 rows=588504 loops=1  
                                         Buffers: shared hit=2604  
                                       Worker 14: actual time=0.027..94.133 rows=581724 loops=1  
                                         Buffers: shared hit=2574  
                                       Worker 15: actual time=0.024..95.267 rows=593024 loops=1  
                                         Buffers: shared hit=2624  
 Planning time: 0.129 ms  
 Execution time: 1349.215 ms  
(194 rows)  

2、PostgreSQL 10

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.c1,count(*) from tbl t1 join tbl t2 using (id) group by t1.c1;  
                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=421591.59..421604.59 rows=100 width=12) (actual time=6937.115..6937.548 rows=100 loops=1)  
   Output: t1.c1, count(*)  
   Group Key: t1.c1  
   Buffers: shared hit=48624, temp read=30679 written=30649  
   ->  Sort  (cost=421591.59..421595.59 rows=1600 width=12) (actual time=6937.106..6937.273 rows=1700 loops=1)  
         Output: t1.c1, (PARTIAL count(*))  
         Sort Key: t1.c1  
         Sort Method: quicksort  Memory: 128kB  
         Buffers: shared hit=48624, temp read=30679 written=30649  
         ->  Gather  (cost=421505.44..421506.44 rows=1600 width=12) (actual time=6936.361..6936.756 rows=1700 loops=1)  
               Output: t1.c1, (PARTIAL count(*))  
               Workers Planned: 16  
               Workers Launched: 16  
               Buffers: shared hit=48624, temp read=30679 written=30649  
               ->  Partial HashAggregate  (cost=421505.44..421506.44 rows=100 width=12) (actual time=6762.210..6762.227 rows=100 loops=17)  
                     Output: t1.c1, PARTIAL count(*)  
                     Group Key: t1.c1  
                     Buffers: shared hit=796992, temp read=499722 written=499212  
                     Worker 0: actual time=6731.758..6731.774 rows=100 loops=1  
                       Buffers: shared hit=46402, temp read=29044 written=29014  
                     Worker 1: actual time=6660.380..6660.397 rows=100 loops=1  
                       Buffers: shared hit=46578, temp read=29175 written=29145  
                     Worker 2: actual time=6864.011..6864.028 rows=100 loops=1  
                       Buffers: shared hit=47288, temp read=29688 written=29658  
                     Worker 3: actual time=6769.795..6769.811 rows=100 loops=1  
                       Buffers: shared hit=47194, temp read=29620 written=29590  
                     Worker 4: actual time=6764.356..6764.373 rows=100 loops=1  
                       Buffers: shared hit=46599, temp read=29191 written=29161  
                     Worker 5: actual time=6825.759..6825.775 rows=100 loops=1  
                       Buffers: shared hit=46993, temp read=29475 written=29445  
                     Worker 6: actual time=6822.713..6822.730 rows=100 loops=1  
                       Buffers: shared hit=47278, temp read=29680 written=29650  
                     Worker 7: actual time=6867.773..6867.795 rows=100 loops=1  
                       Buffers: shared hit=47599, temp read=29913 written=29883  
                     Worker 8: actual time=6607.908..6607.925 rows=100 loops=1  
                       Buffers: shared hit=46069, temp read=28806 written=28776  
                     Worker 9: actual time=6868.725..6868.741 rows=100 loops=1  
                       Buffers: shared hit=47600, temp read=29913 written=29883  
                     Worker 10: actual time=6911.439..6911.456 rows=100 loops=1  
                       Buffers: shared hit=48020, temp read=30218 written=30188  
                     Worker 11: actual time=6721.336..6721.352 rows=100 loops=1  
                       Buffers: shared hit=46264, temp read=28946 written=28916  
                     Worker 12: actual time=6734.470..6734.486 rows=100 loops=1  
                       Buffers: shared hit=47042, temp read=29510 written=29480  
                     Worker 13: actual time=6829.711..6829.733 rows=100 loops=1  
                       Buffers: shared hit=47249, temp read=29661 written=29631  
                     Worker 14: actual time=6630.409..6630.426 rows=100 loops=1  
                       Buffers: shared hit=45881, temp read=28670 written=28640  
                     Worker 15: actual time=6411.387..6411.404 rows=100 loops=1  
                       Buffers: shared hit=44312, temp read=27533 written=27503  
                     ->  Hash Join  (cost=308310.48..418380.44 rows=624999 width=4) (actual time=4246.049..6635.659 rows=588235 loops=17)  
                           Output: t1.c1  
                           Hash Cond: (t1.id = t2.id)  
                           Buffers: shared hit=796992, temp read=499722 written=499212  
                           Worker 0: actual time=4267.925..6622.956 rows=479346 loops=1  
                             Buffers: shared hit=46402, temp read=29044 written=29014  
                           Worker 1: actual time=4276.855..6545.040 rows=519122 loops=1  
                             Buffers: shared hit=46578, temp read=29175 written=29145  
                           Worker 2: actual time=4207.921..6714.533 rows=679582 loops=1  
                             Buffers: shared hit=47288, temp read=29688 written=29658  
                           Worker 3: actual time=4209.163..6630.422 rows=658338 loops=1  
                             Buffers: shared hit=47194, temp read=29620 written=29590  
                           Worker 4: actual time=4269.171..6652.047 rows=523868 loops=1  
                             Buffers: shared hit=46599, temp read=29191 written=29161  
                           Worker 5: actual time=4229.457..6694.605 rows=612912 loops=1  
                             Buffers: shared hit=46993, temp read=29475 written=29445  
                           Worker 6: actual time=4209.138..6677.693 rows=677322 loops=1  
                             Buffers: shared hit=47278, temp read=29680 written=29650  
                           Worker 7: actual time=4172.545..6706.718 rows=749868 loops=1  
                             Buffers: shared hit=47599, temp read=29913 written=29883  
                           Worker 8: actual time=4324.320..6521.704 rows=404040 loops=1  
                             Buffers: shared hit=46069, temp read=28806 written=28776  
                           Worker 9: actual time=4173.581..6708.671 rows=750094 loops=1  
                             Buffers: shared hit=47600, temp read=29913 written=29883  
                           Worker 10: actual time=4131.316..6730.818 rows=845014 loops=1  
                             Buffers: shared hit=48020, temp read=30218 written=30188  
                           Worker 11: actual time=4312.563..6626.083 rows=448158 loops=1  
                             Buffers: shared hit=46264, temp read=28946 written=28916  
                           Worker 12: actual time=4237.928..6601.519 rows=623986 loops=1  
                             Buffers: shared hit=47042, temp read=29510 written=29480  
                           Worker 13: actual time=4219.162..6685.480 rows=670768 loops=1  
                             Buffers: shared hit=47249, temp read=29661 written=29631  
                           Worker 14: actual time=4351.151..6551.854 rows=361600 loops=1  
                             Buffers: shared hit=45881, temp read=28670 written=28640  
                           Worker 15: actual time=4503.065..6409.684 rows=7006 loops=1  
                             Buffers: shared hit=44312, temp read=27533 written=27503  
                           ->  Parallel Seq Scan on public.tbl t1  (cost=0.00..50497.99 rows=624999 width=8) (actual time=0.020..99.393 rows=588235 loops=17)  
                                 Output: t1.c1, t1.id  
                                 Buffers: shared hit=44248  
                                 Worker 0: actual time=0.020..86.999 rows=479346 loops=1  
                                   Buffers: shared hit=2121  
                                 Worker 1: actual time=0.024..83.786 rows=519122 loops=1  
                                   Buffers: shared hit=2297  
                                 Worker 2: actual time=0.021..110.111 rows=679582 loops=1  
                                   Buffers: shared hit=3007  
                                 Worker 3: actual time=0.020..115.294 rows=658338 loops=1  
                                   Buffers: shared hit=2913  
                                 Worker 4: actual time=0.019..91.982 rows=523868 loops=1  
                                   Buffers: shared hit=2318  
                                 Worker 5: actual time=0.019..107.672 rows=612912 loops=1  
                                   Buffers: shared hit=2712  
                                 Worker 6: actual time=0.018..110.009 rows=677322 loops=1  
                                   Buffers: shared hit=2997  
                                 Worker 7: actual time=0.022..131.508 rows=749868 loops=1  
                                   Buffers: shared hit=3318  
                                 Worker 8: actual time=0.020..70.879 rows=404040 loops=1  
                                   Buffers: shared hit=1788  
                                 Worker 9: actual time=0.020..131.739 rows=750094 loops=1  
                                   Buffers: shared hit=3319  
                                 Worker 10: actual time=0.017..148.482 rows=845014 loops=1  
                                   Buffers: shared hit=3739  
                                 Worker 11: actual time=0.017..72.478 rows=448158 loops=1  
                                   Buffers: shared hit=1983  
                                 Worker 12: actual time=0.019..100.650 rows=623986 loops=1  
                                   Buffers: shared hit=2761  
                                 Worker 13: actual time=0.022..108.408 rows=670768 loops=1  
                                   Buffers: shared hit=2968  
                                 Worker 14: actual time=0.021..58.355 rows=361600 loops=1  
                                   Buffers: shared hit=1600  
                                 Worker 15: actual time=0.019..1.240 rows=7006 loops=1  
                                   Buffers: shared hit=31  
                           ->  Hash  (cost=144247.77..144247.77 rows=9999977 width=4) (actual time=4234.182..4234.182 rows=10000000 loops=17)  
                                 Output: t2.id  
                                 Buckets: 2097152  Batches: 16  Memory Usage: 38355kB  
                                 Buffers: shared hit=752216, temp written=466786  
                                 Worker 0: actual time=4255.940..4255.940 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 1: actual time=4264.988..4264.988 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 2: actual time=4195.758..4195.758 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 3: actual time=4196.835..4196.835 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 4: actual time=4256.876..4256.876 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 5: actual time=4217.069..4217.069 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 6: actual time=4196.597..4196.597 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 7: actual time=4160.107..4160.107 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 8: actual time=4311.684..4311.684 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 9: actual time=4160.753..4160.753 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 10: actual time=4118.749..4118.749 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 11: actual time=4300.207..4300.207 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 12: actual time=4225.272..4225.272 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 13: actual time=4206.682..4206.682 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 14: actual time=4338.706..4338.706 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 Worker 15: actual time=4490.527..4490.527 rows=10000000 loops=1  
                                   Buffers: shared hit=44248, temp written=27458  
                                 ->  Seq Scan on public.tbl t2  (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.023..1766.936 rows=10000000 loops=17)  
                                       Output: t2.id  
                                       Buffers: shared hit=752216  
                                       Worker 0: actual time=0.020..1788.417 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 1: actual time=0.021..1787.249 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 2: actual time=0.020..1782.212 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 3: actual time=0.022..1758.902 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 4: actual time=0.020..1781.158 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 5: actual time=0.021..1749.199 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 6: actual time=0.021..1751.445 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 7: actual time=0.021..1741.847 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 8: actual time=0.022..1814.993 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 9: actual time=0.021..1743.618 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 10: actual time=0.022..1725.305 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 11: actual time=0.021..1774.372 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 12: actual time=0.035..1735.574 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 13: actual time=0.023..1747.472 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 14: actual time=0.022..1803.754 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
                                       Worker 15: actual time=0.046..1912.041 rows=10000000 loops=1  
                                         Buffers: shared hit=44248  
 Planning time: 0.137 ms  
 Execution time: 6938.022 ms  
(194 rows)  

例子2 - 1亿 JOIN 1亿

1、PostgreSQL 11,1亿 JOIN 1亿,双表过滤1000万。

过滤条件用到了索引,索引扫描也支持并行扫描。

insert into tbl select id, random()*99 from generate_series(1,100000000) t(id);  
  
postgres=# create index idx_tbl1 on tbl using brin(id);  
CREATE INDEX  

执行计划

postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;  
                                                       QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=919498.43..919984.64 rows=100 width=12)  
   Group Key: t1.c1  
   ->  Gather Merge  (cost=919498.43..919975.64 rows=1600 width=12)  
         Workers Planned: 16  
         ->  Partial GroupAggregate  (cost=919498.08..919938.59 rows=100 width=12)  
               Group Key: t1.c1  
               ->  Sort  (cost=919498.08..919644.58 rows=58601 width=4)  
                     Sort Key: t1.c1  
                     ->  Parallel Hash Join  (cost=462502.50..914857.28 rows=58601 width=4)  
                           Hash Cond: (t1.id = t2.id)  
                           ->  Parallel Bitmap Heap Scan on tbl t1  (cost=2444.48..452493.17 rows=605189 width=8)  
                                 Recheck Cond: (id <= 10000000)  
                                 ->  Bitmap Index Scan on idx_tbl1  (cost=0.00..23.72 rows=9690483 width=0)  
                                       Index Cond: (id <= 10000000)  
                           ->  Parallel Hash  (cost=452493.17..452493.17 rows=605189 width=4)  
                                 ->  Parallel Bitmap Heap Scan on tbl t2  (cost=2444.48..452493.17 rows=605189 width=4)  
                                       Recheck Cond: (id <= 10000000)  
                                       ->  Bitmap Index Scan on idx_tbl1  (cost=0.00..23.72 rows=9690483 width=0)  
                                             Index Cond: (id <= 10000000)  
(19 rows)  

响应时间1.24秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;  
 c1 | count    
----+--------  
  0 |  50298  
  1 | 101409  
  2 | 101151  
  3 | 100965  
  4 | 101308  
  5 | 100968  
......  
 94 | 100989  
 95 | 100940  
 96 | 100897  
 97 | 101426  
 98 | 101426  
 99 |  50585  
(100 rows)  
  
Time: 1244.262 ms (00:01.244)  

2、PostgreSQL 10

postgres=# explain select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;  
                                                    QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------  
 Finalize GroupAggregate  (cost=1249990.64..1250516.25 rows=100 width=12)  
   Group Key: t1.c1  
   ->  Gather Merge  (cost=1249990.64..1250507.25 rows=1600 width=12)  
         Workers Planned: 16  
         ->  Partial GroupAggregate  (cost=1249990.29..1250470.20 rows=100 width=12)  
               Group Key: t1.c1  
               ->  Sort  (cost=1249990.29..1250149.93 rows=63855 width=4)  
                     Sort Key: t1.c1  
                     ->  Hash Join  (cost=739971.53..1244893.86 rows=63855 width=4)  
                           Hash Cond: (t1.id = t2.id)  
                           ->  Parallel Bitmap Heap Scan on tbl t1  (cost=2553.36..452941.04 rows=631740 width=8)  
                                 Recheck Cond: (id <= 10000000)  
                                 ->  Bitmap Index Scan on idx_tbl1  (cost=0.00..26.40 rows=10124385 width=0)  
                                       Index Cond: (id <= 10000000)  
                           ->  Hash  (cost=571586.17..571586.17 rows=10107840 width=4)  
                                 ->  Bitmap Heap Scan on tbl t2  (cost=2553.36..571586.17 rows=10107840 width=4)  
                                       Recheck Cond: (id <= 10000000)  
                                       ->  Bitmap Index Scan on idx_tbl1  (cost=0.00..26.40 rows=10124385 width=0)  
                                             Index Cond: (id <= 10000000)  
(19 rows)  

耗时6.35秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;  
 c1 | count    
----+--------  
  0 |  50298  
  1 | 101409  
  2 | 101151  
  3 | 100965  
  4 | 101308  
  5 | 100968  
......  
 95 | 100760  
 96 | 101514  
 97 | 100543  
 98 | 100944  
 99 |  50756  
(100 rows)  
  
Time: 6354.000 ms (00:06.354)  

1亿 JOIN 1亿,不过滤任何记录

1、PostgreSQL 11,耗时10.7秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id) group by t1.c1;  
 c1 |  count    
----+---------  
  0 |  504590  
  1 | 1010766  
  2 | 1010562  
  3 | 1009673  
  4 | 1009991  
......  
 95 | 1008846  
 96 | 1010079  
 97 | 1011009  
 98 | 1009981  
 99 |  505627  
(100 rows)  
  
Time: 10742.472 ms (00:10.742)  

2、PostgreSQL 10,耗时58.3秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id) group by t1.c1;  
 c1 |  count    
----+---------  
  0 |  504177  
  1 | 1009780  
  2 | 1009475  
  3 | 1010739  
  4 | 1010680  
......  
 96 | 1010552  
 97 | 1009568  
 98 | 1010606  
 99 |  505210  
(100 rows)  
  
Time: 58297.043 ms (00:58.297)  

1亿 JOIN 1亿,单表过滤1000万

1、PostgreSQL 11,耗时2秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
 c1 | count  
----+--------
  0 |  50520
  1 | 101802
  2 | 101210
......
 94 | 100722
 95 | 101527
 96 | 100719
 97 | 100881
 98 | 101241
 99 |  50460
(100 rows)

Time: 2008.152 ms (00:02.008)

2、PostgreSQL 10,耗时8.5秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
 c1 | count  
----+--------
  0 |  50441
  1 | 100789
  2 | 101040
  3 | 101655
  4 | 100412
......
 93 | 101191
 94 | 100570
 95 | 101345
 96 | 101246
 97 | 101158
 98 | 100746
 99 |  50610
(100 rows)

Time: 8544.481 ms (00:08.544)

10亿 JOIN 10亿,单表过滤1000万

select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;

1、PostgreSQL 11,耗时10秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
 c1 | count  
----+--------
  0 |  50680
  1 | 101203
  2 | 100880
  3 | 100337
  4 | 101399
......
 94 | 100852
 95 | 100929
 96 | 100848
 97 | 100921
 98 | 100962
 99 |  50781
(100 rows)

Time: 10196.189 ms (00:10.196)

2、PostgreSQL 10,耗时37秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000) group by t1.c1;
 c1 | count  
----+--------
  0 |  50830
  1 | 100537
  2 | 100821
  3 | 100901
  4 | 100392
  5 | 101244
......
 92 | 100611
 93 | 100328
 94 | 101158
 95 | 101322
 96 | 100827
 97 | 101342
 98 | 101285
 99 |  50821
(100 rows)

Time: 37153.008 ms (00:37.153)

10亿 JOIN 10亿,双表过滤1000万

select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;

1、PostgreSQL 11,耗时0.99秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
 c1 | count  
----+--------
  0 |  50680
  1 | 101203
  2 | 100880
  3 | 100337
  4 | 101399
......
 94 | 100852
 95 | 100929
 96 | 100848
 97 | 100921
 98 | 100962
 99 |  50781
(100 rows)

Time: 993.254 ms

2、PostgreSQL 10,耗时12秒

postgres=# select t1.c1,count(*) from tbl t1 join tbl t2 on (t1.id=t2.id and t1.id<=10000000 and t2.id<=10000000) group by t1.c1;
 c1 | count  
----+--------
  0 |  50830
  1 | 100537
  2 | 100821
  3 | 100901
......
 95 | 101322
 96 | 100827
 97 | 101342
 98 | 101285
 99 |  50821
(100 rows)

Time: 12342.835 ms (00:12.343)

hash agg测试(仅PostgreSQL 11测试)

测试10亿的单表

1、HASH AGG 并行,耗时 11.2 秒

set enable_gathermerge =off;

set work_mem='1GB';
  
postgres=# explain (analyze,verbose) select mod(id,64),count(*) from tbl group by 1;
                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=164145929.33..184145930.61 rows=1000000064 width=12) (actual time=10617.537..10618.961 rows=64 loops=1)
   Output: (mod(id, 64)), count(*)
   Group Key: (mod(tbl.id, 64))
   ->  Sort  (cost=164145929.33..166645929.49 rows=1000000064 width=12) (actual time=10617.496..10618.060 rows=4160 loops=1)
         Output: (mod(id, 64)), (PARTIAL count(*))
         Sort Key: (mod(tbl.id, 64))
         Sort Method: quicksort  Memory: 388kB
         ->  Gather  (cost=4698216.52..4893529.03 rows=1000000064 width=12) (actual time=9952.181..10615.292 rows=4160 loops=1)
               Output: (mod(id, 64)), (PARTIAL count(*))
               Workers Planned: 64
               Workers Launched: 64
               ->  Partial HashAggregate  (cost=4698216.52..4893529.03 rows=15625001 width=12) (actual time=8635.257..9278.101 rows=64 loops=65)
                     Output: (mod(id, 64)), PARTIAL count(*)
                     Group Key: mod(tbl.id, 64)
                     Worker 0: actual time=8585.110..9220.770 rows=64 loops=1
                     Worker 1: actual time=8363.083..9009.730 rows=64 loops=1
                     Worker 2: actual time=8506.100..9154.658 rows=64 loops=1
...........
                     Worker 62: actual time=8722.494..9391.633 rows=64 loops=1
                     Worker 63: actual time=8715.185..9347.808 rows=64 loops=1
                     ->  Parallel Seq Scan on public.tbl  (cost=0.00..4620091.51 rows=15625001 width=4) (actual time=0.050..4269.941 rows=15384615 loops=65)
                           Output: mod(id, 64)
                           Worker 0: actual time=0.067..4164.559 rows=15373650 loops=1
                           Worker 1: actual time=0.029..4154.181 rows=14842324 loops=1
                           Worker 2: actual time=0.028..4221.663 rows=15133186 loops=1
.............
                           Worker 61: actual time=0.082..4300.683 rows=15392860 loops=1
                           Worker 62: actual time=0.083..4341.929 rows=15397606 loops=1
                           Worker 63: actual time=0.052..4369.521 rows=15158724 loops=1
 Planning time: 0.080 ms
 Execution time: 11235.180 ms
(146 rows)

2、HASH AGG 串行,耗时 296 秒

postgres=# set work_mem='128GB';
SET

postgres=# set max_parallel_workers_per_gather=0;
SET

postgres=# explain (analyze,verbose) select mod(id,64),count(*) from tbl group by 1;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=21924780.12..34424780.92 rows=1000000064 width=12) (actual time=270483.965..278182.935 rows=64 loops=1)
   Output: (mod(id, 64)), count(*)
   Group Key: mod(tbl.id, 64)
   ->  Seq Scan on public.tbl  (cost=0.00..16924779.80 rows=1000000064 width=4) (actual time=0.026..128690.659 rows=1000000000 loops=1)
         Output: mod(id, 64)
 Planning time: 0.057 ms
 Execution time: 295974.163 ms
(7 rows)

小结

PostgreSQL 10与11 - hash join性能对比:

CASE PostgreSQL 10 PostgreSQL 11
1千万 JOIN 1千万 5.39秒 1.08秒
1亿 JOIN 1亿(双表过滤1千万) 6.35秒 1.24秒
1亿 JOIN 1亿(单表过滤1千万) 8.5秒 2秒
1亿 JOIN 1亿 58.3秒 10.7秒
10亿 JOIN 10亿(双表过滤1千万) 12秒 1秒
10亿 JOIN 10亿(单表过滤1千万) 37秒 10秒
10亿 hash agg(附加测试) 串行 296秒 并行 11.2秒

PostgreSQL 11, 性能主要取决于参与hash JOIN的行数(通常来说一张大表,可能会有一些条件过滤掉一些数据,再参与JOIN)。

PostgreSQL 11 并行hash join的一个特点,可以把所有CPU核全部用完,例如这里有56核的机器,并行度开到56核时,10亿JOIN10亿,仅花费0.99秒。

PostgreSQL 11 并行hash,不仅仅能用于加速hash join,其他的hash操作(例如hash agg)都适用,提升性能的原理主要是:

PG 9.6虽然支持HASH并行,但是HASH TABLE是每个Worker process不共享的,所以需要多份HASH TABLE。

PG 11,采用了shared hash table,所以只有一份HASH TABLE。

HASH TABLE越大,PG 11体现出来的性能越好。

换句话说,PG 11,不管多大的表JOIN,HASH性能都是杠杠的。

参考

https://commitfest.postgresql.org/16/871/

《PostgreSQL dblink异步调用实现 并行hash分片JOIN - 含数据交、并、差 提速案例》

https://www.postgresql.org/message-id/flat/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com#CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com

https://commitfest.postgresql.org/16/871/

Flag Counter

digoal’s 大量PostgreSQL文章入口