Oracle 并行计算 JOIN HINT

2 minute read

背景

Oracle的并行查询在处理JOIN时,可以通过调整优化器的HINT,指定优化器在处理JOIN时表的分布与关联策略。

例如,两张表JOIN时,如果是HASH或者MERGE JOIN,可以使用HASH分片,然后每个并行的计算单元处理一个数据分片,达到提高效率的目的。

又或者可以将某个较小的表进行broadcast,然后另一张表使用随机的分片策略,每个计算单元可以独立的运算。

这些算法与MPP如Greenplum的JOIN策略比较类似。

PostgreSQL 9.6也支持CPU的并行计算了,学习一下Oracle多年的并行计算是非常有必要的。

Oracle 并行计算 JOIN HINT

https://docs.oracle.com/cd/B12037_01/server.101/b10752/hintsref.htm

PQ_DISTRIBUTE

The PQ_DISTRIBUTE hint improves the performance of parallel join operations.

Do this by specifying how rows of joined tables should be distributed among producer and consumer query servers.

Using this hint overrides decisions the optimizer would normally make.

Use the EXPLAIN PLAN statement to identify the distribution chosen by the optimizer.

The optimizer ignores the distribution hint, if both tables are serial.

pq_distribute_hint::=

/*+ PQ_DISTRIBUTE ( [@queryblock] tablespec outer_distribution inner_distribution ) */  

where:

  • outer_distribution is the distribution for the outer table.
  • inner_distribution is the distribution for the inner table.

For a description of the queryblock syntax, see “Specifying a Query Block in a Hint”.

For a description of the tablespec syntax, see “Specifying Global Table Hints”.

There are six combinations for table distribution.

Only a subset of distribution method combinations for the joined tables is valid, as explained in Table 17-1.

Table 17-1 Distribution Hint Combinations

Distribution Interpretation
Hash, Hash Maps the rows of each table to consumer query servers, using a hash function on the join keys. When mapping is complete, each query server performs the join between a pair of resulting partitions. This hint is recommended when the tables are comparable in size and the join operation is implemented by hash-join or sort merge join.
Broadcast, None All rows of the outer table are broadcast to each query server. The inner table rows are randomly partitioned. This hint is recommended when the outer table is very small compared to the inner table. As a general rule, use the Broadcast/None hint when inner table size * number of query servers > outer table size.
None, Broadcast All rows of the inner table are broadcast to each consumer query server. The outer table rows are randomly partitioned. This hint is recommended when the inner table is very small compared to the outer table. As a general rule, use the None/Broadcast hint when inner table size * number of query servers < outer table size.
Partition, None Maps the rows of the outer table, using the partitioning of the inner table. The inner table must be partitioned on the join keys(当内表的JOIN字段正好是它的partition 字段,并且分区个数正好等于或接近并行度时建议使用。外表将依据内表的分区mapping rows。). This hint is recommended when the number of partitions of the outer table is equal to or nearly equal to a multiple of the number of query servers; for example, 14 partitions and 15 query servers. Note: The optimizer ignores this hint if the inner table is not partitioned or not equijoined on the partitioning key.
None, Partition Maps the rows of the inner table using the partitioning of the outer table. The outer table must be partitioned on the join keys. This hint is recommended when the number of partitions of the outer table is equal to or nearly equal to a multiple of the number of query servers; for example, 14 partitions and 15 query servers. Note: The optimizer ignores this hint if the outer table is not partitioned or not equijoined on the partitioning key.
None, None Each query server performs the join operation between a pair of matching partitions, one from each table. Both tables must be equipartitioned on the join keys(当内表和外表都在JOIN字段上进行了分区时,则内表和外表直接在匹配的分区上进行JOIN).

For example: Given two tables, r and s, that are joined using a hash-join, the following query contains a hint to use hash distribution:

SELECT /*+ORDERED PQ_DISTRIBUTE(s HASH, HASH) USE_HASH (s)*/ column_list  
  FROM r,s  
  WHERE r.c=s.c;  

To broadcast the outer table r, the query is:

SELECT /*+ORDERED PQ_DISTRIBUTE(s BROADCAST, NONE) USE_HASH (s) */ column_list  
  FROM r,s  
  WHERE r.c=s.c;  

Flag Counter

digoal’s 大量PostgreSQL文章入口