PostgreSQL 10.0 preview 功能增强 - 分区表(hash,range,list)

3 minute read

背景

PostgreSQL 10.0将支持range,list分区表，同时hash分区处于POC阶段（同时还有一些需要改进的地方，例如优化器部分）。

如果你使用的是10.0以前的版本，可以使用pg_pathman插件实现分区,pg_pathman已经非常的完美。

PostgreSQL支持伪表作为分区，例如外部表，物化视图。伪表作为分区有很多可以适合的使用场景，例如将外部表作为分区，则可以实现sharding场景。

分区表用法

https://www.postgresql.org/docs/devel/static/sql-createtable.html

《PostgreSQL 10.0 内置分区表》

hash poc 如下

Hi all,  
  
Now we have a declarative partitioning, but hash partitioning is not  
implemented yet. Attached is a POC patch to add the hash partitioning  
feature. I know we will need more discussions about the syntax and other  
specifications before going ahead the project, but I think this runnable  
code might help to discuss what and how we implement this.  
  
* Description  
  
In this patch, the hash partitioning implementation is basically based  
on the list partitioning mechanism. However, partition bounds cannot be  
specified explicitly, but this is used internally as hash partition  
index, which is calculated when a partition is created or attached.  
  
The tentative syntax to create a partitioned table is as bellow;  
  
 CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;  
  
The number of partitions is specified by PARTITIONS, which is currently  
constant and cannot be changed, but I think this is needed to be changed in  
some manner. A hash function is specified by USING. Maybe, specifying hash  
function may be ommitted, and in this case, a default hash function  
corresponding to key type will be used.  
  
A partition table can be create as bellow;  
  
 CREATE TABLE h1 PARTITION OF h;  
 CREATE TABLE h2 PARTITION OF h;  
 CREATE TABLE h3 PARTITION OF h;  
  
FOR VALUES clause cannot be used, and the partition bound is  
calclulated automatically as partition index of single integer value.  
  
When trying create partitions more than the number specified  
by PARTITIONS, it gets an error.  
  
postgres=# create table h4 partition of h;  
ERROR:  cannot create hash partition more than 3 for h  
  
An inserted record is stored in a partition whose index equals  
abs(hashfunc(key)) % <number_of_partitions>. In the above  
example, this is abs(hashint4(i))%3.  
  
postgres=# insert into h (select generate_series(0,20));  
INSERT 0 21  
  
postgres=# select *,tableoid::regclass from h;  
 i  | tableoid   
----+----------  
  0 | h1  
  1 | h1  
  2 | h1  
  4 | h1  
  8 | h1  
 10 | h1  
 11 | h1  
 14 | h1  
 15 | h1  
 17 | h1  
 20 | h1  
  5 | h2  
 12 | h2  
 13 | h2  
 16 | h2  
 19 | h2  
  3 | h3  
  6 | h3  
  7 | h3  
  9 | h3  
 18 | h3  
(21 rows)  
  
* Todo / discussions  
  
In this patch, we cannot change the number of partitions specified  
by PARTITIONS. I we can change this, the partitioning rule  
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)  
is also changed and then we need reallocatiing records between  
partitions.  
  
In this patch, user can specify a hash function USING. However,  
we migth need default hash functions which are useful and  
proper for hash partitioning.   
  
Currently, even when we issue SELECT query with a condition,  
postgres looks into all partitions regardless of each partition's  
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".  
  
postgres=# explain select * from h where i = 10;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..125.62 rows=40 width=4)  
   ->  Seq Scan on h  (cost=0.00..0.00 rows=1 width=4)  
         Filter: (i = 10)  
   ->  Seq Scan on h1  (cost=0.00..41.88 rows=13 width=4)  
         Filter: (i = 10)  
   ->  Seq Scan on h2  (cost=0.00..41.88 rows=13 width=4)  
         Filter: (i = 10)  
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4)  
         Filter: (i = 10)  
(9 rows)  
  
However, if we modify a condition into a same expression  
as the partitions constraint, postgres can exclude unrelated  
table from search targets. So, we might avoid the problem  
by converting the qual properly before calling predicate_refuted_by().  
  
postgres=# explain select * from h where abs(hashint4(i))%3 = abs(hashint4(10))%3;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..61.00 rows=14 width=4)  
   ->  Seq Scan on h  (cost=0.00..0.00 rows=1 width=4)  
         Filter: ((abs(hashint4(i)) % 3) = 2)  
   ->  Seq Scan on h3  (cost=0.00..61.00 rows=13 width=4)  
         Filter: ((abs(hashint4(i)) % 3) = 2)  
(5 rows)  
  
Best regards,  
Yugo Nagata  
  
--   
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>  

这个patch的讨论，详见邮件组，本文末尾URL。

PostgreSQL社区的作风非常严谨，一个patch可能在邮件组中讨论几个月甚至几年，根据大家的意见反复的修正，patch合并到master已经非常成熟，所以PostgreSQL的稳定性也是远近闻名的。

参考

https://commitfest.postgresql.org/13/1059/

https://www.postgresql.org/message-id/flat/20170228233313.fc14d8b6.nagata@sraoss.co.jp#20170228233313.fc14d8b6.nagata@sraoss.co.jp

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

PostgreSQL 10.0 preview 功能增强 - 分区表(hash,range,list)

背景

参考

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法