PostgreSQL 10.0 preview 功能增强 - 内置分区表
背景
PostgreSQL 和它的LOGO大象一样,给人非常强大的安全感。
就拿它的Feature来说,一个大的feature要打磨很多年才能正式的合并到master分支。
比如并行计算的特性,从9.4就开始准备,加入了work process和dynamic shared memory的功能,奠定了多进程并行执行的基础。
一致到9.6,经历了3年的开发,大量的测试,终于RELEASE了。
今天要说一说它的分区表,经历了几年的酝酿,终于在10.0加入到master了。(PG的企业版本EDB很早就支持分区表了)
如果你现在需要这个功能,可以使用postgrespro的插件, pg_pathman,否则就等10.0吧,明年9月份左右release。
《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》
PostgreSQL 10.0 分区表的设计
PostgreSQL的分区表依旧使用了继承的特性,好消息是, 终于不需要手工写规则了。
在使用方面,我们需要先创建主表,然后创建分区(即子表)。老顽固啊,就是要让你记住PG是有继承的概念的吗?
目前支持range、list分区(10.0 release时应该会支持更多的方法),分区列的类型必须支持btree索引接口(几乎涵盖所有类型, 后面会说到检查方法)。
语法
1. 创建主表
[ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
2. 创建分区
PARTITION OF parent_table [ (
{ column_name [ column_constraint [ ... ] ]
| table_constraint }
[, ... ]
) ] FOR VALUES partition_bound_spec
and partition_bound_spec is:
{ IN ( expression [, ...] ) -- list分区
|
FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) } -- range分区, unbounded表示无限小或无限大
语法解释
1. 创建主表
partition by 指定分区表的类型range或list
指定分区列,或表达式作为分区键。
range分区表键:支持指定多列、或多表达式,支持混合(键,非表达式中的列,会自动添加not null的约束)
list分区表键:支持单个列、或单个表达式
分区键必须有对应的btree索引方法的ops(可以查看系统表得到)
select typname from pg_type where oid in (select opcintype from pg_opclass);
主表不会有任何数据,数据会根据分区规则进入对应的分区表
如果插入数据时,分区键的值没有匹配的分区,会报错
不支持全局的unique, primary key, exclude, foreign key约束,只能在对应的分区建立这些约束
PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ opclass ] [, ...] )
The optional PARTITION BY clause specifies a strategy of partitioning the table.
The table thus created is called a partitioned table. The parenthesized list of columns or expressions forms the partition key for the table.
When using range partitioning, the partition key can include multiple columns or expressions, but for list partitioning, the partition key must consist of a single column or expression.
If no btree operator class is specified when creating a partitioned table, the default btree operator class for the datatype will be used. If there is none, an error will be reported.
A partitioned table is divided into sub-tables (called partitions), which are created using separate CREATE TABLE commands.
The partitioned table is itself empty. A data row inserted into the table is routed to a partition based on the value of columns or expressions in the partition key.
If no existing partition matches the values in the new row, an error will be reported.
Partitioned tables do not support UNIQUE, PRIMARY KEY, EXCLUDE, or FOREIGN KEY constraints;
however, you can define these constraints on individual partitions.
When using range partitioning, a NOT NULL constraint is added to each non-expression column in the partition key.
2. 创建分区
主表创建好之后,需要再创建分区(为了体现继承么?就不能一次干掉?)
创建分区表需要指定它的主表是哪个?
指定分区键的属性(范围,LIST值),范围,LIST不能有重叠(这个很好理解,否则重叠部分的数据到底插到哪个分区去呢?)
分区表和主表的 列数量,定义 必须完全一致,(包括OID也必须一致,要么都有,要么都没有)
可以为分区表的列单独增加Default值,或约束。
用户还可以对分区表增加表级约束
如果新增的分区表check约束,名字与主表的约束名一致,则约束内容必须与主表一致
当用户往主表插入数据库时,记录被自动路由到对应的分区,如果没有合适的分区,则报错
如果更新数据,并且更新后的KEY导致数据需要移动到另一分区,则会报错,(意思是分区键可以更新,但是不支持更新后的数据移出到别的分区表)
修改主表的字段名,字段类型时,会自动同时修改所有的分区
TRUNCATE 主表时,会清除所有继承表分区的记录(如果有多级分区,则会一直清除到所有的直接和间接继承的分区)
如果要清除单个分区,请对分区进行操作
如果要删除分区表,可以使用DROP TABLE的DDL语句,注意这个操作会对主表也加access exclusive lock。
PARTITION OF parent_table FOR VALUES partition_bound_spec
partition_bound_spec is:
{ IN ( expression [, ...] ) |
FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) }
Creates the table as partition of the specified parent table.
The partition bound specification must correspond to the partitioning method and partition key of the parent table, and must not overlap with any existing partition of that parent.
A partition cannot have columns other than those inherited from the parent. That includes the oid column, which can be specified using the WITH (OIDS) clause.
Defaults and constraints can optionally be specified for each of the inherited columns.
One can also specify table constraints in addition to those inherited from the parent.
If a check constraint with the name matching one of the parent's constraint is specified, it is merged with the latter, provided the specified condition is same.
Rows inserted into a partitioned table will be automatically routed to the correct partition. If no suitable partition exists, an error will occur.
Also, if updating a row in a given partition causes it to move to another partition due to the new partition key, an error will occur.
A partition must have the same column names and types as the table of which it is a partition.
Therefore, modifications to the column names or types of the partitioned table will automatically propagate to all children, as will operations such as TRUNCATE which normally affect a table and all of its inheritance children.
It is also possible to TRUNCATE a partition individually, just as for an inheritance child.
Note that dropping a partition with DROP TABLE requires taking an ACCESS EXCLUSIVE lock on the parent table.
例子
测试版本编译
wget https://git.postgresql.org/gitweb/?p=postgresql.git;a=snapshot;h=55caaaeba877eac1feb6481fb413fa04ae9046ac;sf=tgz
tar -zxvf postgresql-55caaae.tar.gz
cd postgresql-55caaae
./configure --prefix=/home/digoal/pgsql10.0
make world -j 32
make install-world
export PGPORT=5299
export PGDATA=/disk1/pgdata/pg_root10
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql10.0
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
initdb -D $PGDATA -U postgres -E SQL_ASCII --locale=C
vi $PGDATA/postgresql.conf
cat postgresql.conf|grep "^[a-z]"|awk -F "#" '{print $1}'
listen_addresses = '0.0.0.0'
port = 5299
max_connections = 500
superuser_reserved_connections = 13
unix_socket_directories = '.'
shared_buffers = 16GB
maintenance_work_mem = 1024MB
dynamic_shared_memory_type = posix
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 5.0
bgwriter_flush_after = 0
max_parallel_workers_per_gather = 0
max_parallel_workers = 0
old_snapshot_threshold = -1
backend_flush_after = 0
wal_buffers = 512MB
wal_writer_delay = 10ms
wal_writer_flush_after = 0
checkpoint_timeout = 55min
max_wal_size = 128GB
min_wal_size = 8GB
checkpoint_completion_target = 0.5
checkpoint_flush_after = 0
random_page_cost = 1.0
effective_cache_size = 400GB
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
log_autovacuum_min_duration = 0
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
pg_ctl start
查找可用的opclass
由于分区表的键值列必须支持btree索引,所以建议确认一下。
创建主表时,需要指定ops,如果没有指定,会自动选择类型默认的btree ops.
postgres=# select opcintype::regtype,opcname from pg_opclass ;
opcintype | opcname
-----------------------------+------------------------
abstime | abstime_ops
anyarray | array_ops
anyarray | array_ops
bit | bit_ops
boolean | bool_ops
character | bpchar_ops
character | bpchar_ops
。。。。。。
分区表
1. range 分区表
例子为按时间字段分区的分区表,用户在创建主表时,指定分区键,创建分区时,指定每个分区的范围。
1.1 主表
range分区表,支持混合式的分区键(多列,多表达式,或者混合式)
和前面说的一样,不能使用全局PK
postgres=# create table t_range(id int primary key, info text, crt_time timestamp) partition by range (crt_time, mod(hashtext(info), 4));
ERROR: 0A000: primary key constraints are not supported on partitioned tables
LINE 1: create table t_range(id int primary key, info text, crt_time...
^
LOCATION: transformColumnDefinition, parse_utilcmd.c:620
postgres=# create table t_range(id int, info text, crt_time timestamp) partition by range (crt_time);
CREATE TABLE
1.2 分区
postgres=# create table t_range_0_201610 partition of t_range (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01'); -- >= 20161001 and < 20161101
CREATE TABLE
postgres=# create table t_range_0_201611 partition of t_range (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01'); -- >= 20161101 and < 20161201
CREATE TABLE
甚至你可以将分区表放到不同的schema下面
postgres=# create schema x;
CREATE SCHEMA
postgres=# create table x.t_range_0_201612 partition of t_range (id primary key, info , crt_time ) for values from ('2016-12-01') to ('2017-01-01');
CREATE TABLE
postgres=# \d+ t_range
Table "public.t_range"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
crt_time | timestamp without time zone | | not null | | plain | |
Partition key: RANGE (crt_time)
Partitions: t_range_0_201610 FOR VALUES FROM ('2016-10-01 00:00:00') TO ('2016-11-01 00:00:00'),
t_range_0_201611 FOR VALUES FROM ('2016-11-01 00:00:00') TO ('2016-12-01 00:00:00'),
x.t_range_0_201612 FOR VALUES FROM ('2016-12-01 00:00:00') TO ('2017-01-01 00:00:00')
2. list 分区表
2.1 主表
postgres=# create table t_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 4) );
CREATE TABLE
2.2 分区
postgres=# create table t_list_0 partition of t_list (id primary key, info , crt_time ) for values in (0);
CREATE TABLE
postgres=# create table t_list_1 partition of t_list (id primary key, info , crt_time ) for values in (1);
CREATE TABLE
postgres=# create table t_list_2 partition of t_list (id primary key, info , crt_time ) for values in (2);
CREATE TABLE
postgres=# create table t_list_3 partition of t_list (id primary key, info , crt_time ) for values in (3);
CREATE TABLE
多级分区
创建主表
create table t_range_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 4) );
创建一级分区,主表
create table t_range_list_0 partition of t_range_list (id , info , crt_time ) for values in (0) partition by range (crt_time);
create table t_range_list_1 partition of t_range_list (id , info , crt_time ) for values in (1) partition by range (crt_time);
create table t_range_list_2 partition of t_range_list (id , info , crt_time ) for values in (2) partition by range (crt_time);
create table t_range_list_3 partition of t_range_list (id , info , crt_time ) for values in (3) partition by range (crt_time);
创建2级分区表
create table t_range_list_0_201611 partition of t_range_list_0 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_0_201612 partition of t_range_list_0 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
create table t_range_list_1_201611 partition of t_range_list_1 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_1_201612 partition of t_range_list_1 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
create table t_range_list_2_201611 partition of t_range_list_2 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_2_201612 partition of t_range_list_2 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
create table t_range_list_3_201611 partition of t_range_list_3 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_3_201612 partition of t_range_list_3 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
执行计划
1. 分区键条件建议使用同类型的常量、如果是函数则必须使用返回值与分区键类型一致,stable或immutable的函数。
postgres=# explain select * from t_range where crt_time=now()::timestamp; -- 如果需要转换,转换函数必须为immutable(常量),这样可以在进行逻辑推理前被调用
QUERY PLAN
---------------------------------------------------------------------------
Append (cost=0.00..23929.55 rows=14 width=42)
-> Seq Scan on t_range (cost=0.00..0.00 rows=1 width=44)
Filter: (crt_time = (now())::timestamp without time zone)
-> Seq Scan on t_range_0_201610 (cost=0.00..23870.00 rows=1 width=17)
Filter: (crt_time = (now())::timestamp without time zone)
-> Seq Scan on t_range_0_201611 (cost=0.00..29.78 rows=6 width=44)
Filter: (crt_time = (now())::timestamp without time zone)
-> Seq Scan on t_range_0_201612 (cost=0.00..29.78 rows=6 width=44)
Filter: (crt_time = (now())::timestamp without time zone)
(9 rows)
postgres=# create or replace function sysdate() returns timestamp as $$
select now()::timestamp ;
$$ language sql strict immutable;
CREATE FUNCTION
postgres=# explain select * from t_range where crt_time=sysdate();
QUERY PLAN
----------------------------------------------------------------------------------------
Append (cost=0.00..24.12 rows=7 width=44)
-> Seq Scan on t_range (cost=0.00..0.00 rows=1 width=44)
Filter: (crt_time = '2016-12-16 09:58:03.246322'::timestamp without time zone)
-> Seq Scan on t_range_0_201612 (cost=0.00..24.12 rows=6 width=44)
Filter: (crt_time = '2016-12-16 09:58:03.246322'::timestamp without time zone)
(5 rows)
返回类型一致的C代码stable函数也可以被排除,否则会被解释
postgres=# create or replace function sysdate() returns timestamp as $$
select now()::timestamp ;
$$ language sql strict stable;
CREATE FUNCTION
postgres=# explain select * from t_range where crt_time=sysdate();
QUERY PLAN
---------------------------------------------------------------------------
Append (cost=0.00..23929.55 rows=14 width=42)
-> Seq Scan on t_range (cost=0.00..0.00 rows=1 width=44)
Filter: (crt_time = (now())::timestamp without time zone)
-> Seq Scan on t_range_0_201610 (cost=0.00..23870.00 rows=1 width=17)
Filter: (crt_time = (now())::timestamp without time zone)
-> Seq Scan on t_range_0_201611 (cost=0.00..29.78 rows=6 width=44)
Filter: (crt_time = (now())::timestamp without time zone)
-> Seq Scan on t_range_0_201612 (cost=0.00..29.78 rows=6 width=44)
Filter: (crt_time = (now())::timestamp without time zone)
(9 rows)
返回类型一致的C代码stable函数也可以被排除
postgres=# create table p(id int, info text, crt_time timestamptz);
CREATE TABLE
postgres=# create table t(like p) inherits(p);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "info" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
postgres=# create table p1(like p) inherits(p);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "info" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
postgres=# create table p2(like p) inherits(p);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "info" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
postgres=# alter table t add constraint ck check(crt_time >='2016-11-01' and crt_time <'2016-12-01');
ALTER TABLE
postgres=# alter table p1 add constraint ck check(crt_time >='2016-10-01' and crt_time <'2016-11-01');
ALTER TABLE
postgres=# alter table p2 add constraint ck check(crt_time >='2016-09-01' and crt_time <'2016-10-01');
ALTER TABLE
postgres=# explain select * from p where crt_time=now();
QUERY PLAN
----------------------------------------------------------------------------------------
Append (cost=0.00..0.00 rows=1 width=44)
-> Seq Scan on p (cost=0.00..0.00 rows=1 width=44)
Filter: (crt_time = '2016-12-16 10:02:40.483355+08'::timestamp with time zone)
(3 rows)
绑定分区,解绑分区
ALTER TABLE可以将表绑定到分区表的某个分区,也可以将某个已有分区从分区表剔除。
ALTER TABLE [ IF EXISTS ] name
ATTACH PARTITION partition_name FOR VALUES partition_bound_spec
ALTER TABLE [ IF EXISTS ] name
DETACH PARTITION partition_name
解释
绑定时,需要指定分区键的边界(范围、或LIST值)
列名与主表一致、列类型一致、列顺序一致、NOT NULL、CHECK约束一致、不建议UNIQUE,PK,FK。
不能有任何主表非继承的CHECK约束,(建议对约束重建,不要加not inherit属性)
加入的分区需要进行全表扫描,确认所有的记录都在指定的分区键边界内。
如果要避免全表扫描,建议在执行绑定前,对这个表加入CHECK约束,确保约束可以保证记录都在边界内。 但是这种方法不适用于分区键包含表达式并且分区不允许NULL值的情况。
ATTACH PARTITION partition_name FOR VALUES partition_bound_spec
This form attaches an existing table (which might itself be partitioned) as a partition of the target table using the same syntax for partition_bound_spec as CREATE TABLE.
The partition bound specification must correspond to the partitioning strategy and partition key of the target table.
The table to be attached must have all the same columns as the target table and no more; moreover, the column types must also match.
Also, it must have all the NOT NULL and CHECK constraints of the target table. Currently UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints are not considered.
If any of the CHECK constraints of the table being attached is marked NO INHERIT, the command will fail; such a constraint must be recreated without the NO INHERIT clause.
A full table scan is performed on the table being attached to check that no existing row in the table violates the partition constraint.
It is possible to avoid this scan by adding a valid CHECK constraint to the table that would allow only the rows satisfying the desired partition constraint before running this command.
It will be determined using such a constraint that the table need not be scanned to validate the partition constraint.
This does not work, however, if any of the partition keys is an expression and the partition does not accept NULL values.
If attaching a list partition that will not accept NULL values, also add NOT NULL constraint to the partition key column, unless it's an expression.
DETACH PARTITION partition_name
This form detaches specified partition of the target table. The detached partition continues to exist as a standalone table, but no longer has any ties to the table from which it was detached.
All the actions except RENAME, SET TABLESPACE, SET SCHEMA, ATTACH PARTITION, and DETACH PARTITION can be combined into a list of multiple alterations to apply in parallel.
For example, it is possible to add several columns and/or alter the type of several columns in a single command. This is particularly useful with large tables, since only one pass over the table need be made.
You must own the table to use ALTER TABLE. To change the schema or tablespace of a table, you must also have CREATE privilege on the new schema or tablespace.
To add the table as a new child of a parent table, you must own the parent table as well. Also, to attach a table as a new partition of the table, you must own the table being attached.
To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the table's schema.
(These restrictions enforce that altering the owner doesn't do anything you couldn't do by dropping and recreating the table.
However, a superuser can alter ownership of any table anyway.) To add a column or alter a column type or use the OF clause, you must also have USAGE privilege on the data type.
性能测试
range 分区表
插入
postgres=# insert into t_range select generate_series(1,1000000),'test','2016-10-01';
INSERT 0 1000000
Time: 3849.514 ms (00:03.850)
直接插分区表
postgres=# truncate t_range;
TRUNCATE TABLE
Time: 72.181 ms
postgres=# insert into t_range_0_201610 select generate_series(1,1000000),'test','2016-10-01';
INSERT 0 1000000
Time: 3587.772 ms (00:03.588)
查询,主表未消除
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_range where id=1 and crt_time='2016-10-01';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..2.45 rows=2 width=30) (actual time=0.037..0.038 rows=1 loops=1)
Buffers: shared hit=4
-> Seq Scan on public.t_range (cost=0.00..0.00 rows=1 width=44) (actual time=0.020..0.020 rows=0 loops=1)
Output: t_range.id, t_range.info, t_range.crt_time
Filter: ((t_range.id = 1) AND (t_range.crt_time = '2016-10-01 00:00:00'::timestamp without time zone))
-> Index Scan using t_range_0_201610_pkey on public.t_range_0_201610 (cost=0.42..2.45 rows=1 width=17) (actual time=0.016..0.016 rows=1 loops=1)
Output: t_range_0_201610.id, t_range_0_201610.info, t_range_0_201610.crt_time
Index Cond: (t_range_0_201610.id = 1)
Filter: (t_range_0_201610.crt_time = '2016-10-01 00:00:00'::timestamp without time zone)
Buffers: shared hit=4
Planning time: 0.248 ms
Execution time: 0.069 ms
(12 rows)
on conflict do
postgres=# insert into t_range select generate_series(1,1000000),'test','2016-10-01' on conflict do nothing;
ERROR: 0A000: ON CONFLICT clause is not supported with partitioned tables
LOCATION: transformInsertStmt, analyze.c:825
Time: 0.350 ms
参考
1. https://www.postgresql.org/docs/devel/static/sql-createtable.html
《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》