分区索引的应用和实践 - 阿里云RDS PostgreSQL最佳实践

1 minute read

背景

当表很大时，大家可能会想到分区表的概念，例如用户表，按用户ID哈希或者范围分区，拆成很多表。

又比如行为数据表，可以按时间分区，拆成很多表。

拆表的好处：

1、可以将表放到不同的表空间，表空间和块设备挂钩，例如历史数据访问量低，数据量大，可以放到机械盘所在的表空间。而活跃数据则可以放到SSD对应的表空间。

2、拆表后，方便维护，例如删除历史数据，直接DROP TABLE就可以了，不会产生REDO。

索引实际上也有分区的概念，例如按USER ID HASH分区，按时间分区等。

分区索引的好处与分区表的好处类似。同时还有其他好处：

1、不需要被检索的部分数据，可以不对它建立索引。

例如一张用户表，我们只检索已激活的用户，对于未激活的用户，我们不对它进行检索，那么可以只对已激活用户建立索引。

2、不同构造的数据，可以使用不同的索引接口。

例如某张表里面数据出现了倾斜，某些VALUE占比很高，而某些VALUE占比则很低。我们可以对占比很高的VALUE使用bitmap或者gin的索引方法，而对于出现频率低的使用btree的索引方法。

那么我们接下来看看PostgreSQL分区索引是如何实现的？

全局索引

首先是全局索引，就是我们平常建立的索引。

create table test(id int, crt_time timestamp, info text);  
  
create index idx_test_id on test(id);  

一级分区索引

create table test(id int, crt_time timestamp, info text);  
  
分区索引如下  
  
create index idx_test_id_1 on test(id) where crt_time between '2017-01-01' and '2017-02-01';  
create index idx_test_id_2 on test(id) where crt_time between '2017-02-01' and '2017-03-01';  
...  
create index idx_test_id_12 on test(id) where crt_time between '2017-12-01' and '2018-01-01';  

多级分区索引

create table test(id int, crt_time timestamp, province_code int, info text);  
  
分区索引如下  
  
create index idx_test_id_1_1 on test(id) where crt_time between '2017-01-01' and '2017-02-01' and province_code=1;  
create index idx_test_id_1_2 on test(id) where crt_time between '2017-02-01' and '2017-03-01' and province_code=1;  
...  
create index idx_test_id_1_12 on test(id) where crt_time between '2017-12-01' and '2018-01-01' and province_code=1;  
  
....  
  
create index idx_test_id_2_1 on test(id) where crt_time between '2017-01-01' and '2017-02-01' and province_code=2;  
create index idx_test_id_2_2 on test(id) where crt_time between '2017-02-01' and '2017-03-01' and province_code=2;  
...  
create index idx_test_id_2_12 on test(id) where crt_time between '2017-12-01' and '2018-01-01' and province_code=2;  

数据倾斜分区例子

create table test(uid int, crt_time timestamp, province_code int, info text);  
  
create index idx_test_1 on test using gin(uid) where uid<1000;     -- 该号段包含大量重复值（高频值），使用gin索引加速  
create index idx_test_1 on test using btree(uid) where uid>=1000;  -- 该号段为低频值，使用btree索引加速  

小结

1、在搜索数据时，用户带上索引分区条件，索引字段。使用对应的操作符，即可实现分区索引的检索。

2、分区索引通常用在多个条件的搜索中，其中分区条件作为其中的一种搜索条件。当然它也能用在对单个列的搜索中。

3、PostgreSQL除了支持分区索引（partial index），还支持表达式索引、函数索引。

欢迎使用阿里云RDS PostgreSQL。

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

分区索引的应用和实践 - 阿里云RDS PostgreSQL最佳实践

背景

全局索引

一级分区索引

多级分区索引

数据倾斜分区例子

小结

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法